Cloud Run

Cloud Run is a fully managed, serverless (it does not save any states) platform by Google Cloud that allows you to run containerized applications without having to worry about infrastructure management. It automatically scales your services based on demand and only charges you for the resources you use when your service is running.

Cloud Run Microservices

We currently run three microservices—Users, Academy, and AI—on Google Cloud Run.

Users Microservice: 8 instances minimum
Academy & AI Microservices: 3 instances minimum each

Why These Minimum Instances?

Performance: Keeping more instances running helps avoid cold starts during traffic spikes.
Cost vs. Convenience: Our Google Cloud grant allows us to prioritize user experience by running more instances. Over time, we will likely optimize these numbers to reduce costs once the grant no longer covers a significant portion of usage.

Monitoring and Latency

When reviewing user experience, pay close attention to the 95% latency stats. This metric shows the upper range of response times most users experience:

If the 95% latency is too high, some users are experiencing significant delays.
Investigate the cause (e.g., cold starts, insufficient resources, code inefficiencies) and adjust configurations or optimizations as needed.

By balancing minimum instances and carefully monitoring 95% latencies, we aim to deliver a responsive platform while keeping operational costs manageable.

Here the example of instance monitoring

Introduction Cloud build