All posts
·
13 min read

Building a Vercel-like Platform on Cloud Run

  • Platform Engineering
  • GCP
  • Cloud Run
  • DevOps
  • PaaS

Vercel gets the deployment loop right: push to git, build, deploy, get a URL, inspect the result from one dashboard. We wanted that shape inside a large GCP environment, but for internal apps that did not justify the standard GKE path.

The existing route to production was heavy for small web apps: repository scaffolding, CI/CD wiring, Helm charts, Backstage provisioning, and coordination with the platform org. Teams could build a proof of concept quickly, then wait days or weeks before a live URL existed with authentication in front of it.

This post covers the technical shape of the internal platform we built: Cloud Build for builds, Artifact Registry for images, Cloud Run for runtime, Secret Manager for credentials, a shared HTTPS routing layer, Terraform for URL-map updates, and a log pipeline that made build and runtime output feel like one console.

For the short portfolio version, see Internal deployment platform on Cloud Run.


Architecture overview

The core path was intentionally boring: source control triggered a build, the build produced a container image, the image landed in Artifact Registry, and Cloud Run served it.

The developer contract was "connect a repository, push to a branch, and get an authenticated URL." Behind that, the platform owned build detection, image publishing, Cloud Run deployment, secret injection, routing, and access posture.

That contract mattered more than matching Vercel feature for feature. The goal was not to reproduce the whole Vercel platform on GCP. The goal was to build a narrow golden path for internal apps that were too small for GKE and too important to run from a laptop.


Why Cloud Run

GCP gave us several compute options. GKE was the production standard, but it was too much machinery for small internal apps. App Engine was simpler, but less aligned with the container model teams already understood. Cloud Functions fit event handlers, not full web apps with many routes.

Cloud Run sat in the middle. Teams could bring a container, and Google operated the runtime. We got scale to zero, revisions, traffic splitting primitives, managed HTTPS integration, and no node pools to patch.

Scale to zero was important. Many internal apps and previews receive a few requests per day. Keeping a Kubernetes deployment warm for each one would have made the platform expensive before it proved useful.

We accepted the Cloud Run constraints:

  • Cold starts were acceptable for most internal tools. Latency-sensitive production services could opt into minimum instances.
  • Long-running HTTP requests and background work needed a different path, such as queues, workers, or Cloud Run Jobs.
  • The platform was request-oriented by default. It was not meant to host every workload pattern.

That tradeoff was the point. Cloud Run covered a large class of internal applications without making every team learn cluster operations.


Build detection

We needed a build path that worked for teams with and without container expertise.

The rule was simple: if a repository had a Dockerfile in the build context, the platform used it. If not, the platform used buildpacks. That gave experienced teams an escape hatch while keeping the first deployment path accessible for teams that only had application source code.

Buildpacks were the right default because they supported common stacks without requiring every team to write a container file. Framework-specific builders would have multiplied the number of platform-maintained paths. Docker plus buildpacks gave us one explicit branch in the build logic instead of a growing matrix of special cases.

Cloud Build cloned the repository, ran the selected builder, and pushed the image to Artifact Registry. Cloud Run then deployed a new revision from that image. The image was the boundary between build and runtime.


Configuration and secrets

Configuration had two categories: non-secret environment variables and sensitive values.

Non-secret configuration could be managed through the platform UI. Sensitive values lived in Secret Manager and were mapped into Cloud Build or Cloud Run by name. That kept credentials out of repository config and avoided treating the platform database as a secret store.

Some values were needed at build time, such as private package registry tokens or framework build variables. Others were needed only at runtime. The platform modeled both paths so teams did not have to flatten every value into one generic environment list.

Each application could override values per environment. That was necessary because the same app often needed different database URLs, service endpoints, or feature flags between staging and production.


Routing and URL maps

Cloud Run gives every service a run.app URL, but that was not enough for the product we wanted. Teams needed predictable internal hostnames behind the organization's HTTPS and access policies.

We put Cloud Run services behind a shared HTTP(S) load balancer using serverless NEGs. Each application received a hostname derived from the service and pool it belonged to. Google-managed certificates handled TLS.

The hard part was keeping the load balancer URL map aligned with the platform's control plane. A manually edited URL map would not scale. We used Firestore as the source of truth for projects and environments, then generated and applied Terraform to update the routing layer.

This worked, but it was heavier than I would choose today. Cloud Run Domain Mapping would have removed much of the URL-map machinery, but it was not available in Canada when we built the first version. We paid the load-balancer complexity because the regional constraint was real.


Identity-aware access by default

Internal apps needed authentication from the first deployment. The default could not be "public until someone remembers to lock it down."

We put identity-aware access in front of deployed apps by default. That made security the path of least resistance: connect repo, deploy, get a URL, require sign-in. Teams could request a different posture when they needed it, but the platform default was private.

This was also why a single cohesive deployment path mattered. Some routing alternatives would have simplified pieces of the topology, but split the product across more control planes or packaging models. Non-technical users were part of the audience. Keeping one repo-push deployment story mattered more than optimizing edge mechanics for power users.


Tenancy and isolation

We started with one shared footprint because it was the fastest way to prove the platform. That simplified bootstrapping, but it created the usual problems: noisy neighbors, fuzzy ownership, shared quotas, and painful teardown.

As adoption grew, the tenancy model needed a path forward:

  • Shared footprint: fastest to bootstrap, weakest for ownership and cleanup.
  • Pooled isolation: workloads grouped into isolated pools with clearer billing and operational boundaries.
  • Dedicated boundary: a path for teams that outgrew the pool and needed their own cloud project or account boundary.

The long-term shape was not "everything in one project forever." It was a progression. Teams could start in a managed pool and move toward stronger isolation when their workload justified it.

Given a less constrained org-level GCP project budget, I would automate project creation and API enablement earlier. Cloud Run, Artifact Registry, Cloud Build, IAM, DNS, and logging all need setup before a tenant is useful. Terraform handled the infrastructure, but project lifecycle automation should not have been the bottleneck for every new tenant.


Build logs and runtime logs

Developers expected build logs and runtime logs in one place. GCP split them across Cloud Build and Cloud Run.

For build logs, the platform could stream directly from the build system. Runtime logs were harder because workloads could live in team-owned projects. We shipped a Terraform module that teams could run in their project to forward Cloud Run logs through Pub/Sub. The platform subscribed to those messages, merged the streams, and pushed updates to the UI with server-sent events.

This was one of the places where product shape mattered. A platform dashboard that deploys an app but sends users elsewhere for every debug loop is not a complete workflow. The console needed to feel like one tail, even if the data came from different GCP systems.


What we did not build

We scoped the first version to the path from git to an authenticated Cloud Run URL. A lot of Vercel-like features did not make that cut.

Preview deployments per PR. Cloud Run revisions gave us the underlying primitive, but we did not build the full lifecycle around pull requests, preview hostnames, cleanup, and UI state in the first version. The routing layer was already the most complex part of the platform, so adding preview URL churn would have expanded the riskiest area first.

Edge runtime features. The platform ran apps in Cloud Run regions. We did not add a separate edge runtime or routing compute layer. Most workloads were internal tools, APIs, and server-rendered pages where regional Cloud Run was enough.

Analytics dashboards. GCP already had Cloud Monitoring, Logging, and Trace. We linked to the relevant places and surfaced the logs developers needed during deploys, but we did not rebuild Vercel Analytics inside the platform.

Managed databases and storage. We did not bundle database, cache, or blob provisioning into v1. The direction was to let teams provision GCP-native resources through maintained templates or approved platform flows, not to make the deployment product own every backing service.

Platform-level caching and static optimization. Apps controlled their own caching behavior. If a Next.js app used ISR or static assets correctly, the container could serve that behavior. We did not add a platform-level cache abstraction in front of every app.

That constraint kept the first version deliverable. The goal was a narrow golden path, not a full cloud platform.


What I would keep

I would keep the git-centric workflow and the buildpack-or-Docker split. Letting teams arrive with or without a Dockerfile preserved accessibility without blocking power users.

I would keep identity-aware access as the default posture for internal deployments. Security should be the path of least resistance, not a separate project.

I would keep separating the control-plane data model from how the platform rendered infrastructure. Firestore described what should exist; Terraform reconciled the GCP routing layer.


What I would change

I would avoid owning URL-map complexity if the regional product surface allowed it. Domain Mapping would have removed a large amount of routing machinery.

I would automate tenant project creation and API enablement earlier. Terraform was useful, but project lifecycle setup still created friction for stronger isolation.

I would add a first-class GKE path for workloads whose production standard remained Kubernetes. Cloud Run was the right simple path, but not the only path large teams needed.

The lesson is the same one from Gatekeeping vs. Golden Paths: a platform does not need to own every option. It needs to make the preferred path obvious, fast, and safe enough that teams choose it willingly.


  • Platform Engineering
  • GCP
  • Cloud Run
  • DevOps
  • PaaS
Made with ❤️ in 🇨🇦 · Copyright © 2026 Valentin Prugnaud
Foxy seeing you here!
Wondering if I'd fit your role?
Logo