From 0bd1a7350ff54a92c8ae7ff6f7b62c114fc8bebd Mon Sep 17 00:00:00 2001 From: Danny Avila Date: Mon, 8 Jun 2026 18:44:52 -0400 Subject: [PATCH] =?UTF-8?q?=F0=9F=91=B7=20ci:=20Add=20API=20runtime=20smok?= =?UTF-8?q?e=20(boot=20the=20production=20image)=20to=20docker-smoke=20(#1?= =?UTF-8?q?3605)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * 👷 ci: Add API runtime smoke (boot the production image) to docker-smoke The docker-smoke workflow only built the `client-package-build` stage and never booted the runtime, so it couldn't catch the class of regression that recently took production down: the api tsdown bundle externalizes runtime deps that, after `npm ci --omit=dev`, were missing from the image (`Cannot find module 'get-stream'`). - Add an `api-runtime-smoke` job that builds the real production image (final `api-build` stage, `npm ci --omit=dev`), then: 1. loads the @librechat/api bundle's full require graph in the pruned image (deterministic, no DB) — fails on any missing/ESM-incompatible runtime dependency. 2. boots the actual entrypoint and asserts no module-load crash (the server loads its require graph before connecting to Mongo, so this surfaces without a database). - Expand triggers to include `packages/api/**`, `packages/data-schemas/**`, and `api/package.json` (previously a packages/api change only triggered this via a root lockfile change, and even then only built the client stage). - Add gha build cache + concurrency cancellation to bound CI cost. * 👷 ci: Address Codex review — boot smoke against real Mongo + crash detection - Boot the production image against a real MongoDB container with the env the server needs, so the *entire* require graph loads. `api/db/connect.js` throws at module scope without `MONGO_URI` and is imported before models/services/routes, so the previous no-env boot exercised almost none of the legacy API graph. (Codex finding 2) - Gate on `/health` returning 200 AND the container staying alive, failing on any container exit. A non-module startup crash (ReferenceError, SyntaxError, bad config) now fails the smoke instead of slipping past a missing-module grep. (Codex finding 3) - Expand trigger from `api/package.json` to `api/**`, since the image copies the whole `api/` tree and runs `node server/index.js`. (Codex finding 1) * 👷 ci: Address Codex round 2 — poll /readyz + cover all image inputs - Poll /readyz instead of /health. /health returns 200 at app.listen, but initializeMCPs() and checkMigrations() run *after* listen and process.exit(1) on failure; /readyz only returns 200 once serverReady is set after those complete. So post-listen startup crashes now fail the smoke too. (finding A) - Expand triggers to every source tree copied into the production image: client/**, config/**, skill/** (the final stage copies client/dist, config, and skill). (finding B) --- .github/workflows/docker-smoke.yml | 92 ++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) diff --git a/.github/workflows/docker-smoke.yml b/.github/workflows/docker-smoke.yml index d3f313b571..3780959d8b 100644 --- a/.github/workflows/docker-smoke.yml +++ b/.github/workflows/docker-smoke.yml @@ -9,12 +9,22 @@ on: - 'Dockerfile.multi' - 'package.json' - 'package-lock.json' + - 'api/**' + - 'client/**' + - 'config/**' + - 'skill/**' + - 'packages/api/**' - 'packages/client/**' - 'packages/data-provider/**' + - 'packages/data-schemas/**' permissions: contents: read +concurrency: + group: docker-smoke-${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + jobs: client-package-target: name: Build Docker client package target @@ -34,3 +44,85 @@ jobs: platforms: linux/amd64 push: false target: client-package-build + + api-runtime-smoke: + name: API runtime smoke (production image boots) + runs-on: ubuntu-latest + timeout-minutes: 30 + steps: + - uses: actions/checkout@v4 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + # Build the real production image (final `api-build` stage), which installs + # with `npm ci --omit=dev` — the same prune that, in prod, exposed runtime + # dependencies the tsdown bundle externalizes but were never declared. + - name: Build production image + uses: docker/build-push-action@v5 + with: + context: . + file: Dockerfile.multi + platforms: linux/amd64 + push: false + load: true + tags: librechat-api-smoke:ci + cache-from: type=gha,scope=docker-smoke-api + cache-to: type=gha,mode=max,scope=docker-smoke-api + + # Loads the entire externalized require graph of the built @librechat/api + # bundle inside the pruned production image. A missing or ESM-incompatible + # runtime dependency (e.g. the `get-stream` regression) fails here with a + # non-zero exit — deterministically, with no database required. + - name: Verify production image resolves all runtime modules + run: | + docker run --rm librechat-api-smoke:ci \ + node -e "require('@librechat/api'); require('@librechat/api/telemetry'); console.log('module resolution OK')" + + # Boot the real entrypoint against a real MongoDB so the *entire* server + # require graph loads (api/db throws at module scope without MONGO_URI, and + # is imported before models/services/routes), then gate on /readyz AND the + # container staying alive. /readyz only returns 200 after the post-listen + # startup (initializeMCPs + checkMigrations) sets serverReady, and those + # steps process.exit(1) on failure — so ANY startup crash (missing module, + # ReferenceError, bad config, post-listen failure) fails the smoke. + - name: Boot production image against MongoDB and poll /readyz + run: | + set -u + docker network create lc-smoke + docker run -d --name lc-mongo --network lc-smoke mongo:8.0.20 + docker run -d --name lc-api --network lc-smoke -p 3080:3080 \ + -e HOST=0.0.0.0 -e PORT=3080 \ + -e NODE_ENV=production \ + -e MONGO_URI=mongodb://lc-mongo:27017/LibreChat \ + -e CREDS_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \ + -e CREDS_IV=0123456789abcdef0123456789abcdef \ + -e JWT_SECRET=docker-smoke-jwt-secret \ + -e JWT_REFRESH_SECRET=docker-smoke-jwt-refresh-secret \ + -e SEARCH=false \ + librechat-api-smoke:ci + + healthy="" + for i in $(seq 1 60); do + if [ "$(docker inspect -f '{{.State.Running}}' lc-api 2>/dev/null)" != "true" ]; then + echo "::error::API container exited during startup (exit code $(docker inspect -f '{{.State.ExitCode}}' lc-api 2>/dev/null))" + break + fi + if [ "$(curl -sS -o /dev/null -w '%{http_code}' http://localhost:3080/readyz 2>/dev/null || true)" = "200" ]; then + healthy="yes" + echo "/readyz returned 200 — server fully booted (post-listen startup complete)." + break + fi + sleep 2 + done + + echo "----- last 100 lines of api container logs -----" + docker logs lc-api 2>&1 | tail -100 || true + echo "------------------------------------------------" + docker rm -f lc-api lc-mongo >/dev/null 2>&1 || true + docker network rm lc-smoke >/dev/null 2>&1 || true + + if [ -z "$healthy" ]; then + echo "::error::Production image failed to reach a ready /readyz within timeout" + exit 1 + fi