LibreChat/.github/workflows/docker-smoke.yml
Danny Avila 0bd1a7350f
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
👷 ci: Add API runtime smoke (boot the production image) to docker-smoke (#13605)
* 👷 ci: Add API runtime smoke (boot the production image) to docker-smoke

The docker-smoke workflow only built the `client-package-build` stage and
never booted the runtime, so it couldn't catch the class of regression that
recently took production down: the api tsdown bundle externalizes runtime
deps that, after `npm ci --omit=dev`, were missing from the image
(`Cannot find module 'get-stream'`).

- Add an `api-runtime-smoke` job that builds the real production image
  (final `api-build` stage, `npm ci --omit=dev`), then:
  1. loads the @librechat/api bundle's full require graph in the pruned
     image (deterministic, no DB) — fails on any missing/ESM-incompatible
     runtime dependency.
  2. boots the actual entrypoint and asserts no module-load crash (the
     server loads its require graph before connecting to Mongo, so this
     surfaces without a database).
- Expand triggers to include `packages/api/**`, `packages/data-schemas/**`,
  and `api/package.json` (previously a packages/api change only triggered
  this via a root lockfile change, and even then only built the client stage).
- Add gha build cache + concurrency cancellation to bound CI cost.

* 👷 ci: Address Codex review — boot smoke against real Mongo + crash detection

- Boot the production image against a real MongoDB container with the env
  the server needs, so the *entire* require graph loads. `api/db/connect.js`
  throws at module scope without `MONGO_URI` and is imported before
  models/services/routes, so the previous no-env boot exercised almost none
  of the legacy API graph. (Codex finding 2)
- Gate on `/health` returning 200 AND the container staying alive, failing on
  any container exit. A non-module startup crash (ReferenceError, SyntaxError,
  bad config) now fails the smoke instead of slipping past a missing-module
  grep. (Codex finding 3)
- Expand trigger from `api/package.json` to `api/**`, since the image copies
  the whole `api/` tree and runs `node server/index.js`. (Codex finding 1)

* 👷 ci: Address Codex round 2 — poll /readyz + cover all image inputs

- Poll /readyz instead of /health. /health returns 200 at app.listen, but
  initializeMCPs() and checkMigrations() run *after* listen and process.exit(1)
  on failure; /readyz only returns 200 once serverReady is set after those
  complete. So post-listen startup crashes now fail the smoke too. (finding A)
- Expand triggers to every source tree copied into the production image:
  client/**, config/**, skill/** (the final stage copies client/dist, config,
  and skill). (finding B)
2026-06-08 18:44:52 -04:00

128 lines
4.8 KiB
YAML

name: Docker Build Smoke Tests
on:
workflow_dispatch:
pull_request:
paths:
- '.github/workflows/docker-smoke.yml'
- '.dockerignore'
- 'Dockerfile.multi'
- 'package.json'
- 'package-lock.json'
- 'api/**'
- 'client/**'
- 'config/**'
- 'skill/**'
- 'packages/api/**'
- 'packages/client/**'
- 'packages/data-provider/**'
- 'packages/data-schemas/**'
permissions:
contents: read
concurrency:
group: docker-smoke-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
client-package-target:
name: Build Docker client package target
runs-on: ubuntu-latest
timeout-minutes: 25
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build client package target
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile.multi
platforms: linux/amd64
push: false
target: client-package-build
api-runtime-smoke:
name: API runtime smoke (production image boots)
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
# Build the real production image (final `api-build` stage), which installs
# with `npm ci --omit=dev` — the same prune that, in prod, exposed runtime
# dependencies the tsdown bundle externalizes but were never declared.
- name: Build production image
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile.multi
platforms: linux/amd64
push: false
load: true
tags: librechat-api-smoke:ci
cache-from: type=gha,scope=docker-smoke-api
cache-to: type=gha,mode=max,scope=docker-smoke-api
# Loads the entire externalized require graph of the built @librechat/api
# bundle inside the pruned production image. A missing or ESM-incompatible
# runtime dependency (e.g. the `get-stream` regression) fails here with a
# non-zero exit — deterministically, with no database required.
- name: Verify production image resolves all runtime modules
run: |
docker run --rm librechat-api-smoke:ci \
node -e "require('@librechat/api'); require('@librechat/api/telemetry'); console.log('module resolution OK')"
# Boot the real entrypoint against a real MongoDB so the *entire* server
# require graph loads (api/db throws at module scope without MONGO_URI, and
# is imported before models/services/routes), then gate on /readyz AND the
# container staying alive. /readyz only returns 200 after the post-listen
# startup (initializeMCPs + checkMigrations) sets serverReady, and those
# steps process.exit(1) on failure — so ANY startup crash (missing module,
# ReferenceError, bad config, post-listen failure) fails the smoke.
- name: Boot production image against MongoDB and poll /readyz
run: |
set -u
docker network create lc-smoke
docker run -d --name lc-mongo --network lc-smoke mongo:8.0.20
docker run -d --name lc-api --network lc-smoke -p 3080:3080 \
-e HOST=0.0.0.0 -e PORT=3080 \
-e NODE_ENV=production \
-e MONGO_URI=mongodb://lc-mongo:27017/LibreChat \
-e CREDS_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef \
-e CREDS_IV=0123456789abcdef0123456789abcdef \
-e JWT_SECRET=docker-smoke-jwt-secret \
-e JWT_REFRESH_SECRET=docker-smoke-jwt-refresh-secret \
-e SEARCH=false \
librechat-api-smoke:ci
healthy=""
for i in $(seq 1 60); do
if [ "$(docker inspect -f '{{.State.Running}}' lc-api 2>/dev/null)" != "true" ]; then
echo "::error::API container exited during startup (exit code $(docker inspect -f '{{.State.ExitCode}}' lc-api 2>/dev/null))"
break
fi
if [ "$(curl -sS -o /dev/null -w '%{http_code}' http://localhost:3080/readyz 2>/dev/null || true)" = "200" ]; then
healthy="yes"
echo "/readyz returned 200 — server fully booted (post-listen startup complete)."
break
fi
sleep 2
done
echo "----- last 100 lines of api container logs -----"
docker logs lc-api 2>&1 | tail -100 || true
echo "------------------------------------------------"
docker rm -f lc-api lc-mongo >/dev/null 2>&1 || true
docker network rm lc-smoke >/dev/null 2>&1 || true
if [ -z "$healthy" ]; then
echo "::error::Production image failed to reach a ready /readyz within timeout"
exit 1
fi