mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-05-13 16:07:30 +00:00
* fix: add docker system prune before image pull to prevent disk exhaustion The 60GB droplet filled up after ~40 deploys because each docker compose pull leaves the previous image's layers as dangling/unused. The gitnexus image is ~700MB, so ~40 stale copies ≈ 28GB of dead layers. Combined with indexes, OS, and Docker's build cache, the disk hits 100% and the next pull fails with 'no space left on device'. Add a docker system prune -af --volumes BEFORE pulling the new image on every deploy. This removes stopped containers, unused networks, all images not referenced by a running container, and build cache. Running containers are never touched. Typically frees 1-2GB per deploy (the previous image's layers). Also add a hard 2GB free-space guard after prune so the deploy fails with a clear error instead of letting docker pull attempt a 700MB extract onto a near-full disk. * fix: cap PR indexes at 3 + delete-before-sync for 10GB disk The 10GB droplet has ~2GB free. Each index is ~130MB, so 7 PR indexes (~900MB) plus main+dev (~260MB) plus the ~700MB Docker image leaves almost nothing for image pulls. The deploy failed with 'no space left on device' during docker compose pull. Three changes: 1. Cap PR indexes at MAX_PR_INDEXES=3. The resolve step now sorts PR artifacts by created_at descending and only keeps the 3 most recent. Older PR indexes are logged as evicted and their droplet folders get cleaned by the prune step. 2. Prune BEFORE sync (was after). Freeing disk space from evicted indexes before rsyncing new data is critical on a tight disk. The old order (sync then prune) could briefly hold both old evicted indexes and newly-uploaded ones simultaneously. 3. Delete-before-sync for every index, including main/dev. Instead of rsync --delete (which transfers new files then removes extras), rm -rf the target folder before rsync so the disk never holds both old and new copies of the same index (~260MB saved per index). Main/dev are only deleted when a fresh artifact is about to replace them — never evicted between deploys. Budget on 10GB disk: OS + Docker engine: ~4.0 GB Docker image (running): ~0.7 GB main + dev indexes: ~0.26 GB 3 PR indexes: ~0.39 GB Docker prune headroom: ~0.7 GB (for image pull) Free: ~3.9 GB * refine: restrict automatic PR indexing to danny-avila authored PRs With 200+ open PRs and a 10GB disk capped at 3 served PR indexes, auto-indexing every contributor PR burns CI minutes for artifacts that will mostly be evicted before anyone queries them. Narrow the pull_request auto-trigger to PRs authored by danny-avila only. Other contributors' PRs can still be indexed on demand via /gitnexus index (contributor-gated comment command) or manual workflow_dispatch — both arrive as workflow_dispatch events and bypass the pull_request filter entirely. * fix: drop --volumes from docker system prune to preserve Caddy TLS state The deploy workflow explicitly handles a caddy-not-running state later in the same step. If Caddy is stopped when the prune runs, --volumes deletes the caddy-data and caddy-config volumes (TLS certs + ACME account keys), forcing a Let's Encrypt re-issuance on next start. LE rate-limits to 5 certs per domain per week, so repeated wipes could brick HTTPS for days. docker system prune -af (without --volumes) still removes stopped containers, unused networks, all dangling/unreferenced images, and build cache — which is where the disk savings come from. Named volumes are left untouched. * fix: rsync-then-swap instead of delete-before-sync The delete-before-sync pattern removed the live index BEFORE rsync ran. If rsync failed (SSH timeout, disk pressure, network error), the index was already gone — production served nothing for that repo until a later deploy succeeded. Replace with rsync-then-swap: upload to a .new temp directory, and only rm + mv into place after rsync succeeds. On rsync failure, the .new temp is cleaned up and the old index stays live. The cost is ~130MB of extra disk while both old and new coexist, but the prune step runs first and frees evicted PR indexes, so this fits comfortably on the 10GB disk. * fix: fail deploy on main/dev rsync failure, soft-fail PRs only The rsync-then-swap pattern downgraded ALL failures to a warning, so the deploy continued even when LibreChat or LibreChat-dev failed to sync. The job would pull the new image, restart the container, and report success while serving stale or missing core indexes. Split by criticality: main/dev rsync failures now exit 1 (aborting the deploy before the container restart). PR index failures remain soft-fail with a warning — a missing PR index is inconvenient but shouldn't take the whole server down. |
||
|---|---|---|
| .. | ||
| ISSUE_TEMPLATE | ||
| workflows | ||
| CODE_OF_CONDUCT.md | ||
| configuration-release.json | ||
| configuration-unreleased.json | ||
| CONTRIBUTING.md | ||
| FUNDING.yml | ||
| playwright.yml | ||
| pull_request_template.md | ||
| SECURITY.md | ||