LibreChat/.github
Danny Avila 76e9543f99
🧹 chore: Cap PR Indexes at 3 and Add Delete-Before-Sync (#12672)
* fix: add docker system prune before image pull to prevent disk exhaustion

The 60GB droplet filled up after ~40 deploys because each
docker compose pull leaves the previous image's layers as
dangling/unused. The gitnexus image is ~700MB, so ~40 stale
copies ≈ 28GB of dead layers. Combined with indexes, OS, and
Docker's build cache, the disk hits 100% and the next pull fails
with 'no space left on device'.

Add a docker system prune -af --volumes BEFORE pulling the new
image on every deploy. This removes stopped containers, unused
networks, all images not referenced by a running container, and
build cache. Running containers are never touched. Typically
frees 1-2GB per deploy (the previous image's layers).

Also add a hard 2GB free-space guard after prune so the deploy
fails with a clear error instead of letting docker pull attempt
a 700MB extract onto a near-full disk.

* fix: cap PR indexes at 3 + delete-before-sync for 10GB disk

The 10GB droplet has ~2GB free. Each index is ~130MB, so 7 PR indexes
(~900MB) plus main+dev (~260MB) plus the ~700MB Docker image leaves
almost nothing for image pulls. The deploy failed with 'no space left
on device' during docker compose pull.

Three changes:

1. Cap PR indexes at MAX_PR_INDEXES=3. The resolve step now sorts
   PR artifacts by created_at descending and only keeps the 3 most
   recent. Older PR indexes are logged as evicted and their droplet
   folders get cleaned by the prune step.

2. Prune BEFORE sync (was after). Freeing disk space from evicted
   indexes before rsyncing new data is critical on a tight disk. The
   old order (sync then prune) could briefly hold both old evicted
   indexes and newly-uploaded ones simultaneously.

3. Delete-before-sync for every index, including main/dev. Instead
   of rsync --delete (which transfers new files then removes extras),
   rm -rf the target folder before rsync so the disk never holds both
   old and new copies of the same index (~260MB saved per index).
   Main/dev are only deleted when a fresh artifact is about to replace
   them — never evicted between deploys.

Budget on 10GB disk:
  OS + Docker engine:    ~4.0 GB
  Docker image (running): ~0.7 GB
  main + dev indexes:    ~0.26 GB
  3 PR indexes:          ~0.39 GB
  Docker prune headroom: ~0.7 GB (for image pull)
  Free:                  ~3.9 GB

* refine: restrict automatic PR indexing to danny-avila authored PRs

With 200+ open PRs and a 10GB disk capped at 3 served PR indexes,
auto-indexing every contributor PR burns CI minutes for artifacts
that will mostly be evicted before anyone queries them.

Narrow the pull_request auto-trigger to PRs authored by danny-avila
only. Other contributors' PRs can still be indexed on demand via
/gitnexus index (contributor-gated comment command) or manual
workflow_dispatch — both arrive as workflow_dispatch events and
bypass the pull_request filter entirely.

* fix: drop --volumes from docker system prune to preserve Caddy TLS state

The deploy workflow explicitly handles a caddy-not-running state later
in the same step. If Caddy is stopped when the prune runs, --volumes
deletes the caddy-data and caddy-config volumes (TLS certs + ACME
account keys), forcing a Let's Encrypt re-issuance on next start.
LE rate-limits to 5 certs per domain per week, so repeated wipes
could brick HTTPS for days.

docker system prune -af (without --volumes) still removes stopped
containers, unused networks, all dangling/unreferenced images, and
build cache — which is where the disk savings come from. Named
volumes are left untouched.

* fix: rsync-then-swap instead of delete-before-sync

The delete-before-sync pattern removed the live index BEFORE rsync
ran. If rsync failed (SSH timeout, disk pressure, network error),
the index was already gone — production served nothing for that
repo until a later deploy succeeded.

Replace with rsync-then-swap: upload to a .new temp directory, and
only rm + mv into place after rsync succeeds. On rsync failure,
the .new temp is cleaned up and the old index stays live. The cost
is ~130MB of extra disk while both old and new coexist, but the
prune step runs first and frees evicted PR indexes, so this fits
comfortably on the 10GB disk.

* fix: fail deploy on main/dev rsync failure, soft-fail PRs only

The rsync-then-swap pattern downgraded ALL failures to a warning,
so the deploy continued even when LibreChat or LibreChat-dev failed
to sync. The job would pull the new image, restart the container,
and report success while serving stale or missing core indexes.

Split by criticality: main/dev rsync failures now exit 1 (aborting
the deploy before the container restart). PR index failures remain
soft-fail with a warning — a missing PR index is inconvenient but
shouldn't take the whole server down.
2026-04-15 09:46:48 -04:00
..
ISSUE_TEMPLATE 🎨 feat: OpenAI Image Tools (GPT-Image-1) (#7079) 2025-04-26 04:30:58 -04:00
workflows 🧹 chore: Cap PR Indexes at 3 and Add Delete-Before-Sync (#12672) 2026-04-15 09:46:48 -04:00
CODE_OF_CONDUCT.md 🔗 chore: Add Stable Discord and Homepage Links (#1835) 2024-02-19 09:42:57 -05:00
configuration-release.json 📜 ci: AutomateCHANGELOG.md (#5838) 2025-02-18 08:35:43 -05:00
configuration-unreleased.json 📜 ci: AutomateCHANGELOG.md (#5838) 2025-02-18 08:35:43 -05:00
CONTRIBUTING.md 📝 docs: Add AGENTS.md for Project Structure and Coding Standards (#11866) 2026-02-19 16:33:43 -05:00
FUNDING.yml
playwright.yml chore: Get the latest of all github actions (#1335) 2023-12-14 07:44:38 -05:00
pull_request_template.md 🧹 chore: remove old docs (#2684) 2024-05-13 10:15:30 -04:00
SECURITY.md 🔗 chore: Add Stable Discord and Homepage Links (#1835) 2024-02-19 09:42:57 -05:00