LibreChat/.github/workflows/gitnexus-pr-command.yml
Danny Avila 990763cbee
🧠 feat: Enable GitNexus Embeddings for Dev Branch and PR Indexes (#12620)
* feat: auto-enable embeddings for dev and PR indexes too

Previously only main branch pushes got --embeddings; dev and
contributor PRs ran graph-only and relied on BM25 search. Semantic
search on those indexes silently returned empty, which defeats the
whole point of serving them to MCP clients.

New logic: every automatic trigger (push to main/dev, pull_request
from contributors) enables --embeddings. Only workflow_dispatch
still respects the explicit input toggle, so operators can run a
fast graph-only re-index when they don't need fresh vectors.

Cost: adds ~3-5 minutes per index run. Acceptable tradeoff for
having semantic search work across all served branches + open PRs
instead of just main.

* refine: gate PR embeddings on unit-test path relevance

Previous version auto-enabled --embeddings on every contributor PR,
which cost ~3-5 min of extra CI per index even on PRs that couldn't
benefit from semantic code search (docs, config, workflow files,
i18n strings, etc.).

New logic mirrors the backend-review.yml and frontend-review.yml
path filters — if a PR doesn't touch api/, client/, or packages/
it won't trigger unit tests and it doesn't need embeddings. The
check queries the GitHub API for the PR's changed file list via
`gh api repos/.../pulls/<N>/files` (paginated for very large PRs)
and enables embeddings only when at least one path matches.

main/dev pushes still always embed. workflow_dispatch still respects
the explicit input toggle, which also covers the /gitnexus index
[embeddings] PR command.

The contributor gate at the job level is unchanged — non-contributor
PRs are still skipped entirely regardless of paths.

* feat: /gitnexus command works for non-contributor and fork PRs

The command workflow already gated on the commenter's author
association (not the PR author's), so a contributor commenting
/gitnexus index on an outside contributor's PR passes the auth
check. But the downstream index workflow checked out the PR's
raw head SHA, which only exists in the fork for cross-repo PRs —
actions/checkout fetches from the base repo's origin and fails.

Switch the command workflow to dispatch with refs/pull/<N>/head
instead of the SHA. GitHub mirrors every PR's head into the base
repo as this ref regardless of whether the PR is from a fork, so
the checkout always resolves.

End result: a contributor can type `/gitnexus index embeddings`
on any PR — including one opened by a first-time contributor from
a fork — and the index (with embeddings, if requested) is built
and served. The contributor takes responsibility for the trust
boundary by typing the command.

Updated the relevant header/inline comments in both workflows so
the next maintainer understands the refs/pull/<N>/head choice and
the commenter-based gating.

* refine: /gitnexus index defaults to embeddings on

A contributor typing the command has already chosen to spend ~5
minutes of CI on a full re-index; they wouldn't invoke the command
just to get a BM25-only result. Flip the default so the short form
`/gitnexus index` produces an embeddings-enabled index.

Modifier semantics:
  /gitnexus index              -> embeddings ON (new default)
  /gitnexus index embeddings   -> embeddings ON (explicit, no-op alias)
  /gitnexus index fast         -> embeddings OFF (opt-out)
  /gitnexus index graph-only   -> embeddings OFF (alias)
  /gitnexus index no-embeddings-> embeddings OFF (alias)

The previous `embeddings` modifier is preserved as a no-op alias so
anyone who learned the earlier form still gets what they expected.
2026-04-11 13:35:29 -04:00

121 lines
5 KiB
YAML

# Responds to `/gitnexus index` comments on pull requests.
#
# Gated to the same author_association roles (OWNER, MEMBER, COLLABORATOR)
# as the automatic PR index trigger, but applied to the COMMENTER, not
# the PR author. This intentionally lets a contributor index a PR from
# a non-contributor / first-time fork author — the contributor takes
# responsibility for the trust boundary by typing the command.
#
# When a matching comment lands on a PR, this workflow dispatches
# `gitnexus-index.yml` with the PR number and the `refs/pull/<N>/head`
# ref so indexing works for fork PRs too (GitHub mirrors every PR's
# head ref into the base repo regardless of which fork it originated
# from, so actions/checkout can always resolve it).
#
# Use cases:
# - Re-index a PR after a rebase without pushing a new commit
# - Index a docs-only PR that was skipped by paths-ignore
# - Index a non-contributor (fork) PR that the auto-trigger skipped
# - Re-run a failed index
#
# Supported commands:
# /gitnexus index — index the PR with embeddings (default)
# /gitnexus index embeddings — explicit form of the above; same effect
# /gitnexus index fast — graph-only index (skip embeddings), for
# a quick re-index without waiting ~5 min
# of embedding generation
name: GitNexus PR Command
on:
issue_comment:
types: [created]
permissions:
contents: read
pull-requests: write
actions: write # needed to dispatch gitnexus-index.yml
concurrency:
group: gitnexus-pr-command-${{ github.event.issue.number }}
cancel-in-progress: false
jobs:
dispatch:
# Only run for PR comments that start with /gitnexus from trusted
# commenters. Intentionally checks the COMMENTER's association so a
# contributor can index a non-contributor's PR on demand.
if: |
github.event.issue.pull_request != null &&
startsWith(github.event.comment.body, '/gitnexus') &&
(github.event.comment.author_association == 'OWNER' ||
github.event.comment.author_association == 'MEMBER' ||
github.event.comment.author_association == 'COLLABORATOR')
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Parse command and resolve PR head ref
id: parse
uses: actions/github-script@v7
with:
script: |
const body = context.payload.comment.body.trim();
const match = body.match(/^\/gitnexus\s+(\w+)(?:\s+(\w+))?/);
if (!match) {
core.setFailed(`Unrecognized command: ${body}. Try: /gitnexus index [fast]`);
return;
}
const [, subcommand, modifier] = match;
if (subcommand !== 'index') {
core.setFailed(`Unknown subcommand: ${subcommand}. Only 'index' is supported.`);
return;
}
// Default to embeddings on — a contributor typing the command
// has already decided they want a full re-index. The `fast`
// modifier is the explicit opt-out for graph-only runs.
// `embeddings` is accepted as a no-op alias for backwards
// compat with the previous command form.
let embeddings = 'true';
if (modifier === 'fast' || modifier === 'graph-only' || modifier === 'no-embeddings') {
embeddings = 'false';
}
// Use refs/pull/<N>/head instead of the raw head SHA. GitHub
// mirrors every PR's head into the base repo as this ref, so
// actions/checkout can always resolve it — even for PRs from
// forks whose raw SHAs don't exist in the base repo.
const prNum = context.payload.issue.number;
core.setOutput('pr_number', String(prNum));
core.setOutput('pr_ref', `refs/pull/${prNum}/head`);
core.setOutput('embeddings', embeddings);
core.info(
`Dispatching index for PR #${prNum} at refs/pull/${prNum}/head (embeddings=${embeddings}, modifier=${modifier || '(none)'})`,
);
- name: Dispatch gitnexus-index workflow
uses: actions/github-script@v7
with:
script: |
await github.rest.actions.createWorkflowDispatch({
owner: context.repo.owner,
repo: context.repo.repo,
workflow_id: 'gitnexus-index.yml',
ref: 'main',
inputs: {
pr_number: '${{ steps.parse.outputs.pr_number }}',
pr_ref: '${{ steps.parse.outputs.pr_ref }}',
embeddings: '${{ steps.parse.outputs.embeddings }}',
force: 'false',
},
});
- name: React to the comment
uses: actions/github-script@v7
with:
script: |
await github.rest.reactions.createForIssueComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: context.payload.comment.id,
content: 'rocket',
});