mtproto_proxy/doc/migration-flow.md
Sergey Prokhorov 121d8b7413
docs: split-mode setup guide, architecture diagrams, cert script, build
README:
- New 'Split-mode setup' section: motivation, firewall rules, step-by-step
  instructions for both VPN tunnel and TLS distribution options
- Split-mode bullet added to Features list
- Notes on DPI-resistant tunnels (Shadowsocks, VLESS/XRay, Hysteria2) for
  Russian deployment; standard VPN protocols (WireGuard, OpenVPN) may be blocked
- Install instructions updated to use `make init-config` (copies templates,
  auto-detects public IP) instead of manual cp; ROLE= documented throughout
- Split-mode Step 4 uses `make ROLE=back/front` so template-change detection
  works correctly after `git pull`

Makefile:
- ROLE ?= both variable selects config templates (both/front/back)
- Config prereq rules use $(SYS_CONFIG_SRC) / $(VM_ARGS_SRC) based on ROLE
- New `init-config` target: force-copies templates, auto-detects public IP,
  prints edit reminder; replaces manual cp in install workflow

scripts/gen_dist_certs.sh:
- Two-step workflow: `init <dir>` on back server (CA + back cert),
  `add-node <dir> <name>` per front server (cert signed by existing CA)
- Generates per-node ssl_dist.<name>.conf with paths substituted (no
  NODE_NAME placeholder to edit manually)
- ssl_dist.<name>.conf is now used directly (no rename to ssl_dist.conf);
  vm.args examples and README updated to match

config/vm.args.{front,back}.example:
- -ssl_dist_optfile points to role-specific filename (ssl_dist.front.conf /
  ssl_dist.back.conf) so cert files can be copied as-is without renaming

AGENTS.md:
- Role-overview Mermaid flowchart showing front/back/both process split
- Data-plane section replaced with links to doc/ (no duplication)
- Supervision tree, key interactions, split-mode config keys updated

doc/handler-downstream-flow.md, doc/migration-flow.md:
- Mermaid box grouping to visually separate FRONT and BACK node participants
- erpc:call reference corrected (was rpc:call)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-12 00:34:45 +02:00

2.7 KiB
Raw Blame History

Transparent client migration on DC connection death

Telegram periodically closes the TCP connection to the proxy ("DC connection rotation", typically every 3070 s). Instead of dropping all clients multiplexed on that connection, the proxy remaps each idle client to a surviving (or freshly-started) DC connection transparently.

Key actors:

  • mtp_down_conn (old) — the dying downstream connection process
  • mtp_dc_pool — pool managing all downstream connections for one DC
  • mtp_handler — one process per connected Telegram client
  • mtp_down_conn (new) — replacement downstream spawned by the pool

Split-mode note: in front/back split mode mtp_handler lives on the front node and mtp_dc_pool / mtp_down_conn live on the back node. Every message in the diagram below that crosses the front↔back boundary (the migrate cast, upstream_new cast, Pool->>Handler reply, etc.) is carried transparently by Erlang distribution — no code changes are needed because Erlang PIDs and gen_server calls work across nodes unchanged. Process monitors also fire on node disconnection, so a back-node restart causes all affected front-node handlers to exit cleanly.

sequenceDiagram
    participant TG as Telegram
    box LightGreen "BACK node"
        participant OldDown as mtp_down_conn (old)
        participant Pool as mtp_dc_pool
        participant NewDown as mtp_down_conn (new)
    end
    box LightBlue "FRONT node"
        participant Handler as mtp_handler
    end

    TG->>OldDown: TCP close

    OldDown->>Pool: downstream_closing(self()) [sync]
    Pool-->>Pool: remove OldDown from ds_store + monitors
    Pool-->>NewDown: spawn & connect (maybe_restart_connection)
    Pool-->>OldDown: ok

    OldDown->>Handler: migrate(OldDown) [cast, to all known upstreams]

    Note over OldDown: drain_mailbox(5000)

    alt upstream_new in mailbox
        Note over Pool,OldDown: Race: pool processed a {get} call just before<br/>downstream_closing — upstream_new cast already queued
        Pool-->>OldDown: upstream_new(Handler2, Opts) [cast, queued]
        OldDown->>Handler2: migrate(OldDown) [cast, immediately]
    end

    alt Handler was blocked in down_send
        Handler-->>OldDown: {send, Data} [call, in mailbox]
        OldDown-->>Handler: {error, migrating}
        Note over Handler: metric[mid_send] → stop<br/>(client reconnects and resends)
    else Handler was idle
        Handler->>Pool: migrate(OldDown, self(), Opts) [sync]
        Pool-->>Pool: remove Handler from upstreams map
        Pool->>NewDown: upstream_new(Handler, Opts) [cast]
        Pool-->>Handler: NewDown pid
        Note over Handler: down = NewDown<br/>metric[ok]
    end

    Note over OldDown: stop {shutdown, downstream_migrated}