Send TLS decode_error alert on malformed ClientHello and missing SNI

Scanners probe for fake-TLS proxies by sending structurally malformed
ClientHellos (e.g. ExtensionsLen=0 with trailing extension bytes). A
real TLS server responds with a fatal decode_error alert; previously
the proxy crashed the handler process silently, making it detectable.

Changes:
- mtp_fake_tls: add TLS_REC_ALERT, TLS_ALERT_FATAL, TLS_ALERT_DECODE_ERROR
  macros; export tls_decode_error_alert/0 which builds the 7-byte alert
  frame from macros
- mtp_fake_tls: add second clause to parse_client_hello/1 that throws
  {protocol_error, tls_bad_client_hello, bad_client_hello} instead of
  letting a bare function_clause propagate
- mtp_fake_tls: tighten parse_sni/1 catch to match the specific tagged
  error rather than a catch-all error:_
- mtp_handler: add attempt_fronting clauses for tls_bad_client_hello and
  tls_no_sni — both send the decode_error alert before closing
- mtp_handler: effective_secret/2 now raises tls_bad_client_hello (not
  tls_invalid_digest) when per_sni_secrets=on and the ClientHello has
  no SNI, so it also gets the alert treatment
- single_dc_SUITE: new malformed_tls_hello_decode_error_case/1 verifies
  the alert bytes are sent and the metric is incremented
- AGENTS.md: document test organisation, process architecture diagram,
  and upstream/downstream naming note

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Sergey Prokhorov 2026-04-07 13:46:39 +02:00
parent 36b30e3f5f
commit dfe8ebf034
No known key found for this signature in database
GPG key ID: 1C570244E4EF3337
4 changed files with 158 additions and 4 deletions

View file

@ -36,6 +36,66 @@ Dockerfile Docker image build
| `mtp_metric` | Metrics/telemetry |
| `mtp_session_storage` | Replay-attack protection (session deduplication) |
### Process architecture
```
OTP supervision tree
────────────────────────────────────────────────────────────────────
mtproto_proxy_sup (one_for_one)
├── mtp_config (gen_server, singleton)
├── mtp_session_storage (gen_server, singleton)
├── mtp_dc_pool_sup (supervisor, simple_one_for_one)
│ └── mtp_dc_pool (gen_server, one per DC id, permanent)
├── mtp_down_conn_sup (supervisor, simple_one_for_one)
│ └── mtp_down_conn (gen_server, one per Telegram TCP conn, temporary)
└── Ranch listeners (one per configured port: mtp_ipv4, mtp_ipv6, …)
└── mtp_handler (gen_server, one per client TCP conn, transient)
Data-plane message flow (one client connection)
────────────────────────────────────────────────────────────────────
Telegram client Telegram server
│ │
│ TCP (fake-TLS / obfuscated / secure) │
▼ ▼
┌─────────────┐ gen_server:call({send,Data}) ┌──────────────┐ raw TCP ┌──────────────┐
│ mtp_handler │ ──────────────────────────────► │ mtp_down_conn│ ────────────► │ Telegram DC │
│ (upstream) │ │ (downstream) │ ◄──────────── │ (middle srv) │
│ │ ◄────────────────────────────── │ │ └──────────────┘
└──────┬──────┘ gen_server:cast({proxy_ans}) └──────────────┘
│ on first data: mtp_config:get_downstream_safe/2 → picks (pool, down_conn)
│ on disconnect: cast({return, self()}) → releases slot
┌─────────────┐
│ mtp_dc_pool │ — spawns mtp_down_conn via mtp_down_conn_sup when pool is empty
│ (per DC) │
└─────────────┘
> **Naming note:** the terms "upstream" and "downstream" in the current code are the
> opposite of what one might expect:
> `upstream` = the client-side connection (`mtp_handler`),
> `downstream` = the Telegram-server-side connection (`mtp_down_conn`).
> This will be renamed in a future refactor.
Key interactions
────────────────────────────────────────────────────────────────────
mtp_handler → mtp_config : get_downstream_safe/2 — resolves DC id to
a (pool_pid, down_conn_pid) pair on first
upstream data packet
mtp_handler → mtp_down_conn : send/2 (sync call) — forward client data
mtp_down_conn → mtp_handler : cast {proxy_ans, …} — forward Telegram reply
mtp_down_conn → mtp_handler : cast {close_ext, …} — Telegram closed stream
mtp_handler → mtp_dc_pool : return/2 (cast) — release slot on disconnect
mtp_dc_pool → mtp_down_conn : upstream_new/upstream_closed (cast)
mtp_dc_pool → mtp_down_conn_sup: start_conn/2 — spawn new TCP conn to Telegram
mtp_down_conn → mtp_config : get_netloc/1, get_secret/0 — read DC address
and proxy secret for RPC handshake
mtp_config → mtp_dc_pool_sup : start_pool/1 — create pool when new DC seen
```
## Build
Requires Erlang/OTP 25+.
@ -79,6 +139,32 @@ Individual steps:
Always run `make test` before committing. Fix all xref warnings and dialyzer errors — they are treated as errors.
### Test organisation — where to add new tests
There are three kinds of tests, each with a clear home:
| Kind | Files | When to add |
|------|-------|-------------|
| **EUnit** (unit) | `src/*.erl`, `-ifdef(TEST)` blocks | Pure functions with no I/O: codec encode/decode round-trips, packet parsing helpers, crypto primitives |
| **PropEr** (property-based) | `test/prop_mtp_<module>.erl` | Codec/parser properties that should hold for *arbitrary* inputs — e.g. encode→decode identity, parser accepts all valid inputs, parser never crashes on random bytes |
| **Common Test** (integration) | `test/single_dc_SUITE.erl` | End-to-end behaviour involving a real listener + fake DC: protocol negotiation, policy enforcement, error handling visible at the TCP level (alerts sent, connections closed), domain fronting, replay protection |
**Rule of thumb:** if the behaviour is observable only over a TCP socket or requires a running application, it belongs in `single_dc_SUITE`. If it is a property of a pure function, add a PropEr property in the matching `prop_mtp_<module>.erl`. If it is a targeted unit case for a specific input, use EUnit.
**What changes need new tests:**
- **New codec or protocol module** → PropEr round-trip property in `prop_mtp_<module>.erl` + a CT `echo_*_case` in `single_dc_SUITE`
- **New protocol error path** → CT case that sends the triggering byte sequence over TCP and asserts the exact response (alert bytes, metric counter, connection close)
- **New policy or config option** → CT case that sets the env, exercises the path, resets env in `{post, Cfg}`
- **New parser clause or binary pattern** → PropEr property verifying the clause accepts all valid inputs and a targeted EUnit/PropEr case for boundary/malformed inputs
- **Security-critical paths** (replay detection, session storage, digest validation) → CT case; also consider PropEr for the pure crypto/comparison functions
**Naming conventions:**
- CT cases: `<description>_case/1` — auto-discovered by `all/0`
- PropEr properties: `prop_<description>/0` (or `/1` with a `doc` clause)
- Each CT case must implement `{pre, Cfg}` / `{post, Cfg}` / `Cfg when is_list(Cfg)` clauses and call `setup_single` / `stop_single` to avoid resource leaks
### Debugging CT failures
When `rebar3 ct` (or `make test`) reports failures, **do not rely on the terminal output** — it is truncated and shows only the last error. Instead, go straight to the HTML logs:

View file

@ -16,6 +16,7 @@
-export([from_client_hello/2,
derive_sni_secret/3,
parse_sni/1,
tls_decode_error_alert/0,
new/0,
try_decode_packet/2,
decode_all/2,
@ -54,9 +55,13 @@
-define(TLS_12_VERSION, 3, 3).
-define(TLS_13_VERSION, 3, 4).
-define(TLS_REC_CHANGE_CIPHER, 20).
-define(TLS_REC_ALERT, 21).
-define(TLS_REC_HANDSHAKE, 22).
-define(TLS_REC_DATA, 23).
-define(TLS_ALERT_FATAL, 2).
-define(TLS_ALERT_DECODE_ERROR, 50).
-define(TLS_12_DATA, ?TLS_REC_DATA, ?TLS_12_VERSION).
-define(DIGEST_POS, 11).
@ -156,10 +161,17 @@ parse_sni(Data) ->
{error, no_sni}
end
catch
error:_ ->
error:{protocol_error, tls_bad_client_hello, _} ->
{error, bad_hello}
end.
%% TLS fatal decode_error alert (RFC 8446 §6).
%% Sent to clients whose ClientHello is structurally invalid or lacks an SNI,
%% making the proxy behave like a real TLS server rather than silently dropping.
-spec tls_decode_error_alert() -> binary().
tls_decode_error_alert() ->
<<?TLS_REC_ALERT, ?TLS_12_VERSION, 0, 2, ?TLS_ALERT_FATAL, ?TLS_ALERT_DECODE_ERROR>>.
%% Derive a per-SNI 16-byte secret from the base secret, SNI domain and a salt.
%% Derivation: SHA256(salt || hex32(base_secret) || sni_domain)[0:16]
%%
@ -194,7 +206,9 @@ parse_client_hello(<<?TLS_REC_HANDSHAKE, ?TLS_10_VERSION, TlsFrameLen:?u16, %Fra
cipher_suites = parse_suites(CipherSuites),
compression_methods = parse_compression(CompMethods),
extensions = parse_extensions(Extensions)
}.
};
parse_client_hello(_Data) ->
error({protocol_error, tls_bad_client_hello, bad_client_hello}).
parse_suites(Bin) ->
[Suite || <<Suite:?u16>> <= Bin].

View file

@ -516,6 +516,14 @@ attempt_fronting(replay_session_detected, SniDomain,
Config ->
do_front(SniDomain, Config, Acc, Ip, Listener, S)
end;
attempt_fronting(tls_bad_client_hello, _Extra,
#state{sock = Sock, transport = Transport} = _S) ->
_ = Transport:send(Sock, mtp_fake_tls:tls_decode_error_alert()),
skip;
attempt_fronting(tls_no_sni, _Extra,
#state{sock = Sock, transport = Transport} = _S) ->
_ = Transport:send(Sock, mtp_fake_tls:tls_decode_error_alert()),
skip;
attempt_fronting(_Type, _Extra, _S) ->
skip.
@ -544,7 +552,7 @@ effective_secret(Data, Secret) ->
mtp_fake_tls:derive_sni_secret(Secret, Sni, Salt);
{error, Reason} ->
%% No SNI not a valid fake-TLS MTP connection; fail fast.
error({protocol_error, tls_invalid_digest, Reason})
error({protocol_error, tls_bad_client_hello, Reason})
end
end.

View file

@ -26,7 +26,8 @@
domain_fronting_fragmented_case/1,
domain_fronting_replay_case/1,
per_sni_secrets_on_case/1,
per_sni_secrets_wrong_secret_case/1
per_sni_secrets_wrong_secret_case/1,
malformed_tls_hello_decode_error_case/1
]).
-export([set_env/2,
@ -735,6 +736,51 @@ per_sni_secrets_wrong_secret_case(Cfg) when is_list(Cfg) ->
1, mtp_test_metric:get_tags(
count, [?APP, protocol_error, total], [?FUNCTION_NAME, tls_invalid_digest])).
%% @doc A structurally malformed ClientHello (ExtensionsLen=0 but data follows) must cause
%% the proxy to send a TLS fatal decode_error alert and then close the connection,
%% rather than crashing silently.
malformed_tls_hello_decode_error_case({pre, Cfg}) ->
setup_single(?FUNCTION_NAME, 10000 + ?LINE, #{}, Cfg);
malformed_tls_hello_decode_error_case({post, Cfg}) ->
stop_single(Cfg);
malformed_tls_hello_decode_error_case(Cfg) when is_list(Cfg) ->
Host = ?config(mtp_host, Cfg),
Port = ?config(mtp_port, Cfg),
%% Build a ClientHello that is structurally valid at the TLS record layer
%% (correct lengths, version bytes) but lies about ExtensionsLen=0 while
%% trailing bytes follow this is the exact pattern seen from real scanners.
TlsPacketLen = 512,
HelloLen = 508, % TlsPacketLen - 4 (hello type + hello len field)
Random = crypto:strong_rand_bytes(32),
SessId = crypto:strong_rand_bytes(32),
CipherSuites = <<19, 1>>, % TLS_AES_128_GCM_SHA256, 2 bytes
%% Padding fills the rest of the TLS frame after ExtensionsLen to hit TlsPacketLen exactly.
%% Consumed so far inside the frame: hello_type(1)+hello_len(3)+version(2)+random(32)
%% +sessid_len(1)+sessid(32)+cs_len(2)+cs(2)+comp_len(1)+comp(1)+ext_len(2) = 79
PaddingLen = TlsPacketLen - 79,
Padding = binary:copy(<<0>>, PaddingLen),
MalformedHello = <<22, 3, 1, TlsPacketLen:16, % TLS record header (handshake, TLS1.0)
1, HelloLen:24, % ClientHello type + length
3, 3, % legacy version (TLS1.2)
Random/binary, % 32-byte random
32, SessId/binary, % session ID
2:16, CipherSuites/binary, % cipher suites
1, 0, % compression methods
0:16, % ExtensionsLen = 0 (lie)
Padding/binary>>, % trailing bytes that should be extensions
{ok, Sock} = gen_tcp:connect(Host, Port, [binary, {active, false}], 2000),
ok = gen_tcp:send(Sock, MalformedHello),
%% Proxy must send back a TLS fatal decode_error alert (21, 3, 3, 0, 2, 2, 50)
ExpectedAlert = mtp_fake_tls:tls_decode_error_alert(),
{ok, Response} = gen_tcp:recv(Sock, byte_size(ExpectedAlert), 5000),
?assertEqual(ExpectedAlert, Response),
%% Then close the connection
?assertEqual({error, closed}, gen_tcp:recv(Sock, 0, 2000)),
gen_tcp:close(Sock),
?assertEqual(
1, mtp_test_metric:get_tags(
count, [?APP, protocol_error, total], [?FUNCTION_NAME, tls_bad_client_hello])).
setup_single(Name, MtpPort, DcCfg0, Cfg) ->
setup_single(Name, "127.0.0.1", MtpPort, DcCfg0, Cfg).