sqlmap/doc/ARCHITECTURE.md
2026-06-15 21:44:04 +02:00

237 lines
12 KiB
Markdown

# sqlmap architecture
A contributor-oriented map of how sqlmap is put together: the major components,
how a run flows through them, and where to start looking for a given concern.
> This is a map, not a spec. It describes the durable structure and data flow; for
> exact signatures, option names, and enumerable lists (tampers, DBMSes, options),
> the source is authoritative. **When this document disagrees with the code, the code wins.**
sqlmap runs on both Python 2.7 and 3.x; sources are kept pure-ASCII unless a literal
non-ASCII byte is unavoidable. Compatibility shims live in `lib/core/compat.py` and
`thirdparty/six`.
---
## 1. Entry points
| Entry | File | Purpose |
|-------|------|---------|
| CLI | `sqlmap.py` -> `main()` | the scanner. Applies runtime patches, parses options, runs a scan. |
| REST API | `sqlmapapi.py` | `-s` server / `-c` client wrappers around `lib/utils/api.py`. |
`main()` (sqlmap.py) does, in order: `dirtyPatches()` (monkey-patches stdlib for
quirks/security - see below), `setPaths()`, `init()` (option parsing + environment
setup), then dispatches to `start()` for a normal scan, or to the self-tests
(`--smoke` / `--vuln-test` / `--api-test`) in `lib/core/testing.py`.
---
## 2. Global state: `conf` and `kb`
Almost everything hangs off two process-global singletons defined in `lib/core/data.py`,
both `AttribDict` (attribute-accessible dicts; missing keys read back as `None`):
- **`conf`** - the resolved user configuration (options + derived settings). What the
user asked for.
- **`kb`** ("knowledge base") - mutable runtime state discovered during a run
(identified DBMS, injection points, page templates, caches, locks, counters).
The configuration pipeline (`lib/core/`):
- `parse/cmdline.py` - argparse definition of every CLI option.
- `core/optiondict.py` - option name -> type map (used for config-file/API coercion).
- `core/defaults.py` - default values.
- `core/option.py` - the heavy lifter: `_setConfAttributes()`, `_setKnowledgeBaseAttributes()`,
`_setHTTPHandlers()` (installs the global urllib opener incl. keep-alive), DBMS/encoding
setup, etc. Merges CLI + config file + defaults into `conf`/`kb`.
- `core/settings.py` - constants, version, regexes, thresholds. **New constants go here.**
Identifiers in the codebase are camelCase.
---
## 3. Top-level layout
| Path | Responsibility |
|------|----------------|
| `lib/core/` | conf/kb model, common helpers, settings, enums, dump, session, agent, option parsing |
| `lib/controller/` | the scan orchestrator (`controller.py`), detection checks (`checks.py`), enumeration dispatch (`action.py`), DBMS handler selection (`handler.py`) |
| `lib/request/` | HTTP layer: `connect.py` (sending), `comparison.py` (the true/false oracle), `inject.py` (value extraction), protocol handlers, response processing |
| `lib/techniques/` | the exploitation engines: `blind/inference.py`, `error/use.py`, `union/{test,use}.py`, `dns/` |
| `lib/parse/` | parsing of inputs: CLI, config, HTTP request/log files, HTML, sitemap, and the XML payload/boundary loader (`payloads.py`) |
| `lib/utils/` | feature modules: `api.py` (REST), `hashdb.py` (session), `crawler.py`, `hash.py` (cracking), `har.py`, `brute.py`, `search.py`, ... |
| `lib/takeover/` | OS-level takeover: shells, file access, UDF, registry, Metasploit, `xp_cmdshell` |
| `plugins/generic/` | DBMS-agnostic enumeration/fingerprint/filesystem/takeover base classes |
| `plugins/dbms/<dbms>/` | per-DBMS subclasses + dialect (one dir per supported DBMS) |
| `tamper/` | payload-mutation scripts (WAF bypass), one `tamper()` per file |
| `data/xml/` | the data-driven engine: `boundaries.xml`, `payloads/*.xml`, `queries.xml`, `errors.xml` |
| `data/` (other) | wordlists/common tables/columns (`txt/`), UDFs (`udf/`), stored procs (`procs/`), shells (`shell/`) |
| `tests/` | stdlib-unittest suite (offline); see section 11 |
| `thirdparty/` | vendored dependencies (six, bottle, keepalive, chardet, ...) - no pip at runtime |
| `extra/` | auxiliary tools (e.g. `vulnserver` used by `--vuln-test`) |
---
## 4. The scan lifecycle (`lib/controller/controller.py: start()`)
For each target:
1. **Target setup** - `initTargetEnv()` / `setupTargetEnv()` (`lib/core/target.py`):
resolve URL/params, open the per-target output dir and session file
(`conf.hashDBFile`), and **resume** anything already known (DBMS, injection points,
cached values) from the session.
2. **Connection & profiling** (`lib/controller/checks.py`): `checkConnection()`,
`checkWaf()` (fills `kb.identifiedWafs`), `checkStability()` /
dynamic-content detection (establishes `kb.pageTemplate`, `kb.matchRatio`).
3. **Heuristics** - `heuristicCheckSqlInjection()` (cheap error-based hint).
4. **Detection** - `checkSqlInjection(place, parameter, value)` per parameter, driven by
the data engine (section 5). Confirmed points are appended to `kb.injections`.
5. **Fingerprint & handler** - `lib/controller/handler.py: setHandler()` identifies the
back-end DBMS and assigns `conf.dbmsHandler`, the object through which all
enumeration is dispatched (section 7).
6. **Action** - `action()` (`lib/controller/action.py`) routes the requested operation
(`--banner`, `--dbs`, `--tables`, `--dump`, `--sql-query`, `--os-shell`, ...) to
`conf.dbmsHandler` methods, and feeds results to `conf.dumper`.
If nothing is injectable, the dead-end advisory (level/risk, technique, `--text-only`,
`--tamper` - definitive when `kb.identifiedWafs` is set) is raised as
`SqlmapNotVulnerableException`.
---
## 5. The data-driven detection engine
Detection behavior lives in **data, not code** - `data/xml/`, loaded by
`lib/parse/payloads.py` (`loadBoundaries()`, `loadPayloads()`):
- **`boundaries.xml`** - injection *boundaries*: prefix/suffix pairs and the
clause/where/parameter-type context they apply to (e.g. quote vs. numeric contexts).
- **`payloads/*.xml`** - the *tests*, one file per technique
(`boolean_blind`, `error_based`, `inline_query`, `stacked_queries`, `time_blind`,
`union_query`), each with the request template and the comparison/grep logic that
decides success.
`getSortedInjectionTests()` (`lib/core/common.py`) orders the candidate tests by the
identified/likely DBMS, `--level`, and `--risk`. The **agent** (`lib/core/agent.py`)
forges the actual payload string - applying boundary prefix/suffix, the `[RANDNUM]`/
`[DELIMITER]`-style markers, comments, and tamper scripts. Requests go out via
`lib/request/connect.py`; the **oracle** `lib/request/comparison.py` decides true/false
by comparing the response against `kb.pageTemplate` (difflib ratio vs. `kb.matchRatio`,
plus titles/errors/HTTP-code signals).
---
## 6. Exploitation techniques
Once a parameter is injectable, value extraction is dispatched by
`lib/request/inject.py: getValue()` to the matching engine in `lib/techniques/`:
| Technique | Engine | Mechanism |
|-----------|--------|-----------|
| boolean-based blind | `blind/inference.py: bisection()` | binary-search each character via true/false oracle |
| time-based blind / stacked | `blind/inference.py` (time compare) | same bisection, oracle is a measured delay |
| error-based | `error/use.py: errorUse()` | parse the value straight out of a provoked DB error |
| UNION query | `union/{test,use}.py` | column-count detection then `UNION SELECT` extraction |
| inline query | (inline, via inject) | value embedded in the original query position |
| DNS exfiltration | `dns/` | `--dns-domain` out-of-band channel |
`bisection()` is the hot loop; it caches the `--charset` table in
`kb.cache.charsetAsciiTbl` and respects the `kb.disableShiftTable` runaway-guard latch
(intentional). Multi-threaded extraction is coordinated via `kb.locks` and
`getCurrentThreadData()` (`lib/core/threads.py`).
---
## 7. DBMS abstraction
Enumeration is DBMS-agnostic at the top and specialized underneath:
- **`plugins/generic/`** - base classes for each concern: `fingerprint.py`,
`enumeration.py`, `databases.py`, `entries.py`, `users.py`, `filesystem.py`,
`takeover.py`, `syntax.py`, `misc.py`, `search.py`, `custom.py`, `connector.py`
(direct DB connection for `-d`).
- **`plugins/dbms/<dbms>/`** - one directory per supported DBMS, subclassing the generic
pieces and supplying dialect specifics.
- **`data/xml/queries.xml`** - per-DBMS SQL query templates (banner, current user, table
enumeration, casting, etc.) keyed by DBMS. The generic code asks for a query by name;
the dialect comes from XML.
`conf.dbmsHandler` (set in `handler.py`) is the live object that `action()` calls into.
---
## 8. Output and session
- **Output** - `conf.dumper` is a `Dump` instance (`lib/core/dump.py`): console tables
plus per-table file export in CSV / HTML / SQLITE / JSONL (`--dump-format`). Logging
is via `logger` (`lib/core/log.py`).
- **Session / resume** - each target gets a SQLite session file
(`<output>/<host>/session.sqlite`). `hashDBWrite()` / `hashDBRetrieve()`
(`lib/core/common.py`, backed by `lib/utils/hashdb.py`) cache injection points,
fingerprint, and extracted values so a re-run *resumes* instead of re-testing
(`--flush-session` discards it; `--fresh-queries` ignores cached query results). A
stale-session nudge fires on resume when the file is older than `HASHDB_STALE_DAYS`.
---
## 9. Request layer and tampering
`lib/request/connect.py` (`Connect.getPage`) is the single HTTP chokepoint. Around it:
protocol handlers (`httpshandler`, `redirecthandler`, `chunkedhandler`, `rangehandler`,
keep-alive via `thirdparty/keepalive`), response processing (`basic.py`), and the
comparison oracle (`comparison.py`).
**Tamper scripts** (`tamper/`) mutate the payload just before sending to evade WAF/IPS.
Each file exposes a `tamper(payload, **kwargs)` and a `__priority__`; `--tamper=a,b,c`
chains them in priority order. They are payload-string transforms only (no engine
coupling), which is why they compose freely.
---
## 10. REST API and JSON report
`lib/utils/api.py` runs a Bottle server (`sqlmapapi.py -s`) that drives sqlmap scans as
subprocesses and exposes them over HTTP. Key pieces: `DataStore`/`Task` (task registry),
an IPC SQLite `Database` (the subprocess writes results/logs/errors back through
`StdDbOut`), and the route handlers (`/task/*`, `/option/*`, `/scan/*`, `/version`, ...).
The contract is documented in `sqlmapapi.yaml` (OpenAPI) and `REST-API.md`.
`--report-json` reuses the *same* assembly code (`_assembleData` / `_sanitizeScanData`)
that the `/scan/<id>/data` endpoint uses, so the CLI report and the API result can't
drift; `RESTAPI_VERSION` is the API contract version (major exposed as integer).
---
## 11. Tests and self-tests
Two complementary layers:
- **Offline unit/regression suite** (`tests/`) - stdlib `unittest` only (no pytest/pip),
green on py2 + py3. `_testutils.py` bootstraps global state and provides the
property/fuzz harness (`Rng` - a cross-version-identical PRNG - and `for_all`). Run:
`python -B -m unittest discover -s tests -p "test_*.py"` (`-B` matters: a cached `.pyc`
makes a `getFileType(__file__)` doctest see `binary`).
- **In-tree self-tests** (`lib/core/testing.py`, hidden switches): `--smoke-test`
(doctests + regex sanity over the whole tree), `--vuln-test` (end-to-end scans against
the bundled `extra/vulnserver`), `--api-test` (live REST round-trip). The CI workflow
(`.github/workflows/tests.yml`) runs all of these.
---
## 12. "Where do I start for ...?"
| I want to change... | Start in |
|---------------------|----------|
| a CLI option | `lib/parse/cmdline.py` (+ `optiondict.py`, `defaults.py`) |
| a constant/threshold | `lib/core/settings.py` |
| how injection is *detected* | `data/xml/boundaries.xml` + `data/xml/payloads/*.xml`, then `lib/controller/checks.py` |
| how a value is *extracted* | `lib/request/inject.py` + the relevant `lib/techniques/` engine |
| the true/false decision | `lib/request/comparison.py` |
| a per-DBMS query/dialect | `data/xml/queries.xml` + `plugins/dbms/<dbms>/` |
| enumeration behavior | `plugins/generic/*.py` |
| dump/output format | `lib/core/dump.py` |
| a WAF-bypass transform | add a file under `tamper/` |
| the REST API surface | `lib/utils/api.py` (+ keep `sqlmapapi.yaml` in sync) |
| session/resume behavior | `lib/utils/hashdb.py` + `hashDB*` in `lib/core/common.py` |
| a stdlib monkey-patch / security shim | `lib/core/patch.py` |