mirror of
https://github.com/sqlmapproject/sqlmap.git
synced 2026-06-28 12:31:00 +00:00
237 lines
12 KiB
Markdown
237 lines
12 KiB
Markdown
# sqlmap architecture
|
|
|
|
A contributor-oriented map of how sqlmap is put together: the major components,
|
|
how a run flows through them, and where to start looking for a given concern.
|
|
|
|
> This is a map, not a spec. It describes the durable structure and data flow; for
|
|
> exact signatures, option names, and enumerable lists (tampers, DBMSes, options),
|
|
> the source is authoritative. **When this document disagrees with the code, the code wins.**
|
|
|
|
sqlmap runs on both Python 2.7 and 3.x; sources are kept pure-ASCII unless a literal
|
|
non-ASCII byte is unavoidable. Compatibility shims live in `lib/core/compat.py` and
|
|
`thirdparty/six`.
|
|
|
|
---
|
|
|
|
## 1. Entry points
|
|
|
|
| Entry | File | Purpose |
|
|
|-------|------|---------|
|
|
| CLI | `sqlmap.py` -> `main()` | the scanner. Applies runtime patches, parses options, runs a scan. |
|
|
| REST API | `sqlmapapi.py` | `-s` server / `-c` client wrappers around `lib/utils/api.py`. |
|
|
|
|
`main()` (sqlmap.py) does, in order: `dirtyPatches()` (monkey-patches stdlib for
|
|
quirks/security - see below), `setPaths()`, `init()` (option parsing + environment
|
|
setup), then dispatches to `start()` for a normal scan, or to the self-tests
|
|
(`--smoke` / `--vuln-test` / `--api-test`) in `lib/core/testing.py`.
|
|
|
|
---
|
|
|
|
## 2. Global state: `conf` and `kb`
|
|
|
|
Almost everything hangs off two process-global singletons defined in `lib/core/data.py`,
|
|
both `AttribDict` (attribute-accessible dicts; missing keys read back as `None`):
|
|
|
|
- **`conf`** - the resolved user configuration (options + derived settings). What the
|
|
user asked for.
|
|
- **`kb`** ("knowledge base") - mutable runtime state discovered during a run
|
|
(identified DBMS, injection points, page templates, caches, locks, counters).
|
|
|
|
The configuration pipeline (`lib/core/`):
|
|
|
|
- `parse/cmdline.py` - argparse definition of every CLI option.
|
|
- `core/optiondict.py` - option name -> type map (used for config-file/API coercion).
|
|
- `core/defaults.py` - default values.
|
|
- `core/option.py` - the heavy lifter: `_setConfAttributes()`, `_setKnowledgeBaseAttributes()`,
|
|
`_setHTTPHandlers()` (installs the global urllib opener incl. keep-alive), DBMS/encoding
|
|
setup, etc. Merges CLI + config file + defaults into `conf`/`kb`.
|
|
- `core/settings.py` - constants, version, regexes, thresholds. **New constants go here.**
|
|
|
|
Identifiers in the codebase are camelCase.
|
|
|
|
---
|
|
|
|
## 3. Top-level layout
|
|
|
|
| Path | Responsibility |
|
|
|------|----------------|
|
|
| `lib/core/` | conf/kb model, common helpers, settings, enums, dump, session, agent, option parsing |
|
|
| `lib/controller/` | the scan orchestrator (`controller.py`), detection checks (`checks.py`), enumeration dispatch (`action.py`), DBMS handler selection (`handler.py`) |
|
|
| `lib/request/` | HTTP layer: `connect.py` (sending), `comparison.py` (the true/false oracle), `inject.py` (value extraction), protocol handlers, response processing |
|
|
| `lib/techniques/` | the exploitation engines: `blind/inference.py`, `error/use.py`, `union/{test,use}.py`, `dns/` |
|
|
| `lib/parse/` | parsing of inputs: CLI, config, HTTP request/log files, HTML, sitemap, and the XML payload/boundary loader (`payloads.py`) |
|
|
| `lib/utils/` | feature modules: `api.py` (REST), `hashdb.py` (session), `crawler.py`, `hash.py` (cracking), `har.py`, `brute.py`, `search.py`, ... |
|
|
| `lib/takeover/` | OS-level takeover: shells, file access, UDF, registry, Metasploit, `xp_cmdshell` |
|
|
| `plugins/generic/` | DBMS-agnostic enumeration/fingerprint/filesystem/takeover base classes |
|
|
| `plugins/dbms/<dbms>/` | per-DBMS subclasses + dialect (one dir per supported DBMS) |
|
|
| `tamper/` | payload-mutation scripts (WAF bypass), one `tamper()` per file |
|
|
| `data/xml/` | the data-driven engine: `boundaries.xml`, `payloads/*.xml`, `queries.xml`, `errors.xml` |
|
|
| `data/` (other) | wordlists/common tables/columns (`txt/`), UDFs (`udf/`), stored procs (`procs/`), shells (`shell/`) |
|
|
| `tests/` | stdlib-unittest suite (offline); see section 11 |
|
|
| `thirdparty/` | vendored dependencies (six, bottle, keepalive, chardet, ...) - no pip at runtime |
|
|
| `extra/` | auxiliary tools (e.g. `vulnserver` used by `--vuln-test`) |
|
|
|
|
---
|
|
|
|
## 4. The scan lifecycle (`lib/controller/controller.py: start()`)
|
|
|
|
For each target:
|
|
|
|
1. **Target setup** - `initTargetEnv()` / `setupTargetEnv()` (`lib/core/target.py`):
|
|
resolve URL/params, open the per-target output dir and session file
|
|
(`conf.hashDBFile`), and **resume** anything already known (DBMS, injection points,
|
|
cached values) from the session.
|
|
2. **Connection & profiling** (`lib/controller/checks.py`): `checkConnection()`,
|
|
`checkWaf()` (fills `kb.identifiedWafs`), `checkStability()` /
|
|
dynamic-content detection (establishes `kb.pageTemplate`, `kb.matchRatio`).
|
|
3. **Heuristics** - `heuristicCheckSqlInjection()` (cheap error-based hint).
|
|
4. **Detection** - `checkSqlInjection(place, parameter, value)` per parameter, driven by
|
|
the data engine (section 5). Confirmed points are appended to `kb.injections`.
|
|
5. **Fingerprint & handler** - `lib/controller/handler.py: setHandler()` identifies the
|
|
back-end DBMS and assigns `conf.dbmsHandler`, the object through which all
|
|
enumeration is dispatched (section 7).
|
|
6. **Action** - `action()` (`lib/controller/action.py`) routes the requested operation
|
|
(`--banner`, `--dbs`, `--tables`, `--dump`, `--sql-query`, `--os-shell`, ...) to
|
|
`conf.dbmsHandler` methods, and feeds results to `conf.dumper`.
|
|
|
|
If nothing is injectable, the dead-end advisory (level/risk, technique, `--text-only`,
|
|
`--tamper` - definitive when `kb.identifiedWafs` is set) is raised as
|
|
`SqlmapNotVulnerableException`.
|
|
|
|
---
|
|
|
|
## 5. The data-driven detection engine
|
|
|
|
Detection behavior lives in **data, not code** - `data/xml/`, loaded by
|
|
`lib/parse/payloads.py` (`loadBoundaries()`, `loadPayloads()`):
|
|
|
|
- **`boundaries.xml`** - injection *boundaries*: prefix/suffix pairs and the
|
|
clause/where/parameter-type context they apply to (e.g. quote vs. numeric contexts).
|
|
- **`payloads/*.xml`** - the *tests*, one file per technique
|
|
(`boolean_blind`, `error_based`, `inline_query`, `stacked_queries`, `time_blind`,
|
|
`union_query`), each with the request template and the comparison/grep logic that
|
|
decides success.
|
|
|
|
`getSortedInjectionTests()` (`lib/core/common.py`) orders the candidate tests by the
|
|
identified/likely DBMS, `--level`, and `--risk`. The **agent** (`lib/core/agent.py`)
|
|
forges the actual payload string - applying boundary prefix/suffix, the `[RANDNUM]`/
|
|
`[DELIMITER]`-style markers, comments, and tamper scripts. Requests go out via
|
|
`lib/request/connect.py`; the **oracle** `lib/request/comparison.py` decides true/false
|
|
by comparing the response against `kb.pageTemplate` (difflib ratio vs. `kb.matchRatio`,
|
|
plus titles/errors/HTTP-code signals).
|
|
|
|
---
|
|
|
|
## 6. Exploitation techniques
|
|
|
|
Once a parameter is injectable, value extraction is dispatched by
|
|
`lib/request/inject.py: getValue()` to the matching engine in `lib/techniques/`:
|
|
|
|
| Technique | Engine | Mechanism |
|
|
|-----------|--------|-----------|
|
|
| boolean-based blind | `blind/inference.py: bisection()` | binary-search each character via true/false oracle |
|
|
| time-based blind / stacked | `blind/inference.py` (time compare) | same bisection, oracle is a measured delay |
|
|
| error-based | `error/use.py: errorUse()` | parse the value straight out of a provoked DB error |
|
|
| UNION query | `union/{test,use}.py` | column-count detection then `UNION SELECT` extraction |
|
|
| inline query | (inline, via inject) | value embedded in the original query position |
|
|
| DNS exfiltration | `dns/` | `--dns-domain` out-of-band channel |
|
|
|
|
`bisection()` is the hot loop; it caches the `--charset` table in
|
|
`kb.cache.charsetAsciiTbl` and respects the `kb.disableShiftTable` runaway-guard latch
|
|
(intentional). Multi-threaded extraction is coordinated via `kb.locks` and
|
|
`getCurrentThreadData()` (`lib/core/threads.py`).
|
|
|
|
---
|
|
|
|
## 7. DBMS abstraction
|
|
|
|
Enumeration is DBMS-agnostic at the top and specialized underneath:
|
|
|
|
- **`plugins/generic/`** - base classes for each concern: `fingerprint.py`,
|
|
`enumeration.py`, `databases.py`, `entries.py`, `users.py`, `filesystem.py`,
|
|
`takeover.py`, `syntax.py`, `misc.py`, `search.py`, `custom.py`, `connector.py`
|
|
(direct DB connection for `-d`).
|
|
- **`plugins/dbms/<dbms>/`** - one directory per supported DBMS, subclassing the generic
|
|
pieces and supplying dialect specifics.
|
|
- **`data/xml/queries.xml`** - per-DBMS SQL query templates (banner, current user, table
|
|
enumeration, casting, etc.) keyed by DBMS. The generic code asks for a query by name;
|
|
the dialect comes from XML.
|
|
|
|
`conf.dbmsHandler` (set in `handler.py`) is the live object that `action()` calls into.
|
|
|
|
---
|
|
|
|
## 8. Output and session
|
|
|
|
- **Output** - `conf.dumper` is a `Dump` instance (`lib/core/dump.py`): console tables
|
|
plus per-table file export in CSV / HTML / SQLITE / JSONL (`--dump-format`). Logging
|
|
is via `logger` (`lib/core/log.py`).
|
|
- **Session / resume** - each target gets a SQLite session file
|
|
(`<output>/<host>/session.sqlite`). `hashDBWrite()` / `hashDBRetrieve()`
|
|
(`lib/core/common.py`, backed by `lib/utils/hashdb.py`) cache injection points,
|
|
fingerprint, and extracted values so a re-run *resumes* instead of re-testing
|
|
(`--flush-session` discards it; `--fresh-queries` ignores cached query results). A
|
|
stale-session nudge fires on resume when the file is older than `HASHDB_STALE_DAYS`.
|
|
|
|
---
|
|
|
|
## 9. Request layer and tampering
|
|
|
|
`lib/request/connect.py` (`Connect.getPage`) is the single HTTP chokepoint. Around it:
|
|
protocol handlers (`httpshandler`, `redirecthandler`, `chunkedhandler`, `rangehandler`,
|
|
keep-alive via `thirdparty/keepalive`), response processing (`basic.py`), and the
|
|
comparison oracle (`comparison.py`).
|
|
|
|
**Tamper scripts** (`tamper/`) mutate the payload just before sending to evade WAF/IPS.
|
|
Each file exposes a `tamper(payload, **kwargs)` and a `__priority__`; `--tamper=a,b,c`
|
|
chains them in priority order. They are payload-string transforms only (no engine
|
|
coupling), which is why they compose freely.
|
|
|
|
---
|
|
|
|
## 10. REST API and JSON report
|
|
|
|
`lib/utils/api.py` runs a Bottle server (`sqlmapapi.py -s`) that drives sqlmap scans as
|
|
subprocesses and exposes them over HTTP. Key pieces: `DataStore`/`Task` (task registry),
|
|
an IPC SQLite `Database` (the subprocess writes results/logs/errors back through
|
|
`StdDbOut`), and the route handlers (`/task/*`, `/option/*`, `/scan/*`, `/version`, ...).
|
|
The contract is documented in `sqlmapapi.yaml` (OpenAPI) and `REST-API.md`.
|
|
|
|
`--report-json` reuses the *same* assembly code (`_assembleData` / `_sanitizeScanData`)
|
|
that the `/scan/<id>/data` endpoint uses, so the CLI report and the API result can't
|
|
drift; `RESTAPI_VERSION` is the API contract version (major exposed as integer).
|
|
|
|
---
|
|
|
|
## 11. Tests and self-tests
|
|
|
|
Two complementary layers:
|
|
|
|
- **Offline unit/regression suite** (`tests/`) - stdlib `unittest` only (no pytest/pip),
|
|
green on py2 + py3. `_testutils.py` bootstraps global state and provides the
|
|
property/fuzz harness (`Rng` - a cross-version-identical PRNG - and `for_all`). Run:
|
|
`python -B -m unittest discover -s tests -p "test_*.py"` (`-B` matters: a cached `.pyc`
|
|
makes a `getFileType(__file__)` doctest see `binary`).
|
|
- **In-tree self-tests** (`lib/core/testing.py`, hidden switches): `--smoke-test`
|
|
(doctests + regex sanity over the whole tree), `--vuln-test` (end-to-end scans against
|
|
the bundled `extra/vulnserver`), `--api-test` (live REST round-trip). The CI workflow
|
|
(`.github/workflows/tests.yml`) runs all of these.
|
|
|
|
---
|
|
|
|
## 12. "Where do I start for ...?"
|
|
|
|
| I want to change... | Start in |
|
|
|---------------------|----------|
|
|
| a CLI option | `lib/parse/cmdline.py` (+ `optiondict.py`, `defaults.py`) |
|
|
| a constant/threshold | `lib/core/settings.py` |
|
|
| how injection is *detected* | `data/xml/boundaries.xml` + `data/xml/payloads/*.xml`, then `lib/controller/checks.py` |
|
|
| how a value is *extracted* | `lib/request/inject.py` + the relevant `lib/techniques/` engine |
|
|
| the true/false decision | `lib/request/comparison.py` |
|
|
| a per-DBMS query/dialect | `data/xml/queries.xml` + `plugins/dbms/<dbms>/` |
|
|
| enumeration behavior | `plugins/generic/*.py` |
|
|
| dump/output format | `lib/core/dump.py` |
|
|
| a WAF-bypass transform | add a file under `tamper/` |
|
|
| the REST API surface | `lib/utils/api.py` (+ keep `sqlmapapi.yaml` in sync) |
|
|
| session/resume behavior | `lib/utils/hashdb.py` + `hashDB*` in `lib/core/common.py` |
|
|
| a stdlib monkey-patch / security shim | `lib/core/patch.py` |
|