mirror of
https://github.com/ollama/ollama.git
synced 2026-05-13 14:27:00 +00:00
Use the current fragment offset when emitting unmatched spans during multi-regex BPE splitting. This avoids duplicating earlier prompt text and inflating token counts for multi-stage BPE tokenizers. |
||
|---|---|---|
| .. | ||
| testdata | ||
| bytepairencoding.go | ||
| bytepairencoding_test.go | ||
| sentencepiece.go | ||
| sentencepiece_test.go | ||
| tokenizer.go | ||
| vocabulary.go | ||
| vocabulary_test.go | ||
| wordpiece.go | ||
| wordpiece_test.go | ||