ollama

mirror of https://github.com/ollama/ollama.git synced 2026-05-13 14:27:00 +00:00

History

Daniel Hiltgen ec9b4e9e47 tokenizer: fix multi-regex BPE offset handling (#15844 ) Use the current fragment offset when emitting unmatched spans during multi-regex BPE splitting. This avoids duplicating earlier prompt text and inflating token counts for multi-stage BPE tokenizers.		2026-04-27 14:14:27 -07:00
..
testdata	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
bytepairencoding.go	tokenizer: fix multi-regex BPE offset handling (#15844 )	2026-04-27 14:14:27 -07:00
bytepairencoding_test.go	tokenizer: fix multi-regex BPE offset handling (#15844 )	2026-04-27 14:14:27 -07:00
sentencepiece.go	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
sentencepiece_test.go	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
tokenizer.go	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
vocabulary.go	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
vocabulary_test.go	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
wordpiece.go	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
wordpiece_test.go	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00