-1
$\begingroup$

When converting mistralai/Mistral-Small-3.2-24B-Instruct-2506 to GGUF (via llama_cpp), I get an error saying the tokenizer.json file is missing. After re-examining the HF repo, there in fact, is not a tokenizer.json file in the repo; although there is one for the other models I have converted, and the base model.

However, others have successfully converted and quantised this model; which is my endgame. What do I need to do convert this model to GGUF?

Can I use the tokenizer.json file from the base model?

New contributor
s3dev is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$
1
  • 1
    $\begingroup$ Programming questions (like "why isn't this program working?") or questions related to specific software are off-topic here. We focus on the theory of AI. $\endgroup$ Commented Nov 26 at 15:17

1 Answer 1

0
$\begingroup$

You should be able to use tokenizer.json as the token tekken.json are nearly the same in both models. The only difference i found are:

  1. small uses 'v11' base uses 'v7' (v11 uses a different encoding in the token_str but the vocab is the same)
  2. v11 includes special tokens like [INST] [SYSTEM_PROMPT] etc. but they are already in tokenizer.json so that should be fine too
$\endgroup$
2
  • 1
    $\begingroup$ Thank you for the advice, I shall give that a try (then upvote / accept as appropriate). $\endgroup$ Commented Nov 25 at 22:52
  • 1
    $\begingroup$ Programming questions are off-topic here and we shouldn't answer them but instead vote to close them. $\endgroup$ Commented Nov 26 at 15:18

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.