Missing tokenizer files?
This directory doesn't have the openvino_tokenizer.bin and openvino_tokenizer.xml files. Likewise for the detokenizer files. Should they be here? Or do I need to use tokenizer files from elsewhere?
This model does appear to be missing those files. I will reconvert and update. It's title may also change.
The openvino tokenizer/detokenizer are used for other optimization features instead of using the one provided during conversion, in this case, from Pytorch. For example, you can pass string tensors which would not work with Autotokenizers. You also have to handle chat templating yourself, much less dynamic than what apply_chat_template method allows for different models.
On the transformers end there are no checks for these tokenizer files so the pytorch files in the repo will work as is; however I have improved quants to update the weights with here. How are you infering the model?
How are you infering the model?
I'm on a bit of journey trying to get the best performance from my recently purchased A770 card, and suggestions on a now-deleted reddit thread have pointed me toward openvino. I tried OpenArc and OVMS but am now toying with openvino-api-server. This last one has the advantage of being almost trivial (~200 lines of Python), which I can summarize as:
# Setup
model_path = "Hermes-3-Llama-3.2-3B-awq-ov" # This model fails "Neither tokenizer nor detokenzier models were provided" but others work
device = "GPU"
tokenizer = openvino_genai.Tokenizer(model_path)
pipe = openvino_genai.LLMPipeline(model_path, tokenizer=tokenizer, device=device)
...
# Generation in response to HTTP requests
history = [{"role": m.role, "content": m.content} for m in messages]
model_inputs = tokenizer.apply_chat_template(history, add_generation_prompt=True)
answer = pipe.generate(model_inputs, max_new_tokens=max_tokens, streamer=streamer)
OpenArc is my project! Glad to hear people are using it. Would love some feedback
The OVMS targets a production scenario and the documentation can be confusing. I have not seen openvino-api-server before. You should join the discord sever. https://discord.gg/maMY7QjG. Other people who are working on the same thing have started showing up and it's pretty sick.
Are you looking for a simpler entrypoint to the framework?