How to run you model with opencode with tool calling?

#4
by dfsafdsf - opened

CUDA_VISIBLE_DEVICES=1 ../bin/llama-server -m MiniCPM5-1B-Q4_K_M.gguf -c 30003 -ngl 99 --cont-batching --host 0.0.0.0 --port 1113 -fa on -b 2048 -ub 1024 -t 8 -tb 8 --no-mmap --kv-unified --alias minicpm5 --jinja

image

dfsafdsf changed discussion title from How to run you model with opencode? to How to run you model with opencode with tool calling?
OpenBMB org

Hi @dfsafdsf
We have a PR that adds tool call support to llama.cpp, but it has not been merged upstream yet. If you need this functionality right away, you can temporarily apply the patch from the PR below.
https://github.com/ggml-org/llama.cpp/pull/23802

Hi @dfsafdsf
We have a PR that adds tool call support to llama.cpp, but it has not been merged upstream yet. If you need this functionality right away, you can temporarily apply the patch from the PR below.
https://github.com/ggml-org/llama.cpp/pull/23802

it's not working (screenshots in pull request )

OpenBMB org

Thanks for testing and sharing the screenshots! πŸ™

One thing worth noting: MiniCPM5-1B is a fairly small model (1B parameters), so on out-of-distribution inputs β€” such as opencode's specific system prompts and tool-calling format, which differ from our training distribution β€” it's more prone to falling into repetition loops than larger models. The tool-calling patch enables the capability, but the model can still degrade on prompts that are far from what it saw during training.

A few things that often help mitigate this:

Add/raise a repetition penalty, e.g. --repeat-penalty 1.1 (and optionally --repeat-last-n 256).
Try --presence-penalty / --frequency-penalty instead of or alongside repeat-penalty.
Keep sampling slightly stochastic (e.g. --temp 0.7 --top-p 0.8) rather than near-greedy, which tends to loop more.
If opencode lets you customize the system prompt, simplifying/shortening it to be closer to a standard tool-calling format can reduce OOD behavior.
If you can share the exact prompt + the repeated output, we're happy to take a closer look.

suhmily changed discussion status to closed

Sign up or log in to comment