Inference Providers
Active filters: rlvr
lastmass/Qwen3.5-Medical-GSPO
Image-Text-to-Text
• 5B • Updated • 4.27k
• 8
nishantup/nanogpt-rlvr-slm-tinystories-124m
Text Generation
• Updated • 1.65k
• 1
EvanOLeary/laguna-xs2-dense-k8-cuda-grpo
Text Generation
• 3B • Updated • 121
• 1
JayZenith/RLVR_HELDOUT69_PASSK_STEP25
Text Generation
• 4B • Updated • 1
SultanR/SmolTulu-1.7b-Reinforced-GGUF
Text Generation
• 2B • Updated • 19
• 1
thuml/rt1-world-model-multi-step-rlvr
thuml/rt1-world-model-single-step-rlvr
thuml/webarena-world-model-rlvr
2B • Updated • 4
thuml/bytesized32-world-model-rlvr-binary-reward
2B • Updated • 5
thuml/bytesized32-world-model-rlvr-task-specific-reward
2B • Updated • 5
DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO
Text Generation
• 8B • Updated • 8
• 1
Question Answering
• 4B • Updated • 9
• 2
thinkwee/NOVER1-Qwen2.5-7B
Question Answering
• 8B • Updated • 7
• 2
mradermacher/NOVER1-Qwen3-4B-GGUF
4B • Updated • 179
• 1
mradermacher/NOVER1-Qwen2.5-7B-GGUF
8B • Updated • 635
• 1
mradermacher/NOVER1-Qwen3-4B-i1-GGUF
4B • Updated • 311
• 1
mradermacher/NOVER1-Qwen2.5-7B-i1-GGUF
8B • Updated • 720
• 1
DebateLabKIT/Phi-4-Argunaut-1-HIRPO
Text Generation
• 415k • Updated • 5
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-GGUF
8B • Updated • 157
• 1
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-i1-GGUF
8B • Updated • 313
• 1
Text Generation
• 2B • Updated • 13
• 9
Text Generation
• 4B • Updated • 8
• 1
mradermacher/airesupdated-v2-GGUF
Reinforcement Learning
• 4B • Updated • 106
ABaroian/Apertus-8B-RLVR-GSM
Text Generation
• Updated • 2
Anonymouslolol/qwen3-8B-hanabi-step110
Reinforcement Learning
• Updated • 22
Text Generation
• 4B • Updated • 1
anonymousatom/IntelliAsk-Qwen3-32B-450-Merged
Text Generation
• 33B • Updated • 28
mradermacher/IntelliAsk-Qwen3-32B-450-Merged-GGUF
Reinforcement Learning
• 33B • Updated • 160
mradermacher/Phi-4-Argunaut-1-HIRPO-GGUF
15B • Updated • 172
mradermacher/Phi-4-Argunaut-1-HIRPO-i1-GGUF
15B • Updated • 642