How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf john-broadway/Qwen3-8B-RYS-16-19-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "john-broadway/Qwen3-8B-RYS-16-19-GGUF:Q4_K_M"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

Qwen3-8B-RYS-16-19

RYS-enhanced Qwen3-8B with layers 16-19 duplicated. 36 layers expanded to 39. Zero training, zero weight changes.

Math +6.7%, Reasoning +23.5%. Baseline reasoning healed from 53% to 76%.

Results

Metric Baseline RYS (16,19) Delta
Math 0.6568 0.7240 +6.7%
EQ 91.91 90.74 -1.17
Reasoning 52.94% 76.47% +23.5%

117 configurations tested. The 8B's baseline reasoning was the weakest in the Qwen3 family (53%). RYS at (16,19) heals it to 76%.

Usage

llama-server -m Qwen3-8B-RYS-16-19-Q4_K_M.gguf -ngl 99

Full sweep data

117 configurations tested. Sweep results published with the model files.

Part of the v2 Qwen3-family cohort โ€” parallel Qwen3-family RYS-applied weights from April 2026, expanded alongside the v1 Qwen2.5 cross-scale collection. (The "four model scales" originally referenced here was a Qwen3-only expansion; the original v1 writeup described Qwen2.5 cross-scale + Qwen3-32B as headline.)

Where this sits in the Sovereign Collection

v1 โ€” Qwen2.5 cross-scale + Qwen3-32B headline crossover (the original v1 intent per the 2026-04-11 writeup). 5 model repos on HuggingFace; see john-broadway.

v2 Qwen3-family cohort (this card's cohort โ€” parallel Qwen3-family RYS-applied weights, April 2026):

v2 cross-architecture corpus (21 model variants spanning 10 architecture families): john-broadway/rys-sovereign-collection-v2

Attribution: John Broadway, with collaboration from Claude (Opus 4.6 in April 2026 build; Opus 4.7 in May 2026 cross-architecture analysis and family-relabeling). Original RYS method by David Ng on Qwen2-72B; sweep toolkit by alainnothere.


v2 cross-architecture context (2026-05-13)

This model's place in the v2 curve: baseline reasoning 52.94%, peak RYS ฮ” +29.41%. Of 117 swept configurations, 71 boost reasoning >5% โ€” the highest hit rate in the corpus. The (16,19) block is one of the consistent boosters at L16-19.

Across the 21 model variants (10 architecture families) surveyed in john-broadway/rys-sovereign-collection-v2:

  • Pearson r(baseline reasoning, peak RYS lift) = โˆ’0.726. Weak baselines lift more, in their weakest dimension.
  • Three RYS-recoverable suppression mechanisms identified: under-training scale, MoE routing inefficiency, specialization training trade-off.
  • One published negative result (SmolLM2-1.7B). RYS is not universal.

v2 attribution: John Broadway, with cross-architecture analysis by Claude (Opus 4.7). Original RYS method by David Ng; circuit-finder toolkit by alainnothere.

Downloads last month
221
GGUF
Model size
9B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for john-broadway/Qwen3-8B-RYS-16-19-GGUF

Finetuned
Qwen/Qwen3-8B
Quantized
(294)
this model

Collection including john-broadway/Qwen3-8B-RYS-16-19-GGUF