DavidAU's picture
Update README.md
27e58a4 verified
|
Raw
History Blame Contribute Delete
1.59 kB
metadata
license: apache-2.0
tags:
  - 256k context
  - Qwen3
  - Mixture of Experts
  - MOE
  - MOE Dense
  - 2 experts
  - 4Bx6
  - All use cases
  - bfloat16
  - merge
  - creative
  - creative writing
  - fiction writing
  - plot generation
  - sub-plot generation
  - fiction writing
  - story generation
  - scene continue
  - storytelling
  - fiction story
  - science fiction
  - romance
  - all genres
  - story
  - writing
  - vivid prosing
  - vivid writing
  - fiction
pipeline_tag: text-generation
language:
  - en
library_name: transformers

Qwen3-24B-MOE-6x-4B-Star-Trek-AwayTeam-Instruct

A fully gated INSTRUCT MOE (Mixture of Experts) model of 24B compressed into 18B "model size".

This is a "Colab" between myself and Nightmedia.

Gating is based on Star Trek characters NAME(s) that each model said it was closest to during testing (no quotes):

  • "Q Continuum"
  • "[Q]"
  • "Enterprise Computer"
  • "Quark"
  • "Picard"
  • "Sisko"
  • "Janeway"
  • "Garak"
  • "Martok"
  • "Spock"
  • "Sarek"
  • "Data"
  • "Seven of Nine"
  • "Kira"
  • "Odo"
  • "Dr Crusher"
  • "Bashir"
  • "Worf"
  • "Klingons"

Use like:

"Sisko, [prompt here]"

or

"Sisko, Kira and Worf [prompt here]"

You can use:

"Away-Team" to address all experts.

Each model is isolated from one another and controlled using prompts and/or activation of additional experts.

You can set experts from 1 to 6 with default of 2.

Features:

  • Six of the top Qwen3 4B models (each benchmarked) in one package.
  • 2 experts activated (adjustable)
  • "programmable" model which features gating instructions embedded in prompts and/or system prompts.
  • 256k context.

[more coming soon...]