Instructions to use karakuri-ai/karakuri-lm-8x7b-chat-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use karakuri-ai/karakuri-lm-8x7b-chat-v0.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="karakuri-ai/karakuri-lm-8x7b-chat-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("karakuri-ai/karakuri-lm-8x7b-chat-v0.1")
model = AutoModelForCausalLM.from_pretrained("karakuri-ai/karakuri-lm-8x7b-chat-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use karakuri-ai/karakuri-lm-8x7b-chat-v0.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "karakuri-ai/karakuri-lm-8x7b-chat-v0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "karakuri-ai/karakuri-lm-8x7b-chat-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/karakuri-ai/karakuri-lm-8x7b-chat-v0.1

SGLang

How to use karakuri-ai/karakuri-lm-8x7b-chat-v0.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "karakuri-ai/karakuri-lm-8x7b-chat-v0.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "karakuri-ai/karakuri-lm-8x7b-chat-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "karakuri-ai/karakuri-lm-8x7b-chat-v0.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "karakuri-ai/karakuri-lm-8x7b-chat-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use karakuri-ai/karakuri-lm-8x7b-chat-v0.1 with Docker Model Runner:
```
docker model run hf.co/karakuri-ai/karakuri-lm-8x7b-chat-v0.1
```

karakuri-lm-8x7b-chat-v0.1 / README.md

ynakashima

Update README.md

ae4ed39 verified about 2 years ago

preview code

Raw

History Blame Contribute Delete

6.35 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- OpenAssistant/oasst2
	- nvidia/HelpSteer
	language:
	- en
	- ja
	tags:
	- mixtral
	- steerlm
	base_model: tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1
	model-index:
	- name: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MT-Bench
	type: unknown
	metrics:
	- type: unknown
	name: score
	value: 7.39375
	source:
	url: https://huggingface.co/spaces/lmsys/mt-bench
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MT-Bench-jp
	type: unknown
	metrics:
	- type: unknown
	name: score
	value: 7.540625
	source:
	url: https://api.wandb.ai/links/wandb-japan/6ff86bp3
	---

	# KARAKURI LM 8x7B Chat v0.1

	![KARAKURI LM](./thumbnail.png)

	## Model Details

	### Model Description

	- Developed by: [KARAKURI Inc.](https://about.karakuri.ai/)
	- Model type: Mixture of Experts (MoE)
	- Languages: Primarily English and Japanese
	- License: Apache 2.0
	- Finetuned from model: [tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1)
	- Contact: For questions and comments about the model, please email `karakuri-rd@karakuri.ai`
	- Demo: https://lm.karakuri.cc/

	## Usage

	### Warning

	The prompt format has been changed from [KARAKURI LM 70B Chat v0.1](https://huggingface.co/karakuri-ai/karakuri-lm-70b-chat-v0.1).
	Please make sure to follow the correct format.
	Otherwise, the model will generate sub-optimal outputs.

	### Prompt Format

	We use the following prompt template of multi-turn conversation in the Mistral format, which includes an encoded string of multiple attribute values.

	```python
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("karakuri-ai/karakuri-lm-8x7b-chat-v0.1")

	messages = [
	{"role": "system", "content": "System prompt"},
	{"role": "user", "content": "User prompt"},
	{"role": "assistant", "content": "Model response"},
	{"role": "user", "content": "User prompt"},
	]
	tokenizer.apply_chat_template(messages, tokenize=False)
	# <s>[INST] <<SYS>>
	# System prompt
	# <</SYS>>
	#
	# User prompt [ATTR] helpfulness: 4 correctness: 4 coherence: 4 complexity: 4 verbosity: 4 quality: 4 toxicity: 0 humor: 0 creativity: 0 [/ATTR] [/INST]Model response</s>[INST] User prompt [ATTR] helpfulness: 4 correctness: 4 coherence: 4 complexity: 4 verbosity: 4 quality: 4 toxicity: 0 humor: 0 creativity: 0 [/ATTR] [/INST]
	```

	The prompt template contains nine attributes.
	The first five are derived from HelpSteer, while the remaining four are derived from OASST2.
	The values are represented by integers ranging from 0 to 4, with 0 being the lowest and 4 being the highest.

	- helpfulness (default: 4): Overall helpfulness of the response to the prompt.
	- correctness (default: 4): Inclusion of all pertinent facts without errors.
	- coherence (default: 4): Consistency and clarity of expression.
	- complexity (default: 4): Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
	- verbosity (default: 4): Amount of detail included in the response, relative to what is asked for in the prompt.
	- quality (default: 4): Perceived goodness of response.
	- toxicity (default: 0): Undesirable elements such as vulgar, harmful or potentially biased response.
	- humor (default: 0): Sense of humor within response.
	- creativity (default: 0): Willingness to generate non-conventional response.

	If you want to change attribute values from the default values specified in the template, you can modify them to any values by adding the attribute values to the user messages:

	```python
	messages = [
	{"role": "user", "content": "User prompt", "helpfulness": 0, "complexity": 0},
	]
	tokenizer.apply_chat_template(messages, tokenize=False)
	# <s>[INST] User prompt [ATTR] helpfulness: 0 correctness: 4 coherence: 4 complexity: 0 verbosity: 4 quality: 4 toxicity: 0 humor: 0 creativity: 0 [/ATTR] [/INST]
	```

	### Run the model

	```python
	from transformers import AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained(
	"karakuri-ai/karakuri-lm-8x7b-chat-v0.1",
	torch_dtype="auto",
	device_map="auto",
	)

	messages = [
	{
	"role": "user",
	"content": "週末に日帰りで東京に遊びに行こうと思っています。日帰りなので、短時間で回れるおすすめの観光プランを教えてください。",
	},
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	return_tensors="pt",
	).to(model.device)
	outputs = model.generate(input_ids, max_new_tokens=512)
	tokenizer.decode(outputs[0][input_ids.shape[-1]:])
	```

	## Performance

	\| Model \| # Active Params \| Alignment \| MT-Bench-jp \|
	\| :----------------------------- \| :-------------: \| :---------: \| ----------: \|
	\| Qwen1.5 72B Chat \| 72B \| DPO \| 8.19 \|
	\| KARAKURI LM 8x7B Chat v0.1 \| 13B \| SteerLM \| 7.54 \|
	\| Command R+ \| 104B \| - \| 7.31 \|
	\| Mixtral 8x7B Instruct v0.1 \| 13B \| DPO \| 7.24 \|
	\| Llama 3 70B Instruct \| 70B \| RLHF \| 7.13 \|
	\| KARAKURI LM 70B Chat v0.1 \| 70B \| SteerLM \| 6.43 \|
	\| Llama 2 70B Chat \| 70B \| RLHF \| 5.23 \|

	## Training Details

	### Training Data

	- [OASST2](https://huggingface.co/datasets/OpenAssistant/oasst2)
	- [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer)
	- Internal Japanese dataset

	### Training Infrastructure

	- Hardware: The model was trained on 8 nodes of an Amazon EC2 trn1.32xlarge instance.
	- Software: We use code based on [neuronx-nemo-megatron](https://github.com/aws-neuron/neuronx-nemo-megatron).

	## Citation

	```
	@misc{karakuri_lm_8x7b_chat_v01,
	author = { {KARAKURI} {I}nc. },
	title = { {KARAKURI} {LM} 8x7{B} {C}hat v0.1 },
	year = { 2024 },
	url = { https://huggingface.co/karakuri-ai/karakuri-lm-8x7b-chat-v0.1 },
	publisher = { Hugging Face },
	journal = { Hugging Face repository }
	}
	```