kunishou/databricks-dolly-15k-ja
Viewer • Updated • 15k • 980 • 89
How to use FiShota/hinomoto-100m-v7 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="FiShota/hinomoto-100m-v7") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FiShota/hinomoto-100m-v7", dtype="auto")How to use FiShota/hinomoto-100m-v7 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FiShota/hinomoto-100m-v7"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "FiShota/hinomoto-100m-v7",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/FiShota/hinomoto-100m-v7
How to use FiShota/hinomoto-100m-v7 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "FiShota/hinomoto-100m-v7" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "FiShota/hinomoto-100m-v7",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "FiShota/hinomoto-100m-v7" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "FiShota/hinomoto-100m-v7",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use FiShota/hinomoto-100m-v7 with Docker Model Runner:
docker model run hf.co/FiShota/hinomoto-100m-v7
個人 GPU + アイディア + 実装速度で、世界水準の研究所と同じ土俵に踏み込む実験
HinoMoto-100M-v7 は 完全自前 (architecture / tokenizer / weights / corpus / bench) 日本語 LM の集約 SOTA model.
| Axis | Score | vs base 7B GGUF |
|---|---|---|
| family (n=110) | 6.83/12 (57%) | base: 0% |
| keigo (n=70) | 16% | base: - |
| silence (n=50) | 20% | base: - |
| degenerate (family) | 0% | base: 96% |
100M model on RTX 3090 で 26 分学習, base 7B GGUF を全軸 outperform.
| Item | Value |
|---|---|
| Hardware | 1× RTX 3090 (24GB VRAM) |
| Wallclock | ~26 分 |
| Throughput | 12,200 tok/s (fp32 + seq 512) |
| Optimizer | AdamW (β=0.9, 0.95), wd 0.1, grad_clip 1.0 |
| LR Schedule | warmup 400, cosine decay (min ratio 0.1), peak 3e-4 |
| Batch | 2 × grad_accum 4 = 8 sequences × 512 tokens = 4096 toks/step |
| Steps | 20,000 |
| Total tokens | ~82M |
| Stability | EMA decay 0.999, Loss spike detector (ratio 4.0, floor 1.5) |
| Seed | 0 |
| Source | License | Share | Size |
|---|---|---|---|
| 国会会議録 (Diet records, 1995-2024) | Public domain | 53% | 39 MB |
| 青空文庫 (Aozora Bunko, 610 works) | Copyright expired | 25% | 19 MB |
| Dolly-15k-ja + 自家製 family conv | CC-BY-SA | 21% | 16 MB |
| Total (balanced corpus v6) | 75 MB |
Tokenizer: tokenizer_v3_32k_clean.json (vocab 9,506, byte-level BPE).
Decontamination report: 8-gram overlap with HinoMotoBench-ja questions, mean 12-16% (formal phrases unavoidable).
import torch, json
from huggingface_hub import hf_hub_download
# Download weights + config + tokenizer
repo = "FiShota/hinomoto-100m-v7"
ckpt_path = hf_hub_download(repo_id=repo, filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id=repo, filename="config.json")
tok_path = hf_hub_download(repo_id=repo, filename="tokenizer.json")
# Need the HinoMoto codebase for model class
# git clone https://github.com/FIshota/hinomoto-model && cd hinomoto-model && pip install -e .
from hinomoto.tokenizer import ByteBPETokenizer
from hinomoto.model.hinomoto_model import HinoMotoConfig, HinoMotoModel
from hinomoto.infer.generate import generate_ids
# Load
cfg = HinoMotoConfig(**json.load(open(config_path)))
model = HinoMotoModel(cfg).to("cuda")
model.load_state_dict(torch.load(ckpt_path, weights_only=False))
model.eval()
tok = ByteBPETokenizer.load(tok_path)
# Generate
prompt = "今日もいい天気"
ids = tok.encode(prompt, add_bos=True)
inp = torch.tensor([ids], dtype=torch.long, device="cuda")
with torch.no_grad():
out_ids = generate_ids(model, inp, max_new_tokens=80,
temperature=0.7, top_p=0.9)
print(tok.decode(out_ids[0].tolist()))
temperature: 0.7 (家族会話) / 0.5 (敬語) / 0.3 (factual)top_p: 0.9max_new_tokens: 80 (推奨), 上限 256eos_boost: 2.0 (文末で EOS bias)import re
SENTENCE_END = re.compile(r"[。?!\n]")
LOOP_RE = re.compile(r"(.{4,30}?)\1{2,}")
def clean(text, max_sentences=2):
# 1. ループ検出 (n-gram repeat) → 切り詰め
m = LOOP_RE.search(text)
if m:
text = text[:m.start() + len(m.group(1))]
# 2. 文末で切り詰め
parts = []
last = 0
for m in SENTENCE_END.finditer(text):
parts.append(text[last:m.end()])
last = m.end()
if len(parts) >= max_sentences:
break
return "".join(parts).rstrip() if parts else text
→ family score +3pt, degenerate 5%→0% (paired t p=0.021)
Model audit (2026-04-26):
Full report: SECURITY_AUDIT_MODEL_v3.md
@misc{hinomoto2026v7,
title = {HinoMoto-100M-v7: A Solo-Built Japanese Family-Conversation LM},
author = {{Project HinoMoto}},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/FiShota/hinomoto-100m-v7},
}