--- license: other license_name: hyperclovax license_link: >- https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B/blob/main/LICENSE language: - en - ko base_model: - naver-hyperclovax/HyperCLOVAX-SEED-Think-32B pipeline_tag: text-generation tags: - llama - chat - bf16 - safetensors - model-editing --- # HyperCLOVAX-SEED-Think-32B-heretic **HyperCLOVAX-SEED-Think-32B-heretic**은 `naver-hyperclovax/HyperCLOVAX-SEED-Think-32B`를 기반으로, 사후(weight editing) 방식으로 **과잉 거부(refusal) 성향을 완화**하는 방향의 수정이 적용된 변형 모델입니다. --- ## Model Summary (KO) - **Base model**: `naver-hyperclovax/HyperCLOVAX-SEED-Think-32B` - **Format**: BF16 (safetensors) - **Approach**: targeted post-hoc **weight editing** - **Primary effect**: refusal behavior calibration (reduce over-refusal) - **Observed drift**: small (see KL metric below) --- ## Model Summary (EN) - **Base model**: `naver-hyperclovax/HyperCLOVAX-SEED-Think-32B` - **Weights**: BF16 (safetensors) - **Method**: targeted post-hoc **weight editing** - **Goal**: reduce over-refusal on benign/borderline prompts while keeping the output distribution close to the base model --- ## What’s Changed This variant applies focused modifications around projection layers to shift refusal-related behavior. ### Editing Parameters (as-run) - `direction_index = 42.77` - `attn.o_proj.max_weight = 1.13` - `attn.o_proj.max_weight_position = 67.44` - `attn.o_proj.min_weight = 0.46` - `attn.o_proj.min_weight_distance = 25.36` - `mlp.down_proj.max_weight = 1.49` - `mlp.down_proj.max_weight_position = 43.36` - `mlp.down_proj.min_weight = 0.97` - `mlp.down_proj.min_weight_distance = 26.08` ### Behavior / Drift Indicators - **Refusals**: **10 / 100** - **KL divergence**: **0.1123** > Note: these values depend on your prompt set and refusal criterion. > If you can disclose the evaluation slice (even roughly), add it in the section below. --- ## Evaluation Notes ### Refusal Definition (edit as needed) A completion is counted as a refusal if it: - explicitly denies compliance (e.g., “I can’t / I won’t”), **and** - does not provide a meaningful safe alternative or partial completion. ### Prompt Set - prompt mix: `[benign / borderline / policy-sensitive]` - sample size: `100` - source: `[private/internal or 공개 가능하면 설명]` --- ## Intended Use ### Recommended - General chat - Creative writing / brainstorming - Everyday Q&A where over-refusal hurts usability - Research on refusal behavior, steering, and drift tradeoffs ### Not Recommended (without extra guardrails) - Public-facing deployment without moderation/filters - High-stakes domains (medical/legal/financial) - Any use that requires strict compliance guarantees --- ## Safety & Risks Reducing refusals can increase the chance that the model responds in situations where the base model would refuse. For real deployments, consider: - input filtering / output moderation - rate limits & logging - clear acceptable-use policy and enforcement Known limitations: - side effects may exist (tone shift, verbosity changes, occasional riskier completions) - evaluation is not exhaustive; additional red-teaming is recommended --- ## GGUF (llama.cpp) Inference This repository also provides an **F16 GGUF** build under `gguf/`, intended for running with **llama.cpp**. ### Run with `llama-server` (Thinking ON) > This command enables the model's "thinking" behavior via `--chat-template-kwargs`. #### Linux / macOS ```bash ./llama-server \ -m {PATH}/HyperCLOVAX-SEED-Think-32B-heretic2.f16.gguf \ --host 0.0.0.0 --port 10000 \ --jinja \ --chat-template-kwargs '{"thinking":true,"enable_thinking":true}' \ -cb -fa on --- ## How to Use ### Transformers (example) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic" # <- your repo id tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain KL divergence in simple terms."}, ] # If the tokenizer provides a chat template: prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tok(prompt, return_tensors="pt").to(model.device) out = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_p=0.95, do_sample=True, ) print(tok.decode(out[0], skip_special_tokens=True))