Qwen3.5 2B Korean Multi-session Memory Extraction LoRA

This repository contains a PEFT LoRA adapter trained for turn-level Korean memory extraction.

The model is intended to decide whether the current user message contains information worth storing as long-term memory. If there is a memory, it returns structured JSON. If not, it returns should_remember: false with an empty memories array.

Base Model

Base model: unsloth/Qwen3.5-2B
Adapter type: LoRA
PEFT task type: CAUSAL_LM
LoRA rank: 16
LoRA alpha: 16
Source checkpoint: local checkpoint-400
Validation snapshot: eval_loss = 0.06641462445259094 on a capped validation run

Training Data

This adapter was trained using a processed turn-level memory extraction dataset derived from AI Hub's 한국어 멀티세션 대화 dataset.

Dataset name: 한국어 멀티세션 대화
Provider: AI Hub
Dataset page: https://aihub.or.kr/aihubdata/data/view.do?aihubDataSe=data&dataSetSn=71630
Domain/type: Korean text, multi-session dialogue
Original purpose: multi-session Korean dialogue data for chatbot research and development with long-term conversational memory

The original AI Hub dataset was converted into supervised turn-level memory extraction samples. This repository does not contain the original AI Hub dataset, raw conversations, or the processed training JSONL shards.

Recommended attribution:

This model was trained using data derived from AI Hub's "한국어 멀티세션 대화" dataset. The original dataset is available from AI Hub: https://aihub.or.kr/aihubdata/data/view.do?aihubDataSe=data&dataSetSn=71630

License and Data Use

The LoRA adapter is a derived training artifact. It is distributed separately from the original AI Hub dataset and does not redistribute the source dataset.

Use of the underlying AI Hub dataset is subject to AI Hub's data usage policy:

AI Hub states that its open AI training data was built as part of the Ministry of Science and ICT / National Information Society Agency data infrastructure program, and rights are held by the participating organizations and NIA.
AI Hub open data may be used for commercial and non-commercial research and development for AI technology, products, and services, subject to the AI Hub policy.
Users must disclose that the data is an NIA project result when using AI Hub data or derivative works based on it.
Overseas use or export of AI Hub data may require a separate agreement with the participating organizations and NIA.
The original AI Hub data must not be disclosed, provided, transferred, rented, or sold to third parties without approval.
The original AI Hub data is for AI model training use; commercial sale of the dataset itself requires separate consultation with the performing organization.

See AI Hub's official usage policy for the authoritative terms: https://www.aihub.or.kr/intrcn/guid/usagepolicy.do?currMenu=151&topMenu=105

Files

adapter_config.json: PEFT adapter config
adapter_model.safetensors: LoRA adapter weights
gguf/qwen35-2b-korean-multisession-memory-extract-lora-checkpoint-400.gguf: llama.cpp-compatible LoRA GGUF adapter

Optimizer, scheduler, RNG state, and other training-resume files are intentionally not included.

Task Format

Input:

{
  "recent_context": [
    {
      "role": "assistant",
      "content": "좋아하는 음식이 있으세요?"
    }
  ],
  "current_user_message": "난 떡볶이를 좋아해"
}

Expected output style:

{
  "should_remember": true,
  "memories": [
    {
      "type": "preference",
      "content": "사용자는 떡볶이를 좋아한다.",
      "evidence": "난 떡볶이를 좋아해",
      "confidence": "high",
      "sensitivity": "normal",
      "scope": "long_term"
    }
  ]
}

Negative example:

{
  "should_remember": false,
  "memories": []
}

Python Usage

import json
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "unsloth/Qwen3.5-2B"
adapter_id = "mangoo3431/aura-qwen35-2b-korean-multisession-memory-extract-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)

instruction = (
    "다음 사용자의 현재 발화와 이전 문맥을 보고, 장기적으로 기억할 정보가 있는지 판단하세요.\n"
    "기억할 정보가 있으면 JSON으로 추출하세요.\n"
    "없으면 should_remember를 false로 두고 memories는 빈 배열로 두세요."
)

sample = {
    "recent_context": [],
    "current_user_message": "안녕 내 이름은 김철수야",
}

prompt = f"{instruction}\n\n입력:\n{json.dumps(sample, ensure_ascii=False)}\n\n정답:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

GGUF Usage

The GGUF file is a LoRA adapter, not a standalone model. Use it with a compatible Qwen3.5 2B base GGUF in llama.cpp:

llama-cli -m /path/to/qwen35-2b-base.gguf \
  --lora gguf/qwen35-2b-korean-multisession-memory-extract-lora-checkpoint-400.gguf \
  -p "<prompt>"

Notes

This adapter was trained for a narrow extraction task, not for general chat. For best results, keep the inference prompt close to the training format and parse only the generated JSON object.