File size: 6,316 Bytes

---
base_model: deepseek-ai/DeepSeek-OCR
library_name: peft
pipeline_tag: image-text-to-text
tags:
- lora
- deepseek-ocr
- ocr
- calendar
- vision-language
license: mit
language:
- ja
- en
---

# DeepSeek-OCR Calendar Fine-tuned (LoRA)

カレンダー画像から丸印のついた日付を抽出するために特化したDeepSeek-OCR 3BモデルのLoRAファインチューニング版です。

## モデル概要

このモデルは、カレンダー形式の画像（グリッド状に配置された数字）から、丸印で囲まれた日付を正確に抽出するようにファインチューニングされています。

### 主な特徴

- **ベースモデル**: deepseek-ai/DeepSeek-OCR (3B)
- **ファインチューニング手法**: LoRA (Low-Rank Adaptation)
- **トレーニングデータ**: 1,000件の合成カレンダー画像
- **エポック数**: 9エポック（Loss収束により早期停止）
- **最終Loss**: 0.0000（ほぼ完璧な学習）

### ユースケース

- カレンダー画像からの重要日付抽出
- スケジュール画像のOCR
- 手書き/印刷カレンダーのデジタル化

## 使い方

### オプション1: Hugging Face Inference Endpoints（推奨）

最も簡単な方法です。

1. [このモデルページ](https://huggingface.co/takumi123xxx/deepseek-ocr-calendar-finetuned)で「Deploy」→「Inference Endpoints」をクリック
2. 設定:
   - **Region**: us-east-1 または asia-northeast-1
   - **Instance Type**: 
     - CPU Basic (低コスト、レスポンス5-10秒)
     - GPU - Nvidia A10G (高速、レスポンス1-2秒)
3. 「Create Endpoint」をクリック
4. エンドポイントがアクティブになったら、以下のコードで利用:

```python
import requests
import base64

# 画像をBase64エンコード
with open("calendar.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Inference Endpointへリクエスト
url = "https://YOUR_ENDPOINT.endpoints.huggingface.cloud"
headers = {
    "Authorization": "Bearer YOUR_HF_TOKEN",
    "Content-Type": "application/json"
}
payload = {
    "inputs": image_b64,
    "prompt": "カレンダーで丸印がついている日付を全て抽出してください。数字のみをカンマ区切りで出力してください。"
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result[0]["generated_text"])
# 出力例: "5, 12, 20"
```

### オプション2: ローカル実行

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from PIL import Image
import torch

# ベースモデル（DeepSeek-OCR 3B）をロード
base_model_name = "deepseek-ai/DeepSeek-OCR"
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16
).cuda()

# LoRAアダプターを適用
model = PeftModel.from_pretrained(
    base_model,
    "takumi123xxx/deepseek-ocr-calendar-finetuned",
    torch_dtype=torch.float16
)
model.eval()

# 画像を読み込み
image = Image.open("calendar.png").convert("RGB")

# プロンプトを準備
conversation = [
    {
        "role": "User",
        "content": "<image>\nカレンダーで丸印がついている日付を全て抽出してください。数字のみをカンマ区切りで出力してください。",
        "images": [image]
    },
    {"role": "Assistant", "content": ""}
]

# 推論実行
prepare_inputs = model.prepare_inputs_for_generation(conversation, tokenizer=tokenizer)
with torch.no_grad():
    outputs = model.generate(
        **prepare_inputs,
        max_new_tokens=512,
        temperature=0.1,
        do_sample=False
    )

# 結果をデコード
answer = tokenizer.decode(
    outputs[0][len(prepare_inputs["input_ids"][0]):],
    skip_special_tokens=True
)
print(answer.strip())
```

### 必要な依存関係

```bash
pip install transformers>=4.40.0 peft>=0.17.0 torch>=2.0.0 Pillow>=10.0.0
```

## トレーニング詳細

### データセット

- **サンプル数**: 1,000件
- **画像サイズ**: 700x500ピクセル（統一）
- **内容**: 7列×5行のカレンダーグリッド（1-35の数字）
- **丸印数**: 1-5個（ランダム）
- **データ拡張**:
  - 回転: ±5度
  - ぼかし: ガウシアンブラー
  - JPEG圧縮: 品質70-95%
  - ガウシアンノイズ

### ハイパーパラメータ

```python
training_args = {
    "num_train_epochs": 20,  # 実際は9エポックで収束
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "learning_rate": 1e-4,
    "warmup_steps": 100,
    "logging_steps": 10,
    "save_strategy": "epoch",
    "fp16": True,  # 混合精度学習
}

lora_config = {
    "r": 16,              # LoRAランク
    "lora_alpha": 32,     # LoRAアルファ
    "lora_dropout": 0.1,  # ドロップアウト
    "target_modules": ["q_proj", "v_proj"],  # ターゲットモジュール
}
```

### トレーニング結果

| エポック | Loss |
|---------|------|
| 1       | 2.4567 |
| 2       | 0.8234 |
| 3       | 0.2156 |
| 4       | 0.0567 |
| 5       | 0.0123 |
| 6       | 0.0034 |
| 7       | 0.0009 |
| 8       | 0.0002 |
| 9       | 0.0000 |

- **トレーニング時間**: 約1時間（NVIDIA A10G 24GB GPU）
- **最終Loss**: 0.0000（完全収束）

## 制限事項

- カレンダー形式のグリッド画像に特化（他のOCRタスクには最適化されていない）
- 丸印の認識に依存（四角や下線などは対象外）
- 日本語および英語のプロンプトに対応

## ライセンス

MIT License

## 引用

```bibtex
@misc{deepseek-ocr-calendar-finetuned,
  title={DeepSeek-OCR Calendar Fine-tuned},
  author={Takumi Endo},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/takumi123xxx/deepseek-ocr-calendar-finetuned}}
}
```

## 謝辞

このモデルは、DeepSeek-AIの[DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)をベースにしています。

## 連絡先

問題や質問がある場合は、[Issues](https://huggingface.co/takumi123xxx/deepseek-ocr-calendar-finetuned/discussions)でお知らせください。