🐕 Stanford Dogs ViT 犬种分类器

基于 Vision Transformer (ViT) 在 Stanford Dogs 数据集上微调的细粒度犬种分类模型

📦 模型下载

模型文件	类别数	用途	说明
`ViT-full120.pt`	120	🚀 推理部署	完整模型，覆盖全部犬种
`ViT-top100.pt`	100	📊 评估测试	在 Stanford / Atharva 数据集上评估
`ViT-top80.pt`	80	🔧 精细调参	用于超参数调优实验

🚀 快速开始

使用 ViT-full120.pt 推理

import torch
from torchvision import transforms
from PIL import Image

# 1. 加载模型
model = torch.load("ViT-full120.pt")
model.eval()

# 2. 图像预处理
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

# 3. 推理
image = Image.open("dog.jpg").convert("RGB")
input_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
    output = model(input_tensor)
    predicted_class = output.argmax(dim=1).item()
    print(f"预测类别: {predicted_class}")

📚 数据集

Stanford Dogs

🌐 官方主页
🏷️ 120 个犬种
🖼️ 约 20,580 张图片
✅ 主要训练数据集

Atharva Dog Breeds

🌐 GitHub 仓库
📊 用于模型评估
🔬 跨数据集泛化测试

⚙️ 训练细节

配置项	值
基础模型	ViT (Vision Transformer)
任务类型	细粒度图像分类
深度学习框架	PyTorch
输入尺寸	224 × 224

Downloads last month: -; Downloads are not tracked for this model. How to track