File size: 3,506 Bytes
019ca64 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | ---
library_name: pytorch
license: apache-2.0
datasets:
- huuuyeah/meetingbank
tags:
- pytorch
- transformer
- meeting-summarization
- custom-code
- attention-variant
---
# Run Sliding Gqa
Custom PyTorch Transformer checkpoint trained on MeetingBank for meeting summarization research. This repository is part of the [`transformer-lab`](https://huggingface.co/collections/Pradheep1647/transformer-lab-6a07fe3185f5728e217997e0) collection.
## Model Details
| Field | Value |
|---|---|
| Repository | `Pradheep1647/run_sliding_gqa-meetingbank-bs8-e20-fp32-19` |
| Attention | `sliding_gqa` |
| Dataset | `meetingbank` |
| Layers | `6` |
| Hidden size | `512` |
| Heads | `8` |
| Batch size | `8` |
| Epochs | `20` |
| Precision | `fp32` |
| Checkpoint | `meeting_model19.pt` |
## Training Loss

Raw curve data is available in [`loss_curve.csv`](loss_curve.csv).
## Available Models
| Variant | Repository |
|---|---|
| `gqa_rope` | [`Pradheep1647/run_gqa_rope-meetingbank-bs8-e20-fp32-19`](https://huggingface.co/Pradheep1647/run_gqa_rope-meetingbank-bs8-e20-fp32-19) |
| `mqa` | [`Pradheep1647/run_mqa-meetingbank-bs8-e20-fp32-19`](https://huggingface.co/Pradheep1647/run_mqa-meetingbank-bs8-e20-fp32-19) |
| `gqa` | [`Pradheep1647/run_gqa-meetingbank-bs8-e20-fp32-19`](https://huggingface.co/Pradheep1647/run_gqa-meetingbank-bs8-e20-fp32-19) |
| `mha` | [`Pradheep1647/run_mha-meetingbank-bs8-e20-fp32-19`](https://huggingface.co/Pradheep1647/run_mha-meetingbank-bs8-e20-fp32-19) |
| `sliding_gqa` | [`Pradheep1647/run_sliding_gqa-meetingbank-bs8-e20-fp32-19`](https://huggingface.co/Pradheep1647/run_sliding_gqa-meetingbank-bs8-e20-fp32-19) |
## Files
| File | Purpose |
|---|---|
| `meeting_model19.pt` | PyTorch checkpoint containing `model_state_dict`, optimizer states, epoch, and global step. |
| `config.json` | Training and architecture config converted from the Hydra run config. |
| `tokenizer.json` | MeetingBank transcript tokenizer alias for source inputs. |
| `transcript_tokenizer.json` | Explicit MeetingBank transcript tokenizer. |
| `summary_tokenizer.json` | MeetingBank summary tokenizer for target text. |
| `loss_curve.csv` | TensorBoard `train/loss` scalar export. |
| `loss_curve.svg` | Static training-loss plot generated from `loss_curve.csv`. |
## Usage
These checkpoints are from a custom PyTorch codebase, not a `transformers.AutoModel` checkpoint. Use the repo-native builder to instantiate the architecture, then load the checkpoint state dict.
```python
from pathlib import Path
import torch
from huggingface_hub import hf_hub_download
from omegaconf import OmegaConf
import src # registers components
from src.model.builder import build_transformer
repo_id = "Pradheep1647/run_sliding_gqa-meetingbank-bs8-e20-fp32-19"
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
checkpoint_path = hf_hub_download(repo_id=repo_id, filename="meeting_model19.pt")
cfg = OmegaConf.load(config_path)
model = build_transformer(cfg)
state = torch.load(checkpoint_path, map_location="cpu")
model.load_state_dict(state["model_state_dict"])
model.eval()
print(f"Loaded {repo_id} from {Path(checkpoint_path).name}")
```
## Notes
- This is a research checkpoint for comparing attention variants under the same MeetingBank setup.
- The config and tokenizers are included so future runs can reproduce the architecture and preprocessing assumptions.
- Use `config.json` as the source of truth for architecture parameters.
|