---
language:
- ru
tags:
- text-generation
- causal-lm
- pytorch
- custom-code
- byte-level-bpe
library_name: pytorch
pipeline_tag: text-generation
datasets:
- IgorVolochay/russian_jokes
---

# Craftic/llm-course-hw1-nano

Educational causal language model trained as part of a Deep Learning homework assignment on a Russian jokes corpus.

This repository contains:
- a custom Byte-Level BPE tokenizer (`vocabulary.json`, `merges.json`)
- model weights in `model.safetensors`
- a minimal training/inference config in `config.json`

## Model summary

- Model type: decoder-only Transformer for causal language modeling
- Tokenizer: custom Byte-Level BPE
- Vocabulary size: 1024
- Context length: 128 tokens
- Attention: Grouped-Query Attention (`n_head=4`, `n_kv_head=2`)
- Feed-forward: SwiGLU
- Normalization: RMSNorm
- Positional bias: ALiBi
- Dropout: 0.1

## Configuration

- Variant: `nano`
- Layers: 3
- Hidden size: 96
- Attention heads: 4
- KV heads: 2
- Intermediate size: 256

## Training data

- Dataset: [IgorVolochay/russian_jokes](https://huggingface.co/datasets/IgorVolochay/russian_jokes)
- Domain: Russian jokes / short humorous texts

## Training setup

- Optimizer: AdamW
- Learning rate: 3e-4
- Training steps: 10,000
- Max sequence length: 128

## Usage

These weights were uploaded from a custom homework implementation, so loading requires the same Python classes that were used during training.

```python
import torch

# Define ByteLevelBPETokenizer and TransformerForCausalLM exactly as in the homework notebook.
# Then load artifacts from the Hub:

tokenizer = ByteLevelBPETokenizer.from_pretrained("Craftic/llm-course-hw1-nano")
model = TransformerForCausalLM.from_pretrained("Craftic/llm-course-hw1-nano")
model.eval()

prompt = "Муж приходит домой и говорит:"
input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
attention_mask = torch.ones_like(input_ids)

with torch.no_grad():
    logits = model(input_ids, attention_mask)

next_token_id = logits[0, -1].argmax().item()
print(tokenizer.decode(tokenizer.encode(prompt, add_eos_token=False) + [next_token_id]))
```

## Notes

- This is an educational model, not a production-ready checkpoint.
- Because the model is trained on jokes, generations may be low-quality, repetitive, or stylistically narrow.
- The repository does not include a packaged inference library; the custom tokenizer/model classes should be copied from the homework notebook or moved into a Python module before loading.

## Limitations

- Small context window
- Small vocabulary
- Trained on a narrow-domain dataset
- No safety alignment or moderation tuning

## Intended use

- Coursework demonstration
- Experiments with custom tokenization and compact LMs
- Lightweight local generation experiments

## Not intended use

- Factual QA
- Safety-critical tasks
- Production deployment without additional packaging, evaluation, and safeguards