--- language: - ru tags: - text-generation - causal-lm - pytorch - custom-code - byte-level-bpe library_name: pytorch pipeline_tag: text-generation datasets: - IgorVolochay/russian_jokes --- # Craftic/llm-course-hw1-nano Educational causal language model trained as part of a Deep Learning homework assignment on a Russian jokes corpus. This repository contains: - a custom Byte-Level BPE tokenizer (`vocabulary.json`, `merges.json`) - model weights in `model.safetensors` - a minimal training/inference config in `config.json` ## Model summary - Model type: decoder-only Transformer for causal language modeling - Tokenizer: custom Byte-Level BPE - Vocabulary size: 1024 - Context length: 128 tokens - Attention: Grouped-Query Attention (`n_head=4`, `n_kv_head=2`) - Feed-forward: SwiGLU - Normalization: RMSNorm - Positional bias: ALiBi - Dropout: 0.1 ## Configuration - Variant: `nano` - Layers: 3 - Hidden size: 96 - Attention heads: 4 - KV heads: 2 - Intermediate size: 256 ## Training data - Dataset: [IgorVolochay/russian_jokes](https://huggingface.co/datasets/IgorVolochay/russian_jokes) - Domain: Russian jokes / short humorous texts ## Training setup - Optimizer: AdamW - Learning rate: 3e-4 - Training steps: 10,000 - Max sequence length: 128 ## Usage These weights were uploaded from a custom homework implementation, so loading requires the same Python classes that were used during training. ```python import torch # Define ByteLevelBPETokenizer and TransformerForCausalLM exactly as in the homework notebook. # Then load artifacts from the Hub: tokenizer = ByteLevelBPETokenizer.from_pretrained("Craftic/llm-course-hw1-nano") model = TransformerForCausalLM.from_pretrained("Craftic/llm-course-hw1-nano") model.eval() prompt = "Муж приходит домой и говорит:" input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long) attention_mask = torch.ones_like(input_ids) with torch.no_grad(): logits = model(input_ids, attention_mask) next_token_id = logits[0, -1].argmax().item() print(tokenizer.decode(tokenizer.encode(prompt, add_eos_token=False) + [next_token_id])) ``` ## Notes - This is an educational model, not a production-ready checkpoint. - Because the model is trained on jokes, generations may be low-quality, repetitive, or stylistically narrow. - The repository does not include a packaged inference library; the custom tokenizer/model classes should be copied from the homework notebook or moved into a Python module before loading. ## Limitations - Small context window - Small vocabulary - Trained on a narrow-domain dataset - No safety alignment or moderation tuning ## Intended use - Coursework demonstration - Experiments with custom tokenization and compact LMs - Lightweight local generation experiments ## Not intended use - Factual QA - Safety-critical tasks - Production deployment without additional packaging, evaluation, and safeguards