--- license: apache-2.0 datasets: - Shuu12121/python-treesitter-filtered-datasetsV2 - Shuu12121/javascript-treesitter-filtered-datasetsV2 - Shuu12121/ruby-treesitter-filtered-datasetsV2 - Shuu12121/go-treesitter-dedupe_doc-filtered-dataset - Shuu12121/java-treesitter-dedupe_doc-filtered-dataset - Shuu12121/rust-treesitter-filtered-datasetsV2 - Shuu12121/php-treesitter-filtered-datasetsV2 - Shuu12121/typescript-treesitter-filtered-datasetsV2 pipeline_tag: fill-mask tags: - code - python - java - javascript - typescript - go - ruby - rust - php language: - en base_model: - Shuu12121/CodeModernBERT-Crow-v1-Pre --- # CodeModernBERT-Crow-v1.1🐦‍⬛ ## Model Details * **Model type**: Bi-encoder architecture based on ModernBERT * **Architecture**: * Hidden size: 768 * Layers: 12 * Attention heads: 12 * Intermediate size: 3,072 * Max position embeddings: 8,192 * Local attention window size: 128 * RoPE positional encoding: θ = 160,000 * Local RoPE positional encoding: θ = 10,000 * **Sequence length**: up to 2,048 tokens for code and docstring inputs during pretraining ## Pretraining * **Tokenizer**: Custom BPE tokenizer trained for code and docstring pairs. * **Data**: Functions and natural language descriptions extracted from GitHub repositories. * **Masking strategy**: Two-phase pretraining. * **Phase 1: Random Masked Language Modeling (MLM)** 30% of tokens in code functions are randomly masked and predicted using standard MLM. * **Phase 2: Line-level Span Masking** Inspired by SpanBERT, continued pretraining on the same data with span masking at line granularity: 1. Convert input tokens back to strings. 2. Detect newline tokens with regex and segment inputs by line. 3. Exclude whitespace-only tokens from masking. 4. Apply padding to align sequence lengths. 5. Randomly mask 30% of tokens in each line segment and predict them. * **Pretraining hyperparameters**: * Batch size: 16 * Gradient accumulation steps: 16 * Effective batch size: 256 * Optimizer: AdamW * Learning rate: 5e-5 * Scheduler: Cosine * Epochs: 3 * Precision: Mixed precision (fp16) using `transformers`