Text Classification
Transformers
English
genomics
dna
sequence-classification
nucleotide-transformer
fine-tuning
bioinformatics
Eval Results (legacy)

Nucleotide Transformer v2 50M, fine-tuned on GenomicBenchmarks

Full weight-level fine-tuning of InstaDeepAI/nucleotide-transformer-v2-50m-multi-species for binary DNA sequence classification on two GenomicBenchmarks tasks. All parameters are updated rather than using LoRA or a frozen backbone, with a leakage-free train/validation/test protocol and multi-seed evaluation.

Weights are not redistributed: this is a card and code release

The base model is licensed CC-BY-NC-SA-4.0 (non-commercial, share-alike), and any fine-tuned derivative inherits those terms. To respect that license, the trained checkpoints are not hosted here. The full training and evaluation code is open source and the run is reproducible, with every seed, split, and hyperparameter fixed.

Code, reproduction steps, figures, and the full model card: github.com/ankurgenomics/genome-ft

Running the documented commands regenerates the checkpoints described below.

Results

All numbers are on the held-out test set, evaluated exactly once on the best checkpoint (selected by validation MCC). The base model is the pretrained backbone with an untrained classification head, measured under the identical pipeline.

Enhancers: human_enhancers_cohn (6,948 test examples, 3 seeds: 42, 0, 123)

Model Accuracy F1 MCC
Base (pretrained backbone, untrained head) 0.499 0.021 −0.009
Fine-tuned (this work) 0.735 ± 0.004 0.745 ± 0.016 0.478 ± 0.003

Per-seed test MCC: 0.481, 0.478, 0.474 (σ = 0.003), indicating a stable result across initialisations.

Promoters: human_nontata_promoters (9,034 test examples, seed 42)

Model Accuracy F1 MCC
Base (pretrained backbone, untrained head) 0.451 0.064 −0.044
Fine-tuned (this work) 0.872 0.878 0.747

Where it lands (published reference numbers, accuracy on enhancers)

Model Accuracy Note
GB-CNN (Grešová et al. 2023) 0.69 published reference
DNABERT (Ji et al. 2021) 0.706 published reference
This work — NT-v2 50M full fine-tune 0.735 measured here, 3 seeds
HyenaDNA tiny-1k (Nguyen et al. 2023) 0.74 published reference
NT-v2 500M (Dalla-Torre et al. 2023) 0.776 published reference
DNABERT-2 (Zhou et al. 2023) 0.785 published reference

On enhancers, the 50M model performs above the published CNN and DNABERT baselines and below the larger transformers, consistent with its parameter count. It does not outperform the 500M models, and is not intended to.

How it was trained

  • Method: full fine-tuning of all 53.8M parameters with a 2-layer classification head
  • Optimizer: AdamW, learning rate 1e-5, weight decay 0.01
  • Schedule: linear warmup (300 steps) with cosine decay, gradient clipping (max-norm 1.0)
  • Augmentation: reverse-complement
  • Protocol: 15% of training data held out for validation; checkpoint selected by validation MCC; test set evaluated once; 3 seeds for the primary task
  • Best epoch: 1–2, after which validation MCC declines; the low learning rate and validation-based selection limit overfitting beyond the early epochs

Intended use & limitations

  • Intended use: research and educational demonstration of adapting a genomic foundation model at the weight level on standard DNA classification benchmarks.
  • Not for: clinical, diagnostic, or any commercial use (the CC-BY-NC-SA-4.0 license forbids commercial use).
  • Scope: two GenomicBenchmarks tasks only; results reflect those datasets and may not transfer to other genomic tasks, species, or sequence lengths.

License & attribution

  • Fine-tuned weights: CC-BY-NC-SA-4.0 (inherited from the base model; non-commercial, share-alike, attribution).
  • Training/evaluation code: MIT (see the GitHub repository).
  • Attribution: base model by InstaDeep; benchmark datasets by the GenomicBenchmarks authors (GreÅ¡ová et al. 2023).

Citation / related work

For the broader research context, see Fesser, Zhang, Li, Zitnik et al., How Post-Training Shapes Biological Reasoning Models (arXiv:2606.16517, 2026). Its Finding 1, that supervised fine-tuning improves in-domain accuracy while out-of-domain performance peaks early and declines, has an in-domain counterpart in this work: validation MCC peaked at epoch 1–2 and then declined, which motivates the use of a low learning rate and validation-based checkpoint selection rather than longer training. This is a single controlled fine-tune, not a comparable research program; the connection is one of shared principle, not scope.


Developed by Ankur Sharma — GitHub · LinkedIn

Code and full results: github.com/ankurgenomics/genome-ft

This is a personal open-source project, developed independently in a personal capacity. It is not affiliated with, endorsed by, or representative of any current or former employer, and uses only public models and public benchmark datasets. All views and results are the author's own.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ankur0050/nucleotide-transformer-v2-50m-genomicbenchmarks-ft

Datasets used to train ankur0050/nucleotide-transformer-v2-50m-genomicbenchmarks-ft

Paper for ankur0050/nucleotide-transformer-v2-50m-genomicbenchmarks-ft

Evaluation results

  • Test MCC (mean of 3 seeds) on GenomicBenchmarks human_enhancers_cohn
    self-reported
    0.478
  • Test accuracy (mean of 3 seeds) on GenomicBenchmarks human_enhancers_cohn
    self-reported
    0.735
  • Test F1 (mean of 3 seeds) on GenomicBenchmarks human_enhancers_cohn
    self-reported
    0.745
  • Test MCC (seed 42) on GenomicBenchmarks human_nontata_promoters
    self-reported
    0.747
  • Test accuracy (seed 42) on GenomicBenchmarks human_nontata_promoters
    self-reported
    0.872
  • Test F1 (seed 42) on GenomicBenchmarks human_nontata_promoters
    self-reported
    0.878