Instructions to use napaalm/jazz-piano-ispr-2025-2026 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use napaalm/jazz-piano-ispr-2025-2026 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir jazz-piano-ispr-2025-2026 napaalm/jazz-piano-ispr-2025-2026
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Expressive Jazz Piano Performance Modeling β ISPR 2025β2026
Four trained checkpoints for the ISPR 2025β2026 project on jazz piano
performance modeling. Two pretrained PerformanceRNN LSTM variants and
two fine-tunes of the publicly released Aria 1B-parameter piano language
model (Bradshaw & Colton 2025), all fine-tuned on PiJAMA (Edwards et al.
2024). Companion code lives in the submission directory
napolitano_antonio_ispr_2025_2026_project/; the report is the
accompanying napolitano_antonio_ispr_2025_2026_report.pdf.
Models in this repo
The model names below match the headline table of the report.
aria-full-quality/ β Aria fine-tune, full-quality (offline)
Aria 1B-parameter LLaMA-3.2-style decoder fine-tuned on the PiJAMA
hawthorne split with the default 17 727-id AbsTokenizer (no
sustain pedal). Architecture: medium (d=1536, 16 layers, 24 heads,
RoPE, GQA, max_seq_len=8192). Best swept mean OA on the test split:
0.911. FMD vs the kong test-pool reference (CLaMP-2 encoder):
272.6.
tested.safetensorsβ the checkpoint reported on in the paper (4-stage train/val pipeline: retrained on TRAIN+VAL for the patience-selected epoch count, evaluated once on TEST).deployed.safetensorsβ full retrain on TRAIN+VAL+TEST for the same epoch count. For deployment / listening only; test metrics are not honestly reportable on this checkpoint because it has seen the test set.
aria-real-time/ β Aria fine-tune, real-time MLX-compatible
Same backbone but loaded from the public model-demo.safetensors
checkpoint with the residual-stream embedding-projection layer
preserved (medium-emb architecture, +1536Γ512 emb_proj). Trained on
PiJAMA kong album-aware split with the 2 675-id demo tokenizer
that adds explicit sustain-pedal events. Drop-in jazz replacement for
the upstream aria/demo/demo_mlx.py iOS sampler. Best swept mean OA:
0.804. FMD: 233.6.
tested.safetensors,deployed.safetensorsβ same conventions as above.
lstm-hawthorne/ β pretrained PerformanceRNN, hawthorne split
3-layer stacked LSTM (hidden 512, embed 512, tied I/O head, 6.46M params; paper-faithful PerformanceRNN, Oore et al. 2018). Pretrained on the 820 944-file Aria-MIDI corpus (30k steps) and fine-tuned on the PiJAMA hawthorne split with the 413-id no-pedal vocabulary. Best swept mean OA: 0.664. FMD: 427.8.
tested.ptβ Stage-B equivalent (the one reported on).
lstm-kong-pedal/ β pretrained PerformanceRNN, kong+pedal split
Same architecture but with the 314-id pedal-aware vocabulary (NOTE_ONΓ88 + NOTE_OFFΓ88 + TIME_SHIFTΓ100 + VELOCITYΓ32 + SUSTAIN_ON/OFF + 4 specials). Fine-tuned on the PiJAMA kong album-aware split. Best swept mean OA: 0.768 (only β0.04 below Aria real-time despite a ~150Γ parameter ratio). FMD: 438.6.
tested.pt
Note: a full retrain on TRAIN+VAL+TEST was not performed for the LSTMs (their compute cost is small enough that the 4-stage generalisation-honest pipeline already gives a strong deployment baseline). If you need that variant, the training script in the companion submission directory reproduces it in β25 minutes on a single NVIDIA B200 (or comparable GPU).
MLX variants for macOS inference
Each Aria model also has mlx-tested/ and mlx-deployed/ directories
containing:
model.safetensorsβ same weights as the top-level safetensors, laid out for loading viamlx.core.load()on Apple silicon.config.jsonβ the corresponding Aria model config (medium.jsonfor full-quality,medium-emb.jsonfor real-time).- For
aria-real-time/mlx-*only:tokenizer-config.json, the same 2 675-id demo tokenizer the upstreamaria/demo/demo_mlx.pyuses.
Running on macOS
aria-real-time/mlx-tested/ is a drop-in replacement for the
weights expected by the upstream aria/demo/demo_mlx.py (iOS / Apple
silicon real-time sampler from EleutherAI/aria). Point that script at
model.safetensors and use the bundled tokenizer-config.json:
python aria/demo/demo_mlx.py \
--checkpoint-path /path/to/mlx-tested/model.safetensors \
--tokenizer-config /path/to/mlx-tested/tokenizer-config.json
aria-full-quality/mlx-*/ ships the full-quality weights and the
medium.json config. The upstream demo_mlx.py hardcodes the
medium-emb arch, so to run these checkpoints on MLX you either:
- Adapt
aria.inference.model_mlx.TransformerLMto loadmediuminstead ofmedium-emb(drop theemb_projlayer), or - Run inference via PyTorch with the MPS backend on macOS, using the
top-level
tested.safetensors/deployed.safetensorsand the defaultAbsTokenizer(no demo tokenizer config needed).
The full-quality checkpoints are β2.5 GB in bf16 β they fit easily on β₯16 GB unified-memory Apple silicon for inference.
Loading from Python
Aria (any variant) on CUDA / ROCm / MPS
from aria.config import load_model_config
from aria.model import ModelConfig, TransformerLM
from safetensors.torch import load_file
model_config = ModelConfig(**load_model_config("medium")) # or "medium-emb"
model_config.set_vocab_size(17727) # or 2675 for real-time
model = TransformerLM(model_config)
model.load_state_dict(load_file("tested.safetensors"), strict=False)
model.eval()
Aria on MLX (Apple silicon)
import mlx.core as mx
weights = mx.load("mlx-tested/model.safetensors")
# β¦ then build the MLX TransformerLM as in aria.inference.model_mlx
LSTM
import torch
from src.models.performancernn_lstm import PerformanceRNNLSTM, PerformanceRNNLSTMConfig
ckpt = torch.load("tested.pt", map_location="cpu", weights_only=False)
cfg = PerformanceRNNLSTMConfig(**ckpt["config"])
model = PerformanceRNNLSTM(cfg)
model.load_state_dict(ckpt["model_state"], strict=True)
model.eval()
(The PerformanceRNNLSTM / PerformanceRNNLSTMConfig definitions live
in the companion submission directory under
src/models/performancernn_lstm.py.)
Recommended sampling settings
The Stage-C sampling sweep covered the 12 cells
T β {0.8, 1.0, 1.2} Γ top-k β {0, 24} Γ min-p β {0.035, 0.05} with 4
PiJAMA test prompts Γ 20 variations per cell. The same cell β
temperature = 1.2, top-k = 0 (no truncation), min-p = 0.035 β
wins on both Mean OA and FMD for every model in this repo.
| Model | best (T, k, p) |
Mean OA β | FMD β (CLaMP-2) |
|---|---|---|---|
aria-full-quality |
(1.2, 0, 0.035) | 0.911 | 272.6 |
aria-real-time |
(1.2, 0, 0.035) | 0.804 | 233.6 |
lstm-kong-pedal |
(1.2, 0, 0.035) | 0.768 | 438.6 |
lstm-hawthorne |
(1.2, 0, 0.035) | 0.664 | 427.8 |
Three robust observations from the sweep:
- Temperature dominates. Bumping
Tfrom 0.8 β 1.2 buys +0.18β0.30 absolute OA on Aria at every(k, p)cell and +0.28 on both LSTM splits. - Don't truncate.
top-k = 0(no truncation) beatstop-k = 24by 0.03β0.07 OA at every(T, p)cell β aggressive truncation hurts distributional fidelity on this corpus. min-pis comparatively flat between 0.035 and 0.05; the smaller value wins by a small margin everywhere.
If you only want a single set of knobs that works across all four
models, use temperature=1.2, top_k=0, min_p=0.035.
Reproducibility
All four checkpoints were produced by the pipeline scripts in the
companion submission directory (scripts/aria_pipeline_per_variant.sh
for the Aria variants, scripts/train_performancernn_lstm_pipeline.sh
for the LSTMs). Reported metrics in the report come from
src/eval_aria_metrics.py (OA / KLD) and
scripts/fmd_eval_sweeps.py (FMD with the CLaMP-2 music encoder).
Citation
If you use these checkpoints, please cite the report and the original PiJAMA + Aria papers:
- Edwards, Dixon and Benetos. PiJAMA: Piano Jazz with Automatic MIDI Annotations. ISMIR Transactions, 6(1):89β102, 2024.
- Bradshaw and Colton. Aria: A Generative Model for Music-Aware AI. arXiv:2506.23869, 2025.
- Oore, Simon, Dieleman, Eck, Simonyan. This Time with Feeling: Learning Expressive Musical Performance. Neural Computing and Applications, 32:955β967, 2020.
Quantized