| --- |
| pipeline_tag: other |
| language: en |
| library_name: pytorch |
| license: apache-2.0 |
| tags: |
| - music |
| - midi |
| - mir |
| - deduplication |
| - caugbert |
| model-index: |
| - name: LMD Deduplication - CAugBERT |
| results: |
| - task: |
| type: representation-learning |
| name: symbolic music representation learning |
| dataset: |
| type: midi |
| name: Lakh MIDI Dataset |
| metrics: |
| - type: F1 |
| value: 0.493 |
| --- |
| |
| # LMD Deduplication Supplements |
| This repository provides the pre-trained CAugBERT model checkpoint used in: |
| **"On the De-duplication of the Lakh MIDI Dataset" (ISMIR 2025)** |
| [[Paper]](https://ismir2025program.ismir.net/poster_188.html) | [[GitHub Code]](https://github.com/jech2/LMD_Deduplication) |
|
|
| --- |
|
|
| # Usage |
| You can either integrate this checkpoint into the main repository for inference, or load it directly: |
| ```bash |
| # Option 1: Run inference in the main repo |
| poetry run python inference.py # make sure yamls/inference.yaml paths are correct |
| ``` |
| ```python |
| # Option 2: Load checkpoint manually |
| import torch |
| from contrastive_musicbert.model.BERT import BERT_Lightning |
| |
| model = BERT_Lightning(...).to(device) # see .hydra/config.yaml for arguments |
| checkpoint = torch.load(checkpoint_path, map_location="cpu") |
| model.load_state_dict(checkpoint['state_dict']) |
| ``` |
|
|
| # Note |
| If you have any questions regarding the checkpoint, please contact: |
| Eunjin Choi (jech@kaist.ac.kr) |