jech2
/

lmd-dedup-caugbert

Eval Results (legacy)

Model card Files Files and versions

lmd-dedup-caugbert / README.md

jech2's picture

update readme

d0af2eb 9 months ago

|

History Blame Contribute Delete

1.42 kB

	---
	pipeline_tag: other
	language: en
	library_name: pytorch
	license: apache-2.0
	tags:
	- music
	- midi
	- mir
	- deduplication
	- caugbert
	model-index:
	- name: LMD Deduplication - CAugBERT
	results:
	- task:
	type: representation-learning
	name: symbolic music representation learning
	dataset:
	type: midi
	name: Lakh MIDI Dataset
	metrics:
	- type: F1
	value: 0.493
	---

	# LMD Deduplication Supplements
	This repository provides the pre-trained CAugBERT model checkpoint used in:
	"On the De-duplication of the Lakh MIDI Dataset" (ISMIR 2025)
	[[Paper]](https://ismir2025program.ismir.net/poster_188.html) \| [[GitHub Code]](https://github.com/jech2/LMD_Deduplication)

	---

	# Usage
	You can either integrate this checkpoint into the main repository for inference, or load it directly:
	```bash
	# Option 1: Run inference in the main repo
	poetry run python inference.py # make sure yamls/inference.yaml paths are correct
	```
	```python
	# Option 2: Load checkpoint manually
	import torch
	from contrastive_musicbert.model.BERT import BERT_Lightning

	model = BERT_Lightning(...).to(device) # see .hydra/config.yaml for arguments
	checkpoint = torch.load(checkpoint_path, map_location="cpu")
	model.load_state_dict(checkpoint['state_dict'])
	```

	# Note
	If you have any questions regarding the checkpoint, please contact:
	Eunjin Choi (jech@kaist.ac.kr)