Update README.md

04ded68 verified 9 months ago

4.67 kB

	---
	language:
	- en
	tags:
	- sentence-transformers
	- cross-encoder
	- reranker
	- generated_from_trainer
	- dataset_size:942069
	- loss:PrecomputedDistillationLoss
	base_model: jhu-clsp/ettin-encoder-17m
	datasets:
	- dleemiller/all-nli-distill
	pipeline_tag: text-classification
	library_name: sentence-transformers
	metrics:
	- f1_macro
	- f1_micro
	- f1_weighted
	model-index:
	- name: CrossEncoder based on jhu-clsp/ettin-encoder-17m
	results:
	- task:
	type: cross-encoder-classification
	name: Cross Encoder Classification
	dataset:
	name: AllNLI dev
	type: AllNLI-dev
	metrics:
	- type: f1_macro
	value: 0.843215238686306
	name: F1 Macro
	- type: f1_micro
	value: 0.8435163046243068
	name: F1 Micro
	- type: f1_weighted
	value: 0.8438547382511594
	name: F1 Weighted
	- task:
	type: cross-encoder-classification
	name: Cross Encoder Classification
	dataset:
	name: AllNLI test
	type: AllNLI-test
	metrics:
	- type: f1_macro
	value: 0.8442865676487733
	name: F1 Macro
	- type: f1_micro
	value: 0.8446784696784697
	name: F1 Micro
	- type: f1_weighted
	value: 0.8449960204914074
	name: F1 Weighted
	---

	# EttinX Cross-Encoder: Natural Language Inference (NLI)

	This cross encoder performs sequence classification for contradiction/neutral/entailment labels. This has
	drop-in compatibility with comparable sentence transformers cross encoders.

	To train this model, I added teacher logits to the all-nli dataset `dleemiller/all-nli-distill` from the
	`dleemiller/ModernCE-large-nli` model. This significantly improves performance above standard training.

	This 17m architecture is based on ModernBERT and is an excellent candidate for lightweight CPU inference.

	---

	## Features
	- High performing: Achieves 80.47% and 86.95% (Micro F1) on MNLI mismatched and SNLI test.
	- Efficient architecture: Based on the Ettin-17m encoder design (17M parameters), offering faster inference speeds.
	- Extended context length: Processes sequences up to 8192 tokens, great for LLM output evals.

	---

	## Performance

	\| Model \| MNLI Mismatched \| SNLI Test \| Context Length \| # Parameters \|
	\|---------------------------\|-------------------\|--------------\|----------------\|----------------\|
	\| `dleemiller/ModernCE-large-nli` \| 0.9202 \| 0.9110 \| 8192 \| 395M \|
	\| `dleemiller/ModernCE-base-nli` \| 0.9034 \| 0.9025 \| 8192 \| 149M \|
	\| `cross-encoder/deberta-v3-large` \| 0.9049 \| 0.9220 \| 512 \| 435M \|
	\| `cross-encoder/deberta-v3-base` \| 0.9004 \| 0.9234 \| 512 \| 184M \|
	\| `cross-encoder/nli-distilroberta-base` \| 0.8398 \| 0.8838 \| 512 \| 82M \|
	\| `dleemiller/EttinX-nli-xxs` \| 0.8047 \| 0.8695 \| 8192 \| 17M \|


	---

	## Usage

	To use EttinX for NLI tasks, you can load the model with the Hugging Face `sentence-transformers` library:

	```python
	from sentence_transformers import CrossEncoder

	# Load EttinX model
	model = CrossEncoder("dleemiller/EttinX-nli-xxs")

	scores = model.predict([
	('A man is eating pizza', 'A man eats something'),
	('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')
	])

	# Convert scores to labels
	label_mapping = ['contradiction', 'entailment', 'neutral']
	labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
	# ['entailment', 'contradiction']
	```

	---

	## Training Details

	### Pretraining
	We initialize the `` weights.

	Details:
	- Batch size: 512
	- Learning rate: 1e-4
	- Attention Dropout: attention dropout 0.1

	### Fine-Tuning
	Fine-tuning was performed on the `dleemiller/all-nli-distill` dataset.

	### Validation Results
	The model achieved the following test set micro f1 performance after fine-tuning:
	- MNLI Unmatched: 0.8047
	- SNLI: 0.8695

	---

	## Model Card

	- Architecture: Ettin-encoder-17m
	- Fine-Tuning Data: `dleemiller/all-nli-distill`

	---

	## Thank You

	Thanks to the Johns Hopkins team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.

	---

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{moderncenli2025,
	author = {Miller, D. Lee},
	title = {EttinX NLI: An NLI cross encoder model},
	year = {2025},
	publisher = {Hugging Face Hub},
	url = {https://huggingface.co/dleemiller/EttinX-nli-xxs},
	}
	```

	---

	## License

	This model is licensed under the [MIT License](LICENSE).