---
license: mit
tags:
- tokenizer
- sentencepiece
- multilingual
- cluster-8
- vocab-128000
---

# Grand Tokenizer - Cluster 8 (Vocab 128000)

This is a multilingual tokenizer trained on cluster 8 with vocabulary size 128000.

## Usage

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("tokenizer-iso-cluster-8-vocab-128000")
```

## Files

- `final_normalized_tokenizer.model`: SentencePiece model file
- `final_normalized_tokenizer.vocab`: Vocabulary file
- `tokenizer.config`: Tokenizer configuration
- `special_tokens_map.json`: Special tokens mapping

## Training Details

- Cluster: 8
- Vocabulary Size: 128000
- Model Type: SentencePiece Unigram