eriktks/conll2002
Updated • 923 • 11
BiLSTM + CRF+CharCNN con embeddings FastText español (cc.es.300).
Entrenado sobre CoNLL-2002 español. Modelo CUSTOM con trust_remote_code=True.
| Metric | Valor |
|---|---|
| F1 | 0.8059 |
| Precision | 0.8057 |
| Recall | 0.8061 |
# Requiere: pip install pytorch-crf
from transformers import AutoModelForTokenClassification, AutoConfig
import json
config = AutoConfig.from_pretrained("cvalenciaunivalle/bilstm-crf-fasttext-charcnn-conll-bs16", trust_remote_code=True)
model = AutoModelForTokenClassification.from_pretrained("cvalenciaunivalle/bilstm-crf-fasttext-charcnn-conll-bs16", trust_remote_code=True)
# Cargar vocab (incluye word2idx, char2idx, id2tag)
from huggingface_hub import hf_hub_download
vocab_path = hf_hub_download("cvalenciaunivalle/bilstm-crf-fasttext-charcnn-conll-bs16", "vocab.json")
with open(vocab_path) as f:
vocab = json.load(f)
# Predecir
oraciones_tokenizadas = [["Juan", "vive", "en", "Bogotá", "."]]
tags = model.predict(oraciones_tokenizadas, vocab)
print(tags) # [['B-PER', 'O', 'O', 'B-LOC', 'O']]