Instructions to use rchuluc/bertimbau-large-lener_br-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use rchuluc/bertimbau-large-lener_br-onnx with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('token-classification', 'rchuluc/bertimbau-large-lener_br-onnx');
bertimbau-large-lener_br-onnx
ONNX conversion of Luciano/bertimbau-large-lener_br — Brazilian legal Named Entity Recognition (LeNER-Br, 6 classes) on top of BERTimbau-large — for Transformers.js (v3+) and ONNX Runtime.
- Conversion pipeline (reproducible): github.com/rchuluc/bertimbau-large-lener_br-onnx
- Encoder weights: unchanged from upstream. License: MIT.
Files
| file | dtype | size |
|---|---|---|
onnx/model.onnx |
fp32 | 1.33 GB |
onnx/model_quantized.onnx |
int8 (QUInt8 dynamic) | 335 MB |
Verified parity (vs PyTorch reference, 10 PT-BR legal sentences)
| metric | fp32 | q8 |
|---|---|---|
entity parity (Optimum/ORT, aggregation_strategy="simple") |
26/26 (100%) | 25/26 (96.2%) |
| token-level argmax parity (PyTorch↔ONNX) | 196/196 (100%) | 193/196 (98.5%) |
| max |Δlogit| vs PyTorch | 2.1e-5 | — |
Usage
Python (Optimum / ONNX Runtime) — recommended
from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline
model = ORTModelForTokenClassification.from_pretrained(
"rchuluc/bertimbau-large-lener_br-onnx",
subfolder="onnx",
file_name="model_quantized.onnx", # or "model.onnx"
)
tok = AutoTokenizer.from_pretrained("rchuluc/bertimbau-large-lener_br-onnx")
pipe = pipeline("ner", model=model, tokenizer=tok, aggregation_strategy="simple")
print(pipe("Conforme o art. 5º da Constituição Federal, todos são iguais perante a lei."))
Transformers.js
import { pipeline } from '@huggingface/transformers';
const ner = await pipeline(
'token-classification',
'rchuluc/bertimbau-large-lener_br-onnx',
{ dtype: 'q8' }, // or 'fp32'
);
const out = await ner('O Supremo Tribunal Federal julgou a ação em Brasília.', {
ignore_labels: ['O'],
});
// Aggregate B-/I- + WordPiece (##) yourself.
⚠️ For legal text, prefer the Python/Optimum path. Transformers.js v3's
TokenClassificationPipelinedrops[UNK]tokens and lacksaggregation_strategy. Legal entities contain out-of-vocab ordinals/symbols (5º,nº→[UNK]), so spans likeart. 5º da Constituição Federallose pieces in JS. The ONNX model labels those tokens correctly (see token-level parity); the loss is in the JS pipeline, not the model.
Classes (LeNER-Br, 6 types)
ORGANIZACAO, PESSOA, TEMPO, LOCAL, LEGISLACAO, JURISPRUDENCIA
Attribution
- Original: Luciano/bertimbau-large-lener_br, fine-tuned on
peluz/lener_br. - Base: neuralmind/bert-large-portuguese-cased (BERTimbau).
- Dataset: LeNER-Br (Luz de Araujo et al., PROPOR 2018).
Not affiliated with the original authors. Cite the original work in any publication.
- Downloads last month
- 20
Model tree for rchuluc/bertimbau-large-lener_br-onnx
Base model
neuralmind/bert-large-portuguese-cased