facebook
/

esm2_t6_8M_UR50D

Model card Files Files and versions

Resources

View closed (4)

TemporalMesh Transformer: 29.4 PPL at 48% compute — beats Mamba, new open-source architecture

#18 opened 21 days ago by

Request: DOI

#17 opened 6 months ago by

Request: DOI

#16 opened 10 months ago by

Request: DOI

#15 opened over 1 year ago by

Request: DOI

#13 opened over 1 year ago by

Attention matrix

#12 opened almost 3 years ago by

Lower precision

#11 opened almost 3 years ago by

PEFT LoRA and QLoRA

#10 opened almost 3 years ago by

AmelieSchreiber

accessing to embedding layer and generate embeddings step by step

#9 opened about 3 years ago by

francescopatane

Understanding vocabulary size

#8 opened about 3 years ago by

how visualize attention matrix

#7 opened about 3 years ago by

francescopatane

TorchScript export failed. Maybe related to sequence length cache.

#5 opened over 3 years ago by

inferring device map for model

#4 opened over 3 years ago by

passing parameters to the underlying model's forward

#3 opened over 3 years ago by