indus-sde
Collection
3 items • Updated
How to use nasa-impact/indus-sde-v0.2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="nasa-impact/indus-sde-v0.2") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/indus-sde-v0.2")
model = AutoModelForMaskedLM.from_pretrained("nasa-impact/indus-sde-v0.2")This model was further pre-trained on full Science Discovery Engine (SDE) website data from nasa-smd-ibm-v0.1 after extending its context size with Masked Language Modelling task.
Paper: INDUS-SDE: A Language Model for Scientific Content Curation and Discovery — KDD 2026, AI for Sciences Track. INDUS-SDE prioritizes scientific terminology during pretraining via Weighted Dynamic Masking (YAKE keyword + random masking) on NASA's noisy, web-sourced SDE corpus. Code: NASA-IMPACT/mlm-fine-tuning · Sentence transformer: indus-sde-st-v0.2
If you use INDUS-SDE, please cite:
@inproceedings{pantha2026indussde,
author = {Pantha, Nishan and Awale, Sajil and Kuruvanthodi, Vishnudev and KC, Simran and Ramasubramanian, Muthukumaran and Davis, Carson and Praveen, Bishwas and Foshee, Emily and Bhattacharjee, Bishwaranjan and Bugbee, Kaylin and Ramachandran, Rahul},
title = {{INDUS-SDE}: A Language Model for Scientific Content Curation and Discovery},
year = {2026},
isbn = {979-8-4007-2259-2},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = {10.1145/3770855.3818847},
booktitle = {Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26)},
location = {Jeju Island, Republic of Korea},
series = {KDD '26}
}