--- language: en tags: - Text Classification - TDAMM - Multi-label Classification - NASA - Astrophysics - Science Document Entity base_model: - nasa-impact/indus-sde-v0.2 library_name: transformers license: apache-2.0 pipeline_tag: text-classification --- # TDAMM Multi-Label Classification Model v2 The TDAMM (Time Domain Multi-Messenger Astronomy) model v2 is created to categorize NASA's time domain multi-messenger resources into one or more of 36 distinct categories identified by subject matter experts (SMEs). This is an updated version fine-tuned from [INDUS-SDE](https://huggingface.co/nasa-impact/indus-sde-v0.2), a domain-adapted language model for Scientific Content Curation & Discovery in noisy context. ## Model Description - **Base Model:** [nasa-impact/indus-sde-v0.2](https://huggingface.co/nasa-impact/indus-sde-v0.2), fine-tuned for multi-label classification - **Architecture:** RobertaForSequenceClassification - **Task:** Multi-label classification (36 categories) - **Training Data:** NASA and non-NASA documents related to TDAMM topics identified by SMEs (same data split as [v1](https://huggingface.co/nasa-impact/tdamm-classification)) ## Changes from v1 - **New Base Model:** Fine-tuned from INDUS-SDE v0.2 (previously [astroBERT](https://huggingface.co/adsabs/astroBERT) in [v1](https://huggingface.co/nasa-impact/tdamm-classification)) - Leverages domain-adapted embeddings from INDUS-SDE for improved understanding of scientific document entities ## Performance Metrics | Metric | Value | |--------|-------| | Eval Accuracy | 0.657 | | Weighted Precision (threshold=0.5) | 0.854 | ### Model Comparison | Model | Weighted Precision | |-------|-------------------| | ModernBERT-SDE | 45.2 | | ModernBERT | 72.5 | | INDUS | 73.4 | | AstroBERT | 85.5 | | **INDUS-SDE** | **85.3** | *TDAMM classification performance (Weighted Precision). All models fine-tuned with focal loss. INDUS-SDE matches domain-specific AstroBERT despite no astrophysics-specific pretraining.* ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("nasa-impact/tdamm-classification-v2") model = AutoModelForSequenceClassification.from_pretrained("nasa-impact/tdamm-classification-v2") # Prepare input text = "Your astronomical text here" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512) # Get predictions with torch.no_grad(): outputs = model(**inputs) predictions = torch.sigmoid(outputs.logits) # Convert to binary predictions (threshold = 0.5) binary_predictions = (predictions > 0.5).int() # Get predicted label indices predicted_indices = torch.where(binary_predictions[0] == 1)[0].tolist() print(f"Predicted indices: {predicted_indices}") ``` ## Label Mapping During Inference After obtaining predictions from the model, you can map the predicted label indices to their actual names using the `model.config.id2label` dictionary: ```python # Example usage predicted_indices = [0, 2, 5] predicted_labels = [model.config.id2label[idx] for idx in predicted_indices] print(predicted_labels) ``` ## Related Models - [TDAMM Classification v1](https://huggingface.co/nasa-impact/tdamm-classification) - Previous version based on astroBERT - [INDUS-SDE v0.2](https://huggingface.co/nasa-impact/indus-sde-v0.2) - Base model for this fine-tuned version ## Citation If you use this model, please cite: ```bibtex @misc{tdamm-classification-v2, author = {NASA IMPACT}, title = {TDAMM Multi-Label Classification Model v2}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/nasa-impact/tdamm-classification-v2} } ``` ## License Apache 2.0