NLP Text Datasets
updated
valiantlynxz/tripletex-tool-embeddings
Viewer
• Updated • 1.6k • 61
• 1
Alignment-Lab-AI/Evoltext
Viewer
• Updated • 70k • 10
Alignment-Lab-AI/Non-english-flan
Viewer
• Updated • 7.56M • 54
• 1
Alignment-Lab-AI/english-flan
Viewer
• Updated • 8.3M • 5
• 1
Alignment-Lab-AI/evoltext8192
Viewer
• Updated • 1.37k • 9
NLPC-UOM/document_alignment_dataset-Sinhala-Tamil-English
Preview
• Updated • 103
NLPC-UOM/sentence_alignment_dataset-Sinhala-Tamil-English
Updated • 558
• 2
PJMixers/AP-News-CGPT-Summarize-Not-Cucked-PreferenceShareGPT
Viewer
• Updated • 79 • 7
PJMixers/jondurbin_contextual-dpo-v0.1-PreferenceShareGPT
Viewer
• Updated • 1.37k • 3
• 1
PJMixers/nvidia_HelpSteer2-Correctness-Binary-Classification
Viewer
• Updated • 21.4k • 7
PJMixers/princeton-nlp_llama3-ultrafeedback-armorm-PreferenceShareGPT
Viewer
• Updated • 61.8k • 4
tum-nlp/cognitive-biases-in-llms
Viewer
• Updated • 30k • 58
• 1
AlekseyKorshuk/guanaco-english
Viewer
• Updated • 217k • 8
• 2
Eurolingua/tokenizer_final_dataset
Intuit-GenSRF/all_english_datasets
Viewer
• Updated • 2.92M • 20
MongoDB/english-words-definitions
Viewer
• Updated • 467k • 65
• 3
NewEden-Forge/Asstr-120K-English-Only
Viewer
• Updated • 121k • 25
• 3
breadlicker45/test-bread-tokenizer
Viewer
• Updated • 29M • 11
breadlicker45/tokenizer-dataset-test
Viewer
• Updated • 10k • 5
skbose/indian-english-nptel-test
Viewer
• Updated • 544k • 443
skbose/indian-english-nptel-v0
Viewer
• Updated • 544k • 1.13k
• 3
skbose/indian-english-nptel-v0-tags
Viewer
• Updated • 544k • 2
skbose/indian-english-nptel-v0-tags-gender
Viewer
• Updated • 544k • 9
skbose/indian-english-nptel-v0-tags-gender-accent
Viewer
• Updated • 544k • 8
theblackcat102/sharegpt-english
Viewer
• Updated • 50.5k • 520
• 14
0x22almostEvil/russe-semantics-sim
Viewer
• Updated • 201k • 9
0x22almostEvil/semantics-ws-qna-oa
Viewer
• Updated • 1.81k • 16
0x22almostEvil/ws-semantics-simnrel
Viewer
• Updated • 1.81k • 7
Abdelkareem/Arabic-article-summarization-30-000
Viewer
• Updated • 8.38k • 14
Abdelkareem/arabic-article-summarization
Viewer
• Updated • 5.87k • 8
Abdelkareem/arabic-bbc-news
Viewer
• Updated • 9.38k • 37
• 3
Abdelkareem/arabic_articles
Abdelkareem/arabic_summarization_text
Preview
• Updated • 22
Abdelkareem/arabic_tweets_classification
Viewer
• Updated • 13.2k • 13
• 1
Abdelkareem/rwkv_articles_30_000
Abdelkareem/rwkv_articles_xp3all
AlekseyKorshuk/dummy-text
Viewer
• Updated • 100 • 3
Aratako/Magpie-Tanuki-Instruction-100k-Embeddings
Viewer
• Updated • 100k • 10
Arun63/query-domain-classification-sharegpt
Viewer
• Updated • 12.2k • 7
Arun63/query-domain-classification-sharegpt-v2
Viewer
• Updated • 12k • 13
Arun63/rag_domain_query_classification
Viewer
• Updated • 25k • 7
Arun63/rag_dsl_filter_classification
Viewer
• Updated • 10k • 6
Arun63/text_to_dsl_opensearch
Viewer
• Updated • 7.33k • 6
Arun63/text_to_dsl_opensearch_new_2026
Viewer
• Updated • 9.25k • 11
Arun63/text_to_dsl_opensearch_v1_new
Viewer
• Updated • 951 • 10
Viewer
• Updated • 4.96k • 3
Ayushnangia/autotrain-data-qa_context
Preview
• Updated • 3
BEE-spoke-data/edgar-corpus
Viewer
• Updated • 517k • 22
BEE-spoke-data/financial-news-articles-filtered
Viewer
• Updated • 200k • 51
BEE-spoke-data/medium-articles-en
Viewer
• Updated • 180k • 40
• 2
BEE-spoke-data/yahoo_answers_topics-long-text
Viewer
• Updated • 3.49k • 12
• 2
BUT-FIT/CzechSingleDocumentSummarization
Viewer
• Updated • 90k • 10
Dahoas/full-single-context
Viewer
• Updated • 125k • 5
Viewer
• Updated • 89.5k • 44
• 1
Dahoas/sft-single-context
Viewer
• Updated • 35k • 69
• 1
Viewer
• Updated • 95.3k • 20
DamarJati/indocorpus-sastra
Viewer
• Updated • 28.8k • 26
Updated • 152
EMBO/sd-nlp-non-tokenized
Updated • 87
El-chapoo/Urdu-1M-news-text
Viewer
• Updated • 1.04M • 44
• 4
Emm9625/nsw_commonwealth_corpus
Viewer
• Updated • 223k • 3
Viewer
• Updated • 194k • 9
Emm9625/textwork-00-dedupe-0.5
Viewer
• Updated • 656 • 6
Emm9625/textwork-00-dedupe-0.75
Viewer
• Updated • 37k • 2
Emm9625/textwork-00-dedupe-0.8
Viewer
• Updated • 69.4k • 9
Emm9625/textwork-00-dedupe-0.85
Viewer
• Updated • 122k • 8
Emm9625/textwork-00-deduped
Viewer
• Updated • 71.2k • 2
Viewer
• Updated • 231k • 34
FreedomIntelligence/ApolloCorpus
Viewer
• Updated • 3.74M • 926
• 40
HiTZ/composite_corpus_es_v1.0
Viewer
• Updated • 526k • 62
HiTZ/composite_corpus_eseu_v1.0
Viewer
• Updated • 742k • 210
• 2
HiTZ/composite_corpus_eu_v2.1
Viewer
• Updated • 407k • 775
• 3
Viewer
• Updated • 4.13M • 402
• 2
Viewer
• Updated • 4.18M • 240
• 1
HuggingFaceTB/openstax_paragraphs
Viewer
• Updated • 77 • 256
• 6
HuggingFaceTB/smollm-corpus
Viewer
• Updated • 237M • 36.2k
• 469
HumynLabs/Arabic_Documents_Dataset_PDF
Viewer
• Updated • 127 • 498
HumynLabs/Chinese_Documents_Dataset_PDF
Viewer
• Updated • 30 • 336
HumynLabs/French_Documents_Dataset_PDF
Viewer
• Updated • 60 • 697
HumynLabs/German_Documents_Dataset_PDF
Viewer
• Updated • 54 • 442
HumynLabs/Italian_Documents_Dataset_PDF
Viewer
• Updated • 293 • 279
HumynLabs/Japanese_Documents_Dataset_PDF
Viewer
• Updated • 63 • 510
• 1
HumynLabs/Korean-Documents-Dataset
Viewer
• Updated • 3 • 30
HumynLabs/Russian_Documents_Dataset_PDF
Viewer
• Updated • 17 • 183
• 1
HumynLabs/Spanish_Documents_Dataset_PDF
Viewer
• Updated • 21 • 228
HumynLabs/Turkish_Documents_Datasets_PDF
Viewer
• Updated • 4 • 60
IDEA-CCNL/PretrainCorpusDemo
Updated • 242
• 7
Intuit-GenSRF/hackathon-somos-nlp-2023-suicide-comments-es
Viewer
• Updated • 10.1k • 3
Intuit-GenSRF/hackathon-somos-nlp-2023-suicide-comments-es-en
Viewer
• Updated • 8.82k • 26
Viewer
• Updated • 125 • 40
• 10
Viewer
• Updated • 8.4M • 263
• 5
Viewer
• Updated • 61.7M • 13
MongoDB/airbnb_embeddings
Viewer
• Updated • 5.56k • 486
• 7
MongoDB/devcenter-articles
Viewer
• Updated • 619 • 24
MongoDB/devcenter-articles-embedded
Viewer
• Updated • 218 • 13
MongoDB/subset_arxiv_papers_with_embeddings
Viewer
• Updated • 50k • 6.86k
• 2
MongoDB/tech-news-embeddings
Viewer
• Updated • 1.58M • 883
• 6
NLPC-UOM/AnanyaSinhalaNERDataset
Preview
• Updated • 5
NLPC-UOM/English-Tamil-Parallel-Corpus
Viewer
• Updated • 62.9k • 15
• 3
Viewer
• Updated • 22.1k • 25
• 3
Viewer
• Updated • 66.3k • 14
NLPC-UOM/Sentiment-tagger
Viewer
• Updated • 68.4k • 9
Viewer
• Updated • 1k • 11
NLPC-UOM/Sinhala-Neuspellcorrector
NLPC-UOM/Sinhala-News-Category-classification
Viewer
• Updated • 3.33k • 82
• 1
NLPC-UOM/Sinhala-News-Source-classification
Viewer
• Updated • 24.1k • 14
NLPC-UOM/Sinhala-POS-Data
NLPC-UOM/Sinhala-Stopword-list
Updated • 125
NLPC-UOM/Sinhala-Tamil-Aligned-Parallel-Corpus
Viewer
• Updated • 2.27k • 9
NLPC-UOM/Sinhala-news-clustering
NLPC-UOM/Sinhala-short-sentences
Updated • 8
• 1
NLPC-UOM/Student_feedback_analysis_dataset
Preview
• Updated • 31
• 6
NLPC-UOM/Tamil-Sinhala-short-sentence-similarity-deep-learning
Updated • 17
NLPC-UOM/Travel-Dataset-5000
Updated • 18
• 8
NLPC-UOM/ensi_enta_sita_curated_parallel_data
Preview
• Updated • 40
NLPC-UOM/nllb-top25k-ensi-cleaned
Viewer
• Updated • 25k • 9
• 2
NLPC-UOM/nllb-top25k-enta-cleaned
Viewer
• Updated • 25k • 4
NLPC-UOM/sinhala-sentiment-lexicon-generation
Viewer
• Updated • 1.84M • 109
• 18
Viewer
• Updated • 5.3k • 7
• 14
Viewer
• Updated • 2.45k • 54
OdiaGenAI/odia_context_10K_llama2_set
Viewer
• Updated • 10.5k • 6
• 1
OdiaGenAI/odia_context_qa_98k
Viewer
• Updated • 98k • 13
OdiaGenAI/odia_domain_context_train_v1
Viewer
• Updated • 10.5k • 8
OdiaGenAI/sentiment_analysis_hindi
Viewer
• Updated • 2.5k • 117
• 2
Open-Orca/gpt4-1m-orca-embeddings
Viewer
• Updated • 355k • 72
• 6
OusiaResearch/Aureth-Corpus-Hermes4.3-Generated
Viewer
• Updated • 654k • 66
• 14
Viewer
• Updated • 1.04k • 28
• 2
PJMixers/AP-News-2024-CGPT-Summarize-ShareGPT
Viewer
• Updated • 616 • 4
• 1
SEACrowd/indolem_sentiment
Updated • 54
SEACrowd/indonesian_news_dataset
Updated • 37
SEACrowd/mtop_intent_classification
Updated • 20
SeppeV/jokeTailor_embeddings
SeppeV/user_embeddings_jester
Viewer
• Updated • 45k • 12
SeppeV/user_embeddings_jester_bert
Viewer
• Updated • 45k • 6
allenai/olmoearth-paper-embeddings
Updated • 3.29k
• 8
argilla/banking_sentiment_setfit
Viewer
• Updated • 144 • 65
• 2
argilla/end2end_textclassification
Viewer
• Updated • 1k • 42
• 2
argilla/end2end_textclassification_with_metadata
Viewer
• Updated • 1k • 129
• 1
argilla/end2end_textclassification_with_suggestions_and_responses
Viewer
• Updated • 1k • 33
• 3
argilla/end2end_textclassification_with_vectors
Viewer
• Updated • 1k • 44
• 1
Viewer
• Updated • 38.1k • 22
Viewer
• Updated • 44.9k • 55
• 1
Viewer
• Updated • 21.4k • 184
• 40
Viewer
• Updated • 114 • 6
• 1
argilla/rag-embeddings-relevance-similarity
Viewer
• Updated • 6.25k • 18
• 1
argilla/textcat-tokencat-pii-per-domain
Viewer
• Updated • 2.1k • 12
astarostap/autonlp-data-antisemitism-2
Preview
• Updated • 127
• 1
Viewer
• Updated • 138 • 13
Viewer
• Updated • 78.6k • 4.43k
• 500
behavior-in-the-wild/content-behavior-corpus
Viewer
• Updated • 24.6k • 122
• 6
beyoru/synthetic_text_to_sql_filter
Viewer
• Updated • 71.2k • 18
breadlicker45/gender-bluesky-classification
Viewer
• Updated • 63.1k • 8
breadlicker45/gender-bluesky-classification-v2
Viewer
• Updated • 8.11k • 3
breadlicker45/gender-bluesky-classification-v3
Viewer
• Updated • 975k • 19
breadlicker45/gender-bluesky-classification-v4
Viewer
• Updated • 8.24M • 12
breadlicker45/gender-classification-v4.5
Viewer
• Updated • 79.6M • 8
Preview
• Updated • 6
chillies/course-review-multilabel-sentiment-analysis
Viewer
• Updated • 8.21k • 27
Viewer
• Updated • 27.9M • 5
Viewer
• Updated • 20.7M • 5
communityai/gretelai___synthetic_text_to_sql
Viewer
• Updated • 100k • 11
communityai/gretelai___synthetic_text_to_sql-10k
Viewer
• Updated • 10k • 14
communityai/gretelai___synthetic_text_to_sql-15k
Viewer
• Updated • 15k • 12
communityai/gretelai___synthetic_text_to_sql-20k
Viewer
• Updated • 20k • 15
communityai/gretelai___synthetic_text_to_sql-25k
Viewer
• Updated • 25k • 15
communityai/gretelai___synthetic_text_to_sql-30k
Viewer
• Updated • 30k • 15
cyberlangke/whitesilkmarisa-corpus
Preview
• Updated • 42
darkknight25/Adversarial_Machine_Learning_TextFooler_Dataset
Updated • 16
davanstrien/newspaper_navigator
Viewer
• Updated • 48M • 172
derek-thomas/autotrain-data-i-bert-twitter-sentiment
Preview
• Updated • 3
Viewer
• Updated • 82.5k • 4
• 1
Viewer
• Updated • 7.14k • 10
Viewer
• Updated • 3k • 4
dinushiTJ/nz_hansard_classification
Viewer
• Updated • 6.23k • 35
• 1
dinushiTJ/nz_hansard_classification_10k_tokens
Viewer
• Updated • 2.61k • 5
dinushiTJ/nz_hansard_classification_4096_tokens
Viewer
• Updated • 1.06k • 4
dinushiTJ/nz_research_commons_classification
Viewer
• Updated • 16.6k • 107
• 1
diwank/IBMDebaterEvidenceSentences
Viewer
• Updated • 5.78k • 6
diwank/imaginary-nlp-dataset
Viewer
• Updated • 1.04M • 25
• 1
diwank/llmlingua-compressed-text
Viewer
• Updated • 222k • 9
• 2
dmayhem93/random-walk-reddit-corpus-55-cleaned
Viewer
• Updated • 6.14M • 5
dmayhem93/random-walk-reddit-corpus-small
Viewer
• Updated • 8.29k • 10
dmayhem93/self-critiquing-base-topic-embeddings
Viewer
• Updated • 2.76k • 3
dmayhem93/top-2-reddit-corpus-small
Viewer
• Updated • 8.29k • 2
dmayhem93/top-n-reddit-corpus-55-cleaned
Viewer
• Updated • 6.14M • 5
femboysLover/gemini_trader_embeddings_dataset
Viewer
• Updated • 60.1k • 8
flamesbob/Line_style-Embedding
Updated • 1
• 3
free-law/alaska_embeddings
Viewer
• Updated • 10.7k • 1
free-law/arizona_embeddings
Viewer
• Updated • 28.4k • 12
• 1
free-law/arkansas_embeddings
Viewer
• Updated • 60.5k • 1
free-law/california_embeddings
Viewer
• Updated • 144k • 1
free-law/colorado_embeddings
Viewer
• Updated • 40.9k • 8
Viewer
• Updated • 56.4k • 11
Viewer
• Updated • 172k • 5
Viewer
• Updated • 18.4k • 1
free-law/idaho_embeddings
Viewer
• Updated • 19.4k • 1
Viewer
• Updated • 184k • 9
Viewer
• Updated • 92.7k • 10
Viewer
• Updated • 57.7k • 6
Viewer
• Updated • 79.6k • 1
Viewer
• Updated • 313k • 9
Viewer
• Updated • 91.7k • 1
Viewer
• Updated • 43.9k • 1
Viewer
• Updated • 82.8k • 3
Viewer
• Updated • 56.1k • 1
Viewer
• Updated • 60.4k • 1
Viewer
• Updated • 140k • 1
free-law/n_mar_i_embeddings
Viewer
• Updated • 395 • 1
free-law/navajo_nation_embeddings
Viewer
• Updated • 966 • 1
Viewer
• Updated • 108k • 4
• 1
Viewer
• Updated • 21.5k • 1
Viewer
• Updated • 18.5k • 8
Viewer
• Updated • 683k • 1
Viewer
• Updated • 67.1k • 1
Viewer
• Updated • 56.8k • 1
Viewer
• Updated • 239k • 6
Viewer
• Updated • 45.7k • 1
Viewer
• Updated • 41.9k • 6
Viewer
• Updated • 16.6k • 1
Viewer
• Updated • 38.4k • 1
Viewer
• Updated • 251k • 1
free-law/tribal_embeddings
Viewer
• Updated • 1.4k • 1
Viewer
• Updated • 2 • 1
Viewer
• Updated • 3.47k • 1
Viewer
• Updated • 27.7k • 6
Viewer
• Updated • 106k • 9
Viewer
• Updated • 49.1k • 3
Viewer
• Updated • 1.67M • 30.5k
• 243
hac541309/polyglot-ko-tokenizer-corpus
Viewer
• Updated • 11.8M • 475
• 1
hac541309/polyglot-ko-tokenizer-corpus-merge_ws
Viewer
• Updated • 11.8M • 163
harpreetsahota/CVPR_2024_Papers_with_Embeddings
Viewer
• Updated • 2.38k • 4
• 2
harpreetsahota/fiftyone-qa-with-qwen-embeddings
Viewer
• Updated • 28.1k • 8
harpreetsahota/testing_qwen3vl_embeddings
Viewer
• Updated • 412 • 442
haseong8012/Korean_Political-News_By_Media-Outlet
Updated • 62
Viewer
• Updated • 6.28k • 362
• 3
Updated • 3
• 1
irds/msmarco-document_trec-dl-hard
irds/nfcorpus_test_nontopic
irds/wapo_v2_trec-news-2018
irds/wapo_v2_trec-news-2019
irds/wapo_v3_trec-news-2020
jayavibhav/classification-gen-ai
Viewer
• Updated • 141k • 4
jayavibhav/new-updated-gen-classification
jayavibhav/text2sql-cleaned
Viewer
• Updated • 262k • 8
• 1
jondurbin/contextual-dpo-v0.1
Viewer
• Updated • 1.37k • 153
• 33
Viewer
• Updated • 4.34M • 4
jtatman/myers_briggs_text_classify
Viewer
• Updated • 8.68k • 18
justinphan3110/textquests
Viewer
• Updated • 407 • 8
justinphan3110/wmdp-bio-forget-corpus
Viewer
• Updated • 24.5k • 5
Updated • 51
• 2
lamini/bird_spider_train_text_to_sql
Viewer
• Updated • 17.5k • 29
• 5
Viewer
• Updated • 11k • 128
• 7
lamini/spider_text_to_sql
Viewer
• Updated • 8.03k • 63
• 9
lamini/text_to_sql_finetune
Viewer
• Updated • 17.5k • 62
• 15
lianghsun/free_english_news
Viewer
• Updated • 1.6M • 8
lianghsun/tw-gov-news-90M
Viewer
• Updated • 117k • 9
lianghsun/tw-hokkien-seed-text
Viewer
• Updated • 1.24M • 12
• 4
lianghsun/tw-law-article-evolution
Viewer
• Updated • 1.42M • 9
lianghsun/tw-law-article-num-convention
Viewer
• Updated • 2.61k • 42
lianghsun/tw-law-article-qa-DPO
Viewer
• Updated • 108 • 9
lianghsun/tw-legal-news-24M
Viewer
• Updated • 17.7k • 7
Viewer
• Updated • 171 • 62
• 4
Viewer
• Updated • 649k • 11
• 1
lianghsun/tw-structured-law-article
lightonai/embeddings-fine-tuning
Viewer
• Updated • 53.7M • 2.72k
• 21
lightonai/embeddings-pre-training
Viewer
• Updated • 1.38B • 2.73k
• 48
lightonai/embeddings-pre-training-curated
Viewer
• Updated • 665M • 6.2k
• 12
lightonai/embeddings-pre-training-test
Viewer
• Updated • 323k • 11
lightonai/embeddings_supervised
Viewer
• Updated • 3.43M • 1.26k
• 10
lightonai/nfcorpus-decontaminated
Viewer
• Updated • 18.3k • 48
lionelchg/dolly_classification
Viewer
• Updated • 2.14k • 17
Viewer
• Updated • 16.4k • 1.89k
• 24
litagin/jvnv_corpus_v1_no_nv
Viewer
• Updated • 1.62k • 651
• 4
Viewer
• Updated • 44.5k • 20
Viewer
• Updated • 509k • 440
• 11
manishiitg/en-embeddings-bge
Viewer
• Updated • 724k • 43
maxidl/FineNews-unfiltered
Viewer
• Updated • 31.4M • 1.92k
• 1
meandyou200175/word_embedding
Viewer
• Updated • 10.4k • 4
• 1
meandyou200175/word_embedding_200k
Viewer
• Updated • 207k • 20
Viewer
• Updated • 1.41k • 141
• 8
mlabonne/synthetic_text_to_sql-ShareGPT
Viewer
• Updated • 106k • 13
• 4
Viewer
• Updated • 200k • 5
multi-train/ccnews_title_text_1107
Viewer
• Updated • 200k • 4
nahiar/sentiment_3kdata-inggris
Viewer
• Updated • 2.9k • 9
nahiar/sentiment_clean_20k-60k_ham_only
Viewer
• Updated • 32.5k • 9
nahiar/sentiment_data-20-60k-labelling
Viewer
• Updated • 32.5k • 7
nahiar/sentiment_data-en-3k-labelling
Viewer
• Updated • 3.42k • 8
nahiar/sentiment_data-en-sentiment-3k
Viewer
• Updated • 3.42k • 8
nahiar/sentiment_data-inggris
Viewer
• Updated • 3.42k • 8
nahiar/sentiment_data-testing-300k-labelling
Viewer
• Updated • 300 • 6
nahiar/sentiment_data-testing-sentiment-300
Viewer
• Updated • 300 • 4
nahiar/sentiment_data-train-30k-id
Viewer
• Updated • 30k • 7
nahiar/sentiment_data-train-30k-sentimen-id-en
Viewer
• Updated • 47.5k • 6
nahiar/sentiment_data-train-bahasa-inggris
Viewer
• Updated • 31.2k • 7
nahiar/sentiment_data-train-sentiment-32k-up-id
Viewer
• Updated • 34.9k • 10
nahiar/sentiment_data-train-sentiment-40k-id-en
Viewer
• Updated • 32.5k • 6
nahiar/sentiment_data-train_db_sentimen_full
Viewer
• Updated • 67.4k • 12
nahiar/sentiment_data-train_db_sentimen_full-copy1
Viewer
• Updated • 67.4k • 9
nahiar/sentiment_inggris_3k_csv
Viewer
• Updated • 3.42k • 12
nahiar/sentiment_tmp_20k-100k_sentimen
Viewer
• Updated • 67.4k • 13
Viewer
• Updated • 200 • 3
Viewer
• Updated • 12.3k • 4
nekofura/tooth_classification
Preview
• Updated • 41
• 1
Viewer
• Updated • 9.72k • 3
nlplabtdtu/Extract-QA-question-answer-with-context
Viewer
• Updated • 7.6k • 2
nlplabtdtu/Extractive-QA-type-2
Viewer
• Updated • 9.22k • 1
Viewer
• Updated • 393k • 8
• 1
nlplabtdtu/OpenOrca-2-fact-vi
Viewer
• Updated • 2.72k • 11
nlplabtdtu/OpenOrca-conclusion-condition-vi
Viewer
• Updated • 1.11k • 6
nlplabtdtu/OpenOrca-describe-vi
Viewer
• Updated • 3.22k • 6
nlplabtdtu/OpenOrca-predict-people-action-vi
Viewer
• Updated • 2.15k • 8
nlplabtdtu/OpenOrca-solution-for-a-goal-vi
Viewer
• Updated • 1.25k • 11
nlplabtdtu/ai_la_trieu_phu
Viewer
• Updated • 13.6k • 1
nlplabtdtu/biosses-sts-vi
Viewer
• Updated • 100 • 3
Viewer
• Updated • 18.7k • 1
• 2
nlplabtdtu/classification_fqa
Viewer
• Updated • 1.95k • 3
nlplabtdtu/classification_fqa_cmc
Viewer
• Updated • 2.3k • 5
nlplabtdtu/classification_fqa_cmc_31
Viewer
• Updated • 341 • 7
Viewer
• Updated • 4.17k • 5
Viewer
• Updated • 6.38k • 5
Viewer
• Updated • 25.7k • 5
Viewer
• Updated • 1.13k • 4
nlplabtdtu/daily_dialog_gan
Viewer
• Updated • 13.1k • 5
nlplabtdtu/daily_dialog_gan_discriminator
Viewer
• Updated • 6.45k • 1
• 1
nlplabtdtu/data-synthetic-part-2
Viewer
• Updated • 467k • 33
• 1
Viewer
• Updated • 27 • 7
nlplabtdtu/diem_chuan_dai_hoc
Viewer
• Updated • 36.2k • 1
Viewer
• Updated • 2.39k • 5
nlplabtdtu/ds-synthetic-version-2
Viewer
• Updated • 416k • 24
nlplabtdtu/edu-crawl-with-date
Viewer
• Updated • 279k • 1
nlplabtdtu/edu_data_with_tag
Viewer
• Updated • 214k • 3
Viewer
• Updated • 251 • 3
nlplabtdtu/general-multi-choices-ailatrieuphu-870
Viewer
• Updated • 870 • 3
nlplabtdtu/general-multi-choices-food-100-v2
Viewer
• Updated • 78 • 4
nlplabtdtu/general-multi-choices-geo
Viewer
• Updated • 62 • 2
nlplabtdtu/general-multi-choices-tech
nlplabtdtu/general-people-multichoices-vi
Viewer
• Updated • 100 • 2
Viewer
• Updated • 1.31k • 3
Viewer
• Updated • 4.25k • 3
Viewer
• Updated • 9.72k • 6
nlplabtdtu/legal-citation-choosen-qa
Viewer
• Updated • 775 • 3
nlplabtdtu/legal-multiple-choice
Viewer
• Updated • 1.78k • 4
nlplabtdtu/legal_qa_with_old_docs
Viewer
• Updated • 16.9k • 5
Viewer
• Updated • 19 • 2
Viewer
• Updated • 27.2k • 2
nlplabtdtu/multi-choices-food-100-v2
Viewer
• Updated • 78 • 1
nlplabtdtu/multi-choices-text
Viewer
• Updated • 58.3k • 1
Viewer
• Updated • 56.2k • 42
Viewer
• Updated • 19.6k • 1
Viewer
• Updated • 20.6k • 5
Viewer
• Updated • 1k • 5
Viewer
• Updated • 48 • 9
Viewer
• Updated • 203 • 5
nlplabtdtu/review_edu_data
Viewer
• Updated • 684 • 6
• 1
Viewer
• Updated • 777k • 5
Viewer
• Updated • 16.4k • 4
nlplabtdtu/sentiment-analysis-UIT
Viewer
• Updated • 16.4k • 3
nlplabtdtu/sentiment-analysis-se
Viewer
• Updated • 494 • 3
Viewer
• Updated • 9.93k • 3
Viewer
• Updated • 3.11k • 6
Viewer
• Updated • 1.5k • 3
Viewer
• Updated • 3.75k • 3
Viewer
• Updated • 3k • 3
Viewer
• Updated • 1.19k • 3
nlplabtdtu/summarization_sft
Viewer
• Updated • 1.2k • 4
Viewer
• Updated • 66.4k • 4
Viewer
• Updated • 570 • 3
nlplabtdtu/tdtu_info_major
Viewer
• Updated • 44 • 3
Viewer
• Updated • 106 • 4
nlplabtdtu/train-tokenizor-ds-T5
Viewer
• Updated • 1.89M • 3
Viewer
• Updated • 14.3k • 1
nlplabtdtu/tvpl-chinh-sach-moi
Viewer
• Updated • 49.7k • 1
nlplabtdtu/tvpl-chinh-sach-moi-links
Viewer
• Updated • 49.7k • 1
nlplabtdtu/tvpl-qa-detail
Viewer
• Updated • 46.4k • 5
• 1
Viewer
• Updated • 329k • 75
nlplabtdtu/tvpl_split_error
Viewer
• Updated • 2.63k • 8
nlplabtdtu/uni_collection
Viewer
• Updated • 224k • 4
nlplabtdtu/uni_law_review_data
Viewer
• Updated • 10.4k • 3
nlplabtdtu/university-dataset
Viewer
• Updated • 214k • 1
nlplabtdtu/val-tokenizor-ds-T5
Viewer
• Updated • 210k • 2
Viewer
• Updated • 329k • 4
Viewer
• Updated • 330k • 5
Preview
• Updated • 21
nvidia/Nemotron-Terminal-Corpus
Viewer
• Updated • 366k • 4.11k
• 136
Viewer
• Updated • 13.8M • 114
openai/BrowseCompLongContext
Viewer
• Updated • 295 • 10.1k
• 53
Viewer
• Updated • 2.91k • 49
• 9
opensporks/hackernews-top
Viewer
• Updated • 503k • 10
opensporks/stocknewseventssentiment-snes-10
Viewer
• Updated • 218k • 5
• 1
Viewer
• Updated • 77.5k • 24
sam2ai/odia_cc_news_parallel
Viewer
• Updated • 6.15k • 6
sert121/spambase_dataset_balanced_text
Viewer
• Updated • 3.63k • 5
sert121/spambase_dataset_balanced_text_serialized
Viewer
• Updated • 3.26k • 4
sert121/synthetic_data_textual
Viewer
• Updated • 9.54k • 11
sert121/synthetic_data_textual_leavingT_Q_W_O_V_U_X
Viewer
• Updated • 9.54k • 8
sert121/synthetic_data_textual_leaving_T
Viewer
• Updated • 9.54k • 12
sert121/synthetic_data_textual_leaving_T_Q
Viewer
• Updated • 9.54k • 11
sert121/synthetic_data_textual_leaving_T_Q_W
Viewer
• Updated • 9.54k • 9
sert121/synthetic_data_textual_leaving_T_Q_W_L
Viewer
• Updated • 9.54k • 10
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2
Viewer
• Updated • 9.54k • 8
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O
Viewer
• Updated • 9.54k • 11
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V
Viewer
• Updated • 9.54k • 9
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U
Viewer
• Updated • 9.54k • 10
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X
Viewer
• Updated • 9.54k • 11
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X_A
Viewer
• Updated • 9.54k • 8
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X_A_Z
Viewer
• Updated • 9.54k • 10
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X_A_Z_R
Viewer
• Updated • 9.54k • 8
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X_A_Z_R_B
Viewer
• Updated • 9.54k • 8
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X_A_Z_R_B_S
Viewer
• Updated • 9.54k • 8
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X_A_Z_R_B_S_M
Viewer
• Updated • 9.54k • 7
sert121/synthetic_data_textual_leaving_T_Q_W_L_N2_O_V_U_X_A_Z_R_B_S_M_P
Viewer
• Updated • 9.54k • 9
sert121/synthetic_data_textual_leaving_T_Q_W_O_V_U_X
Viewer
• Updated • 9.54k • 10
shawhin/ai-job-embedding-finetuning
Viewer
• Updated • 1.01k • 40
• 4
skbose/indian-english-nptel-v0-tags-gender-accent-text
Viewer
• Updated • 544k • 8
skbose/indian-english-nptel-v0-tags-gender-text
Viewer
• Updated • 544k • 10
skbose/indian-english-nptel-v0-tags-text
Viewer
• Updated • 544k • 8
swagat-panda/POS_language_detect_tagged
thangvip/combined-vietnamese-legal-text
Viewer
• Updated • 215k • 21
• 1
Viewer
• Updated • 329k • 9
thangvip/legal-documents-splits-filtered
Viewer
• Updated • 207k • 6
thangvip/legal-documents-splitted
Viewer
• Updated • 2.93M • 32
thangvip/legaldocuments-nli-test
Viewer
• Updated • 1.92k • 3
thangvip/legaldocuments-nli-test-v2
Viewer
• Updated • 1.42k • 3
thangvip/legaldocuments-nli-test-v3
Viewer
• Updated • 1.42k • 13
tum-nlp/German4All-Corpus
Preview
• Updated • 110
• 2
Updated • 70
• 9
Viewer
• Updated • 77.4k • 29
tum-nlp/sexism-socialmedia-balanced
Viewer
• Updated • 20.1k • 48
• 2
tum-nlp/span-similarity-dataset
Viewer
• Updated • 1k • 51
Viewer
• Updated • 144k • 14
valurank/News_Articles_Categorization
Viewer
• Updated • 3.72k • 148
• 5
Viewer
• Updated • 13.4k • 20
• 1
valurank/Topic_Classification
Viewer
• Updated • 22.5k • 87
• 4
Viewer
• Updated • 81 • 24
voidful/earica_text_train
Viewer
• Updated • 497k • 2
waifu-research-department/embeddings
Updated • 26
• 3
Viewer
• Updated • 367 • 62
• 1
wow2000/japanese_fake_news
Viewer
• Updated • 6.85k • 6
Jofthomas/gemma-japanese-english-translation
Viewer
• Updated • 16.2k • 10
Jofthomas/japanese-english-translation
Viewer
• Updated • 16.2k • 14
• 2
PinkPixel/vietnamese-to-english-crochet
Viewer
• Updated • 2k • 29
hac541309/multilingual_tokenizers
Preview
• Updated • 22
lianghsun/chinese-english-technical-patent-glossary
Viewer
• Updated • 3.25M • 38
• 2