Instructions to use shantanudave/BERTopic_ArXiv with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- BERTopic
How to use shantanudave/BERTopic_ArXiv with BERTopic:
from bertopic import BERTopic model = BERTopic.load("shantanudave/BERTopic_ArXiv") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - bertopic | |
| library_name: bertopic | |
| pipeline_tag: text-classification | |
| # BERTopic_ArXiv | |
| This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. | |
| BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. | |
| ## Usage | |
| To use this model, please install BERTopic: | |
| ``` | |
| pip install -U bertopic | |
| ``` | |
| You can use the model as follows: | |
| ```python | |
| from bertopic import BERTopic | |
| topic_model = BERTopic.load("shantanudave/BERTopic_ArXiv") | |
| topic_model.get_topic_info() | |
| ``` | |
| ## Topic overview | |
| * Number of topics: 18 | |
| * Number of training documents: 8526 | |
| <details> | |
| <summary>Click here for an overview of all topics.</summary> | |
| | Topic ID | Topic Keywords | Topic Frequency | Label | | |
| |----------|----------------|-----------------|-------| | |
| | 0 | payment - pay - card - bank - money | 742 | Payment Issues Detection | | |
| | 1 | load - slow - search - article - doesnt | 705 | Slow Search Function | | |
| | 2 | clothes - clothing - size - fashion - large size | 683 | Large Size Quality Clothing | | |
| | 3 | bon - - - - | 668 | bon documents collection | | |
| | 4 | clear - intuitive - clear easy - recommend - selection | 665 | Easy Clear Navigation | | |
| | 5 | - - - - | 649 | Keyword-Driven Document Analysis | | |
| | 6 | shopping - staff - friendly - store - satisfy | 578 | Friendly staff satisfaction | | |
| | 7 | delivery - fast delivery - fast - shipping - ship | 563 | Fast Delivery Quality | | |
| | 8 | cart - shop cart - log - password - add | 548 | Shopping Cart Issues | | |
| | 9 | easy use - easy - use - use easy - quick easy | 531 | Quick & Easy Solutions | | |
| | 10 | awesome - excellent - think - clearly - phenomenal | 462 | Really Phenomenal Clear Thinking | | |
| | 11 | quality - price - quality quality - price quality - comfortable | 454 | Excellent Quality Price | | |
| | 12 | work work - work - work quickly - flawlessly - work flawlessly | 390 | Efficient Flawless Work | | |
| | 13 | super super - super - superb - superb super - super friendly | 349 | Superb Friendly Coat | | |
| | 14 | really simple - ra - solve problem - control - satisfied easy | 145 | User-Friendly Problem Solver | | |
| | 15 | clear clear - clear - fast clear - clear fast - super clear | 144 | Clear and Transparent Working | | |
| | 16 | discover - stuff good - stuff - fact - clearly | 129 | Discovering Interesting Facts | | |
| | 17 | satisfied - satisfaction - totally satisfied - satisfied good - completely satisfied | 121 | Utmost Satisfaction | | |
| </details> | |
| ## Training hyperparameters | |
| * calculate_probabilities: True | |
| * language: None | |
| * low_memory: False | |
| * min_topic_size: 10 | |
| * n_gram_range: (1, 1) | |
| * nr_topics: None | |
| * seed_topic_list: None | |
| * top_n_words: 10 | |
| * verbose: True | |
| * zeroshot_min_similarity: 0.7 | |
| * zeroshot_topic_list: None | |
| ## Framework versions | |
| * Numpy: 1.23.5 | |
| * HDBSCAN: 0.8.33 | |
| * UMAP: 0.5.5 | |
| * Pandas: 1.3.5 | |
| * Scikit-Learn: 1.4.1.post1 | |
| * Sentence-transformers: 2.6.1 | |
| * Transformers: 4.39.3 | |
| * Numba: 0.59.1 | |
| * Plotly: 5.20.0 | |
| * Python: 3.10.13 | |