--- license: mit language: - pt tags: - tabular-classification - graph-theory - urban-mobility - public-transport - scikit-learn - sao-paulo - brazil library_name: sklearn datasets: - cintia-shinoda/sp-transit-network-centrality metrics: - f1 - accuracy --- # SP Transit Node Classifier Classifies bus stops in São Paulo's transit network as **Hub**, **Intermediate**, or **Peripheral** based on graph features and geographic coordinates. The goal: **predict betweenness centrality class without computing betweenness itself** (which is computationally expensive for large networks). ## How to Use ```python import joblib import numpy as np from huggingface_hub import hf_hub_download path = hf_hub_download( repo_id="cintia-shinoda/sp-transit-node-classifier", filename="model.joblib", ) model = joblib.load(path) # Input: [degree, degree_centrality, closeness_centrality, lat, lon] node = np.array([[8, 0.00036, 0.018, -23.55, -46.63]]) pred = model.predict(node) # 0 = Peripheral, 1 = Intermediate, 2 = Hub ``` ## Features | Feature | Description | |---------|-------------| | degree | Number of direct connections | | degree_centrality | Normalized degree centrality | | closeness_centrality | Closeness centrality | | lat | Latitude | | lon | Longitude | ## Metrics | Metric | Value | |--------|-------| | F1 Macro (test) | 0.59 | | Accuracy (test) | 0.68 | | F1 Macro (5-fold CV) | 0.43 | ## Feature Importance | Feature | Importance | |---------|-----------| | lat | 0.2793 | | lon | 0.2604 | | closeness_centrality | 0.2566 | | degree | 0.1061 | | degree_centrality | 0.0976 | ## Key Finding Geographic position (lat/lon) is the strongest predictor of hub status, confirming that high-centrality stops concentrate in specific corridors of São Paulo. ## Limitations - Labels derived from betweenness centrality quantiles — simplified classification - Trained on a single GTFS snapshot — may not generalize to network changes - Does not consider temporal patterns (peak vs. off-peak) - Class imbalance: 66% Peripheral, 24% Intermediate, 10% Hub ## Dataset [SP Transit Network Centrality](https://huggingface.co/datasets/cintia-shinoda/sp-transit-network-centrality) — 21,892 bus stops with graph centrality metrics. ## Citation ```bibtex @misc{shinoda2026sp-classifier, author = {Cintia Shinoda}, title = {SP Transit Node Classifier}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/cintia-shinoda/sp-transit-node-classifier} } ```