cintia-shinoda
/

sp-transit-node-classifier

Tabular Classification

GradientBoostingClassifier

public-transport

Model card Files Files and versions

cintia-shinoda commited on Apr 3

Commit

800c292

·

verified ·

1 Parent(s): a0cd463

Create README.md

Files changed (1) hide show

README.md +97 -0

README.md ADDED Viewed

	@@ -0,0 +1,97 @@

+---
+license: mit
+language:
+  - pt
+tags:
+  - tabular-classification
+  - graph-theory
+  - urban-mobility
+  - public-transport
+  - scikit-learn
+  - sao-paulo
+  - brazil
+library_name: sklearn
+datasets:
+  - cintia-shinoda/sp-transit-network-centrality
+metrics:
+  - f1
+  - accuracy
+---
+# SP Transit Node Classifier
+Classifies bus stops in São Paulo's transit network as **Hub**, **Intermediate**, or **Peripheral** based on graph features and geographic coordinates.
+The goal: **predict betweenness centrality class without computing betweenness itself** (which is computationally expensive for large networks).
+## How to Use
+```python
+import joblib
+import numpy as np
+from huggingface_hub import hf_hub_download
+path = hf_hub_download(
+    repo_id="cintia-shinoda/sp-transit-node-classifier",
+    filename="model.joblib",
+)
+model = joblib.load(path)
+# Input: [degree, degree_centrality, closeness_centrality, lat, lon]
+node = np.array([[8, 0.00036, 0.018, -23.55, -46.63]])
+pred = model.predict(node)
+# 0 = Peripheral, 1 = Intermediate, 2 = Hub
+```
+## Features
+| Feature | Description |
+|---------|-------------|
+| degree | Number of direct connections |
+| degree_centrality | Normalized degree centrality |
+| closeness_centrality | Closeness centrality |
+| lat | Latitude |
+| lon | Longitude |
+## Metrics
+| Metric | Value |
+|--------|-------|
+| F1 Macro (test) | 0.59 |
+| Accuracy (test) | 0.68 |
+| F1 Macro (5-fold CV) | 0.43 |
+## Feature Importance
+| Feature | Importance |
+|---------|-----------|
+| lat | 0.2793 |
+| lon | 0.2604 |
+| closeness_centrality | 0.2566 |
+| degree | 0.1061 |
+| degree_centrality | 0.0976 |
+## Key Finding
+Geographic position (lat/lon) is the strongest predictor of hub status, confirming that high-centrality stops concentrate in specific corridors of São Paulo.
+## Limitations
+- Labels derived from betweenness centrality quantiles — simplified classification
+- Trained on a single GTFS snapshot — may not generalize to network changes
+- Does not consider temporal patterns (peak vs. off-peak)
+- Class imbalance: 66% Peripheral, 24% Intermediate, 10% Hub
+## Dataset
+[SP Transit Network Centrality](https://huggingface.co/datasets/cintia-shinoda/sp-transit-network-centrality) — 21,892 bus stops with graph centrality metrics.
+## Citation
+```bibtex
+@misc{shinoda2026sp-classifier,
+  author = {Cintia Shinoda},
+  title = {SP Transit Node Classifier},
+  year = {2026},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/cintia-shinoda/sp-transit-node-classifier}
+}
+```