cintia-shinoda commited on
Commit
800c292
·
verified ·
1 Parent(s): a0cd463

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - pt
5
+ tags:
6
+ - tabular-classification
7
+ - graph-theory
8
+ - urban-mobility
9
+ - public-transport
10
+ - scikit-learn
11
+ - sao-paulo
12
+ - brazil
13
+ library_name: sklearn
14
+ datasets:
15
+ - cintia-shinoda/sp-transit-network-centrality
16
+ metrics:
17
+ - f1
18
+ - accuracy
19
+ ---
20
+
21
+ # SP Transit Node Classifier
22
+
23
+ Classifies bus stops in São Paulo's transit network as **Hub**, **Intermediate**, or **Peripheral** based on graph features and geographic coordinates.
24
+
25
+ The goal: **predict betweenness centrality class without computing betweenness itself** (which is computationally expensive for large networks).
26
+
27
+ ## How to Use
28
+ ```python
29
+ import joblib
30
+ import numpy as np
31
+ from huggingface_hub import hf_hub_download
32
+
33
+ path = hf_hub_download(
34
+ repo_id="cintia-shinoda/sp-transit-node-classifier",
35
+ filename="model.joblib",
36
+ )
37
+ model = joblib.load(path)
38
+
39
+ # Input: [degree, degree_centrality, closeness_centrality, lat, lon]
40
+ node = np.array([[8, 0.00036, 0.018, -23.55, -46.63]])
41
+ pred = model.predict(node)
42
+ # 0 = Peripheral, 1 = Intermediate, 2 = Hub
43
+ ```
44
+
45
+ ## Features
46
+
47
+ | Feature | Description |
48
+ |---------|-------------|
49
+ | degree | Number of direct connections |
50
+ | degree_centrality | Normalized degree centrality |
51
+ | closeness_centrality | Closeness centrality |
52
+ | lat | Latitude |
53
+ | lon | Longitude |
54
+
55
+ ## Metrics
56
+
57
+ | Metric | Value |
58
+ |--------|-------|
59
+ | F1 Macro (test) | 0.59 |
60
+ | Accuracy (test) | 0.68 |
61
+ | F1 Macro (5-fold CV) | 0.43 |
62
+
63
+ ## Feature Importance
64
+
65
+ | Feature | Importance |
66
+ |---------|-----------|
67
+ | lat | 0.2793 |
68
+ | lon | 0.2604 |
69
+ | closeness_centrality | 0.2566 |
70
+ | degree | 0.1061 |
71
+ | degree_centrality | 0.0976 |
72
+
73
+ ## Key Finding
74
+
75
+ Geographic position (lat/lon) is the strongest predictor of hub status, confirming that high-centrality stops concentrate in specific corridors of São Paulo.
76
+
77
+ ## Limitations
78
+
79
+ - Labels derived from betweenness centrality quantiles — simplified classification
80
+ - Trained on a single GTFS snapshot — may not generalize to network changes
81
+ - Does not consider temporal patterns (peak vs. off-peak)
82
+ - Class imbalance: 66% Peripheral, 24% Intermediate, 10% Hub
83
+
84
+ ## Dataset
85
+
86
+ [SP Transit Network Centrality](https://huggingface.co/datasets/cintia-shinoda/sp-transit-network-centrality) — 21,892 bus stops with graph centrality metrics.
87
+
88
+ ## Citation
89
+ ```bibtex
90
+ @misc{shinoda2026sp-classifier,
91
+ author = {Cintia Shinoda},
92
+ title = {SP Transit Node Classifier},
93
+ year = {2026},
94
+ publisher = {Hugging Face},
95
+ url = {https://huggingface.co/cintia-shinoda/sp-transit-node-classifier}
96
+ }
97
+ ```