Model Card: DQN Traffic Light Controller (Belgradzka-KEN, Warsaw)
Model Summary
This project contains a Deep Q-Network (DQN) model trained to control traffic lights at the Belgradzka-KEN intersection in Warsaw.
The model learns signal switching behavior in a SUMO simulation to improve traffic flow metrics such as queue length and delay.
Intended Use
- Primary use: Research and demonstration of reinforcement learning for traffic signal control in SUMO.
- Deployment scope: This model is trained for one specific intersection only (Belgradzka-KEN).
- Important limitation: It is not expected to work correctly on other intersections without retraining.
Training Setup
- Algorithm: DQN
- Final selected model:
- Training steps: 225,000
- Learning rate: 0.0005
- Traffic during training:
- Random traffic with fixed car arrival interval
- Random traffic with random car arrival interval (closer to real-world variability)
Hyperparameter Configurations
Best / Stable Model (dqn_fixed_lr_5e-4_225k_final.zip)
This is the final selected model. Compared with the unstable variant, it uses a lower learning rate and more frequent target-network updates, which gave more stable behavior and better final metrics.
training:
total_timesteps: 225_000
seed: 42
learning_rate: 0.0005
buffer_size: 50000
learning_starts: 1000
batch_size: 64
gamma: 0.99
train_freq: 1
target_update_interval: 500
exploration_fraction: 0.2
exploration_initial_eps: 1.0
exploration_final_eps: 0.05
device: auto
Unstable Model (dqn_fixed_tui_1e3_225k_unstable.zip)
This model was intentionally kept for comparison. With a higher learning rate and a slower target-network update schedule, training was less stable and produced worse traffic control quality.
training:
total_timesteps: 225_000
seed: 42
learning_rate: 0.001
buffer_size: 50000
learning_starts: 1000
batch_size: 64
gamma: 0.99
train_freq: 1
target_update_interval: 1000
exploration_fraction: 0.2
exploration_initial_eps: 1.0
exploration_final_eps: 0.05
device: auto
Demonstration
GUI recording showing how the trained model controls traffic lights:
Training Results
Effect of Training Steps (Fixed Arrival Interval)
This plot shows performance trends for models trained with different numbers of steps on random traffic with fixed car arrival interval:
Effect of Training Steps (Random Arrival Interval)
This plot shows performance trends for models trained with different numbers of steps on random traffic with random car arrival interval, intended to better simulate real-world traffic:
Hyperparameter Tuning
This plot compares how different hyperparameter settings affect model performance, tested on:
- low fixed arrival interval
- medium fixed arrival interval
- high fixed arrival interval
- truly random traffic (mix of low/medium/high, closer to real world)
Final Evaluation and Stability Comparison
Two models were compared:
- Stable/best model: selected final model
- Unstable model: model with
target_update_interval = 1000(instead of 500)
The comparison highlights the importance of choosing proper hyperparameters.
Mean Delay by Method
Mean Queue Length by Method
Usage
If you want to train, evaluate, or use these models yourself, see the repository:





