Model Card: DQN Traffic Light Controller (Belgradzka-KEN, Warsaw)

Model Summary

This project contains a Deep Q-Network (DQN) model trained to control traffic lights at the Belgradzka-KEN intersection in Warsaw.

The model learns signal switching behavior in a SUMO simulation to improve traffic flow metrics such as queue length and delay.

Intended Use

  • Primary use: Research and demonstration of reinforcement learning for traffic signal control in SUMO.
  • Deployment scope: This model is trained for one specific intersection only (Belgradzka-KEN).
  • Important limitation: It is not expected to work correctly on other intersections without retraining.

Training Setup

  • Algorithm: DQN
  • Final selected model:
    • Training steps: 225,000
    • Learning rate: 0.0005
  • Traffic during training:
    • Random traffic with fixed car arrival interval
    • Random traffic with random car arrival interval (closer to real-world variability)

Hyperparameter Configurations

Best / Stable Model (dqn_fixed_lr_5e-4_225k_final.zip)

This is the final selected model. Compared with the unstable variant, it uses a lower learning rate and more frequent target-network updates, which gave more stable behavior and better final metrics.

training:
    total_timesteps: 225_000
    seed: 42
    learning_rate: 0.0005
    buffer_size: 50000
    learning_starts: 1000
    batch_size: 64
    gamma: 0.99
    train_freq: 1
    target_update_interval: 500
    exploration_fraction: 0.2
    exploration_initial_eps: 1.0
    exploration_final_eps: 0.05
    device: auto

Unstable Model (dqn_fixed_tui_1e3_225k_unstable.zip)

This model was intentionally kept for comparison. With a higher learning rate and a slower target-network update schedule, training was less stable and produced worse traffic control quality.

training:
    total_timesteps: 225_000
    seed: 42
    learning_rate: 0.001
    buffer_size: 50000
    learning_starts: 1000
    batch_size: 64
    gamma: 0.99
    train_freq: 1
    target_update_interval: 1000
    exploration_fraction: 0.2
    exploration_initial_eps: 1.0
    exploration_final_eps: 0.05
    device: auto

Demonstration

GUI recording showing how the trained model controls traffic lights:

DQN GUI demo

Training Results

Effect of Training Steps (Fixed Arrival Interval)

This plot shows performance trends for models trained with different numbers of steps on random traffic with fixed car arrival interval:

Fixed interval training steps trend

Effect of Training Steps (Random Arrival Interval)

This plot shows performance trends for models trained with different numbers of steps on random traffic with random car arrival interval, intended to better simulate real-world traffic:

Random interval training steps trend

Hyperparameter Tuning

This plot compares how different hyperparameter settings affect model performance, tested on:

  • low fixed arrival interval
  • medium fixed arrival interval
  • high fixed arrival interval
  • truly random traffic (mix of low/medium/high, closer to real world)

Hyperparameter tuning results

Final Evaluation and Stability Comparison

Two models were compared:

  • Stable/best model: selected final model
  • Unstable model: model with target_update_interval = 1000 (instead of 500)

The comparison highlights the importance of choosing proper hyperparameters.

Mean Delay by Method

Mean delay comparison

Mean Queue Length by Method

Mean queue length comparison

Usage

If you want to train, evaluate, or use these models yourself, see the repository:

https://github.com/Tombiczek/rl-traffic-control-sumo

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading