Model Card: DQN Traffic Light Controller (Belgradzka-KEN, Warsaw)

Model Summary

This project contains a Deep Q-Network (DQN) model trained to control traffic lights at the Belgradzka-KEN intersection in Warsaw.

The model learns signal switching behavior in a SUMO simulation to improve traffic flow metrics such as queue length and delay.

Intended Use

Primary use: Research and demonstration of reinforcement learning for traffic signal control in SUMO.
Deployment scope: This model is trained for one specific intersection only (Belgradzka-KEN).
Important limitation: It is not expected to work correctly on other intersections without retraining.

Training Setup

Algorithm: DQN
Final selected model:
- Training steps: 225,000
- Learning rate: 0.0005
Traffic during training:
- Random traffic with fixed car arrival interval
- Random traffic with random car arrival interval (closer to real-world variability)

Hyperparameter Configurations

Best / Stable Model (`dqn_fixed_lr_5e-4_225k_final.zip`)

This is the final selected model. Compared with the unstable variant, it uses a lower learning rate and more frequent target-network updates, which gave more stable behavior and better final metrics.

training:
    total_timesteps: 225_000
    seed: 42
    learning_rate: 0.0005
    buffer_size: 50000
    learning_starts: 1000
    batch_size: 64
    gamma: 0.99
    train_freq: 1
    target_update_interval: 500
    exploration_fraction: 0.2
    exploration_initial_eps: 1.0
    exploration_final_eps: 0.05
    device: auto

Unstable Model (`dqn_fixed_tui_1e3_225k_unstable.zip`)

This model was intentionally kept for comparison. With a higher learning rate and a slower target-network update schedule, training was less stable and produced worse traffic control quality.

training:
    total_timesteps: 225_000
    seed: 42
    learning_rate: 0.001
    buffer_size: 50000
    learning_starts: 1000
    batch_size: 64
    gamma: 0.99
    train_freq: 1
    target_update_interval: 1000
    exploration_fraction: 0.2
    exploration_initial_eps: 1.0
    exploration_final_eps: 0.05
    device: auto

Demonstration

GUI recording showing how the trained model controls traffic lights:

Training Results

Effect of Training Steps (Fixed Arrival Interval)

This plot shows performance trends for models trained with different numbers of steps on random traffic with fixed car arrival interval:

Effect of Training Steps (Random Arrival Interval)

This plot shows performance trends for models trained with different numbers of steps on random traffic with random car arrival interval, intended to better simulate real-world traffic: