---
base_model:
- custom/SimpleMLP
datasets:
- heig-vd-geo/GridNet-HD
language:
- en
license: mit
metrics:
- mean_iou
pipeline_tag: other
---

# GridNet-HD Baseline: Late Fusion MLP on Dual Softmax Outputs

This repository provides an implementation of a simple Multi-Layer Perceptron (MLP) baseline on the task of late fusing two LiDAR softmax outputs, as presented in the paper [GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure](https://huggingface.co/papers/2601.13052).

## Overview

This repository provides an implementation of a simple Multi-Layer Perceptron (MLP) baseline on the task of late fusing two LiDAR softmax outputs. Before using this baseline, results from the 2 other baselines are required. This repository includes:

- Per-zone preprocessing of two LiDAR softmax files (`image_vote` and `spt`) into combined feature tensors
- A lightweight `SimpleMLP` model that concatenates the two softmax vectors per point
- Training, validation and inference loops
- Weights & Biases integration for real-time experiment tracking

This implementation serves as one of the official baselines for GridNet-HD.
---

## Table of Contents

- [Project Structure](#project-structure)
- [Configuration](#configuration)
- [Environment](#environment)
- [Dataset Structure](#dataset-structure)
- [Installation](#installation)
- [Supported Modes](#supported-modes)
- [Results](#results)
- [Pretrained Weights](#pretrained-weights)
- [Usage Examples](#usage-examples)
- [Weights & Biases Integration](#weights--biases-integration)
- [License](#license)
- [Contact](#contact)
- [Citation](#citation)

---

## Project Structure

```
project_root/
├── main.py # Entry point (all modes)
├── config.yaml # Configuration parameters
├── dataset/
│ ├── lidar_dataset.py # dataset for training/validation/test
│ └── preprocess_multi_processing.py # prepare files for training
├── las_utils/
│ ├── io.py # .las reading
│ └── matching.py # Nearest-neighbor matching
├── model/
│ └── model.py # SimpleMLP definition
├── train/
│ ├── train.py # eval and train loop
│ └── test.py # inference function for test split
├── utils/
│ ├── logging_utils.py
│ └── metrics.py # compute all metrics
├── requirements.txt # Python dependencies
└── README.md # This file
```


---

## Configuration

All training and evaluation settings are stored in a single file: `config.yaml`.

### Key sections

#### `dataset`  
Controls data loading, preprocessing, and class remapping.
- `root`: path to the folder containing all raw zones
- `split_file`: path to a JSON file defining train/val/test splits
- `n_classes`: number of target classes
- `voxel_size`: downsampling voxel size (used for KDTree)
- `pre-processing_num_workers`: parallelism for data preprocessing
- `max_point_per_class`: maximum number of points sampled per class during training
- `class_map`: label remapping rules (original → new class)

#### `training`  
Hyperparameters and runtime configuration.
- `batch_size`, `epochs`, `learning_rate`, `weight_decay`
- `lr_step_size`, `lr_gamma`: learning rate scheduler (StepLR)
- `device`: `"cuda"` or `"cpu"`

#### `model`  
Defines the MLP structure
- `hidden_dims`: list of layer widths
- `ignore_index`: label to ignore during loss computation

#### `logging`  
Output and checkpoint configuration.
- `save_dir`: where to store logs and model weights
- `save_freq`: save checkpoint every N epochs

#### `wandb`  
Weights & Biases experiment tracking.
- `project`: W&B project name
- `entity`: your W&B team or username

## Environment

| Component       | Details                          |
| --------------- | -------------------------------- |
| GPU             | NVIDIA A40 (48 GB VRAM)          |
| CUDA Version    | 12.x                             |
| OS              | Ubuntu 22.04 LTS                 |
| RAM             | 256 GB     |

## Dataset structure

The structure of the GridNet-HD dataset remains the same (see [GridNet-HD dataset](https://huggingface.co/datasets/heig-vd-geo/GridNet-HD) for more information)
Raw zones (36 folders) are completed with the results from the 2 other baselines (soft-log LiDAR from ImageVote and SPT):

```
/path/to/data/
├── t1z5b/
│   ├── lidar_softmax_image_vote/t1z4_with_softmax.las   # LiDAR with soft-log from ImageVote baseline
│   ├── lidar_softmax_spt/t1z4_with_softmax.las          # LiDAR with soft-log from SPT baseline
│   └── lidar/t1z4.las                                   # ground-truth
├── …
└── split.json                      # maps zones → train/val/test
```

After preprocessing:

```
/path/to/data/preprocessed/
├── t1z4.pt        # contains {features and labels}
├── t1z5a.pt
└── …

```

## Installation

1. **Clone the repository**:
 
   ```bash
   git clone https://github.com/heig-vd-geo/baseline_fusion_mlp.git
   cd baseline_fusion_mlp
   ```

2. **Create a conda virtual environment**:

   ```bash
   conda create -n gridnet_hd_mlp python=3.12
   conda activate gridnet_hd_mlp
   
   ```

3. **Install dependencies**:

   ```bash
   pip install --upgrade pip
   pip install -r requirements.txt
   ```

## Supported modes

Use --mode in main.py:

| Mode         | Description                                     |
| ------------ | ----------------------------------------------- |
| `preprocess` | Convert all zones `.las` → `.pt` with remapping |
| `train`      | Train SimpleMLP on train split                  |
| `val`        | Validate model on val split                     |
| `test`       | Evaluate on test split                          |


## Results

The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level for the best model. 

| Class                     | IoU (Test set) (%)|
|---------------------------|------------|
| Pylon                     |   94.82    |
| Conductor cable           |   94.40    |
| Structural cable          |   82.52    |
| Insulator                 |   86.98    |
| High vegetation           |   83.08    |
| Low vegetation            |   47.64    |
| Herbaceous vegetation     |   80.75    |
| Rock, gravel, soil        |   42.89    |
| Impervious soil (Road)    |   80.26    |
| Water                     |   61.69    |
| Building                  |   61.40    |
| **Mean IoU (mIoU)**       |  **74.22** |


## Pretrained Weights

Checkpoints for the best-performing model (mIoU = 74.22%) are available directly in the repository.


## Usage examples

Before training the model, use the `preprocess` mode and configure the `config.yaml` file accordingly.

### Preprocessing

```bash
python main.py --mode preprocess --config config.yaml
```
This will concatenate features from SPT soft-log and ImageVote soft-log, apply remapping, and prepare files for training.

### Training

```
python main.py --mode train --config config.yaml
```
Trains the MLP late fusion model using the dataset and settings defined in config.yaml. Checkpoints and logs are saved under logging.save_dir.

### Validation

```
python main.py --mode val --config config.yaml --weights best_model.pt
```
Evaluates the model on the validation set and prints out per-class IoUs and mIoU.

### Test (Las export)

```
python main.py --mode test --config config.yaml --weights best_model.pt
```
Runs inference on the test set and exports the original .las files with the field classification, which contains the predicted class label for each point.

## Weights & Biases Integration

- Login:
```
wandb login
```
- Set logging.wandb.project & .entity in config.yaml.

All training and validation metrics will be tracked live.

## License

This project is released under the MIT License.

## Contact

For questions, issues, or contributions, please open an issue on the repository.

## Citation

If you use this repo in research, please cite:

```
@misc{gridnet-hd-dataset,
      title={GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure}, 
      author={Antoine Carreaud and Shanci Li and Malo De Lacour and Digre Frinde and Jan Skaloud and Adrien Gressin},
      year={2026},
      eprint={2601.13052},
      url={https://arxiv.org/abs/2601.13052}, 
}
```