--- base_model: - custom/SimpleMLP datasets: - heig-vd-geo/GridNet-HD language: - en license: mit metrics: - mean_iou pipeline_tag: other --- # GridNet-HD Baseline: Late Fusion MLP on Dual Softmax Outputs This repository provides an implementation of a simple Multi-Layer Perceptron (MLP) baseline on the task of late fusing two LiDAR softmax outputs, as presented in the paper [GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure](https://huggingface.co/papers/2601.13052). ## Overview This repository provides an implementation of a simple Multi-Layer Perceptron (MLP) baseline on the task of late fusing two LiDAR softmax outputs. Before using this baseline, results from the 2 other baselines are required. This repository includes: - Per-zone preprocessing of two LiDAR softmax files (`image_vote` and `spt`) into combined feature tensors - A lightweight `SimpleMLP` model that concatenates the two softmax vectors per point - Training, validation and inference loops - Weights & Biases integration for real-time experiment tracking This implementation serves as one of the official baselines for GridNet-HD. --- ## Table of Contents - [Project Structure](#project-structure) - [Configuration](#configuration) - [Environment](#environment) - [Dataset Structure](#dataset-structure) - [Installation](#installation) - [Supported Modes](#supported-modes) - [Results](#results) - [Pretrained Weights](#pretrained-weights) - [Usage Examples](#usage-examples) - [Weights & Biases Integration](#weights--biases-integration) - [License](#license) - [Contact](#contact) - [Citation](#citation) --- ## Project Structure ``` project_root/ ├── main.py # Entry point (all modes) ├── config.yaml # Configuration parameters ├── dataset/ │ ├── lidar_dataset.py # dataset for training/validation/test │ └── preprocess_multi_processing.py # prepare files for training ├── las_utils/ │ ├── io.py # .las reading │ └── matching.py # Nearest-neighbor matching ├── model/ │ └── model.py # SimpleMLP definition ├── train/ │ ├── train.py # eval and train loop │ └── test.py # inference function for test split ├── utils/ │ ├── logging_utils.py │ └── metrics.py # compute all metrics ├── requirements.txt # Python dependencies └── README.md # This file ``` --- ## Configuration All training and evaluation settings are stored in a single file: `config.yaml`. ### Key sections #### `dataset` Controls data loading, preprocessing, and class remapping. - `root`: path to the folder containing all raw zones - `split_file`: path to a JSON file defining train/val/test splits - `n_classes`: number of target classes - `voxel_size`: downsampling voxel size (used for KDTree) - `pre-processing_num_workers`: parallelism for data preprocessing - `max_point_per_class`: maximum number of points sampled per class during training - `class_map`: label remapping rules (original → new class) #### `training` Hyperparameters and runtime configuration. - `batch_size`, `epochs`, `learning_rate`, `weight_decay` - `lr_step_size`, `lr_gamma`: learning rate scheduler (StepLR) - `device`: `"cuda"` or `"cpu"` #### `model` Defines the MLP structure - `hidden_dims`: list of layer widths - `ignore_index`: label to ignore during loss computation #### `logging` Output and checkpoint configuration. - `save_dir`: where to store logs and model weights - `save_freq`: save checkpoint every N epochs #### `wandb` Weights & Biases experiment tracking. - `project`: W&B project name - `entity`: your W&B team or username ## Environment | Component | Details | | --------------- | -------------------------------- | | GPU | NVIDIA A40 (48 GB VRAM) | | CUDA Version | 12.x | | OS | Ubuntu 22.04 LTS | | RAM | 256 GB | ## Dataset structure The structure of the GridNet-HD dataset remains the same (see [GridNet-HD dataset](https://huggingface.co/datasets/heig-vd-geo/GridNet-HD) for more information) Raw zones (36 folders) are completed with the results from the 2 other baselines (soft-log LiDAR from ImageVote and SPT): ``` /path/to/data/ ├── t1z5b/ │ ├── lidar_softmax_image_vote/t1z4_with_softmax.las # LiDAR with soft-log from ImageVote baseline │ ├── lidar_softmax_spt/t1z4_with_softmax.las # LiDAR with soft-log from SPT baseline │ └── lidar/t1z4.las # ground-truth ├── … └── split.json # maps zones → train/val/test ``` After preprocessing: ``` /path/to/data/preprocessed/ ├── t1z4.pt # contains {features and labels} ├── t1z5a.pt └── … ``` ## Installation 1. **Clone the repository**: ```bash git clone https://github.com/heig-vd-geo/baseline_fusion_mlp.git cd baseline_fusion_mlp ``` 2. **Create a conda virtual environment**: ```bash conda create -n gridnet_hd_mlp python=3.12 conda activate gridnet_hd_mlp ``` 3. **Install dependencies**: ```bash pip install --upgrade pip pip install -r requirements.txt ``` ## Supported modes Use --mode in main.py: | Mode | Description | | ------------ | ----------------------------------------------- | | `preprocess` | Convert all zones `.las` → `.pt` with remapping | | `train` | Train SimpleMLP on train split | | `val` | Validate model on val split | | `test` | Evaluate on test split | ## Results The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level for the best model. | Class | IoU (Test set) (%)| |---------------------------|------------| | Pylon | 94.82 | | Conductor cable | 94.40 | | Structural cable | 82.52 | | Insulator | 86.98 | | High vegetation | 83.08 | | Low vegetation | 47.64 | | Herbaceous vegetation | 80.75 | | Rock, gravel, soil | 42.89 | | Impervious soil (Road) | 80.26 | | Water | 61.69 | | Building | 61.40 | | **Mean IoU (mIoU)** | **74.22** | ## Pretrained Weights Checkpoints for the best-performing model (mIoU = 74.22%) are available directly in the repository. ## Usage examples Before training the model, use the `preprocess` mode and configure the `config.yaml` file accordingly. ### Preprocessing ```bash python main.py --mode preprocess --config config.yaml ``` This will concatenate features from SPT soft-log and ImageVote soft-log, apply remapping, and prepare files for training. ### Training ``` python main.py --mode train --config config.yaml ``` Trains the MLP late fusion model using the dataset and settings defined in config.yaml. Checkpoints and logs are saved under logging.save_dir. ### Validation ``` python main.py --mode val --config config.yaml --weights best_model.pt ``` Evaluates the model on the validation set and prints out per-class IoUs and mIoU. ### Test (Las export) ``` python main.py --mode test --config config.yaml --weights best_model.pt ``` Runs inference on the test set and exports the original .las files with the field classification, which contains the predicted class label for each point. ## Weights & Biases Integration - Login: ``` wandb login ``` - Set logging.wandb.project & .entity in config.yaml. All training and validation metrics will be tracked live. ## License This project is released under the MIT License. ## Contact For questions, issues, or contributions, please open an issue on the repository. ## Citation If you use this repo in research, please cite: ``` @misc{gridnet-hd-dataset, title={GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure}, author={Antoine Carreaud and Shanci Li and Malo De Lacour and Digre Frinde and Jan Skaloud and Adrien Gressin}, year={2026}, eprint={2601.13052}, url={https://arxiv.org/abs/2601.13052}, } ```