onnxmodelzoo
/

fcn-resnet50-12

+---
+language: en
+license: apache-2.0
+model_name: fcn-resnet50-12.onnx
+tags:
+- validated
+- vision
+- object_detection_segmentation
+- fcn
+---
+<!--- SPDX-License-Identifier: MIT -->
+# Fully Convolutional Network (FCN)
+## Description
+FCNs are a model for real-time neural network for class-wise image segmentation. As the name implies, every weight layer in the network is convolutional. The final layer has the same height/width as the input image, making FCNs a useful tool for doing dense pixel-wise predictions without a significant amount of postprocessing. Being fully convolutional also provides great flexibility in the resolutions this model can handle.
+This specific model detects 20 different [classes](dependencies/voc_classes.txt). The models have been pre-trained on the COCO train2017 dataset on this class subset.
+## Model
+| Model          | Download                              | Download (with sample test data)        | ONNX version | Opset version | Mean IoU |
+|----------------|:--------------------------------------|:----------------------------------------|:-------------|:--------------|:--------|
+| FCN ResNet-50  | [134 MB](model/fcn-resnet50-11.onnx)  | [213 MB](model/fcn-resnet50-11.tar.gz)  | 1.8.0        | 11 | 60.5% |
+| FCN ResNet-101 | [207 MB](model/fcn-resnet101-11.onnx) | [281 MB](model/fcn-resnet101-11.tar.gz) | 1.8.0        | 11 | 63.7% |
+| FCN ResNet-50  | [134 MB](model/fcn-resnet50-12.onnx)  | [125 MB](model/fcn-resnet50-12.tar.gz)  | 1.8.0        | 12 | 65.0% |
+| FCN ResNet-50-int8  | [34 MB](model/fcn-resnet50-12-int8.onnx)  | [29 MB](model/fcn-resnet50-12-int8.tar.gz)  | 1.8.0        | 12 | 64.7% |
+| FCN ResNet-50-qdq  | [34 MB](model/fcn-resnet50-12-qdq.onnx)  | [21 MB](model/fcn-resnet50-12-qdq.tar.gz)  | 1.8.0        | 12 | 64.4% |
+### Source
+* PyTorch Torchvision FCN ResNet50 ==> ONNX FCN ResNet50
+* PyTorch Torchvision FCN ResNet101 ==> ONNX FCN ResNet101
+* ONNX FCN ResNet50 ==> Quantized ONNX FCN ResNet50
+## Inference
+### Input
+The input is expected to be an image with the shape `(N, 3, height, width)` where `N` is the number of images in the batch, and `height` and `width` are consistent across all images.
+### Preprocessing
+The images must be loaded in RGB with a range of `[0, 1]` per channel, then normalized per-image using `mean = [0.485, 0.456, 0.406]` and `std = [0.229, 0.224, 0.225]`.
+This model can take images of different sizes as input. However, it is recommended that the images are resized such that the minimum size of either edge is 520.
+The following code shows an example of how to preprocess a [demo image](dependencies/000000017968.jpg):
+```python
+from PIL import Image
+from torchvision import transforms
+preprocess = transforms.Compose([
+transforms.ToTensor(),
+transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+])
+img = Image.open('dependencies/000000017968.jpg')
+img_data = preprocess(img).detach().cpu().numpy()
+```
+### Output of model
+The model has two outputs, `("out", "aux")`. `"out"` is the main classifier and has shape `(N, 21, height, width)`. Each output pixel is one-hot encoded, i.e. `np.argmax(out[image, :, x, y])` is that pixel's predicted class. Class 0 is the background class.
+`"aux"` is an auxilliary classifier with the same shape performing the same functionality. The difference between the two is that `"out"` sources features from last layer of the ResNet backbone, while `"aux"` sources features from the second-to-last layer.
+### Postprocessing steps
+The following code shows how to overlay the segmentation on the original image:
+```python
+from PIL import Image
+from matplotlib.colors import hsv_to_rgb
+import numpy as np
+import cv2
+classes = [line.rstrip('\n') for line in open('voc_classes.txt')]
+num_classes = len(classes)
+def get_palette():
+# prepare and return palette
+palette = [0] * num_classes * 3
+for hue in range(num_classes):
+if hue == 0: # Background color
+colors = (0, 0, 0)
+else:
+colors = hsv_to_rgb((hue / num_classes, 0.75, 0.75))
+for i in range(3):
+palette[hue * 3 + i] = int(colors[i] * 255)
+return palette
+def colorize(labels):
+# generate colorized image from output labels and color palette
+result_img = Image.fromarray(labels).convert('P', colors=num_classes)
+result_img.putpalette(get_palette())
+return np.array(result_img.convert('RGB'))
+def visualize_output(image, output):
+assert(image.shape[0] == output.shape[1] and \
+image.shape[1] == output.shape[2]) # Same height and width
+assert(output.shape[0] == num_classes)
+# get classification labels
+raw_labels = np.argmax(output, axis=0).astype(np.uint8)
+# comput confidence score
+confidence = float(np.max(output, axis=0).mean())
+# generate segmented image
+result_img = colorize(raw_labels)
+# generate blended image
+blended_img = cv2.addWeighted(image[:, :, ::-1], 0.5, result_img, 0.5, 0)
+result_img = Image.fromarray(result_img)
+blended_img = Image.fromarray(blended_img)
+return confidence, result_img, blended_img, raw_labels
+conf, result_img, blended_img, raw_labels = visualize_output(orig_tensor, one_output)
+```
+## Model Creation
+### Dataset (Train and validation)
+The FCN models have been pretrained on the [COCO train2017 dataset](https://cocodataset.org/#download), using the subset of classes from Pascal VOC classes. See the [Torchvision Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html) for more details.
+### Training
+Pretrained weights from the Torchvision Model Zoo were used instead of training these models from scratch. A [conversion notebook](dependencies/conversion.ipynb) is provided.
+### Validation accuracy
+Mean IoU (intersection over union) and global pixelwise accuracy are computed on the COCO val2017 dataset.
+[Torchvision](https://pytorch.org/docs/stable/torchvision/models.html) reports these values as follows:
+| Model          | mean IoU (%) | global pixelwise accuracy (%) |
+|----------------|:-------------|:------------------------------|
+| FCN ResNet 50  | 60.5         | 91.4                          |
+| FCN ResNet 101 | 63.7         | 91.9                          |
+If you have the [COCO val2017 dataset](https://cocodataset.org/#download) downloaded, you can confirm updated numbers using [the provided notebook](dependencies/validation_accuracy.ipynb):
+| Model          | mean IoU | global pixelwise accuracy |
+|----------------|:---------|:--------------------------|
+| FCN ResNet 50  | 65.0     | 99.6                      |
+| FCN ResNet 50-int8 | 64.7 | 99.5                      |
+| FCN ResNet 101 | 66.7     | 99.6                      |
+The more conservative of the two estimates is used in the model files table.
+> Compared with the fp32 FCN ResNet 50, FCN ResNet 50-int8's mean IoU drop ratio is 0.46% global pixelwise accuracy drop ratio is 0.10% and performance improvement is 1.28x.
+>
+> **Note**
+>
+>The performance depends on the test hardware. Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per inst ance, CentOS Linux 8.3, data batch size is 1.
+<hr>
+## Quantization
+FCN ResNet 50-int8 and FCN ResNet-50-qdq are obtained by quantizing fp32 FCN ResNet 50 model. We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/image_recognition/onnx_model_zoo/fcn/quantization/ptq/README.md) to understand how to use Intel® Neural Compressor for quantization.
+### Environment
+onnx: 1.9.0
+onnxruntime: 1.8.0
+### Prepare model
+```shell
+wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/fcn/model/fcn-resnet50-12.onnx
+```
+### Model quantize
+Make sure to specify the appropriate dataset path in the configuration file.
+```bash
+bash run_tuning.sh --input_model=path/to/model  \ # model path as *.onnx
+--config=fcn_rn50.yaml \
+--data_path=path/to/coco/val2017 \
+--label_path=path/to/coco/annotations/instances_val2017.json \
+--output_model=path/to/save
+```
+## References
+* Jonathan Long, Evan Shelhamer, Trevor Darrell; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440
+* This model is converted from the [Torchvision Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html), originally implemented by Francisco Moss [here](https://github.com/pytorch/vision/tree/master/torchvision/models/segmentation/fcn.py).
+* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
+## Contributors
+* [Jack Duvall](https://github.com/duvallj)
+* [mengniwang95](https://github.com/mengniwang95) (Intel)
+* [yuwenzho](https://github.com/yuwenzho) (Intel)
+* [airMeng](https://github.com/airMeng) (Intel)
+* [ftian1](https://github.com/ftian1) (Intel)
+* [hshen14](https://github.com/hshen14) (Intel)
+## License
+MIT License