Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,197 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
model_name: fcn-resnet50-12.onnx
|
| 5 |
+
tags:
|
| 6 |
+
- validated
|
| 7 |
+
- vision
|
| 8 |
+
- object_detection_segmentation
|
| 9 |
+
- fcn
|
| 10 |
+
---
|
| 11 |
+
<!--- SPDX-License-Identifier: MIT -->
|
| 12 |
+
|
| 13 |
+
# Fully Convolutional Network (FCN)
|
| 14 |
+
|
| 15 |
+
## Description
|
| 16 |
+
FCNs are a model for real-time neural network for class-wise image segmentation. As the name implies, every weight layer in the network is convolutional. The final layer has the same height/width as the input image, making FCNs a useful tool for doing dense pixel-wise predictions without a significant amount of postprocessing. Being fully convolutional also provides great flexibility in the resolutions this model can handle.
|
| 17 |
+
|
| 18 |
+
This specific model detects 20 different [classes](dependencies/voc_classes.txt). The models have been pre-trained on the COCO train2017 dataset on this class subset.
|
| 19 |
+
|
| 20 |
+
## Model
|
| 21 |
+
|
| 22 |
+
| Model | Download | Download (with sample test data) | ONNX version | Opset version | Mean IoU |
|
| 23 |
+
|----------------|:--------------------------------------|:----------------------------------------|:-------------|:--------------|:--------|
|
| 24 |
+
| FCN ResNet-50 | [134 MB](model/fcn-resnet50-11.onnx) | [213 MB](model/fcn-resnet50-11.tar.gz) | 1.8.0 | 11 | 60.5% |
|
| 25 |
+
| FCN ResNet-101 | [207 MB](model/fcn-resnet101-11.onnx) | [281 MB](model/fcn-resnet101-11.tar.gz) | 1.8.0 | 11 | 63.7% |
|
| 26 |
+
| FCN ResNet-50 | [134 MB](model/fcn-resnet50-12.onnx) | [125 MB](model/fcn-resnet50-12.tar.gz) | 1.8.0 | 12 | 65.0% |
|
| 27 |
+
| FCN ResNet-50-int8 | [34 MB](model/fcn-resnet50-12-int8.onnx) | [29 MB](model/fcn-resnet50-12-int8.tar.gz) | 1.8.0 | 12 | 64.7% |
|
| 28 |
+
| FCN ResNet-50-qdq | [34 MB](model/fcn-resnet50-12-qdq.onnx) | [21 MB](model/fcn-resnet50-12-qdq.tar.gz) | 1.8.0 | 12 | 64.4% |
|
| 29 |
+
|
| 30 |
+
### Source
|
| 31 |
+
|
| 32 |
+
* PyTorch Torchvision FCN ResNet50 ==> ONNX FCN ResNet50
|
| 33 |
+
* PyTorch Torchvision FCN ResNet101 ==> ONNX FCN ResNet101
|
| 34 |
+
* ONNX FCN ResNet50 ==> Quantized ONNX FCN ResNet50
|
| 35 |
+
|
| 36 |
+
## Inference
|
| 37 |
+
|
| 38 |
+
### Input
|
| 39 |
+
The input is expected to be an image with the shape `(N, 3, height, width)` where `N` is the number of images in the batch, and `height` and `width` are consistent across all images.
|
| 40 |
+
|
| 41 |
+
### Preprocessing
|
| 42 |
+
The images must be loaded in RGB with a range of `[0, 1]` per channel, then normalized per-image using `mean = [0.485, 0.456, 0.406]` and `std = [0.229, 0.224, 0.225]`.
|
| 43 |
+
|
| 44 |
+
This model can take images of different sizes as input. However, it is recommended that the images are resized such that the minimum size of either edge is 520.
|
| 45 |
+
|
| 46 |
+
The following code shows an example of how to preprocess a [demo image](dependencies/000000017968.jpg):
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
from PIL import Image
|
| 50 |
+
from torchvision import transforms
|
| 51 |
+
|
| 52 |
+
preprocess = transforms.Compose([
|
| 53 |
+
transforms.ToTensor(),
|
| 54 |
+
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
|
| 55 |
+
])
|
| 56 |
+
|
| 57 |
+
img = Image.open('dependencies/000000017968.jpg')
|
| 58 |
+
img_data = preprocess(img).detach().cpu().numpy()
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### Output of model
|
| 62 |
+
The model has two outputs, `("out", "aux")`. `"out"` is the main classifier and has shape `(N, 21, height, width)`. Each output pixel is one-hot encoded, i.e. `np.argmax(out[image, :, x, y])` is that pixel's predicted class. Class 0 is the background class.
|
| 63 |
+
|
| 64 |
+
`"aux"` is an auxilliary classifier with the same shape performing the same functionality. The difference between the two is that `"out"` sources features from last layer of the ResNet backbone, while `"aux"` sources features from the second-to-last layer.
|
| 65 |
+
|
| 66 |
+
### Postprocessing steps
|
| 67 |
+
|
| 68 |
+
The following code shows how to overlay the segmentation on the original image:
|
| 69 |
+
|
| 70 |
+
```python
|
| 71 |
+
from PIL import Image
|
| 72 |
+
from matplotlib.colors import hsv_to_rgb
|
| 73 |
+
import numpy as np
|
| 74 |
+
import cv2
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
classes = [line.rstrip('\n') for line in open('voc_classes.txt')]
|
| 78 |
+
num_classes = len(classes)
|
| 79 |
+
|
| 80 |
+
def get_palette():
|
| 81 |
+
# prepare and return palette
|
| 82 |
+
palette = [0] * num_classes * 3
|
| 83 |
+
|
| 84 |
+
for hue in range(num_classes):
|
| 85 |
+
if hue == 0: # Background color
|
| 86 |
+
colors = (0, 0, 0)
|
| 87 |
+
else:
|
| 88 |
+
colors = hsv_to_rgb((hue / num_classes, 0.75, 0.75))
|
| 89 |
+
|
| 90 |
+
for i in range(3):
|
| 91 |
+
palette[hue * 3 + i] = int(colors[i] * 255)
|
| 92 |
+
|
| 93 |
+
return palette
|
| 94 |
+
|
| 95 |
+
def colorize(labels):
|
| 96 |
+
# generate colorized image from output labels and color palette
|
| 97 |
+
result_img = Image.fromarray(labels).convert('P', colors=num_classes)
|
| 98 |
+
result_img.putpalette(get_palette())
|
| 99 |
+
return np.array(result_img.convert('RGB'))
|
| 100 |
+
|
| 101 |
+
def visualize_output(image, output):
|
| 102 |
+
assert(image.shape[0] == output.shape[1] and \
|
| 103 |
+
image.shape[1] == output.shape[2]) # Same height and width
|
| 104 |
+
assert(output.shape[0] == num_classes)
|
| 105 |
+
|
| 106 |
+
# get classification labels
|
| 107 |
+
raw_labels = np.argmax(output, axis=0).astype(np.uint8)
|
| 108 |
+
|
| 109 |
+
# comput confidence score
|
| 110 |
+
confidence = float(np.max(output, axis=0).mean())
|
| 111 |
+
|
| 112 |
+
# generate segmented image
|
| 113 |
+
result_img = colorize(raw_labels)
|
| 114 |
+
|
| 115 |
+
# generate blended image
|
| 116 |
+
blended_img = cv2.addWeighted(image[:, :, ::-1], 0.5, result_img, 0.5, 0)
|
| 117 |
+
|
| 118 |
+
result_img = Image.fromarray(result_img)
|
| 119 |
+
blended_img = Image.fromarray(blended_img)
|
| 120 |
+
|
| 121 |
+
return confidence, result_img, blended_img, raw_labels
|
| 122 |
+
|
| 123 |
+
conf, result_img, blended_img, raw_labels = visualize_output(orig_tensor, one_output)
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
## Model Creation
|
| 127 |
+
|
| 128 |
+
### Dataset (Train and validation)
|
| 129 |
+
The FCN models have been pretrained on the [COCO train2017 dataset](https://cocodataset.org/#download), using the subset of classes from Pascal VOC classes. See the [Torchvision Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html) for more details.
|
| 130 |
+
|
| 131 |
+
### Training
|
| 132 |
+
Pretrained weights from the Torchvision Model Zoo were used instead of training these models from scratch. A [conversion notebook](dependencies/conversion.ipynb) is provided.
|
| 133 |
+
|
| 134 |
+
### Validation accuracy
|
| 135 |
+
Mean IoU (intersection over union) and global pixelwise accuracy are computed on the COCO val2017 dataset.
|
| 136 |
+
[Torchvision](https://pytorch.org/docs/stable/torchvision/models.html) reports these values as follows:
|
| 137 |
+
| Model | mean IoU (%) | global pixelwise accuracy (%) |
|
| 138 |
+
|----------------|:-------------|:------------------------------|
|
| 139 |
+
| FCN ResNet 50 | 60.5 | 91.4 |
|
| 140 |
+
| FCN ResNet 101 | 63.7 | 91.9 |
|
| 141 |
+
|
| 142 |
+
If you have the [COCO val2017 dataset](https://cocodataset.org/#download) downloaded, you can confirm updated numbers using [the provided notebook](dependencies/validation_accuracy.ipynb):
|
| 143 |
+
| Model | mean IoU | global pixelwise accuracy |
|
| 144 |
+
|----------------|:---------|:--------------------------|
|
| 145 |
+
| FCN ResNet 50 | 65.0 | 99.6 |
|
| 146 |
+
| FCN ResNet 50-int8 | 64.7 | 99.5 |
|
| 147 |
+
| FCN ResNet 101 | 66.7 | 99.6 |
|
| 148 |
+
|
| 149 |
+
The more conservative of the two estimates is used in the model files table.
|
| 150 |
+
|
| 151 |
+
> Compared with the fp32 FCN ResNet 50, FCN ResNet 50-int8's mean IoU drop ratio is 0.46% global pixelwise accuracy drop ratio is 0.10% and performance improvement is 1.28x.
|
| 152 |
+
>
|
| 153 |
+
> **Note**
|
| 154 |
+
>
|
| 155 |
+
>The performance depends on the test hardware. Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per inst ance, CentOS Linux 8.3, data batch size is 1.
|
| 156 |
+
<hr>
|
| 157 |
+
|
| 158 |
+
## Quantization
|
| 159 |
+
FCN ResNet 50-int8 and FCN ResNet-50-qdq are obtained by quantizing fp32 FCN ResNet 50 model. We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/image_recognition/onnx_model_zoo/fcn/quantization/ptq/README.md) to understand how to use Intel® Neural Compressor for quantization.
|
| 160 |
+
|
| 161 |
+
### Environment
|
| 162 |
+
onnx: 1.9.0
|
| 163 |
+
onnxruntime: 1.8.0
|
| 164 |
+
|
| 165 |
+
### Prepare model
|
| 166 |
+
```shell
|
| 167 |
+
wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/fcn/model/fcn-resnet50-12.onnx
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
### Model quantize
|
| 171 |
+
Make sure to specify the appropriate dataset path in the configuration file.
|
| 172 |
+
```bash
|
| 173 |
+
bash run_tuning.sh --input_model=path/to/model \ # model path as *.onnx
|
| 174 |
+
--config=fcn_rn50.yaml \
|
| 175 |
+
--data_path=path/to/coco/val2017 \
|
| 176 |
+
--label_path=path/to/coco/annotations/instances_val2017.json \
|
| 177 |
+
--output_model=path/to/save
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
## References
|
| 181 |
+
* Jonathan Long, Evan Shelhamer, Trevor Darrell; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440
|
| 182 |
+
|
| 183 |
+
* This model is converted from the [Torchvision Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html), originally implemented by Francisco Moss [here](https://github.com/pytorch/vision/tree/master/torchvision/models/segmentation/fcn.py).
|
| 184 |
+
|
| 185 |
+
* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
|
| 186 |
+
|
| 187 |
+
## Contributors
|
| 188 |
+
* [Jack Duvall](https://github.com/duvallj)
|
| 189 |
+
* [mengniwang95](https://github.com/mengniwang95) (Intel)
|
| 190 |
+
* [yuwenzho](https://github.com/yuwenzho) (Intel)
|
| 191 |
+
* [airMeng](https://github.com/airMeng) (Intel)
|
| 192 |
+
* [ftian1](https://github.com/ftian1) (Intel)
|
| 193 |
+
* [hshen14](https://github.com/hshen14) (Intel)
|
| 194 |
+
|
| 195 |
+
## License
|
| 196 |
+
MIT License
|
| 197 |
+
|