jemartin commited on
Commit
0ecfb88
·
verified ·
1 Parent(s): 73c846f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +197 -0
README.md ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ model_name: fcn-resnet50-12.onnx
5
+ tags:
6
+ - validated
7
+ - vision
8
+ - object_detection_segmentation
9
+ - fcn
10
+ ---
11
+ <!--- SPDX-License-Identifier: MIT -->
12
+
13
+ # Fully Convolutional Network (FCN)
14
+
15
+ ## Description
16
+ FCNs are a model for real-time neural network for class-wise image segmentation. As the name implies, every weight layer in the network is convolutional. The final layer has the same height/width as the input image, making FCNs a useful tool for doing dense pixel-wise predictions without a significant amount of postprocessing. Being fully convolutional also provides great flexibility in the resolutions this model can handle.
17
+
18
+ This specific model detects 20 different [classes](dependencies/voc_classes.txt). The models have been pre-trained on the COCO train2017 dataset on this class subset.
19
+
20
+ ## Model
21
+
22
+ | Model | Download | Download (with sample test data) | ONNX version | Opset version | Mean IoU |
23
+ |----------------|:--------------------------------------|:----------------------------------------|:-------------|:--------------|:--------|
24
+ | FCN ResNet-50 | [134 MB](model/fcn-resnet50-11.onnx) | [213 MB](model/fcn-resnet50-11.tar.gz) | 1.8.0 | 11 | 60.5% |
25
+ | FCN ResNet-101 | [207 MB](model/fcn-resnet101-11.onnx) | [281 MB](model/fcn-resnet101-11.tar.gz) | 1.8.0 | 11 | 63.7% |
26
+ | FCN ResNet-50 | [134 MB](model/fcn-resnet50-12.onnx) | [125 MB](model/fcn-resnet50-12.tar.gz) | 1.8.0 | 12 | 65.0% |
27
+ | FCN ResNet-50-int8 | [34 MB](model/fcn-resnet50-12-int8.onnx) | [29 MB](model/fcn-resnet50-12-int8.tar.gz) | 1.8.0 | 12 | 64.7% |
28
+ | FCN ResNet-50-qdq | [34 MB](model/fcn-resnet50-12-qdq.onnx) | [21 MB](model/fcn-resnet50-12-qdq.tar.gz) | 1.8.0 | 12 | 64.4% |
29
+
30
+ ### Source
31
+
32
+ * PyTorch Torchvision FCN ResNet50 ==> ONNX FCN ResNet50
33
+ * PyTorch Torchvision FCN ResNet101 ==> ONNX FCN ResNet101
34
+ * ONNX FCN ResNet50 ==> Quantized ONNX FCN ResNet50
35
+
36
+ ## Inference
37
+
38
+ ### Input
39
+ The input is expected to be an image with the shape `(N, 3, height, width)` where `N` is the number of images in the batch, and `height` and `width` are consistent across all images.
40
+
41
+ ### Preprocessing
42
+ The images must be loaded in RGB with a range of `[0, 1]` per channel, then normalized per-image using `mean = [0.485, 0.456, 0.406]` and `std = [0.229, 0.224, 0.225]`.
43
+
44
+ This model can take images of different sizes as input. However, it is recommended that the images are resized such that the minimum size of either edge is 520.
45
+
46
+ The following code shows an example of how to preprocess a [demo image](dependencies/000000017968.jpg):
47
+
48
+ ```python
49
+ from PIL import Image
50
+ from torchvision import transforms
51
+
52
+ preprocess = transforms.Compose([
53
+ transforms.ToTensor(),
54
+ transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
55
+ ])
56
+
57
+ img = Image.open('dependencies/000000017968.jpg')
58
+ img_data = preprocess(img).detach().cpu().numpy()
59
+ ```
60
+
61
+ ### Output of model
62
+ The model has two outputs, `("out", "aux")`. `"out"` is the main classifier and has shape `(N, 21, height, width)`. Each output pixel is one-hot encoded, i.e. `np.argmax(out[image, :, x, y])` is that pixel's predicted class. Class 0 is the background class.
63
+
64
+ `"aux"` is an auxilliary classifier with the same shape performing the same functionality. The difference between the two is that `"out"` sources features from last layer of the ResNet backbone, while `"aux"` sources features from the second-to-last layer.
65
+
66
+ ### Postprocessing steps
67
+
68
+ The following code shows how to overlay the segmentation on the original image:
69
+
70
+ ```python
71
+ from PIL import Image
72
+ from matplotlib.colors import hsv_to_rgb
73
+ import numpy as np
74
+ import cv2
75
+
76
+
77
+ classes = [line.rstrip('\n') for line in open('voc_classes.txt')]
78
+ num_classes = len(classes)
79
+
80
+ def get_palette():
81
+ # prepare and return palette
82
+ palette = [0] * num_classes * 3
83
+
84
+ for hue in range(num_classes):
85
+ if hue == 0: # Background color
86
+ colors = (0, 0, 0)
87
+ else:
88
+ colors = hsv_to_rgb((hue / num_classes, 0.75, 0.75))
89
+
90
+ for i in range(3):
91
+ palette[hue * 3 + i] = int(colors[i] * 255)
92
+
93
+ return palette
94
+
95
+ def colorize(labels):
96
+ # generate colorized image from output labels and color palette
97
+ result_img = Image.fromarray(labels).convert('P', colors=num_classes)
98
+ result_img.putpalette(get_palette())
99
+ return np.array(result_img.convert('RGB'))
100
+
101
+ def visualize_output(image, output):
102
+ assert(image.shape[0] == output.shape[1] and \
103
+ image.shape[1] == output.shape[2]) # Same height and width
104
+ assert(output.shape[0] == num_classes)
105
+
106
+ # get classification labels
107
+ raw_labels = np.argmax(output, axis=0).astype(np.uint8)
108
+
109
+ # comput confidence score
110
+ confidence = float(np.max(output, axis=0).mean())
111
+
112
+ # generate segmented image
113
+ result_img = colorize(raw_labels)
114
+
115
+ # generate blended image
116
+ blended_img = cv2.addWeighted(image[:, :, ::-1], 0.5, result_img, 0.5, 0)
117
+
118
+ result_img = Image.fromarray(result_img)
119
+ blended_img = Image.fromarray(blended_img)
120
+
121
+ return confidence, result_img, blended_img, raw_labels
122
+
123
+ conf, result_img, blended_img, raw_labels = visualize_output(orig_tensor, one_output)
124
+ ```
125
+
126
+ ## Model Creation
127
+
128
+ ### Dataset (Train and validation)
129
+ The FCN models have been pretrained on the [COCO train2017 dataset](https://cocodataset.org/#download), using the subset of classes from Pascal VOC classes. See the [Torchvision Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html) for more details.
130
+
131
+ ### Training
132
+ Pretrained weights from the Torchvision Model Zoo were used instead of training these models from scratch. A [conversion notebook](dependencies/conversion.ipynb) is provided.
133
+
134
+ ### Validation accuracy
135
+ Mean IoU (intersection over union) and global pixelwise accuracy are computed on the COCO val2017 dataset.
136
+ [Torchvision](https://pytorch.org/docs/stable/torchvision/models.html) reports these values as follows:
137
+ | Model | mean IoU (%) | global pixelwise accuracy (%) |
138
+ |----------------|:-------------|:------------------------------|
139
+ | FCN ResNet 50 | 60.5 | 91.4 |
140
+ | FCN ResNet 101 | 63.7 | 91.9 |
141
+
142
+ If you have the [COCO val2017 dataset](https://cocodataset.org/#download) downloaded, you can confirm updated numbers using [the provided notebook](dependencies/validation_accuracy.ipynb):
143
+ | Model | mean IoU | global pixelwise accuracy |
144
+ |----------------|:---------|:--------------------------|
145
+ | FCN ResNet 50 | 65.0 | 99.6 |
146
+ | FCN ResNet 50-int8 | 64.7 | 99.5 |
147
+ | FCN ResNet 101 | 66.7 | 99.6 |
148
+
149
+ The more conservative of the two estimates is used in the model files table.
150
+
151
+ > Compared with the fp32 FCN ResNet 50, FCN ResNet 50-int8's mean IoU drop ratio is 0.46% global pixelwise accuracy drop ratio is 0.10% and performance improvement is 1.28x.
152
+ >
153
+ > **Note**
154
+ >
155
+ >The performance depends on the test hardware. Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per inst ance, CentOS Linux 8.3, data batch size is 1.
156
+ <hr>
157
+
158
+ ## Quantization
159
+ FCN ResNet 50-int8 and FCN ResNet-50-qdq are obtained by quantizing fp32 FCN ResNet 50 model. We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/image_recognition/onnx_model_zoo/fcn/quantization/ptq/README.md) to understand how to use Intel® Neural Compressor for quantization.
160
+
161
+ ### Environment
162
+ onnx: 1.9.0
163
+ onnxruntime: 1.8.0
164
+
165
+ ### Prepare model
166
+ ```shell
167
+ wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/fcn/model/fcn-resnet50-12.onnx
168
+ ```
169
+
170
+ ### Model quantize
171
+ Make sure to specify the appropriate dataset path in the configuration file.
172
+ ```bash
173
+ bash run_tuning.sh --input_model=path/to/model \ # model path as *.onnx
174
+ --config=fcn_rn50.yaml \
175
+ --data_path=path/to/coco/val2017 \
176
+ --label_path=path/to/coco/annotations/instances_val2017.json \
177
+ --output_model=path/to/save
178
+ ```
179
+
180
+ ## References
181
+ * Jonathan Long, Evan Shelhamer, Trevor Darrell; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440
182
+
183
+ * This model is converted from the [Torchvision Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html), originally implemented by Francisco Moss [here](https://github.com/pytorch/vision/tree/master/torchvision/models/segmentation/fcn.py).
184
+
185
+ * [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
186
+
187
+ ## Contributors
188
+ * [Jack Duvall](https://github.com/duvallj)
189
+ * [mengniwang95](https://github.com/mengniwang95) (Intel)
190
+ * [yuwenzho](https://github.com/yuwenzho) (Intel)
191
+ * [airMeng](https://github.com/airMeng) (Intel)
192
+ * [ftian1](https://github.com/ftian1) (Intel)
193
+ * [hshen14](https://github.com/hshen14) (Intel)
194
+
195
+ ## License
196
+ MIT License
197
+