'Deep Learning > Computer Vision' 카테고리의 다른 글
[Object Detection] R-CNN, SPPNet, Fast R-CNN, Faster R-CNN (0) | 2019.07.30 |
---|---|
Non-Maximum Suppression (0) | 2019.07.24 |
[Object Detection] R-CNN, SPPNet, Fast R-CNN, Faster R-CNN (0) | 2019.07.30 |
---|---|
Non-Maximum Suppression (0) | 2019.07.24 |
1. R-CNN
2. SPPNet
3. Fast R-CNN
4. Faster R-CNN
[Concepts]
Bounding box regressor : A network that performs regression on coordinates (X, Y, W, H) that represent the rectangle bounding boxes with ground truth bounding boxes
Spatial Pyramid Pooling : Applying BOW(Bag-Of-Words) methods with splitted regions to consider spatial properties. Here SPP layer used max pooling for each regions.
[R-CNN]
1. Extract proposal regions by selective search (~2K proposals)
2. Resize proposals to the fixed size
3. Apply CNN on each proposals
4. SVM for each class (binary classifier) with bounding box regressor
Pros :
1. Huge improvement over previous object detection approaches
Cons :
1. Too much computation, applying CNN on all proposed regions.
2. Distortion with indiscriminative resize.
[SPPNet]
1. Apply CNN on an entire image
2. Extract proposal regions by selective search
3. On the feature maps from CNN, apply Spatial Pyramid Pooling(SPP) on each proposed regions and get fixed size representations
4. Apply FCN with softmax classifier and bounding box regressor
Pros :
1. Much less computation than R-CNN, applying CNN just once on an image.
2. Reduce distortions caused by resize with Spatial Pyramid Pooling
Cons :
1. Still pipelined, taking time for region proposal with selective search
[Fast R-CNN]
Similar to SPPNet, different in Fast R-CNN uses a single level pyramid of SPP layer.
[ROI Pooling]
[Comparison between R-CNN and Fast R-CNN (SPPNet)
[Faster R-CNN]
1. Apply CNN on an entire image
2. Apply RPN on feature maps with k anchors to cover different objects in an input image (base size 16, scale=[8,16,32], ratio=[0.5, 1, 2]) and gets the fixed length vectors
3. With two independent classification, regression layers on them, RPN produces 2k scores(foreground / background) and 4k coordinates (X, Y, W, H) and get classification, regression losses.
4. With RPN's region proposals, apply ROI pooling on initial feature maps and apply another classification, regression layers on them.
First, authors used 'alternating opimization' to train object detection network (not joint training). However, later it was shown that joint training works.
[Performance]
Mask R-CNN presentation (0) | 2019.08.12 |
---|---|
Non-Maximum Suppression (0) | 2019.07.24 |
Used in object detection tasks to reduce redundant bounding boxes for each detected object.
[Algorithm]
For every bounding boxes detected for each object,
1. Sort bounding boxes in descending order with regard to confidence score
2. Starting from the highest score box, calculate IoU (Intersection of Union) and discard lower score box if IoU was higher than threshold (hyperparameter)
Mask R-CNN presentation (0) | 2019.08.12 |
---|---|
[Object Detection] R-CNN, SPPNet, Fast R-CNN, Faster R-CNN (0) | 2019.07.30 |