'Deep Learning/Computer Vision' 카테고리의 글 목록

Deep Learning/Computer Vision

Mask R-CNN presentation 2019.08.12
[Object Detection] R-CNN, SPPNet, Fast R-CNN, Faster R-CNN 2019.07.30
Non-Maximum Suppression 2019.07.24

Mask R-CNN presentation

2019. 8. 12. 14:09

'Deep Learning > Computer Vision' 카테고리의 다른 글

[Object Detection] R-CNN, SPPNet, Fast R-CNN, Faster R-CNN (0)	2019.07.30
Non-Maximum Suppression (0)	2019.07.24

[Object Detection] R-CNN, SPPNet, Fast R-CNN, Faster R-CNN

2019. 7. 30. 17:38

1. R-CNN

2. SPPNet

3. Fast R-CNN

4. Faster R-CNN

[Concepts]

Bounding box regressor : A network that performs regression on coordinates (X, Y, W, H) that represent the rectangle bounding boxes with ground truth bounding boxes

Spatial Pyramid Pooling : Applying BOW(Bag-Of-Words) methods with splitted regions to consider spatial properties. Here SPP layer used max pooling for each regions.

[R-CNN]

1. Extract proposal regions by selective search (~2K proposals)

2. Resize proposals to the fixed size

3. Apply CNN on each proposals

4. SVM for each class (binary classifier) with bounding box regressor

Pros :

1. Huge improvement over previous object detection approaches

Cons :

1. Too much computation, applying CNN on all proposed regions.

2. Distortion with indiscriminative resize.

[SPPNet]

1. Apply CNN on an entire image

2. Extract proposal regions by selective search

3. On the feature maps from CNN, apply Spatial Pyramid Pooling(SPP) on each proposed regions and get fixed size representations

4. Apply FCN with softmax classifier and bounding box regressor

Pros :

1. Much less computation than R-CNN, applying CNN just once on an image.

2. Reduce distortions caused by resize with Spatial Pyramid Pooling

Cons :

1. Still pipelined, taking time for region proposal with selective search

[Fast R-CNN]

Similar to SPPNet, different in Fast R-CNN uses a single level pyramid of SPP layer.

[ROI Pooling]

[Comparison between R-CNN and Fast R-CNN (SPPNet)

[Faster R-CNN]

1. Apply CNN on an entire image

2. Apply RPN on feature maps with k anchors to cover different objects in an input image (base size 16, scale=[8,16,32], ratio=[0.5, 1, 2]) and gets the fixed length vectors

3. With two independent classification, regression layers on them, RPN produces 2k scores(foreground / background) and 4k coordinates (X, Y, W, H) and get classification, regression losses.

4. With RPN's region proposals, apply ROI pooling on initial feature maps and apply another classification, regression layers on them.

First, authors used 'alternating opimization' to train object detection network (not joint training). However, later it was shown that joint training works.

[Performance]

'Deep Learning > Computer Vision' 카테고리의 다른 글

Mask R-CNN presentation (0)	2019.08.12
Non-Maximum Suppression (0)	2019.07.24

Non-Maximum Suppression

2019. 7. 24. 16:41

Used in object detection tasks to reduce redundant bounding boxes for each detected object.

[Algorithm]

For every bounding boxes detected for each object,

1. Sort bounding boxes in descending order with regard to confidence score

2. Starting from the highest score box, calculate IoU (Intersection of Union) and discard lower score box if IoU was higher than threshold (hyperparameter)

'Deep Learning > Computer Vision' 카테고리의 다른 글

Mask R-CNN presentation (0)	2019.08.12
[Object Detection] R-CNN, SPPNet, Fast R-CNN, Faster R-CNN (0)	2019.07.30

PREV 1 NEXT

AI for All