Basic Modules for Computer Vision Jitendra Malik December 9, 2009
by user
Comments
Transcript
Basic Modules for Computer Vision Jitendra Malik December 9, 2009
Basic Modules for Computer Vision Jitendra Malik December 9, 2009 Important modules for computer vision 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Contours (gPb) Regions (gPb-owt-ucm) Computing descriptors on points, regions, or windows Vector quantization of descriptors (k-means) Nearest-neighbors for high dimensional descriptors Training of SVMs (linear, additive,rbf) Evaluation of SVMs (linear, additive, rbf) Hough transform voting Optical Flow Tracking objects/humans Semantic Segmentation Object detection by multi-scale scanning Ask this question repeatedly, varying position, scale, category… Paradigm introduced by Rowley, Baluja & Kanade 96 for face detection. Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08 UC Berkeley Computer Vision Group Object detection by multi-scale scanning Ask this question repeatedly, varying position, scale, category… Paradigm introduced by Rowley, Baluja & Kanade 96 for face detection Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08 UC Berkeley Computer Vision Group PASCAL VOC 2009 Detection AP=0.16 Challenges • Sub-categories • Aspects • Occlusion Addressed by Poselets (Bourdev & Malik, ‘09) AP =0.394 Important modules for computer vision 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Contours (gPb) Regions (gPb-owt-ucm) Computing descriptors on points, regions, or windows Vector quantization of descriptors (k-means) Nearest-neighbors for high dimensional descriptors Training of SVMs (linear, additive,rbf) Evaluation of SVMs (linear, additive, rbf) Hough transform voting Optical Flow Tracking objects/humans Contours & Regions (Arbelaez, Maire, Fowlkes) Descriptors • SIFT, HOG, GB .. – – • Typically high dimensional, 100-1000 Computed on points, regions, or windows Used in different ways – – • Evaluate using SVM Vector Quantize to a “word” and then use “Bag of Words” models Computational problems – – – – Vector quantization of descriptors (k-means) Nearest-neighbors for high dimensional descriptors Training of SVMs (linear, additive,rbf) Evaluation of SVMs (linear, additive, rbf) SIFT descriptor on region UC Berkeley Computer Vision Group Efficient Training of Additive Classifiers (Maji & Berg) • SVMs with additive kernels are additive classifiers • Histogram based kernels – Histogram intersection, chi-squared kernel – Pyramid Match Kernel (Grauman & Darell, ICCV’05) – Spatial Pyramid Match Kernel (Lazebnik,Schmid, Ponce CVPR’06) • IKSVMs can be efficiently evaluated at runtime (CVPR ‘08) • New result: one can train these classifiers up to two orders of magnitude faster w/o loss in accuracy compared to kernel SVM Efficient Training of Additive Classifiers • Approximate classifiers where h is piecewise linear • Use standard linear SVM techniques to solve Encourages smooth functions Closely approximates min kernel SVM Custom solver : PWLSGD (see paper) • Trains classifiers up to two orders of magnitude faster w/o loss in accuracy compared to kernel SVM Max-Margin Hough Transform (Maji) 1. Local parts vote for object pose 2. Complexity : # parts * # votes Can be significantly lower than brute force search over pose (e.g. sliding window detectors) 3. Learn weights for the votes in a maxmargin framework to optimize detection Learned Weights (ETHZ shape) Naïve Bayes Max-Margin Influenced by clutter (rare structures) Important Parts blue (low) , dark red (high) Region based Detection (Gu,Lim,Arbelaez) Hough baseline1 Det. rate at 0.3FPPI 31.0% kAS 1 62.4% 1. Ferrari et al. PAMI 2008. 2. Ferrari, Jurie, Schmid. CVPR 2007 Shape 2 Ours 67.2% 87.1±2.8% Important modules for computer vision 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Contours (gPb) Regions (gPb-owt-ucm) Computing descriptors on points, regions, or windows Vector quantization of descriptors (k-means) Nearest-neighbors for high dimensional descriptors Training of SVMs (linear, additive,rbf) Evaluation of SVMs (linear, additive, rbf) Hough transform voting Optical Flow Tracking objects/humans