Comments
Description
Transcript
Smart surveillance with Sakbot
Agenda Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Presentation of ImageLab Digital Library content-based retrieval Computer Vision for robotic automation Multimedia: video annotation Medical Imaging Video analysis for indoor/outdoor surveillance People and vehicle surveillance Off-line Video analysis for telemetry and forensics Imagelab-Softech Lab of Computer Vision, Pattern Recognition and Multimedia Dipartimento di Ingegneria dell’Informazione Università di Modena e Reggio Emilia Italy http://imagelab.ing.unimore.it Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Imagelab: recent projects in surveillance Projects: • European • International • • BE SAFE NATO Science for Peace project 2007-2009 Detection of infiltrated objects for security 2006-2008 Australian Council Italian & Regional • • • Behave_Lib : Regione Emilia Romagna Tecnopolo Softech 2010-2013 LAICA Regione Emilia Romagna; 2005-2007 FREE_SURF MIUR PRIN Project 2006-2008 With Companies • • • • • Building site surveillance: with Bridge-129 Italia 2009-2010 Stopped Vehicles with Digitek Srl 2007-2008 SmokeWave: with Bridge-129 Italia 2007-2010 Sakbot for Traffic Analysis with Traficon 2004-2006 Mobile surveillance with Sistemi Integrati 2007 • Domotica per disabili: posture detection FCRM 2004-2005 THIS Transport hubs intelligent surveillance EU JLS/CHIPS Project 2009-2010 VIDI-Video: STREP VI FP EU (VISOR VideosSurveillance Online Repository) 2007-2009 Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia AD-HOC: Appearance Driven Human tracking with Occlusion Handling Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Key aspects • Based on the SAKBOT system – Background estimation and updating – Shadow removal • Appearance based tracking – we aim at recovering a pixel based foreground mask, even during an occlusion – Recovering of missing parts from the background subtraction – Managing split and merge situations • Occlusion detection and classification – Classify the differences as real shape changes or occlusions Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Example 1 (from ViSOR) Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Example 2 from PETS 2002 Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Example 3 Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Other experimental results Imagelab videos (available on ViSOR) PETS series Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Results on the PETS2006 dataset Working in real time at 10 fps! Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Posture classification Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia 1 Distributed surveillance with non overlapping field of view Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Exploit the knowledge about the scene • To avoid all-to-all matches, the tracking system can exploit the knowledge about the scene – – – – Preferential paths -> Pathnodes Border line / exit zones Physical constraints & Forbidden zones NVR Temporal constraints Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Tracking with pathnode A possible path between Camera1 and Camera 4 Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Pathnodes lead particle diffusion Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Results with PF and pathnodes Single camera tracking: Recall=90.27% Precision=88.64% Multicamera tracking Recall=84.16% Precision=80.00% Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia “VIP: Vision tool for comparing Images of People” Lantagne & al., Vision Interface 2003 Each extracted silhouette is segmented into significant region using the JSEG algorithm ( Y. Deng ,B.S. Manjunath: “Unsupervised segmentation of colortexture regions in images and video” ) Colour and texture descriptors are calculated for each region The colour descriptor is a modified version of the descriptor presented in Y. Deng et al.: “Efficient color representation for Image retrieval”. Basically an HSV histogram of the dominant colors. The texture descriptor is based on D.K.Park et al.: “Efficient Use of Local Edge Histogram Descriptor”. Essentially this descriptor characterizes the edge density inside a region according to different orientations ( 0°, 45°, 90° and 135° ) The similarity between two regions is the weighted sum of the two descriptor similarities: Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia To compare the regions inside two silhouette, a region matching scheme is used, involving a modified version of the IRM algorithm presented in J.Z. Wang et al, ”Simplicity: Semantics-sensitive integrated matching for picture libraries” . The IRM algorithm is simple and works as follows: 1) The first step is to calculate all of the similarities between all regions. 2) Similarities are sorted in decreasing order, the first one is selected, and areas of the respective pair of regions are compared. A weight, equal to the smallest percentage area between the two regions, is assigned to the similarity measure. 3) Then, the percentage area of the largest region is updated by removing the percentage area of the smallest region so that it can be matched again. The smallest region will not be matched anymore with any other region. 4) The process continues in decreasing order for all of the similarities. In the end the overall similarity between the two region sets is calculated as: Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia ViSOR: Video Surveillance Online Repository Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia The ViSOR video repository Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Aims of ViSOR • Gather and make freely available a repository of surveillance videos • Store metadata annotations, both manually provided as ground-truth and automatically generated by video surveillance tools and systems • Execute Online performance evaluation and comparison • Create an open forum to exchange, compare and discuss problems and results on video surveillance Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Different types of annotation • Structural Annotation: video size, authors, keywords,… • Base Annotation: ground-truth, with concepts referred to the whole video. Annotation tool: online! • GT Annotation: ground-truth, with a frame level annotation; concepts can be referred to the whole video, to a frame interval or to a single frame. Annotation tool: Viper-GT (offline) • Automatic Annotation: output of automatic systems shared by ViSOR users. Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Video corpus set: the 14 categories Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Outdoor multicamera Synchronized views Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Surveillance of entrance door of a building • About 10h! Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Videos for smoke detection with GT Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Videos for shadow detection • Already used from many researcher working on shadow detection • Some videos with GT A. Prati, I. Mikic, M.M. Trivedi, R. Cucchiara, "Detecting Moving Shadows: Algorithms and Evaluation" in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, n. 7, pp. 918923, July, 2003 Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Some statistics We need videos and annotations! Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Action recognition SIMULTANEOUS HMM ACTION SEGMENTATION AND RECOGNITION Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Probabilistic Action Classification • Classical approach: – Given a set of training videos containing an atomic action each (manually labelled) – Given a new video with a single action • … find the most likely action Dataset: "Actions as Space-Time Shapes (ICCV '05)." M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Classical HMM Framework • Definition of a feature set • For each frame t, computation of the feature set Ot (observations) • Given a set of training observations O={O1…OT} for each action, training of an HMM (k) for each action k • Given a new set of observations O={O1…OT} • Find the model (k) which maximise P(k|O) Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia A sample 17-dim feature set • Computed on the extracted blob after the foreground segmentation and people tracking: Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia From the Rabiner tutorial Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Online action Recognition • Given a video with a sequence of actions – Which is the current action? Frame by frame action classification (online – Action recognition) – When does an action finish and the next one start? (offline – Action segmentation) R. Vezzani, M. Piccardi, R. Cucchiara, "An efficient Bayesian framework for on-line action recognition" in press on Proceedings of the IEEE International Conference on Image Processing, Cairo, Egypt, November 7-11, 2009 Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Main problem of this approach I do not know when the action starts and when it finishes. Using all the observations, the first action only is recognized A possible solution: “brute force”. For each action, for each starting frame, for each ending frame, compute the model likelihood and select the maximum. UNFEASIBLE Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Our approach • Subsample of the starting frames (1 each 10) • Adoption of recursive formulas • Computation of the emission probability once for each model (Action) • Current frame as Ending frame • Maximum length of each action • The computational complexity is compliant with real time requirements Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia Different length sequences • Sequences with different starting frame have different length • Unfair comparisons using the traditional HMM schema • The output of each HMM is normalized using the sequence length and a term related to the mean duration of the considered action • This allows to classify the current action and, at the same time, to perform an online action segmentation Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia