Homework 2 1 Question 1: Calibrating cam1
Pictorial Information Homework 2 1 Question 1: Calibrating cam1 The calibration process is separated into two steps: initialization and non-linear optimization. During the initialization step user manually selects four extreme corners of the grid. The coordinates of selected points are refined by a corner detection algorithm which searches for a corner in the vicinity of the point selected by user. Then given the number of squares in the grid the rest of the corner points of the grid are estimated. The closed form solution for the calibration parameters (intrinsic and extrinsic) is computed from these image points. Projection matrix P is obtained. The optimization step involves minimization of the reprojection error, which is defined as follows. Metric configuration of the grid is known: number of squares in the grid and length of each square along X and Y directions. Hence, placing the grid so that it lies in the Z = 0 plane we know 3D coordinates of the corners of the grid Xi (note that the location of the grid with respect to the world coordinate frame needs to be fixed during the initialization step). We can reproject these points to the image plane using x̂i = P Xi . The reprojection error of point Xi is d(xi , x̂i ), where d is the Euclidean distance and xi is the 2D coordinate of the corner point Xi in the image obtained in the initialization step. The optimized projection matrix is the one that minimizes the sum of the reprojection errors for all grid corners X P̂ = argmin d(xi , x̂i ) P i Once P̂ is obtained the coordinates of grid corners in the image are updated to xi = P̂ Xi . 1. Calibration parameters estimated after extracting grid corners with corner finder window size set to 5. Note that the estimated camera centre is way off from the expected [320 240] for a 640 × 480 image. Focal Length: Principal point: Skew: Distortion: Pixel error: fc = [ 569.89119 561.69978 ] +- [ 66.90937 73.86583 ] cc = [ 356.36231 299.73559 ] +- [ 58.51278 104.49382 ] alpha_c = [ 0.00000 ] +- [ 0.00000 ] => angle of pixel axes = 90.00000 +- 0.00000 degrees kc = [ -0.71192 0.72151 -0.05281 -0.00828 0.00000 ] +- [ 0.51963 2.40495 0.07955 0.03679 0.00000 ] err = [ 1.13242 1.04376 ] The numbers following +- (numerical errors/standard deviation??) of the corresponding parameters after the non-linear minimization of the reprojection error. The corresponding calibration matrix K is given by αx s x0 K = 0 αy y0 0 0 1 where αx and αy are the focal length of the camera expressed in units of horizontal and vertical pixels (the values are different if pixels are not perfect squares), s is the skew factor, x0 and y0 is the principal point expressed in pixels. In our case camera matrix is 569.89119 0 356.36231 0 561.69978 299.73559 K= 0 0 1 1 Homework 2 2 2. Points that are further away from the camera are imaged at a lower resolution. If in addition to that the grid is imaged as a skewed rectangle this might lead to a situation where the corners between two adjacent squares are imaged as either separated (by one or more pixels) or connected (i.e. they join in a boundary of more than one pixel). This results in incorrect corner detection by the corner detection algorithm. In our particular example images 9, 12 and 13 (first image is indexed as 1) posed this problem. One possible solution to this problem is to reduce the area within which the algorithm looks for a corner. That way the points detected by the algorithm is forced to be close to those selected by the user. O O X Y X Figure 1: Problem with automatic corner detection. The bottom right corner was not detected properly. Calibration parameters after reextracting corners with corner finder window size set to 1. Note a significant decrease in pixel error. Focal Length: Principal point: Skew: Distortion: Pixel error: fc = [ 548.73706 549.66836 ] +- [ 25.60846 25.65445 ] cc = [ 314.79952 280.90596 ] +- [ 31.31835 34.65060 ] alpha_c = [ 0.00000 ] +- [ 0.00000 ] => angle of pixel axes = 90.00000 +- 0.00000 degrees kc = [ -0.04097 0.09409 0.01544 -0.01001 0.00000 ] +- [ 0.26055 2.21165 0.02139 0.01749 0.00000 ] err = [ 0.47038 0.29962 ] A further improvement can be achieved by recomputing the corners of the grid. Grid corners computed after reprojection error minimization are used as seeds for automatic corner detection algorithm. Since the current points are expected to be close to treir true values a small search window should be used. Note that this method gives considerable improvement if the images are highly distorted. This is not the case for our images. Calibration parameters after recomputing corners with corner finder windows size set to 1. Note that pixel error was redistributed more equally over x and y. Focal Length: Principal point: Skew: Distortion: Pixel error: fc = [ 539.40530 538.89995 ] +- [ 24.40718 24.12283 ] cc = [ 310.75535 267.41846 ] +- [ 34.47806 30.51359 ] alpha_c = [ 0.00000 ] +- [ 0.00000 ] => angle of pixel axes = 90.00000 +- 0.00000 degrees kc = [ 0.02023 -0.47770 0.01059 -0.01385 0.00000 ] +- [ 0.25233 2.16381 0.01829 0.01962 0.00000 ] err = [ 0.39922 0.39523 ] Homework 2 3 Comparing reprojection error scatter plot for calibration results before and after refining corner points shows that reprojection error was significantly reduced. All extreme outlier errors were reduced. 1 4 0.8 3 0.6 2 0.4 0.2 0 y y 1 −1 0 −0.2 −2 −0.4 −3 −0.6 −4 −0.8 −5 −1 −6 −4 −2 x 0 2 4 6 −1 −0.5 x 0 0.5 1 Figure 2: Reprojection error scatter plot (in pixels) before and after refining the corner points. Y O X Figure 3: Extracted (red crosses) and reprojected (black circles) grid corners for image 10. 7 9 8 6 12 10 11 13 15 1 2 14 16 17 34 18 5 100 0 1600 −100 1400 Zc Oc −400 1200 1000 Xc −200 0 800 Yc 600 400 200 400 200 0 Figure 4: Positions of grids with respect to the camera 1. Homework 2 2 4 Question 2: Calibrating cam2 Calibration parameters estimated after extracting grid corners with corner finder window size set to 5. Focal Length: Principal point: Skew: Distortion: Pixel error: fc = [ 788.10309 790.11003 ] +- [ 70.84724 71.15334 ] cc = [ 243.85905 213.72360 ] +- [ 106.09846 100.33384 ] alpha_c = [ 0.00000 ] +- [ 0.00000 ] => angle of pixel axes = 90.00000 +- 0.00000 degrees kc = [ -0.01543 0.01696 -0.00678 -0.00382 0.00000 ] +- [ 0.24104 0.37531 0.03679 0.03504 0.00000 ] err = [ 0.57108 0.30490 ] Calibration parameters estimated reextracting corner points for problematic images (6, 9) and recomputing corners automatically. Focal Length: Principal point: Skew: Distortion: Pixel error: fc = [ 718.43763 721.61863 ] +- [ 48.37756 51.96494 ] cc = [ 289.89476 198.15260 ] +- [ 58.53936 69.17993 ] alpha_c = [ 0.00000 ] +- [ 0.00000 ] => angle of pixel axes = 90.00000 +- 0.00000 degrees kc = [ -0.05610 -0.02608 -0.01986 0.00688 0.00000 ] +- [ 0.16310 0.26592 0.03114 0.01792 0.00000 ] err = [ 0.38740 0.38471 ] 1 2 0.8 0.6 1 0.4 0.2 −1 y y 0 0 −0.2 −2 −0.4 −0.6 −3 −0.8 −4 −1 −4 −3 −2 −1 0 x 1 2 3 4 −1 −0.5 0 x 0.5 1 Figure 5: Reprojection error scatter plot (in pixels) before and after refining the corner points. Homework 2 5 Y O X Figure 6: Extracted (red crosses) and reprojected (black circles) grid corners for image 10. 16 14 17 15 18 3 4 1 2 5 0 Oc −100 −200 Zc 786 9 1112 13 10 Xc Yc −300 2000 −400 1500 0 1000 200 400 500 600 0 Figure 7: Positions of grids with respect to the camera 2. 3 Question 3 The total pixel error can be computed as a sum of pixel errors along x and y directions: Epix1 = 0.7945 Epix2 = 0.7721 Pixel error however is not a good metric for for comparing the quality of calibration of two cameras which were located at different distances from the same calibration grid. The same pixel error will represent a greater real world distance error for the camera that is further away from the grid. A better metric would be total distance error. Calculating total distance error precisely can be difficult (I am not even sure if it is possible) but we can get an estimate of it by taking into account the fact that the pixel error is inversely proportional to the distance of the calibration grid from the camera Epixi ∝ 1/Z. Hence the distance error cam be estimated as Edist ∝ Epix × Z Homework 2 6 We can take Z as the mean distance of calibration grid centres from the camera. In our case it was estimated from the plots of grid positions with respect to cameras: Edist1 ∝ 0.7945 ∗ 1300 = 1033 Edist2 ∝ 0.7721 ∗ 1700 = 1317 This calculation tells us that camera 1 is better calibrated than camera 2. This is expected as camera 2 is located further from the grids which results in worse corner point estimation. 4 Question 4: Stereo calibration 1. Calibration parameters obtained after running stereo optimization: • Left camera Focal Length: Principal point: Skew: Distortion: fc_left = [ 595.62985 595.95122 ] +- [ 8.41064 8.06296 ] cc_left = [ 328.15377 244.49640 ] +- [ 19.69169 18.28879 ] alpha_c_left = [ 0.00000 ] +- [ 0.00000 ] => angle of pixel axes = 90.00000 +- 0.00000 degrees kc_left = [ -0.18283 2.21910 0.00378 -0.00954 0.00000 ] +- [ 0.21197 2.61603 0.00844 0.01432 0.00000 ] • Right camera Focal Length: Principal point: Skew: Distortion: fc_right = [ 794.00268 796.12964 ] +- [ 12.81526 13.17540 ] cc_right = [ 259.92620 247.92904 ] +- [ 49.45971 20.69040 ] alpha_c_right = [ 0.00000 ] +- [ 0.00000 ] => angle of pixel axes = 90.00000 +- 0.00000 degrees kc_right = [ -0.08870 0.26831 0.00051 -0.00628 0.00000 ] +- [ 0.11665 0.27897 0.00546 0.01421 0.00000 ] • Extrinsic parameters (position of right camera with respect to left camera): Rotation vector: Translation vector: om = [ -0.06425 0.83695 0.02628 ] +- [ 0.03460 T = [ -762.20898 70.67856 1040.76619 ] +- [ 64.17818 26.22549 47.12061 ] 2. The rotation vector om is a non-normalized vector codirectional with the rotation axis and whose magnitude is equal to the rotation angle. Rotation matrix can be retrieved using Rodrigues formula: 0.6695 −0.0486 0.7412 R = −0.0020 0.9977 0.0673 −0.7428 −0.0466 0.6679 0.06895 0.017 Homework 2 7 3. Stereo rig spatial configuration Extrinsic parameters 9 78 6 15 1 2 16 14 17 4 18 3 5 100 12 11 10 13 0 −100 Z Left Camera X Y 0 Z X Right Camera Y 500 1000 0 200 400 600 800 1000 1200 1400 1600 1800 Figure 8: Stereo rig spatial configuration 4. A stereo image pair is rectified if the epipolar lines in both images are parallel to the x axis and aligned in a way such that a lines match up between views. The figure below shows the rectified pair for image 13. Figure 9: A pair of stereo rectified images. Red lines show the matching epipolar lines Two points XL and XR which represent the same point in 3D space expressed in the left camera coordinate frame and right camera coordinate frame respectively are related by the following equations: XR = RXL + T XL = RT (XR − T ) where R is the rotation matrix and T is the translation vector between coordinate frames. Note that for orthogonal matrices Q−1 = QT . Homework 2 5 8 Question 5: Stereo triangulation Consider a camera with a projection matrix P = I 0 . A point in space Xcam expressed in the camera coordinate frame is mapped to x 1 0 0 0 1 x x2 1 x1 /x3 x xn = P Xcam = 0 1 0 0 = → 7 2 x3 x2 /x3 0 0 1 0 x3 1 xn is an image point expressed in normalized coordinates. Note that it corresponds to a ray in 3D space on which Xcam lies. Now consider a calibrated stereo rig. Assume that xL and xR are the image points in left and right camera corresponding to 3D point X. Suppose that we can find normalize these points and get the normalized coordinates xLn and xRn . For each camera we can use the normalized coordinates to construct the ray that originates at camera centre and contains point X. These two rays will intersect at X. Thus we have found the 3D coordinates of a point in space from the normalized coordinates of its images in a stereo rig. Camera Calibration Toolbox for MATLAB provides a function normalize.m which computes the normalized coordinates of an image point given camera calibration parameters (including lens distortion model). In practice, there is always an error in estimating the image points xL and xR . Hence, the rays from the left and right cameras will never intersect. Hence X is approximated as a point for which the sum of distances two both rays is minimised. Extrinsic parameters Z X Y Left Camera 0 -100 -200 -300 Z -400 Y Right Camera 0 200 400 X 1500 1000 600 800 1000 500 1200 0 Figure 10: Rays from left and right cameras Homework 2 9 Extrinsic parameters Right Camera Left Camera 50 0 -50 Z X Z X 1500 Y Y 1000 -100 -150 -200 500 -250 -300 -350 0 -400 0 200 400 600 800 1000 Figure 11: Note that those rays do not intersect 1200 Homework 2 10 1. The location of a point in 3D world coordinates can be found from a pair of calibrated stereo images by performing stereo triangulation on the matched images of the point. In our case two points for eyes, two points for mouth corners and one point for nose tip were matched. Figure 12: Face point correspondence for image 7 Figures below show the 3D plot containing face locations as well as calibration grid locations for images 0, 7 and 12. Note that for objects representing faces, the middle point corresponding to the tip of the nose is closer to the camera than the other 4 points. middle point corresponds to 7 0 400 12 7 300 0 200 12 100 0 Z Left Camera X Y −100 −400 −200 0 200 400 0 200 400 600 800 1000 Figure 13: Face locations, isometric view 1200 1400 1600 1800 Homework 2 11 0 400 350 7 300 12 250 200 150 0 100 50 7 0 Left Camera Z X −50 Y 12 −100 −400 −300 −200 −100 0 100 200 300 400 Figure 14: Face locations, frontal view 0 400 12 7 300 200 0 100 Left 0 Camera X Z 12 Y −100 0 200 400 600 800 1000 1200 1400 1600 7 1800 Figure 15: Face locations, side view 2. Both cabinets and the wall can be represented to a certain degree of accuracy as planes. The following scheme was used to perform the dense stereo reconstruction of these objects: • Plane estimation The matching points from both images are used to estimate the equation of the plane; • Corner estimation Object corners are estimated form the picture which has all of the corner points of the desired object visible. These corner points are then projected on the estimated plane. Plane Estimation We need to estimate 4 coefficients in the equation of the plane: ax1 + bx2 + cx3 + d = 0; We can obtain the 3D coordinates of a point in space given the corresponding matching points in left and right images of a calibrated stereo pair using stereo triangulation. Given 3 or more points that belong to the same plane we can estimate the equation of a plane by solving the following system: Homework 2 12 x1 x2 x3 y1 y2 y3 z1 z2 z3 a 0 1 b 0 1 c = 0 1 0 d Since the scale doesn’t matter we can choose d = 1. Figure 16: Image points used for cabinet plane estimation Extrinsic parameters 800 600 400 200 0 −500 Z Left Camera X Y 0 500 ZX Right Camera Y 1000 0 500 1000 1500 2000 2500 Figure 17: Estimated plane for cabinet Corner estimation Once the plane has been estimated we can select the image where all of the corners of the object are visible and project them on the estimated plane. The 3D rays corresponding to the image points are intersected with the plane. Homework 2 13 Figure 18: Image points from right camera corresponding to camera corners Extrinsic parameters 800 600 2500 400 2000 200 1500 0 −500 1000 ZX Left Camera Y 0 500 1000 X Right ZCamera Y 0 500 Figure 19: Image points from right camera corresponding to camera corners Homework 2 14 Extrinsic parameters 800 600 400 10 200 0 -200 ZX Left Camera Y -400 2500 2000 X Right Z Camera Y -1000 -500 1500 1000 0 500 500 1000 0 Figure 20: Dense reconstruction of cabinet and walls