MATH 3795 Lecture 6. Sensitivity of the Solution of a Linear System Goals
by user
Comments
Transcript
MATH 3795 Lecture 6. Sensitivity of the Solution of a Linear System Goals
MATH 3795 Lecture 6. Sensitivity of the Solution of a Linear System Dmitriy Leykekhman Fall 2008 Goals I Understand how does the solution of Ax = b changes when A or b change. I Condition number of a matrix (with respect to inversion). I Vector and matrix norms. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 1 Linear Systems I Given A ∈ Rn×n and b ∈ Rn we are interested in the solution x ∈ Rn of Ax = b. I Suppose that instead of A, and b we are given A + ∆A and b + ∆b, where ∆A ∈ Rn×n and ∆b ∈ Rn . How do these perturbations in the data change the solution of the linear system? I First we need to understand how to measure the size of vectors and of matrices. This leads to vector norms and matrix norms. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 2 Vector Norms Definition A (vector) norm on Rn is a function k · k : Rn → R x → kxk which for all x, y ∈ Rn and α ∈ R satisfies 1. kxk ≥ 0, kxk = 0 ⇔ x = 0, 2. kαxk = |α|kxk, 3. kx + yk ≤ kxk + kyk, (triangle inequality). D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 3 Vector Norms The most frequently used norms on Rn are given by kxk2 = n X !1/2 x2i , 2-norm i=1 The MATLAB’s build in function norm(x) or norm(x, 2). More generally for any p ∈ [1, ∞) kxkp = n X !1/p p |xi | , p-norm. i=1 The MATLAB’s build in function norm(x, p) and kxk∞ = max |xi |, i=1,...,n ∞-norm. The MATLAB’s build in function norm(x, inf ) D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 4 Vector Norms Example Let x = (1, −2, 3, −4)T . Then kxk1 = 1 + 2 + 3 + 4 = 10, √ √ kxk2 = 1 + 4 + 9 + 16 = 30 ≈ 5.48, kxk∞ = max {1, 2, 3, 4} = 4. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 5 Vector Norms The boundaries of the unit balls defined by {x ∈ Rn : kxkp ≤ 1}. One can show the following useful inequalities: D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 6 Vector Norms The boundaries of the unit balls defined by {x ∈ Rn : kxkp ≤ 1}. One can show the following useful inequalities: I kxk∞ ≤ kxk2 ≤ kxk1 . D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 6 Vector Norms The boundaries of the unit balls defined by {x ∈ Rn : kxkp ≤ 1}. One can show the following useful inequalities: I kxk∞ ≤ kxk2 ≤ kxk1 . I Let k · k is any vector norm on Rn , then kx + yk ≥ kxk − kyk for all D. Leykekhman - MATH 3795 Introduction to Computational Mathematics x, y ∈ Rn . Sensitivity of the Solution – 6 Vector Norms The boundaries of the unit balls defined by {x ∈ Rn : kxkp ≤ 1}. One can show the following useful inequalities: I kxk∞ ≤ kxk2 ≤ kxk1 . I I Let k · k is any vector norm on Rn , then kx + yk ≥ kxk − kyk for all x, y ∈ Rn . Cauchy-Schwarz inequality, xT y ≤ kxk2 kyk2 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics for all x, y ∈ Rn . Sensitivity of the Solution – 6 Vector Norms Theorem Vector norms on Rn are equivalent, i.e. for every two vector norms k · ka and k · kb on Rn there exist constants cab , Cab (depending on the vector norms k · ka and k · kb , but not on x) such that cab kxkb ≤ kxka ≤ Cab kxkb ∀x ∈ Rn . In particular, for any x ∈ Rn we have the inequalities 1 √ kxk1 ≤ kxk2 ≤ kxk1 n √ kxk∞ ≤ kxk2 ≤ nkxk∞ kxk∞ ≤ kxk1 ≤ nkxk∞ . D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 7 Matrix Norms Definition A matrix norm on Rm×n is a function k · k : Rm×n → R A → kAk, which for all A, B ∈ Rm×n and α ∈ R satisfies 1. kAk ≥ 0, kAk = 0 ⇔ A = 0 (zero matrix), 2. kαAk = |α|kAk, 3. kA + Bk ≤ kAk + kBk, (triangle inequality). Warning:Matrix- and vector-norms are denoted by the same symbol k · k. However, as we will see shortly, vector-norms and matrix-norms are computed very differently. Thus, before computing a norm we need to examine carefully whether it is applied to a vector or to a matrix. It should be clear from the context which norm, a vector-norm or a matrix-norm, is used. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 8 Matrix Norms. First Approach. I View a matrix A ∈ Rm×n as a vector in Rmn , by stacking the columns of the matrix into a long vector. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9 Matrix Norms. First Approach. I View a matrix A ∈ Rm×n as a vector in Rmn , by stacking the columns of the matrix into a long vector. I Apply the vector-norms to this vectors of length mn. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9 Matrix Norms. First Approach. I View a matrix A ∈ Rm×n as a vector in Rmn , by stacking the columns of the matrix into a long vector. I Apply the vector-norms to this vectors of length mn. I This will give matrix norms. For example if we apply the 2-vector-norm, then 1/2 n X m X kAkF = a2ij . i=1 j=1 This is called the Frobenius norm. (We will use kAk2 to denote a different matrix norm.) D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9 Matrix Norms. First Approach. I View a matrix A ∈ Rm×n as a vector in Rmn , by stacking the columns of the matrix into a long vector. I Apply the vector-norms to this vectors of length mn. I This will give matrix norms. For example if we apply the 2-vector-norm, then 1/2 n X m X kAkF = a2ij . i=1 j=1 This is called the Frobenius norm. (We will use kAk2 to denote a different matrix norm.) I This approach is not very useful. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9 Matrix Norms. Second Approach. I We want to solve linear systems Ax = b. Find a vector x such that if we multiply A by this vector (we apply A to this vector), then we obtain b. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10 Matrix Norms. Second Approach. I We want to solve linear systems Ax = b. Find a vector x such that if we multiply A by this vector (we apply A to this vector), then we obtain b. I View a matrix A ∈ Rm×n as a linear mapping, which maps a vector x ∈ Rn into a vector Ax ∈ Rm A : R n → Rm x → Ax. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10 Matrix Norms. Second Approach. I We want to solve linear systems Ax = b. Find a vector x such that if we multiply A by this vector (we apply A to this vector), then we obtain b. I View a matrix A ∈ Rm×n as a linear mapping, which maps a vector x ∈ Rn into a vector Ax ∈ Rm A : R n → Rm x → Ax. I How do we define the size of a linear mapping? D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10 Matrix Norms. Second Approach. I We want to solve linear systems Ax = b. Find a vector x such that if we multiply A by this vector (we apply A to this vector), then we obtain b. I View a matrix A ∈ Rm×n as a linear mapping, which maps a vector x ∈ Rn into a vector Ax ∈ Rm A : R n → Rm x → Ax. I How do we define the size of a linear mapping? I Compare the size of the image Ax ∈ Rm with the size of x. This leads us to look at kAxk sup x6=0 kxk Here Ax ∈ Rm and x ∈ Rn are vectors and k · k are vector norms (in Rm and Rn ). D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10 Matrix Norms I Let p ∈ [1, ∞]. The following identities re valid sup x6=0 kAxkp kAxkp = sup kAxkp = max = max kAxkp x6 = 0 kxkp kxkp kxkp =1 kxkp =1 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 11 Matrix Norms I Let p ∈ [1, ∞]. The following identities re valid sup x6=0 I kAxkp kAxkp = sup kAxkp = max = max kAxkp x6 = 0 kxkp kxkp kxkp =1 kxkp =1 One can show kAkp = max x6=0 kAxkp . kxkp (1) Note that on the left hand side in (1) the symbol k · kp refers to the p-matrix-norm, while on the right hand side in (1) the symbol k · kp refers to the p-vector-norm applied to the vectors Ax ∈ Rm and x ∈ Rn , respectively. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 11 Matrix Norms For the most commonly used matrix-norms (1) with p = 1, p = 2, or p = ∞, there exist rather simple representations. Let k · kp be the matrix norm defined in (1), then kAk1 = max j=1,...,n kAk∞ = max i=1,...,m kAk2 = m X i=1 n X |aij | (maximum column norm); |aij | (maximum row norm); j=1 q λmax (AT A) (spectral norm). where λmax (AT A) is the largest eigenvalue of AT A. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 12 Matrix Norms Example Let 1 A = −2 2 Then 3 −6 4 2 . 1 −1 kAk1 = max 5, 8, 9 = 9, kAk∞ = max 10, 8, 4 = 10, p kAk2 = max {3.07, 23.86, 49.06} ≈ 7.0045, √ kxkF = 76 ≈ 8.718. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 13 Matrix Norms Two important inequalities. Theorem For any A ∈ Rm×n , B ∈ Rn×k and x ∈ Rn , the following inequalities hold. kAxkp ≤ kAkp kxkp (compatibility of matrix and vector norm) and kABkp ≤ kAkp kBkp (submultiplicativity of matrix norms) Note that for the identity matrix I, kIkp = max x6=0 kIxkp = 1. kxkp Compare this with the first approach in which we view I as a vector of length n2 . For example the Frobenius norm (2-vector norm) is √ kIkF = n. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 14 Error Analysis I Let Ax = b (2) be the original system, where A ∈ Rn×n and b ∈ Rn . D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 15 Error Analysis I Let Ax = b (2) be the original system, where A ∈ Rn×n and b ∈ Rn . I Let (A + ∆A)x̃ = b + ∆b n×n be the perturbed system, where ∆A ∈ R the perturbations in A and b, respectively. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics (3) n and ∆b ∈ R represent Sensitivity of the Solution – 15 Error Analysis I Let Ax = b (2) be the original system, where A ∈ Rn×n and b ∈ Rn . I Let (A + ∆A)x̃ = b + ∆b n×n be the perturbed system, where ∆A ∈ R the perturbations in A and b, respectively. (3) n and ∆b ∈ R represent I What is the error ∆x = x̃ − x between the solution x of the exact linear system (7) and the solution ex perturbed linear system (8). I Use a representation x̃ = x + ∆x. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 15 Error Analysis. Perturbation in b only The original linear system, Ax = b, where A ∈ R n×n n and b ∈ R . The perturbed linear system A(x + ∆x) = b + ∆b, where ∆b ∈ Rn represents the perturbations in b. Subtracting we get A∆x = ∆b, or D. Leykekhman - MATH 3795 Introduction to Computational Mathematics ∆x = A−1 ∆b. Sensitivity of the Solution – 16 Error Analysis. Perturbation in b only The original linear system, Ax = b, where A ∈ R n×n n and b ∈ R . The perturbed linear system A(x + ∆x) = b + ∆b, where ∆b ∈ Rn represents the perturbations in b. Subtracting we get A∆x = ∆b, or ∆x = A−1 ∆b. Take norms: k∆xk = kA−1 ∆bk ≤ kA−1 kk∆bk. (4) To estimate relative error, note that Ax = b and as a result 1 1 kbk = kAxk ≤ kAkkxk ⇒ ≤ kAk . kxk kbk (5) Combining (4) and (5) we get k∆xk k∆bk ≤ kAkkA−1 k . kxk kbk D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution (6) – 16 Error Analysis. Perturbation in b only Definition The (p-) condition number κp (A) of a matrix A (with respect to inversion) is defined by κp (A) = kAkp kA−1 kp . Set κp (A) = ∞ is A is not invertible. MATLAB’s build in function cond(A). If Ax = b, and A(x + ∆x) = b + ∆b, then the relative error between the solutions obeys k∆bk k∆xk ≤ κp (A) . kxk kbk D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 17 Error Analysis. General Case. I Let Ax = b (7) be the original system, where A ∈ Rn×n and b ∈ Rn . I Let (A + ∆A)(∆x + x) = b + ∆b (8) be the perturbed system, where ∆A ∈ Rn×n and ∆b ∈ Rn represent the perturbations in A and b, respectively. I If kA−1 kp k∆Akp < 1, then κp (A) k∆xkp ≤ k∆Ak kxkp 1 − κp (A) kAkpp k∆Akp k∆bk + kAkp kbk . If κp (A) is small, we say that the linear system is well conditioned. Otherwise, we say that the linear system is ill conditioned. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 18 (9) Error Analysis. Example. Hilbert Matrix Example Hilbert Matrix H ∈ Rn×n with entries Z 1 hij = xi+j−2 dx = 0 For n = 4, H −1 1 H= 1 2 1 3 1 4 16 −120 = 240 −140 1 2 1 3 1 4 1 5 −120 1200 −2700 1680 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics 1 3 1 4 1 5 1 6 1 . i+j−1 1 4 1 5 1 6 1 7 . 240 −2700 6480 −4200 −140 1680 . −4200 2800 Sensitivity of the Solution – 19 Error Analysis. Example. Hilbert Matrix. Example We compute that the condition number of a Hilbert matrix grows very fast with n. For n = 4 25 kH −1 k1 = 13620, κ1 (H) = 28375, 12 25 = kH −1 k∞ = 13620, κ∞ (H) = 28375, 12 kHk1 = kHk∞ kHk2 ≈ 1.5 kH −1 k2 ≈ 1.03 ∗ 104 , D. Leykekhman - MATH 3795 Introduction to Computational Mathematics κ2 (H) ≈ 1.55 ∗ 104 . Sensitivity of the Solution – 20 Error Analysis. Example. Hilbert Matrix. Example We consider the linear systems Hx = b. For given n we set xex = (1, . . . , 1)T ∈ Rn , and compute b = Hxex . Then we compute the solution of the linear system Hx = b using the LU-decomposition and compute the relative error between exact solution xex and computed solution x. n 4 5 6 7 8 9 10 κ∞ (H) 2.837500e + 004 9.436560e + 005 2.907028e + 007 9.851949e + 008 3.387279e + 010 1.099651e + 012 3.535372e + 013 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics kxex −xk∞ kxex k∞ 2.958744e − 013 5.129452e − 012 5.096734e − 011 2.214796e − 008 1.973904e − 007 4.215144e − 005 5.382182e − 004 Sensitivity of the Solution – 21 Error Analysis. I If we use finite precision arithmetic, then rounding causes errors in the input data. Using m-digit floating point arithmetic it holds that |x − f l(x)| ≤ 0.5 ∗ 10−m+1 . |x| I Thus, if we solve the linear system in m-digit floating point arithmetic, then, as rule of thumb, we may approximate the the input errors due to rounding by k∆Ak ≈ 0.5 ∗ 10−m+1 , kAk I k∆bk ≈ 0.5 ∗ 10−m+1 kb| If the condition number of A is κ(A) = 10α , then k∆xk 10α ≤ (0.5 ∗ 10−m + 0.5 ∗ 10−m ) ≈ 10α−m . kx| 1 − 10α−m+1 I Provided 10α−m+1 < 1. Rule of thumb: If the linear system is solved in m-digit floating point arithmetic and if the condition number of A is of the order 10α , then only m − α − 1 digits in the solution can be trusted. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 22 Summary. I I If the condition number of a matrix A is large, then small errors in the data may lead to large errors in the solution. Rule of thumb: If the linear system is solved in m-digit floating point arithmetic and if the condition number of A is of the order 10α , then only m − α − 1 digits in the solution can be trusted. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 23