Transformed Linear Regression Vs. Copula Regression Rahul A. Parsa
by user
Comments
Transcript
Transformed Linear Regression Vs. Copula Regression Rahul A. Parsa
Transformed Linear Regression Vs. Copula Regression Rahul A. Parsa Drake University & Paul G.Ferrara, PhD FSA CERA Homesite Insurance & Stuart Klugman Outline of Talk Copula Regression, OLS, GLM Alternative Methodology Examples Notation Notation: Y – Dependent Variable X 1 , X 2 , X k Independent Variables Assumption Y is related to X’s in some functional form E[Y | X 1 = x1 X n = xn ] = f ( X 1 , X 2 , X n ) OLS Regression Y is linearly related to X’s OLS Model Yi = β0 + β1 X 1i + β 2 X 2i + + β k X ki + ε i OLS Multivariate Normal Distribution Assume Y , X 1 , X 2 , X k Jointly follow a multivariate normal distribution Then the conditional distribution of Y | X follows normal distribution with mean and variance given by E(Y | X = x) = µ y + ΣYX Σ ( x − µ x ) −1 XX −1 XX Variance = ΣYY − ΣYX Σ ΣYX GLM Y belongs to an exponential family of distributions E (Y | X = x) = g ( β 0 + β1 x1 + + β k xk ) −1 g is called the link function x's are not random Y|x belongs to the exponential family Conditional variance is no longer constant Parameters are estimated by MLE using numerical methods Copula Regression Y can have any distribution Each Xi can have any distribution The joint distribution is described by a Copula Estimate Y by E(Y|X=x) – conditional mean MVN Copula CDF for MVN is Copula is −1 −1 F ( x1 , x2 ,, xn ) = G (Φ [ F ( x1 )], Φ [ F ( xn )]) Where G is the multivariate normal cdf with zero mean, unit variance, and correlation matrix R. Density of MVN Copula is ⎧ vT ( R −1 − I )v ⎫ −0.5 f ( x1 , x2 ,, xn ) = f ( x1 ) f ( x2 ) f ( xn ) exp⎨− * R ⎬ 2 ⎩ ⎭ Where v is a vector with ith element vi = Φ −1[ F ( xi )] Conditional Distribution in MVN Copula The conditional distribution of xn given x1 ….xn-1 is ⎧ ⎫ ⎡{F ( xn ) − r T Rn−−11vn−1}2 −1 2⎤ T −1 −0.5 f ( xn | x1 xn−1 ) = f ( xn ) * exp⎨− 0.5 * ⎢ − { Φ [ F ( x )]} * ( 1 − r R r ) ⎬ n n −1 ⎥ T −1 ( 1 − r R r ) n − 1 ⎣ ⎦⎭ ⎩ Where vn−1 = (v1 ,vn−1 ) ⎡ Rn −1 R=⎢ T ⎣r r⎤ 1⎥⎦ Alternative Method Convert Y , X 1 , X 2 ,......., X k to standard normal Random Variable using −1 U = Φ ( FY ( y)) −1 Vi = Φ ( FX ( xi )) Note: U and V’s jointly follow Multivariate Normal Distribution if Y and X’s ~ MVN Copula Alternative Method Regress U on the V’s. Obtain U-hat. Convert U-hat to Y-hat using −1 ˆ ˆ YA = ( FY Φ)(U ) Alternative Method Advantages of this method. Easy to implement – can be done in Excel Easy to understand Transformations are well understood in Regression Difference in Approaches Let YˆC be the Copula Estimate. Let YˆA be the Alternative Method Estimate Question: What is the difference between these two estimates? Jensen’s Inequality −1 ˆ YA = FY (Φ( E (U | V1 ,V2 ,.....,Vk )) YˆC = E (Y | X 1 , X 2 ,....., X k ) Jensen’s Inequality: [ ] E ( FY−1 Φ)(U | V ) ≥ ( FY−1 Φ )( E (U | V )) Jensen’s Inequality We considered the case of two variables We show that (see the handout) that [ ] −1 E ( FY Φ)(U | V ) = E (Y | X ) Convexity Problem −1 y = F (Φ( x)) is a convex function for Show that Jensen’s inequality to hold (handout for proof). That is 2 d −1 F (Φ( x)) ≥ 0 2 dx Or Where f 2 ( y) f ' ( y) − ≥0 2 φ ( x) φ ' ( x) −1 y = F (Φ( x)) Examples F ~ Pareto Distribution: α +1 f ' ( y) = − f ( y) * y +θ Convexity Condition: α *θ α f ( y) = ( y + θ )α +1 φ ' ( x) = − x * φ ( x) f ( y) α +1 ≥ φ ( x) x * ( y + θ ) Graph - Pareto y=F^-1(phi(x)) 1000 900 800 700 600 500 y=F^-1(phi(x)) 400 300 200 100 0 -6 -4 -2 0 2 4 6 Example F~ Gamma y − 1 α −1 f ( y) = * y *e θ α Γ(α ) *θ ⎡α − 1 1 ⎤ f ' ( y) = f ( y) * ⎢ − ⎥ θ⎦ ⎣ y Convexity Condition: ⎡α − 1 1 ⎤ − ⎥ ⎢ y θ⎦ f ( y) ⎣ ≥− φ ( x) x Graph - Gamma y=F^-1(phi(x)) 2500 2000 1500 y=F^-1(phi(x)) 1000 500 0 -6 -4 -2 0 2 4 6 Example 1 Data was simulated Y ~ Pareto (3,8) and X ~ Gamma (2,4) 2000 observations were generated MLE’s were: Error: Alpha Theta Y~Pareto 2.849075 7.48509 X~Gamma 1.906755 4.234371 Copula SSE 40,508.92 OLS Transformed 42,844.31 45,337.45 Example 1 80 70 60 50 Y 40 Cop-Yhat Yhat-Method 3 30 20 10 0 0 5 10 15 20 25 30 35 40 45 Example 2 Taken from Copula Regression Paper (Example 1) Dependent – X3 - Gamma Though X2 is simulated from Pareto, parameter estimates do not converge, gamma model fit Variables X1-Pareto X2-Pareto X3-Gamma Parameters 3, 100 4, 300 3, 100 3.44, 161.11 1.04, 112.003 3.77, 85.93 MLE Error: Copula OLS Transformed 590,000.5 637,172.8 597,552.6 Example From Copula Paper 800 700 600 500 Y 400 Yhat-Cop Yhat-Trans 300 200 100 0 0 50 100 150 200 250 300 350 400 450 Example – Copula Paper 800 700 600 500 Y 400 Yhat-Cop Yhat-Trans 300 200 100 0 0 50 100 150 200 250 300 350 400 450 500