A Study on the Process Optimizing of Bank’s Lending Service
by user
Comments
Transcript
A Study on the Process Optimizing of Bank’s Lending Service
A Study on the Process Optimizing of Bank’s Lending Service HUANG Xiaokun Schoole of Business Administration, South ChinaUniversity of Technology,Guangzhou, P.R..China, 510640 [email protected] Abstract The commercial banks can be seen as an enterprise which manufacture loan for firms and individuals. In the process of lending, credit-scoring model has play an important role in evaluating the probability of default of the loan applicaiton. In general, credit-scoring models suffer from a sample-selection bias. This paper uses the bivariate probit approach to estimate an unbiased models scoring model. The data set with large commercial loans data provided by a commercial bank of China to estimate the model contains some financial and firm information on both rejected and approved applicants. In the bivariate probit model, we find the bivariate selection model provides more efficient estimates than does a single equation mode. The results show that the bivariate probit model can help the loan committee of the commercial to optimize the process of lending service. Keywords Lending process, Credit scoring, Sample Selection Bias, Bivariate Probit Selection Model 1 Introduction Commercial banks have play an important role as economy growth accelerator in China since opening and reform. As a part of service industry, the banking industry provide financial service for firms and individuals. Actually, a commercial bank can be regarded as an enterprise which manufactures loans for the other enterprises or individuals. Generally, in evaluating an application for a large loan, such as mortgage or a construction loan, the commercial banks will rely on direct, individual scrutiny by a loan committee. Firm Financial Index Macroecono mic Status Statistical Model Loan Committe e Rejected Loans Feedback and Model Amended Firm Ordinary Data The Bank’s Lending Policy Accepted Loans Bad Loans Good Loans Figure 1 The process of bank lending The loan committee will analyze the frims financial index, ordinary data (such as the number of employees, registered capital), the macroeconomic status and their bank’s lending policy carefully (see Fig. 1). And then, they will make a decision wherther accept the loan application acooring to all the imformation they have. However, this process of lending decision is not optimal, because it lacks 45 efficiency and has more subjective judgement. Many advanced internation commercial banks have use statistical model to help them to make more proper decision and improve the efficiency in lending scrutiny. Among many statistical models, credit-scoring model is the most important one. The objective of most credi-scoring models is to minimize the misclassification rate or the expected default rate. To achieve this, various statistical methods are used to separate loan applicants that are expected to pay back their debts from those who are likely to fall into arrears. The most commonly used statistical methods have been some form of discriminant analysis (DA). The DA model assumes that the exogenous variables xi are normally distributed but with different means conditional on the group to which the dependent variable belongs [1]. The objective is then to estimate these means and then predict which of the group observation with characteristics xi is most likely to come from. DA thus differs from probit and logit analysis in that the exogenous variables explicitly determine group membership. One potential weakness of the DA model is that the underlying assumptions are easy to violate. More important, beside that, the models can only be estimated on samples of granted loans (that means the data have been extracted from all the loan applications), which causes a sample selection bias in the parameters estimates [2]. In practice, most credit-scoring models suffer from a sample selection bias because they are estimated from a sample of granted loans and the criteria by which applicants are rejected are not taken into account [3]. Many researchers have developed effective approaches to solve the sample selection bias problem on credit scoring. Boyes, Hoffman and Low (1989) avoided the bias by designing a bivariate probit model with two sequential events as the dependent variables: the lender’s decision to grant the loan or not, and — conditional on the loan having been provided — the borrower’s ability to pay it off or not [4] . Greene (1998) developed a similar binary choice model for sample selection that is relevant for modeling credit scoring by commercial banks [5]. Jacobson and Roazbach (2003) also followed the same methods to analyze the loan default rate of credit cards with bigger data set [6]. All of the former researches only focused on the loans of credit card which are revolving and have no predetermined maturity of the loan. However, the large commercial loans are quite different with credit card loans because they have predetermined maturity. The contribution of this paper is to augment the usage of credit-scoring models on large commercial loans and study wherther the credit-scoring model can optimize the lending process of the commercial bank.. This paper is organized as follows: section 2 presents the bivariate probit selection model. Section 3 presents the empirical analysis. It is divided into two parts. The first describes the data set and the variables used in the model, and the second focuses on the bivariate sample selection probit model’s parameter estimates results. Section 4 provides a summary of the results. 2 Econometric Model In this section, we begin by briefly presenting the bivariate probit selection model. For details, we refer to Boyes et al.(1989), Greene (1998) and Jacobson et al.(2003). The model consists of two simultaneous equations, one for the binary decision to provide a loan or not, and another for the binary outcome, “default” or “not default”. Let the superscript * indicate an unobserved variable and assume that y1i* and y2i* follow y1*i = X1i ⋅ α1 + ε1i , y2*i = X 2i ⋅ α 2 + ε 2i i = 1, 2, ⋅⋅⋅, N (1) where X ji , j = 1, 2 , are 1× k j vectors of explanatory variable and the disturbances ε1i and ε 2i are assumed to be Zero-mean, bivariate normal distributed with unit variances and a correlation coefficient ρ . ε1 ε ~ N 2 i i 1 ρ 0 0 , ρ 1 46 (2) If ρ = 0, the selection is of no consequence. So, it does not need to correct the sample selection bias. The binary choice variable y1i takes value 1 if the loan was granted and 0 if the application was rejected: * 0 y 1 < 0 y1 = (3) * 1 y 1 ≥ 0 i i i The second variable, y 2 , takes value 0 if the loan defaults and 1 if not: i 0 y 2* < 0 = (4) * 1 y 2 ≥ 0 Generally, one only observes a loan is good or bad if it was granted. There is not only a censoring rule for ( y1i , y 2 ) but even an observation rule. Because we have three types of observations: no loans, bad loans and good loans, the likelihood function will take the following form: (5) l = Π prob(no loan) × Π prob(bad loan) × Π prob(good loan) y2 i i i i no loans bad loans good loans Where “no loans” represents the loan has rejected, “bad loan” represents the loan defaults, “good loan” represents the loan does not default. Combining (3) – (4) and table 1, the likelihood function in equation (5) becomes: N N N i =1 i =1 i =1 l = Π prob( y1*i < 0)(1− y1i ) × Π prob( y1*i ≥ 0, y2*i ≤ 0) y1i (1− y2 i ) × Π prob( y1*i ≥ 0, y2*i ≥ 0) y1i ⋅ y2 i (6) Substituting for (1), (6) implies the following loglikelihood functions: N N i =1 i =1 ( ln l = ∑ (1 − y1i ) ⋅ ln [ prob(ε1i < −X1i α1 ) ] + ∑ y1i ⋅ 1 − y2i ) ⋅ ln [ prob(ε1i ≥ −X1i α1 ∩ ε 2i ≤ −X2i α 2 ) ] (7) N + ∑ y1i ⋅ y2i ⋅ ln [ prob(ε1i ≥ −X1i α1 ∩ ε 2i ≤ − X2i α 2 ) ] i =1 Because of the symmetry property of the bivariate normal distribution, the last line in (7) can be rewritten as : prob(ε1i ≥ −X1i α1 ∩ε2i ≤ −X2iα2 ) ⇔Φ2 (X1i α1, X2i α2 ; ρ) (8) ∀i , the loglikelihood function can be written as: N N i =1 i =1 ( ln l = ∑(1− y1i ) ⋅ ln [1−Φ(X1i α1 )] + ∑ y1i ⋅ 1 − y2i ) ⋅ ln[ Φ(X1i α1 ) −Φ2 (X1i α1 , X2i α2 ; ρ )] N (9) + ∑ y1i ⋅ y2i ⋅ ln Φ2 (X1i α1 , X2i α2 ; ρ ) i =1 Where Φ(⋅) and Φ 2 (⋅,⋅, ρ ) represent the univariate and bivariate standard normal cumulative distribution function, the latter with correlation coeffici- ent ρ . 3 Empirical Analysis 3.1 Data The original data set consists of 16384 commercial loan contracts at one branch of a major commercial bank of China between December 1991 and February 2004. These loan contracts include ordinary commercial loans (such as mortgage loans, pledge loans and credit loans), loans for private housing, acceptance credit loans, outward documentary loans and discount loans. In the original data set, the ordinary commercial loans account for more than 70% of the total lending. Because the ordinary commercial loans are the primary assets and have higher risk in the commercial bank we study on, we focus on this style of loans and exclude the others. Moreover, in order to study what factors of the firms affect credit rating, we need to exclude individual loans and only reserve firm loans. Before handing 47 over the combined data for analysis, the name of firms were removed. Finally, we get a data set which consists of 2798 granted loans and 299 rejected loans. Variable CAPT COMTY COMOWN RELAT ESTATE RMB LOANSIZE RATE MATURITY MORTGAGE APPROVAL Variable CAPT COMTY COMOWN RELAT ESTATE RMB LOANSIZE RATE MATURITY MORTGAGE APPROVAL Variable CAPT COMTY COMOWN RELAT ESTATE RMB LOANSIZE RATE MATURITY MORTGAGE APPROVAL Table 1 Definition of variables Definition The registered capital of a firm (in 10 thousand Yuan) Dummy, take value 1 if the firm is a join-stock company, otherwise 0 Dummy, take value 1 if the firm is a state-owned enterprise, otherwise 0 Dummy, take value 1 if the firm have some relationship with the bank (such as the bank holding shares enterprise), otherwise 0 Dummy, take value 1 if the firm is a real estate development enterprise, otherwise 0 Dummy, take value 1 if the loan currency is RMB, otherwise 0 Amount of the loan (in 10 thousand Yuan) Interest rate of the loan (%) The maturity of the loan (day) Dummy, Take value 1 if the loan is a pledge loan or a mortgage loan, otherwise 0 Dummy, take value 1 if the loan was examined and approved by a sub-branch of the bank, take value 2 if the loan was examined and approved by a branch of the bank, take value 3 if the loan was examined and approved by the bank headquarter. Mean 4336 0.26 0.46 0.013 0.11 0.78 484 9.83% 301 0.36 0.95 Table 2 Descriptive statistics for all loans Rejections (N=153) Granted loans (N=653) Stdev Min Max Mean Stdev Min 10752 40 12000 6882 17010 15 0.44 0 1 0.20 0.42 0 0.50 0 1 0.32 0.47 0 0.11 0 1 0.005 0.07 0 0.32 0 1 0.12 0.32 0 0.41 0 1 0.90 0.31 0 645 27 5000 1558 2016 1 3.01% 5.841% 24.6% 5.39% 1.35% 2.42% 222 31 1826 553 358 23 0.48 0 1 0.47 0.49 0 0.47 1 3 1.02 0.30 1 Table 3 Descriptive statistics for granted loans Defaulted loans (N=57) Good loans (N=596) Mean Stdev Min Max Mean Stdev Min 5874 10642 100 60000 7642 19778 15 0.26 0.44 0 1 0.20 0.40 0 0.42 0.50 0 1 0.31 0.46 0 0.018 0.13 0 1 0.005 0.07 0 0.18 0.38 0 1 0.11 0.31 0 0.81 0.40 0 1 0.91 0.29 0 609 854 2 4500 2714 12666 1 6.30% 1.77% 3% 7.623% 5.31 1.28 2.42 247 330 23 2192 370 355 66 0.37 0.49 0 1 0.47 0.50 0 1.00 0.22 1 3 1.02 0.32 1 Max 150000 1 1 1 1 1 10000 18% 3652 1 3 Max 200000 1 1 1 1 1 30000 18 3652 1 3 Database includes some useful information, such as guarantee style, loan size, interest rate and loan maturity, etc., which can be use as important variables when develop statistical model of loan credit scoring. Otherwise, some information such as the code of loan contract, the starting date of loan and the name of guarantor, etc. could not use as a determined factor in the model. In total we dispose of 38 variables. Of the 38 variables, 27 were no used in the final estimation of the model described in section 2. 48 Most were disregarded because they lacked the relation with the probability of loan default or displayed extremely high correlation with another variable that measured approximately the same thing but had greater explanatory power. The Table 1 contains definitions for the variables that have been selected for the estimation of the model in Section 2. Table 2 and table 3 contain descriptive statistics for the variables used in the empirical model in section 2. Of all loans, 2798 loans are granted loans and 299 are rejected loans. Both of the two types of loans are firm loan. Of granted loans, 210 loans are defaulted loan and 2588 are good loan. 3.2 Empirical analysis result We employ Maximum Likelihood Estimation (MLE) to estimate the model mentioned in section 2. Table 5 present the results of bivariate and single equation probit models,standard errors of the regression coefficients are included, along with their respective t-statistics. A univariate probit model, which assumes zero correlation between ε1i and ε 2i (see section 2), contains the same independent variables as the regression in bivariate probit selection model. This equation also has a significant overall fit. RMB, LOANSIZE, and MATURITY have positive effect on the probability of loan default at the 0.05 level of significance, while COMOWN, RATE and APPROVAL have negative effect. When compare with the bivariate probit selection model, variables CAPT and MORTGAGE in univariate probit model have no significant impact on the probability of loan default. The significance estimate of ρ in the bivariate probit model leads to the inference that a sample selection bias is present in the single equation estimates of probability of loan default. It is an estimate of the correlation between outcomes after the effects of included variable have been incorporated. The correlation coefficient takes the value -0.5268, which implies that non-systematic tendencies to hold the loans in the balance sheet are almost perfectly correlated with non-systematic increases in default risk. In other words, the elements which were described by the first equation in (1) — in the bank’s business that increase the loan’s odds of existing on the current balance sheet, are positively related to increases in default risk that cannot be explained by a systematic relation with the covariate X 2i . Table 4 Result of MLE Univariate Probit(N=2798) Bivariate Probit selection(N=3097) Parameter Parameter Standard Standard t – stat. t – stat. Error Error Estimate( αˆٛ1 ) Estimate( α̂ٛ2 ) 0.360225 10.34435 *** 2.342374 8.03E-06 1.383071 1.11E-05 0.176469 -1.402744 -0.210365 0.149876 -2.715010 *** -0.479586 0.816670 0.051799 -0.800273 0.952175 0.557873 -0.094837 0.227626 0.415307 0.586126 7.50E-05 2.172574 ** 0.000330 0.035274 -13.37997 *** -0.440710 0.000208 -0.514697 0.000557 0.157018 1.018622 0.646626 0.209751 -2.344265*** -0.302108 ρ —— —— -0.526815 Chi-square 0.0000 *** (p-value) 552.0222 McFadden R2 0.535635 The coefficient estimates correspond to the parameters of model (1). * ** *** , , represent statistical significance at 10%, 5% and 1 respectively CONSTANT CAPT COMTY COMOWN RELAT ESTATE RMB LOANSIZE RATE MATURITY MORTGAGE APPROVAL 3.726294 1.11E-05 -0.247540 -0.406916 0.042303 0.531193 0.494534 0.000221 -0.471971 -0.000107 0.159942 -0.491712 —— 419.6083 % 0.303638 7.714374 *** 4.82E-06 2.302901 *** 0.179794 -1.170034 0.155308 -3.087964 *** 0.897579 -0.891591 0.216255 -0.438541 0.213365 2.747059 *** 7.48E-05 4.415668 *** 0.038318 -11.50136 *** 0.000335 1.664689 * 0.155685 4.153429 *** 0.087666 -3.446106*** 0.032439 -5.064592 *** 0.0000 *** (p-value) 0.596983 COMOWN, RMB, LOANSIZE, RATE, MATURITY, and APPROVAL are significant in both models. We can infer that the nation-owned enterprises have higher default rate than the other enterprise, the RMB loans have lower default rate than the other currency loans, higher loan size and maturity of the 49 loans will lead to lower default rate, and the higher interest rate will lead to higher default rate. The regression conclusions in the models are consistent with the experiential judgment of the loan committee when it examines the risk factor of a loan application. In addition, we found an interesting result in the models that the loan-approval agency obviously affects the risk of loans. Because the bank headquarters have more risk management experience and scrutiny skill than the branches and sub-branches, the loans which were examined and approved by them had lower risk. If the loan-approval agency upgrades one, the probability of loan default will decrease 30% in the bivariate probit selection model, while it will decrease 49% in the univariate probit model. Furthermore, there is another important difference between the two probit model except for the coefficients and estimation precision, that is variable CAPT and MORTGAGE are significant in bivariate selection probit model but not in the univariate probit model. However, both of the variables are important factors for the loan committee to consider lending decision-making. According to the experience of lending business of this commercial bank, the higher CAPT means the larger scale of a firm will be, thus it may lead to a lower probability of loan default. When a loan has mortgage, it certainly will reduce the risk of the loan. Moreover, the average estimated loan default probability in bivariate sample selection model is 15.3% and the same index in univariate probit model is 21.9%. Therefore, the sample selection model can help the loan committee to improve not only their scrutiny efficiency but also precision of many loan applications. 4 Conclusions In this paper, the bivariate probit selection model has been applied to investigate the large loan credit rating. From a data set provided by the commercial bank, evidence is found that the loan credit rating model will suffer a bias estimation when it does not consider the fact of sample selection bias problem. We develop an econometric model of bivariate probit selection model to study the loan credit. The bivariate probit selection model has better estimates than the univariate probit model. Because the loan settlement projects have been excluded out of the current balance sheet, the sample selection model predicts much lower default rate for the population as a whole (15.3% vs. 21.9%). Our results show that using bivariate probit model to measure the probability of defult of loan application can help the loan committee of the commercial to make proper lending decision and optimize the process of lending. In this instance, there is a lack of extensive financial information such as asset liability ratio and liquidity ratio of the firms, which are primary variables in developing credit-scoring model but did no provide in our study by the bank for some reason. So, our empirical sample selection model is not a perfect one. But it is enough for us to only study the process of lening in spite of the limitation of data. Whatever, it is important to include the financial variable besides consider the sample selection bias of data when constructs the credit-scoring model for practice used. Reference [1] Carling, K., Jacobson, T., Roszbach, Kf., Dormacy, Risk and expected profits of consumer loans, Journal of banking and financial, 2001,vol. 25, no.2, pp. 717-739. [2] Henly, W.E., and D.J. Hand, A K-nearest-neighbor classification for assessing consumer credit risk, The statistician, 1996,vol. 45, no.1, pp. 77-95. [3] Robert F. Phillips, Anthony M. J. Yezer, Self-selection and tests for bias and risk in mortgage lending: Can you price the mortgage if you don’t know the process, The journal of real estate research, 1996,vol.11, no. 1, pp. 87-102. [4] Bayes, W.J., Hoffman, D.L., Low, S.A., “An econometric analysis of the bank credit scoring problem” Journal of Econometric Perspectives, 1989, vol. 40, no. 2, pp. 3-14. [5] Greene, W., Sample selection in credit-scoring models, Japan and the world Economy, 1998, vol. 10, no. 3, pp. 299-316. 50 [6] Jacobson, T., Roazbach, K., Bank lending policy, credit scoring and value-at-risk, Journal of Banking and Finance, 2003, vol. 27, no. 4, pp. 615-633. [7] Greene, W., Econometric Analysis, 2nd edition, New York: Macmillan, 1993 51