Management & Engineering Research on Agricultural Insurance Rate Based on
by user
Comments
Transcript
Management & Engineering Research on Agricultural Insurance Rate Based on
Management & Engineering 22 (2016) 1838-5745 Contents lists available at SEI Management & Engineering journal homepage: www.seiofbluemountain.com Research on Agricultural Insurance Rate Based on Wavelet-SVM-GMD: A Case Study of Shandong Province Cotton Insurance Hairong CUI∗, Xunfa LU School of Economics and Management, NUIST, Nanjing 210044, P.R.China KEYWORDS ABSTRACT Agricultural insurance, GMD, Rates estimated, SVM, Wavelet analysis The article establishes the crop insurance rate estimated model based on Wavelet-SVM-GMD. Firstly the crop trend yields in the known years is obtained by wavelet analysis; then the data is fitted and forecasted by SVM to receive the crop trend yield in the future insured year; furthermore, the loss probability distribution of the crop is fitted by GMD; finally, according to the law of the yields distribution model, the crop insurance premium is calculated. An empirical study on Shandong Province Cotton insurance shows that the Wavelet-SVM-GMD model is appropriate to determine the crop premium rate. © ST. PLUM-BLOSSOM PRESS PTY LTD 1 Introduction Agriculture is a typical risk-based industry. Agricultural insurance as an economic compensation system for agricultural risk is the commonly used measure of the international community. In recent years, the Chinese government continuously increases policy efforts to benefit the development of agricultural. Agricultural insurance is ushering in an unprecedented period of strategic development. Scientific determination of agricultural insurance rates is an important prerequisite to ensure the stable operation of agricultural insurance. Determining of agricultural insurance rates has two ways: experience method and yield distribution model method. Since the recorded history data of our crop yields is less, the yield distribution model method is one of the methods that scholars are commonly using. Therefore, this paper determines the crop insurance rates using yield distribution model. The research on the yield distribution model has focused on two ways: calculation of the crop trend yield and estimation of distribution of loss. To calculation of trend yield, most of the existing literature use trend equation and wavelet analysis [1]. Although the trend equation is simple and practical, the choice of equations with subjective impacts prediction results [2]. Wavelet analysis views digital signals with step by step, and fully embodies the advantages such as multi-resolution [3]. When there is a small amount of data, Wavelet analysis to predict the trend is ineffective [4]. Support vector machine (SVM) as a new machine learning methods can overcome this disadvantage of wavelet analysis [5]. So, this article will use both wavelet analysis and SVM to predict crop trend yields. The estimated methods of loss distribution include parametric methods and non-parametric method. Parameter method ∗ Corresponding author. E-mail address: [email protected] English edition copyright © ST. PLUM-BLOSSOM PRESS PTY LTD DOI:10.5503/J.ME.2016.22.005 21 requires less sample data capacity, but need to presume a model form for the normal distribution, Beta distribution or Weibull distribution. Non-parametric method is more flexible and does not require prior assumption to determine the probability distribution, but the requirement for the amount of data is more [6]. Many scholars choose a non-parametric method estimate the distribution of crop losses [7]. In fact, we are rarely able to determine in advance the form of parametric model. Although the non-parametric method is better flexibility, it requires an amount of data, which may not be applicable in our country [8]. Gaussian mixture distribution (GMD) with arbitrary precision approximates arbitrary probability distribution. It ranged between parametric method and non-parametric method, and requires less the amount of data [9]. So the article will use GMD in estimating the loss distribution of crop yield. On the basis of existing research, we will make use of wavelet analysis, SVM, and GMD to construct the Wavelet-SVM-GMD model determining the crop insurance rates. 2 The Principle of the Pure Crop Yield Insurance Rate The crop actual yield in the protected year is y. y is a random variable. F(y) is the probability density function of y. The extent of insurance coverage is λ . ŷ is the crop trend yield in the guaranteed year. When y < λ yˆ , the insurer should be responsible for the loss. The expected value of the actual yield is E ( y|y < λ yˆ ) . The loss is λ yˆ -E ( y|y < λ yˆ ) [10]. The expected loss is E (loss ) = E (λ yˆ -E ( y | y < λ yˆ ))=P( y < λ yˆ ) × (λ yˆ -E ( y | y < λ yˆ )) λ yˆ yF ( y)dy )= λ(λ yˆ -y) F ( y)dy = F ( y )dy × (λ yˆ − λ F ( y)dy λ yˆ yˆ 0 yˆ 0 (1) 0 0 The pure rate of crop yield insurance is calculated as λ yˆ E (loss ) 0 (λ yˆ -y ) F ( y ) dy R= = λ yˆ λ yˆ (2) If the relative loss rate of the crop yields is l= λ yˆ − y ( l ∈ (0,1] ) λ yˆ (3) Then, (2) has the following form 1 R = xf ( x)dx (4) 0 ˆ ) is the probability density function of the relative loss rate l . ˆ (λ yˆ − λ yx Where f ( x)=λ yF 3 Wavelet-SVM-GMD Model 3.1 Wavelet Wavelet analysis use a family of base functions in decomposing the original signal into two parts by Mallat algorithm, the low frequency and the high frequency. When using Wavelet Analysis in calculating the crop trend yields, the sequence of the crop actual yields is divided into the following form: yt = yˆt + ywt + ε t Where yt is the actual yield of crops in year t (t = 1, , n) . yˆ t is the trends yield determined by the level of technology. ywt is the yield affected by the meteorological factors. ε t is the random error term. If ignoring the random error term, then yt = yˆ t + ywt By Wavelet analysis the sequences of crop actual yields yt is decomposed in largest scale to obtain the trend item yˆ t and the fluctuating term ywt . Then the trend term yˆ t in the known year is obtain by wavelet reconstruction. 3.2 SVM When using SVM in forecasting the crop trend yield in protected year, according to the sample input-output data sets ( ti , yˆ i ) (i = 1, , m) , compute the relationship between the input time and the output trend yields. Where ti is the i-th time sample. yˆ i is the sample of the crop trend yields corresponding to the time ti . m is the number of training samples. 22 According to SVM theory, firstly the input vector is mapped to a high dimensional feature space, and then in the feature space the form of best approximation problem is constructed as follows: N yˆ(t ) = (α i* − α i ) K (ti , t ) + b i =1 Where α i , α i* (i = 1, 2, N ) is undetermined coefficient, b is the deviation, N is the number of SVM. K ( , ) is SVM nuclear function. The influence on the choice of kernel function is not significant. Here is widely used RBF kernel K ( ti , t ) = exp(− ti − t 2 2 p2 ) . p is the RBF kernel parameter. α i , α i* , b and N can be determined by calculating the following optimization problem: N N max : yˆ i (α i* − α i ) − ε (α i* + α i ) − i =1 s.t. N α i =1 * i i =1 1 N (α i* − α i )(α *j − α j ) K (ti , t j ) 2 i , j =1 N = α i , 0 ≤ α i , α i* ≤ C , i = 1, , N i =1 Where ε is the prediction accuracy, C is the penalty coefficient. Controlling C and ε can enhance SVM’s generalization ability. 3.3 GMD Suppose relative loss rate of the crop yields obeys GMD with k components. Density functional form is k f ( x ) = pi fi ( x; μi , σ i ) i =1 Where f i ( x; μi , σ i ) (i = 1, 2, , k ) is a Gaussian probability density function. μi is Mean, σ i is standard deviation. pi (i = 1, 2, , k ) is the weight of each component, satisfied by k p i = 1, pi ≥ 0 . i =1 Using the maximum likelihood principle in estimating model parameters μi , σ i , pi , k. EM (Expectation Maximization) algorithm is a repeated statistical technique to solve the model distribution parameters by the maximum likelihood from the incomplete data. It is easy to convergence and implement. The paper uses EM algorithm to solve the GMD model. 4 Empirical Studies Shandong Province is a major agricultural province with serious natural disasters in Chinese. Cotton premium rate in Shandong Province is 4%. The following will be taking cotton insurance for empirical research to test the validity of the model. 4.1 Trend yield calculation and prediction In 1949-2011, Cotton yields which in Shandong Province is the sample data. The data is from the Statistical Information Network of Shandong Province and 2011 Statistical Yearbook in Shandong Province. In Matlab wavelet toolbox, select compactly supported biorthogonal wavelet db3 as the mother wavelet. Cotton production is decomposed in the largest-scale and then the low-frequency part is reconstructed. The trend yields are obtained, as shown in Figure 1. a3 2000 1000 0 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 d3 100 0 -100 d2 200 0 -200 d1 200 0 -200 t Figure 1 Cotton yields sequence wavelet decomposition Cotton trend yield data by Wavelet were divided into two groups. The data from 1949 to 2003 is the training set, and the data from 2004 to 2011 is the prediction set. Firstly the training set is learned in SVM. The parameters determined by grid searching method with the gradually narrow step. p = 3 , C = 2.3106 × 104 , ε = 0.001 . Fitting results are shown in Figure 2. Where 1949 is the starting year, corresponding to t = 1, and so on. The correlation coefficient of the two sets of data is 0.9 996. Then the prediction set is 23 fitted by the SVM optimal prediction model, shown in Figure 3. Where 2004 corresponds to t = 56 . The correlation coefficient of the two sets of data is 0.9 778. The predicted effect is very good. Lastly calculate the forecast value of the cotton trend yield in 2012 is yˆ =yˆ 64 =1014.7 kg / ha. Cotton yields(Kg/ha.) t Figure 2 Fitting curve of the training set by SVM optimal prediction model Cotton yields(Kg/ha.) 1070 1065 1060 1055 Raw data 1050 Fitted data 1045 56 57 58 59 60 61 62 63 Figure 3 Fitting curve of the prediction set by SVM optimal prediction model 4.2 Loss probability density function estimation According to equation (3), the relative loss rate of crop yields is calculated, and then the EM algorithm is used in calculating GMD related parameters, as shown in Table 1. Table 1 The parameter of GMD model λ k p1 μ1 σ1 p2 μ2 σ2 100% 2 0.5 752 0.0 591 0.0 013 0.4 248 0.2 574 0.0 176 95% 2 0.2 760 0.0 208 8.484 3E6 0.7 240 0.1 855 0.0 191 90% 1 1 0.1 678 0.0 190 — — — Figure 4 shows the histograms and the GMD density function of the relative loss rate l. It can be seen that GMD well reflected the distribution pattern of the sample data. Furthermore, calculate the empirical distribution and the GMD cumulative distribution function (CDF) of the relative loss rate l (Figure 5). The Anderson-Darling (AD) method is used in testing the goodness of fitting, GMD’s goodness of fitting is 0.99, showing that the fitting effect is very well. 6 8 5 GMD 6 Frequence 4 f(x) 3 4 2 2 1 0 -0.1 0 0.1 0.2 0.3 0.4 0.5 Relative loss rate 0.6 0 -0.2 0.7 0 0.2 x 0.4 Figure 4 Histogram and GMD density function of the relative loss rate l 24 0.6 0.8 1 0.8 0.6 F(x) 0.4 Empirical distribution 0.2 GMD 0 -0.2 0 0.2 0.4 0.6 0.8 x Figure 5 Empirical distribution and GMD CDF of the relative loss rate l 4.3 Insurance rates determined In sum, according to the equation (4) calculate the pure cotton insurance rates in the different level of protection, as shown in Table 2. Table 2 The pure cotton insurance rates in the different level of protection λ (%) Rates (%) 100 4.36 95 4.12 90 3.98 As can be seen from Table 2, under a different level of protection, the pure cotton insurance rates gradually increased with the increase of the level of protection. It is suggested the higher level of protection, the higher the risk, which is reasonable for the insurance companies. Under the different level of protection, the pure cotton insurance rate calculated by the model is slightly different from 4% the Shandong Province actually charged. 5 Conclusion Policy-oriented agricultural insurance benefits the country. However, there are still a number of technical problems when this work carried out. The key to successfully carrying out is the scientific determination of agricultural insurance rates. The history data of China's crop yield data are scarce, so wavelet analysis, support vector machine and Gaussian mixture distribution is used in building Wavelet-SVM-GMD model to determine the crop insurance rates. The empirical research results on Shandong Province cotton Insurance show that: Gaussian mixture distribution can effectively improve the loss fitting goodness; the pure cotton insurance rates calculated by the Wavelet-SVM-GMD model and 4% actually charged by Shandong Province is slightly different. So the model, as results are still reasonable. Acknowledgment: The authors of this paper would like to thank the support of The Universities Philosophy and Social Science Foundation of Department of Education Jiangsu Chinese, grant 2012SJB630047. References [1]. LIANG Laicun. A Comparison and Choice on the Pure Rate-making Methods of Grain Insurance in China. The Journal of Quantitative & Technical Economics. 2011 (2): 124-134 (in Chinese) [2]. LIU Xiaokang, GU Hongbo. Setting the Premium Rate of Crop Insurance Under Catastrophic Risk. Journal of Jiangxi Agricultural University (Social Sciences Edition). 2013, 11 (1): 63-68 (in Chinese) [3]. M. J. Pringle, B. P. Marchant, R. M. Lark. Analysis of Two Variants of a Spatially Distributed Crop Model, Using Wavelet Transforms and Geostatistics, Agricultural Systems. 2008: 98 (2): 135-146 [4]. HUANG Xinyang, TAN Minsheng, WU Hongru, et al. An Improved Invariant Wavelet Transform, Procedia Engineering. 2012, 29: 1963-1968 (in Chinese) [5]. ZHU Xiaochuan. Application of Software Aging Prediction Based on Wavelet Analysis and Support Vector Machines, Computer Simulation. 2012, 29 (3): 266-269 (in Chinese) [6]. WANG Ke, ZHANG Qiao. Influence of Flexible Crop Yield Distributions on Crop Insurance Premium Rate: A Case Study on 25 [7]. [8]. [9]. [10]. Cotton Insurance in Three Counties of Xinjiang Province, Journal of Chinese Agricultural University. 2010, 15 (2): 114-120 (in Chinese) R. Yang, L. Wang, Z. Xian, Y Zhongyong. Evaluation on the Efficiency of Crop Insurance in Chinese's Major Grain-Producing Area, Agriculture and Agricultural Science Procedia. 2010, 1: 90-99 GUO Xingxu, TAO Jianping, ZENG Xiaoyan. Comparative Study of Rapeseed Yield Insurance Pure Premium of Hubei Province-Empirical Analysis Under Alternative Yield Distributions, Insurance Studies. 2010 (1): 65-72 (in Chinese) V. Melnykov, I. Melnykov. Initializing the EM algorithm in Gaussian Mixture Models with an Unknown Number of Components, Computational Statistics & Data Analysis. 2013, 56 (6): 1381-1395 B. J. Sherrick. Crop Insurance Valuation Under Alternative Yield Distributions, American Journal of Agricultural Economics. 2004, 86 (2): 406-419 26