...

Management & Engineering Research on Agricultural Insurance Rate Based on

by user

on
Category: Documents
43

views

Report

Comments

Transcript

Management & Engineering Research on Agricultural Insurance Rate Based on
Management & Engineering 22 (2016) 1838-5745
Contents lists available at SEI
Management & Engineering
journal homepage: www.seiofbluemountain.com
Research on Agricultural Insurance Rate Based on
Wavelet-SVM-GMD: A Case Study of Shandong Province Cotton
Insurance
Hairong CUI∗, Xunfa LU
School of Economics and Management, NUIST, Nanjing 210044, P.R.China
KEYWORDS
ABSTRACT
Agricultural insurance,
GMD,
Rates estimated,
SVM,
Wavelet analysis
The article establishes the crop insurance rate estimated model based on
Wavelet-SVM-GMD. Firstly the crop trend yields in the known years is obtained by wavelet
analysis; then the data is fitted and forecasted by SVM to receive the crop trend yield in the
future insured year; furthermore, the loss probability distribution of the crop is fitted by
GMD; finally, according to the law of the yields distribution model, the crop insurance
premium is calculated. An empirical study on Shandong Province Cotton insurance shows
that the Wavelet-SVM-GMD model is appropriate to determine the crop premium rate.
© ST. PLUM-BLOSSOM PRESS PTY LTD
1 Introduction
Agriculture is a typical risk-based industry. Agricultural insurance as an economic compensation system for agricultural risk is the
commonly used measure of the international community. In recent years, the Chinese government continuously increases policy
efforts to benefit the development of agricultural. Agricultural insurance is ushering in an unprecedented period of strategic
development. Scientific determination of agricultural insurance rates is an important prerequisite to ensure the stable operation of
agricultural insurance.
Determining of agricultural insurance rates has two ways: experience method and yield distribution model method. Since the
recorded history data of our crop yields is less, the yield distribution model method is one of the methods that scholars are commonly
using. Therefore, this paper determines the crop insurance rates using yield distribution model.
The research on the yield distribution model has focused on two ways: calculation of the crop trend yield and estimation of
distribution of loss. To calculation of trend yield, most of the existing literature use trend equation and wavelet analysis [1]. Although
the trend equation is simple and practical, the choice of equations with subjective impacts prediction results [2]. Wavelet analysis
views digital signals with step by step, and fully embodies the advantages such as multi-resolution [3]. When there is a small amount
of data, Wavelet analysis to predict the trend is ineffective [4]. Support vector machine (SVM) as a new machine learning methods can
overcome this disadvantage of wavelet analysis [5]. So, this article will use both wavelet analysis and SVM to predict crop trend
yields. The estimated methods of loss distribution include parametric methods and non-parametric method. Parameter method
∗
Corresponding author.
E-mail address: [email protected]
English edition copyright © ST. PLUM-BLOSSOM PRESS PTY LTD
DOI:10.5503/J.ME.2016.22.005
21
requires less sample data capacity, but need to presume a model form for the normal distribution, Beta distribution or Weibull
distribution. Non-parametric method is more flexible and does not require prior assumption to determine the probability distribution,
but the requirement for the amount of data is more [6]. Many scholars choose a non-parametric method estimate the distribution of
crop losses [7]. In fact, we are rarely able to determine in advance the form of parametric model. Although the non-parametric method
is better flexibility, it requires an amount of data, which may not be applicable in our country [8]. Gaussian mixture distribution (GMD)
with arbitrary precision approximates arbitrary probability distribution. It ranged between parametric method and non-parametric
method, and requires less the amount of data [9]. So the article will use GMD in estimating the loss distribution of crop yield.
On the basis of existing research, we will make use of wavelet analysis, SVM, and GMD to construct the Wavelet-SVM-GMD model
determining the crop insurance rates.
2 The Principle of the Pure Crop Yield Insurance Rate
The crop actual yield in the protected year is y. y is a random variable. F(y) is the probability density function of y. The extent of
insurance coverage is λ . ŷ is the crop trend yield in the guaranteed year. When y < λ yˆ , the insurer should be responsible for the
loss. The expected value of the actual yield is E ( y|y < λ yˆ ) . The loss is λ yˆ -E ( y|y < λ yˆ ) [10]. The expected loss is
E (loss ) = E (λ yˆ -E ( y | y < λ yˆ ))=P( y < λ yˆ ) × (λ yˆ -E ( y | y < λ yˆ ))
λ yˆ
 yF ( y)dy )= λ(λ yˆ -y) F ( y)dy
=  F ( y )dy × (λ yˆ − λ

 F ( y)dy
λ yˆ
yˆ
0
yˆ
0
(1)
0
0
The pure rate of crop yield insurance is calculated as
λ yˆ
E (loss ) 0 (λ yˆ -y ) F ( y ) dy
R=
=
λ yˆ
λ yˆ
(2)
If the relative loss rate of the crop yields is
l=
λ yˆ − y
( l ∈ (0,1] )
λ yˆ
(3)
Then, (2) has the following form
1
R =  xf ( x)dx
(4)
0
ˆ ) is the probability density function of the relative loss rate l .
ˆ (λ yˆ − λ yx
Where f ( x)=λ yF
3 Wavelet-SVM-GMD Model
3.1 Wavelet
Wavelet analysis use a family of base functions in decomposing the original signal into two parts by Mallat algorithm, the low
frequency and the high frequency.
When using Wavelet Analysis in calculating the crop trend yields, the sequence of the crop actual yields is divided into the following
form:
yt = yˆt + ywt + ε t
Where yt is the actual yield of crops in year t (t = 1, , n) . yˆ t is the trends yield determined by the level of technology. ywt is
the yield affected by the meteorological factors. ε t is the random error term.
If ignoring the random error term, then
yt = yˆ t + ywt
By Wavelet analysis the sequences of crop actual yields yt is decomposed in largest scale to obtain the trend item yˆ t and the
fluctuating term ywt . Then the trend term yˆ t in the known year is obtain by wavelet reconstruction.
3.2 SVM
When using SVM in forecasting the crop trend yield in protected year, according to the sample input-output data
sets ( ti , yˆ i ) (i = 1, , m) , compute the relationship between the input time and the output trend yields. Where ti is the i-th time
sample. yˆ i is the sample of the crop trend yields corresponding to the time ti . m is the number of training samples.
22
According to SVM theory, firstly the input vector is mapped to a high dimensional feature space, and then in the feature space the
form of best approximation problem is constructed as follows:
N
yˆ(t ) =  (α i* − α i ) K (ti , t ) + b
i =1
Where α i , α i* (i = 1, 2, N ) is undetermined coefficient, b is the deviation, N is the number of SVM. K (
, ) is SVM nuclear
function. The influence on the choice of kernel function is not significant. Here is widely used RBF kernel
K ( ti , t ) = exp(−
ti − t
2
2 p2
) . p is the RBF kernel parameter.
α i , α i* , b and N can be determined by calculating the following optimization problem:
N
N
max :  yˆ i (α i* − α i ) − ε  (α i* + α i ) −
i =1
s.t.
N
α
i =1
*
i
i =1
1 N
 (α i* − α i )(α *j − α j ) K (ti , t j )
2 i , j =1
N
=  α i , 0 ≤ α i , α i* ≤ C ,
i = 1, , N
i =1
Where ε is the prediction accuracy, C is the penalty coefficient. Controlling C and ε can enhance SVM’s generalization ability.
3.3 GMD
Suppose relative loss rate of the crop yields obeys GMD with k components. Density functional form is
k
f ( x ) =  pi fi ( x; μi , σ i )
i =1
Where
f i ( x; μi , σ i ) (i = 1, 2, , k ) is a Gaussian probability density function. μi is Mean, σ i is standard deviation.
pi (i = 1, 2, , k ) is the weight of each component, satisfied by
k
p
i
= 1, pi ≥ 0 .
i =1
Using the maximum likelihood principle in estimating model parameters μi , σ i , pi , k. EM (Expectation Maximization) algorithm
is a repeated statistical technique to solve the model distribution parameters by the maximum likelihood from the incomplete data. It
is easy to convergence and implement. The paper uses EM algorithm to solve the GMD model.
4 Empirical Studies
Shandong Province is a major agricultural province with serious natural disasters in Chinese. Cotton premium rate in Shandong
Province is 4%. The following will be taking cotton insurance for empirical research to test the validity of the model.
4.1 Trend yield calculation and prediction
In 1949-2011, Cotton yields which in Shandong Province is the sample data. The data is from the Statistical Information Network of
Shandong Province and 2011 Statistical Yearbook in Shandong Province.
In Matlab wavelet toolbox, select compactly supported biorthogonal wavelet db3 as the mother wavelet. Cotton production is
decomposed in the largest-scale and then the low-frequency part is reconstructed. The trend yields are obtained, as shown in Figure 1.
a3
2000
1000
0
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
d3
100
0
-100
d2
200
0
-200
d1
200
0
-200
t
Figure 1 Cotton yields sequence wavelet decomposition
Cotton trend yield data by Wavelet were divided into two groups. The data from 1949 to 2003 is the training set, and the data from
2004 to 2011 is the prediction set. Firstly the training set is learned in SVM. The parameters determined by grid searching method
with the gradually narrow step. p = 3 , C = 2.3106 × 104 , ε = 0.001 . Fitting results are shown in Figure 2. Where 1949 is the
starting year, corresponding to t = 1, and so on. The correlation coefficient of the two sets of data is 0.9 996. Then the prediction set is
23
fitted by the SVM optimal prediction model, shown in Figure 3. Where 2004 corresponds to t = 56 . The correlation coefficient of
the two sets of data is 0.9 778. The predicted effect is very good. Lastly calculate the forecast value of the cotton trend yield in 2012
is yˆ =yˆ 64 =1014.7 kg / ha.
Cotton yields(Kg/ha.)
t
Figure 2 Fitting curve of the training set by SVM optimal prediction model
Cotton yields(Kg/ha.)
1070
1065
1060
1055
Raw data
1050
Fitted data
1045
56
57
58
59
60
61
62
63
Figure 3 Fitting curve of the prediction set by SVM optimal prediction model
4.2 Loss probability density function estimation
According to equation (3), the relative loss rate of crop yields is calculated, and then the EM algorithm is used in calculating GMD
related parameters, as shown in Table 1.
Table 1 The parameter of GMD model
λ
k
p1
μ1
σ1
p2
μ2
σ2
100%
2
0.5 752
0.0 591
0.0 013
0.4 248
0.2 574
0.0 176
95%
2
0.2 760
0.0 208
8.484 3E6
0.7 240
0.1 855
0.0 191
90%
1
1
0.1 678
0.0 190
—
—
—
Figure 4 shows the histograms and the GMD density function of the relative loss rate l. It can be seen that GMD well reflected the
distribution pattern of the sample data. Furthermore, calculate the empirical distribution and the GMD cumulative distribution
function (CDF) of the relative loss rate l (Figure 5). The Anderson-Darling (AD) method is used in testing the goodness of fitting,
GMD’s goodness of fitting is 0.99, showing that the fitting effect is very well.
6
8
5
GMD
6
Frequence
4
f(x)
3
4
2
2
1
0
-0.1
0
0.1
0.2
0.3
0.4
0.5
Relative loss rate
0.6
0
-0.2
0.7
0
0.2
x
0.4
Figure 4 Histogram and GMD density function of the relative loss rate l
24
0.6
0.8
1
0.8
0.6
F(x)
0.4
Empirical distribution
0.2
GMD
0
-0.2
0
0.2
0.4
0.6
0.8
x
Figure 5 Empirical distribution and GMD CDF of the relative loss rate l
4.3 Insurance rates determined
In sum, according to the equation (4) calculate the pure cotton insurance rates in the different level of protection, as shown in Table 2.
Table 2 The pure cotton insurance rates in the different level of protection
λ (%)
Rates (%)
100
4.36
95
4.12
90
3.98
As can be seen from Table 2, under a different level of protection, the pure cotton insurance rates gradually increased with the
increase of the level of protection. It is suggested the higher level of protection, the higher the risk, which is reasonable for the
insurance companies. Under the different level of protection, the pure cotton insurance rate calculated by the model is slightly
different from 4% the Shandong Province actually charged.
5 Conclusion
Policy-oriented agricultural insurance benefits the country. However, there are still a number of technical problems when this work
carried out. The key to successfully carrying out is the scientific determination of agricultural insurance rates. The history data of
China's crop yield data are scarce, so wavelet analysis, support vector machine and Gaussian mixture distribution is used in building
Wavelet-SVM-GMD model to determine the crop insurance rates.
The empirical research results on Shandong Province cotton Insurance show that: Gaussian mixture distribution can effectively
improve the loss fitting goodness; the pure cotton insurance rates calculated by the Wavelet-SVM-GMD model and 4% actually
charged by Shandong Province is slightly different. So the model, as results are still reasonable.
Acknowledgment:
The authors of this paper would like to thank the support of The Universities Philosophy and Social Science Foundation of
Department of Education Jiangsu Chinese, grant 2012SJB630047.
References
[1]. LIANG Laicun. A Comparison and Choice on the Pure Rate-making Methods of Grain Insurance in China. The Journal of
Quantitative & Technical Economics. 2011 (2): 124-134 (in Chinese)
[2]. LIU Xiaokang, GU Hongbo. Setting the Premium Rate of Crop Insurance Under Catastrophic Risk. Journal of Jiangxi
Agricultural University (Social Sciences Edition). 2013, 11 (1): 63-68 (in Chinese)
[3]. M. J. Pringle, B. P. Marchant, R. M. Lark. Analysis of Two Variants of a Spatially Distributed Crop Model, Using Wavelet
Transforms and Geostatistics, Agricultural Systems. 2008: 98 (2): 135-146
[4]. HUANG Xinyang, TAN Minsheng, WU Hongru, et al. An Improved Invariant Wavelet Transform, Procedia Engineering. 2012,
29: 1963-1968 (in Chinese)
[5]. ZHU Xiaochuan. Application of Software Aging Prediction Based on Wavelet Analysis and Support Vector Machines,
Computer Simulation. 2012, 29 (3): 266-269 (in Chinese)
[6]. WANG Ke, ZHANG Qiao. Influence of Flexible Crop Yield Distributions on Crop Insurance Premium Rate: A Case Study on
25
[7].
[8].
[9].
[10].
Cotton Insurance in Three Counties of Xinjiang Province, Journal of Chinese Agricultural University. 2010, 15 (2): 114-120 (in
Chinese)
R. Yang, L. Wang, Z. Xian, Y Zhongyong. Evaluation on the Efficiency of Crop Insurance in Chinese's Major Grain-Producing
Area, Agriculture and Agricultural Science Procedia. 2010, 1: 90-99
GUO Xingxu, TAO Jianping, ZENG Xiaoyan. Comparative Study of Rapeseed Yield Insurance Pure Premium of Hubei
Province-Empirical Analysis Under Alternative Yield Distributions, Insurance Studies. 2010 (1): 65-72 (in Chinese)
V. Melnykov, I. Melnykov. Initializing the EM algorithm in Gaussian Mixture Models with an Unknown Number of
Components, Computational Statistics & Data Analysis. 2013, 56 (6): 1381-1395
B. J. Sherrick. Crop Insurance Valuation Under Alternative Yield Distributions, American Journal of Agricultural Economics.
2004, 86 (2): 406-419
26
Fly UP