Best Practices vs. Misuse of PCA in the Analysis of Climate Variability

by user

on 15 сентября 2016

Category: Documents

>> Downloads: 10

views

Report

Comments

Description

Download Best Practices vs. Misuse of PCA in the Analysis of Climate Variability

Transcript

Best Practices vs. Misuse of PCA in the Analysis of Climate Variability

Best Practices vs. Misuse of PCA
in the Analysis of Climate
Variability
Bob Livezey
Climate Services /Office of
Services/NWS/NOAA
30th Climate Diagnostics and Prediction Workshop
State College, PA, October 26, 2005
Outline
• Motivation, take-home messages and references
• Preprocessing considerations
• S-mode example: Mathematics, characteristics,
interpretation, testing, and truncation
• Rotation: Benefits and truncation considerations
• Conclusions
Eigenvector-Based
Linear Techniques
• Dealing simultaneously with many time series:
– Principal Component Analysis (PCA) – efficient
representation of the information in multiple time
series (time series of gridded maps);
– Rotation – linear transformation of PCA and other
eigenvector based methods to improve the
representation;
– Canonical Correlation Analysis (CCA) – one of the
better ways to efficiently represent linearly the
relationships between two different time series of
gridded maps (say 500 mb heights and surface
temperatures).
Take-Home Messages
•
PCA is an extremely useful linear tool for data compression,
orthogonalization, and filtering
•
PCA results are mathematical and (for even the first mode) don’t
necessarily have to have physical relevance
– Even when the first mode has physical relevance its representation may be
flawed (e.g. the “Arctic Oscillation”)
•
PCA results can be critically impacted by choices of domain, grid, scaling,
etc.
•
Effective PC truncation requires insight and experimentation
•
Rotation can enhance physical relevance and reduce sampling variability
– Under- and over-rotation can negate these gains
•
Just because an area on a map has a closed loading contour doesn’t make
it part of a “dipole” or “tripole”
REFERENCES FOR BASIC PCA AND RPCA
•
•
•
•
•
•
•
•
Barnston, A. G., and R. E. Livezey, 1987: Classification, seasonality, and
persistence of low frequency atmospheric circulation patterns. Mon. Wea.
Rev., 115, 1083-1126.
Huth, R., 2006: The effect of various methodological options on the
detection of leading modes of sea level pressure variability. Tellus, under
revision.
Jolliffe, I. T., 1995: Rotation of principal components: choice of normalization
constraints. J. Appl. Statistics, 22, 29-35.
Livezey, R. E., and T. M. Smith, 1999b: Considerations for use of the
Barnett and Preisendorfer (1987) algorithm for canonical correlation
analysis of climate variations. J. Climate, 12, 303-305.
North, G. R., T. L. Bell, and R. F. Cahalan, 1982: Sampling errors in the
estimation of empirical orthogonal functions. Mon. Wea. Rev., 110, 699706.
O'Lenic, E., and R. E. Livezey , 1988: Practical considerations in the use of
rotated principal components analysis (RPCA) in diagnostic studies of
upper_air height fields. Mon. Wea. Rev., 116, 1682-1689.
Richman, M. B., 1986: Rotation of principal components. J. Climatology, 6,
293-335.
Richman, M. B., and P. J. Lamb, 1985: Climatic pattern analysis of 3- and
7-day summer rainfall in the central United States: Some methodological
considerations and a regionalization. J. Clim. Appl. Meteor., 24, 1325-1343.
Preparing Data
1.
Preprocessing often has major impact on results and
their interpretation.
2.
PCA results are inherently domain dependent as I
will illustrate later.
3.
Standardization means each record has equal weight
in variance-based multivariate analyses; ie high
latitudes vs tropics, January vs. November.
If this is desirable then PCA should be based on the
correlation matrix, if not desirable then the
covariance matrix.
Preparing Data
4.
PCA should be performed on as narrow a window in
the seasonal cycle as sample considerations permit
to avoid mixing inhomogeneous climates (like the
January vs. November example in 3 above).
5.
Area averaged or gridded data often must be
weighted in in multivariate analyses:
Smaller areas can influence results as much as
larger;
On lat/lon grids density of points (and influence)
increase with latitude.
Preparing Data
5. Two ways to treat the
problem:
Create an approximate
equal area representation
(ie CPC megadivisions,
Barnston and Livezey,
1987, grid);
Weight the data –
generally proportional to
the square root of the
area.
Preparing Data
5.
If weights are needed and PCA on the correlation
matrix is the objective, then standardization should
be performed before weighting and then the
covariance matrix formed. Otherwise weights are
removed in the standardization step.
Preparing Data
6.
In EPCA (see below), CCA, etc. maps of variables
with greater numbers of data points will have
disproportionate influence on the results unless the
maps are weighted, ie proportionately to the square
root of the ratio of the total variance in all variables
to the total variance in the weighted variable (see
Livezey and Smith, 1999b).
Principal Component Analysis
1.
Used principally for data compression and filtering,
often as first step to other analyses; direct physical
interpretation VERY limited.
2.
The form most commonly used in climate studies (Smode) starts with n (t = 1,…,n) maps or groups of
maps z with m data points x and the period-of-record
means removed; z(x,t).
3.
The maps are decomposed into a linear combination
of map patterns; the first pattern explains the most
variance, the second is orthogonal to the first and
explains the second most variance, etc.
Principal Component Analysis
N
z( x, t ) 
 a (t )e ( x )
i 1
i
i
ai (t )  z T ( x, t )ei ( x)
ei ( x)  i1 z T ( x, t )ai (t )
N=smaller(m,n),
z(x,t): Original maps, linear combinations of fixed patterns ei(x) with
time-dependent weights ai(t)
ai(t): Principal component scores (time series), the projections of the
maps onto the eigenvectors
ei(x): Principal component loadings (map patterns), also eigenvectors
of the covariance matrix of z.
λi: Eigenvalues of the covariance matrix of z.
Principal Component Analysis
4.
Example of
first four patterns
of 3-day
precipitation for
May-August over
the central US
(Richman and
Lamb, 1985). The
sequence of
patterns is seen
repeatedly in other
analyses and can
be considered an
artifact of the
geometry of PCA:
Principal Component Analysis
5.
All of the patterns (the e’s) are orthogonal and the leading ones reflect
the data points with the most variance. The eigenvalues give these
variances; the first four for the Richman and Lamb patterns are 11.13%,
9.33%, 5.55%, and 4.54%.
6.
Usually (always when the PCA is on the correlation matrix) the numbers
on the maps are correlations of the original data series with the
corresponding scores, thus their squares represent explained variance.
Thus in the latter context:
(a) a point with 0.5 is more than 6 times more important than a point with
0.2, a point with 0.8 more than 7 times more important than one with 0.3,
etc.;
(b) summations of the squares over the maps give the total variances
listed in 5 above;
(c) comparing the squared central values within closed contours allows
practical discrimination between monopoles, dipoles, etc.
Principal Component Analysis
7.
The time series that go with the patterns (the a’s) are uncorrelated
(i.e. not collinear), so they are desirable for multiple linear
regression.
8.
To compress or filter the data some of the patterns must be
thrown out, i.e. the series must be truncated; this is an ART (see
O’Lenic and Livezey, 1988 for the best approach I know).
In these applications over-truncation (throwing baby out with the
bath water) is of far more concern than under-truncation (retention
of some noise). As a pre-step for rotation, CCA, etc., both should
be of concern (see below).
Principal Component Analysis
9.
Physical interpretation of other than the leading PC pattern is usually
unwarranted, and this is often the case for the first as well. Richman (1986)
shows this for the example in two ways. First he splits the domain in two and
does separate PCA on each. Here’s the result for the first PCA mode. Note
that the first mode for the southern domain (a monopole covering the domain)
is not reproduced in the full domain analysis:
Principal Component Analysis
Next he computes the one-point teleconnection pattern for the
largest loading on each pattern. Here’s the result for the second
PCA mode. The PCA mode is a dipole, the teleconnection pattern
(reflecting the physical covariance structure around the point) a
monopole:
Principal Component Analysis
10.
The North et al. (1982) Test is to determine whether two
consecutive patterns can be reasonably interpreted as distinct
patterns or separate signals. It assumes the n samples are
independent (heuristically adjust downward for dependence):


1/ 2
2(i  i 1 ) /  n / 2 (i  i 1 )   1
10.
Other kinds of PCA:
Combined (CPCA) – more than one mapped variable;
Extended (EPCA) – group of maps of same variable at different lags to
capture pattern evolution (MSSA is a variant);
Rotated (RPCA) – to reduce sampling error and improve physical
representiveness.
Rotation
1.
Rotation, ie the linear transformation of a truncated set of patterns
(Richman, 1986), should be considered in many problems when
patterns with minimum sampling variability, little domain
dependence, and increased physical relevance are needed.
2.
Note the robustness of rotated patterns in Richman’s split domain
example (all patterns are present in both analyses):
Rotation
Now compare rotated mode 2 and its corresponding
teleconnection pattern (both are monopoles with similar scales):
Rotation
3.
Barnston and Livezey (1987) compared 120
monthly 700 mb height PCA and RPCA patterns
with their corresponding one-point teleconnection
patterns – the average pattern correlation was 0.69
and 0.90 respectively. They also used sensitivity
tests to demonstrate dramatic reductions in
sampling error.
Barnston and Livezey (1987) RPCA Patterns
North
Atlantic
Oscillation
(a dipole!)
Western
Pacific
Oscillation
Pacific
North
America
Tropical
Northern
Hemisphere
Rotation
4.
The most likely reason for the success of rotation is
the relaxation of the geometrical and mathematical
constraints on the analysis, ie the data can speak
more for itself.
In a commonly used variant of varimax where the
eigenvectors are weighted by the square root of the
eigenvalue the resulting patterns do not have to be
orthogonal and the resulting time series do not have
to be independent (Jolliffe, 1995).
Under- and Over-Rotation
5. Under-rotation
(truncation of too
many modes) can
result in discarded
signal while overrotation (truncation
of too few) can
result in overregionalization of
signals (see Olenic
and Livezey, 1988).
Map (a) here is a
dipole but (b)and (c)
are monopoles.
Conclusions
•
PCA is an extremely useful linear tool for data compression,
orthogonalization, and filtering
•
PCA results are mathematical and (for even the first mode) don’t
necessarily have to have physical relevance
– Even when the first mode has physical relevance its representation may be
flawed (e.g. the “Arctic Oscillation”)
•
PCA results can be critically impacted by choices of domain, grid, scaling,
etc.
•
Effective PC truncation requires insight and experimentation
•
Rotation can enhance physical relevance and reduce sampling variability
– Under- and over-rotation can negate these gains
•
Just because an area on a map has a closed loading contour doesn’t make
it part of a “dipole” or “tripole”