Mann et al. have recently argued that they can salvage MBH98type results using correct PC calculations under "the standard selection rule (Preisendorfer's Rule N) used by MBH98". http://www.realclimate.org/index.php?p=8 They say that this method permits them to retain 5 PCs in the North American network. Since the bristlecones are in the PC4, this expanded roster still permits them to imprint the NH temperature reconstruction. We have discussed elsewhere many issues regarding the robustness and statistical significance of this calculation. Here I consider the narrow issue of whether this method was actually "used by MBH98" for tree ring networks. I have been able to closely replicate the diagram published at realclimate.org on Nov. 22, 2004, said to be an example of the selection method used in MBH98. I have tested the 19 network/calculation step combinations used in MBH98 and, in 18 of 19 cases, the selections from the Preisendorfertype calculation are inconsistent with the reported selections at the Corrigendum SI. In some cases, the results are higher; in some cases, lower. In three calculations, different selections are taken from the same network in different calculation steps  a result inconsistent with the stated policy. We remain puzzled why Mann et al. continue to refuse to provide source code for MBH98 calculations and why climate scientists do not expect them to do so.
First, there is no mention in MBH98 or the MBH98 SI that Preisendorfer's Rule N was used to determine the number of retained PC series for tree ring networks. The only pertinent reference in MBH98 was as follows:
Certain densely sampled regional dendroclimatic data sets have been represented in the network by a smaller number of leading principal components (typically 3–11 depending on the spatial extent and size of the data set). This form of representation ensures a reasonably homogeneous spatial sampling in the multiproxy network (112 indicators back to 1820). [our bolds]
This statement contains no reference to the use of Preisendorfer's Rule N.
In connection with the calculation of temperature principal component series, a different calculation, MBH98 does refer to the use of Preisendorfer's Rule N as follows:
a conventional Principal Component Analysis (PCA) is performed... An objective criterion was used to determine the particular set of eigenvectors which should be used in the calibration as follows. Preisendorfer’s selection rule ‘rule N’ was applied to the multiproxy network to determine the approximate number N_{eofs }of significant independent climate patterns that are resolved by the network, taking into account the spatial correlation within the multiproxy data set.
Before trying to interpret these two statements from a text analytic point of view, I will make four quick points about rules for deciding the number of PCs to retain:
The briefest survey of PC literature will show that there are many approaches to selecting the number of PC series to retain and Preisendorfer's Rule N is far from being a "standard selection rule".
in fact, Urban, in a presentation about PCs cited on Jan. 6, 2005 by Mann at realclimate stated that the choice was subjective as follows:
It should be noted that because the goal of PCA is essentially utilitarian, the choice of how many axes to retain is ultimately subjective. In practice, either 2 or 3 axes are retained, simply because it is difficult to project more than this onto a printed page.
Overland and Preisendorfer [1982] themselves argued that being significant under Rule N was only necessary for significance; they did not argue that it was sufficient.
The real test for retaining a PC series is not whether it is significant under Preisendorfer's Rule N (or some other such rule), but whether it is scientifically significant,. For example, Franklin et al. [1995] stated:
In the final analysis, the
retained
components
must make good scientific sense (Frane & Hill 1976;
Legendre & Legendre 1983; Pielou 1984; Zwick & Velicer 1986; Ludwig
& Reynolds
1988; Palmer 1993).
Now, from a text analytic perspective, a reasonable reader might conclude that the difference in description of the PC retention policy in the two cases  tree rings and temperatures  pointed to the use of different procedures in the two calculations. In fact, the form of PC calculation in the two calculations differed: we have determined that the temperature PC calculations were centered calculations, while, as we've pointed out in our recent articles (and earlier), the tree ring PC calculations were not conventional centered calculations. Mann et al. have recently (Jan. 6, 2005) acknowledged that they did not use a "standard centered method" so their use of an uncentered method is no longer in dispute.
The real test for whether Preisendorfer's Rule N was used in MBH98 was whether the actual number of selected PCs can be replicated using this method.
The actual retentions for each calculation step/network combination were not provided in MBH98, its SI or at Mann's FTP site. The first complete listing of actual retentions came in the Corrigendum SI (July 2004). Even the Corrigendum SI contains no summarized listing: the following table was collated from the Corrigendum SI and shows the number of retained PCs by calculation stepnetwork combination. (It was impossible to deduce this table with the additional disinformation of Mann et al. [2003] that 159 distinct series were used, since only 139 distinct series were actually used. Any such deduction attempts were further blocked by erroneous listings of the number of series used in the AD1450 step and the erroneous nonuse of 6 available series in the AD1500 step. These do not affect early 15th century results, but frustrate attempts at replication.)
1400  1450  1500  1600  1700  1730 
1750 
1760  1780  1800  1820  
Stahle/OK  0  0  0  0  3  3  3  3  3  3  3 
Stahle/SWM  1  1  2  4  7  7  9  9  9  9  9 
NOAMER  2  2  6  7  7  7  9  9  9  9  9 
SOAMER  0  0  0  2  2  2  3  3  3  3  3 
AUSTRAL  0  0  0  3  3  3  4  4  4  4  4 
Vaganov  0  1  1  2  2  2  3  3  3  3  3 
PC series  3  4  9  18  24  24  31  31  31  31  31 
Direct proxy  19  21  19*  39  50  55  58  62  66  71  81 
Total series  22  25*  28*  57  74  79  89  93  97  102  112 
Table 1. Proxy series used in MBH98 (collated from Corrigendum SI, July 2004), showing the number of retained PC series by networkcalculation step combination. *: The total number of series used in the AD1450 step is incorrectly stated in MBH98 as 24 (but error is not reported yet). Six series available in the AD1500 network are not used.
The first hints that a Preisendorfertype policy had supposedly been used in MBH98 came in our Nature correspondence. In response to our observation of the error in their PC methods, Mann et al. [Revised Nature Reply] had noticed that, under correct PC calculations, the bristlecone pine pattern was demoted from the PC1 to the PC4.
precisely the same 'hockey stick' PC pattern appears using their convention, albeit lower down in the eigenvalue spectrum (PC#4) (Figure 1a). If the correct 5 PC indicators are used, rather than incorrectly truncating at 2 PCs (as MM04 have done), a reconstruction similar to MBH98 is obtained.
They argued that they could still salvage a hockeystick shaped series using a Preisendorfertype calculation on the AD1400 North American network. The calculation published on Nov. 22, 2004 at realclimate showing the implementation of a Preisendorfertype calculation on the AD1400 North American network was originally submitted in our Nature correspondence. We had seen this diagram and calculation in August 2004 and had fully considered it in our GRL submission  in fact, it contributed to the approach taken in our GRL submission, which differs substantially from our previous Nature submission.
Realclimate Nov 2004 Figure 1
Figure 1 below, http://www.realclimate.org/index.php?p=9 (Nov. 22, 2004), and the two tables are all taken from realclimate, illustrating the application of the supposed Preisendorfertype calculation. (The original section from Preisendorfer is retyped here for reference.) The blue and red lines show the simulation results (using AR1 models of the AD1400 North American network) under MBH98 and centered PC calculations respectively; the red and blue points show actual results from the MBH98 and centered PC methods respectively. Preisendorfer's Rule N selects PC series as long as the actual eigenvalue exceeds the simulation. For the MBH98 method, 3 eigenvalues are clearly separated under Rule N and perhaps 7 in a centered calculation. This result is strangely described by Mann et al as follows:
" In the former case, 2 (or perhaps 3) eigenvalues are distinct from the noise eigenvalue continuum. In the latter case, 5 (or perhaps 6) eigenvalues are distinct from the noise eigenvalue continuum.
It seems obvious that the selection of 2 (rather than 3) eigenvalues in MBH98 cannot be directly justified on this diagram without appeal to some still unstated method.




FIGURE 1. "Comparison of eigenvalue spectrum resulting from a Principal Components Analysis (PCA) of the 70 North American ITRDB data used by Mann et al (1998) back to AD 1400 based on Mann et al (1998) centering/normalization convention (blue circles) and MM centering/normalization convention (red crosses). Shown also is the null distribution based on Monte Carlo simulations with 70 independent red noise series of the same length and same lagone autocorrelation structure as the actual ITRDB data using the respective centering and normalization conventions (blue curve for MBH98 convention, red curve for MM convention). In the former case, 2 (or perhaps 3) eigenvalues are distinct from the noise eigenvalue continuum. In the latter case, 5 (or perhaps 6) eigenvalues are distinct from the noise eigenvalue continuum." Original legend from: Mann, http://www.realclimate.org/index.php?p=9 
Replication of Realclimate Nov 22, 2004 Figure 1
Figure 2 below shows my replication of the above calculations. The left panel repeats realclimate Nov 22, 2004 Figure 1 (as above), while the right panel shows my emulation, using the script here. The salient features of the methods are obviously captured.
FIGURE 2. AD1400 North American network  Preisendorfertype calculations Left panel: Mann et al. [realclimate]. Points  NOAMER netowrk; lines  simulations. Blue  MBH98 decentered; red  centered. Right panel: Emulation of calculation in left panel.
MBH98 has 6 networks with erratic changes of PC retention by timestep, yielding a total of 17 network/timestep combinations, all of which are examined below,
Stahle/OK
MBH98 only used one network/directory combination here, retaining three PCs. The observed retention is inconsistent with Rule N. The PC3 is insignificant under Rule N, but is retained anyway. There is little difference between MBH98 (blue) and centered (red) results  presumably because Stahle prewhitens site chronologies.
Blue  MBH98 decentered; red  centered 

Stahle/SWM
In this network, there are 6 timesteps with different retained PCs. None of the retention patterns can be obtained through a direct application of Rule N.
Blue  MBH98 decentered; red  centered 


Blue  MBH98 decentered; red  centered 


Blue  MBH98 decentered; red  centered. 


Blue  MBH98 decentered; red  centered 


Blue  MBH98 decentered; red  centered 

South America
In the South American network, the PC3 is retained for calculation steps after AD1750, but does not qualify under Rule N. (This network was also the source of major inconsistencies between the listed sites and the sites actually used (see Corrigendum), as the network was reduced from 18 sites to 11 sites. The reasons for the inconsistency provided in the Corrigendum are incorrect, as I'll post on another occasion.)

Australia/NZ
The retention of the PC3 in the AD16001730 steps and the retention of the PC3 and PC4 in the AD1750+ steps is inconsistent with Rule N.



Vaganov
The Vaganov network is a network of Russian sites. There is relatively little difference between centered and uncentered methods in this network. This appears to be because of prewhitening used by Vaganov in developing tree ring chronologies. The prewhitening dramatically reduces the autocorrelation in the network. These graphs are very important in assessing MBH98 practices, as they show series with large Preisendorfer significance, which are not used: the PC2 in the AD1450 network, the PC35 in all steps from AD1600 on.





North America after 1400
Again, there are inconsistencies in the implementation of the supposed policy: a) the nonuse of the PC3 and perhaps PC4 in the AD1450 step; b) the use of the PC5PC6 in the AD1500 step; c) the increase in retained PCs (within the same network) from the AD1450 step to the AD1500 step; d) the nonuse of the PC8 in the AD1600 step.





Summary
The following table summarizes the inconsistencies between the observed PC retentions and the retentions according to Rule N. It is obvious that application of the Preisendorfer Rule N method to actual networks does not yield the PC selections archived at the Corrigendum SI. In some cases, more PCs are archived; in other cases, fewer PCS: there is no obvious pattern. In addition, in 3 cases, different selections were made from the same network, e.g. 7 PCs were selected from the AD1700 SWM network in the AD1700 calculation; the same network was used in the AD1750 calculation step, but this time 9 PCs were selected. This would not be permitted without some still unreported adaptation of the method
Retained PCs 

Network/Step  Reported  Emulated 
OK/AD1700  3  2 
SWM/AD1400  1  2 
SWM/AD1450  1  2 
SWM/AD1500  2  3 
SWM/AD1600  4  3 
SWM/AD1700  7/9  3 
SOAMER/AD1600  2/3  2 
AUSTRAL/AD1600  3  2 
AUSTRAL/AD1750  4  2 
VAGANOV/AD1450  1  2 
VAGANOV/1D1600  2  5 
VAGANOV/AD1750  3  5 
NOAMER/AD1400  2  3 
NOAMER/AD1450  2/6  4 
NOAMER/AD1600  7  8 
NOAMER/AD1750  9  10 
.
Perhaps there is some common factor to the above process that we have not discerned  however, we are confident that no other third party in the world has been able to discern the pattern. Had Mann et al. archived their source code for these calculations (and for other calculations), then these issues would not be a matter of speculation.
Notwithstanding all of the above, as far as I'm concerned, the main issue is whether the PC series so selected are significant in a scientific sense, rather than a data mining sense. We've provided many caveats in our E&E article to reliance on the bristlecone pine series as arbiters of world temperature history  whether they are in a PC1 or a PC4. However, if Mann is to insist at this late stage that the selection of 5 PCs is justified on the present record, it seems evident to me that an explanation is required for exactly how the Preisendorferpolicy set forth here can be reconciled with actual retentions. Perhaps it can, but I've so far been unable to figure out the secret. These guessing games are also pointless. I invite any readers that have got this far should express their objections to the U.S. National Science Foundation and to Nature that this important source code should continue to remain undisclosed.
References:
Franklin, Scott B., Gibson, David J., Robertson, Philip A.,Pohlmann, John T. and Fralish, James S. (1995), Parallel Analysis: a method for determining significant principal components, Journal of Vegetation Science 6: 99106.
Overland and Preisendorfer [1982], "A Significance Test for Principal Components Applied to a Cyclone Climatology", Mon. Wea. Rev. 110, 14.
Footnote: Preisendorfer’s Rule N is a simulation method based on white noise, stated as follows:
This rule of PC selection is a dominantvariance rule
and is based on a Monte Carlo procedure which simulates sampling from N_{p}(0,Σ),
with Σ=σ^{2} I_{p}.
The null hypothesis is that our n x p data
matrix Z has been drawn from such a population. By following the procedure
outlined below, we can systematically accept or reject this hypothesis. Let R be
the centered random data set so formed (cf 5.4 and 5.5). Forming S=R^{T}R
for each such sample, we then build up a cumulative distribution for each of the
ρ = min(n1,p)) nonzero eigenvalues λ_{j}, j = 1,…,ρ.
We can then compare the data eigenvalues of the given n x p data set Z, one by one, with these cumulative distributions. The
details follow.
Construct (say) 100 independent realization of each
of np variates from N(0,1). Form the n x p matrix R as in (5.4) and (5.5). This is the random n x
p counterpart R to the given n x p data matrix Z.
The ω^{th} realization R(ω) of the centered R results in an
ordered sequence of nonzero eigenvalues:
λ_{1}(ω) > …> λ_{ρ}(ω),
ω = 1, …,100, ρ = min(n1,p).
Write
U_{j}(ω) = λ_{j}(ω) {
ρ^{1} Σ_{k=1:ρ }λ_{k}(ω)
}^{–1}, ω=1:100, j=1:ρ
For each j, order these (after relabelling) as
U_{j}(ω_{1}) < …U _{j}
(ω_{100})
and set
σ_{j}(05) = U_{j}(ω_{5})
; and σ_{j}(95) = U_{j}(ω_{95})
;
These σ_{j} values define the 5% and 95%
points on the cumulative distribution for the jth random eigenvalues.
For the
given data matrix Z with its associated ordered set of nonzero eigenvalues,
write:
V_{j} = d_{j} {
ρ1 _{Σk=1:ρ }d_{k }}^{1} j=
1,…ρ
Thus we have
Rule N: p’ is the greatest j for which V_{j} > σ_{j}(95);
0 if no such j exists.
he random n x p counterpart R to the given n x p data matrix Z. The ωth
realization R(ω) of the centered R results in an ordered sequence of nonzero
eigenvalues:
, ω = 1, …,100, ρ = min(n1,p).
Write
,
ω=1:100, j=1:ρ
For each j, order these (after relabelling) as
and set
; and
;
These σj values define the 5% and 95% points on the cumulative distribution for
the jth random eigenvalues.
For the given data matrix Z with its associated ordered set of nonzero
eigenvalues, write:
j= 1,…ρ
Thus we have
Rule N: p’ is the greatest j for which Vj > σj(95); 0 if no such j exists.