REFEREE REPORTS ON NATURE SUBMISSIONS

1. On Original Submission

2. On Revised Submission

 

Referee Reports on First Submission 

 

Referee #1(Remarks to the Author):

I find merit in the arguments of both protagonists, though Mann et al. (MBH) is much more difficult to read than McIntyre & McKitrick (MM). Their explanations are (at least superficially) less clear and they cram too many things onto the same diagram, so I find it harder to judge whether I agree with them.

 

There are two main points of dispute:

1. The principal component technique used.

2. The quality of the early data.

 

I deal with 1. first. It is an area where I have expertise, but it is not at all clear what exactly is being done. MM talk about 'scaling to 1902-1980 mean and standard deviation', whereas MBH's phrase is 'restandardisation by detrended standard deviation'. These suggest different things to me. The latter seems more appropriate than the former, but I am still uneasy about applying a standardisation based on a small segment of the series to the whole series, if that is what is being done. MBH seem to be too dismissive of MM's red noise

simulations. Even if red noise is not the best model for the series, they should have reservations about a procedure that gives the 'hockey stick' shape for all 10 simulations, when such a shape would not be expected. Having said this, MM's corrected series in their Figure 4c still has the upward trend towards the end of the series, so this trend is not just an artifact of MBH's PCA procedure.

 

I am not qualified to say much on 2. but it seems to be the crucial point. Both sets of authors agree that the omission of some early data changes the early reconstruction considerably. MBH say that the omitted data are reliable; MM say they are not. Does anyone know who is correct? If there is disagreement among experts, then the true behaviour of the series must be very uncertain.

 

Incidentally, I am not entirely convinced by MBH's dismissal of the MM model reconstruction on the basis of RE. I suspect that a lot of the difference is due to the much larger variance in the MM model reconstruction compare to MBH's. This is probably inevitable, given the reduced sample size for the early data.

 

Referee #2(Remarks to the Author):

The technical criticisms raised by McIntyre and McKritrik (MM) concerning the temperature reconstructions by Mann et al (MBH98), and the reply to this criticism by Mann et al is quite difficult to evaluate in a short period of time, since they are aimed at particular technical points of the statistical methods used by Mann et al, or at the use of particular time series of proxy data. A proper evaluation would require to redo most of the calculations presented in both manuscripts, something which is obviously out of reach in two weeks time. Furthermore, both manuscripts seem to contradict each other in some basic facts.

 

Therefore, my comments are based on my impression of the consistency of the results presented, but there is a wide margin of uncertainty that could be resolved only by by looking in detail into the whole data set and the whole software used by the authors.

 

In general terms I found the criticisms raised by McIntyre and McKritik worth of being taken seriously. They have made an in depth analysis of the MBH reconstructions and they have found several technical errors that are only partially addressed in the reply by Mann et al.

 

1)Mann et al assert that important features in the reconstruction by MM, for instance the increased warmth in the 15th century, is due to the fact that they completely ignore the time series from the NOAM tree ring data sets. However, MM explicitly state that they have used the two leading PCs of this data sets. Of course, it is impossible to ascertain who is right and who is wrong in this particular point, but I feel that Mann et al should have taken into account in their reply the statement by MM concerning the NOAM time series.

 

2)My doubts expressed in point 1 are strengthen by Figure 1b in Mann et al. reply. Mann et al. have tried to replicate in this figure the MM reconstruction (MM04c, green line). But one can clearly see that the variance of this reconstruction is much larger than the observations even in the calibration period. Although it might be possible, it seems a very awkward result. Any linear regression method that I am aware of must produce a reconstructed predictand with less variance than the observed predictand (the rest of the variance being the residuals). It seems to me that the something is technically not correct in the replication by Mann et al of the MM reconstruction. If the main difference between the original MBH98 and the MM04c reconstruction is just the elimination of the NOAM data set, why is the variance of the reconstruction in the calibration period inflated by a factor of 2-3?.

 

3)The reply by Mann et al is in my opinion correct when requiring MM to present some validation statistics in a validation period, and the RE statistics seems to me adequate in this context. Since this is the main argument in Mann et al reply I would urge MM to address this criticism. The low value of the RE statistics in the replicated MM reconstruction (MM04c) indicated by Mann et al. seems to be due to the erroneous replication of the MM reconstruction. An inspection by eye of the MM04c reconstruction seems to indicate that both reconstructions, the replicated MM04c and the MBH98 method (1400-1500 model) are more or less equally correlated with the instrumental temperature in the calibration period, and that the low RE value of the MM04c reconstruction stems from its much larger variance. I think that this large variance is unrealistic and therefore the real RE value should be positive.

 

4) The MM reconstruction presented by MM (Fig 4, bottom) should be in principle very similar to the replicated reconstruction MM04c, but it seems to me that they are not. For instance in Fig4, I do not see this excess variance compared to the original MBH98 reconstruction. This suggests again that something is not correct in the calculation of MM04c by Mann et al.

In summary, my recommendation is that MM offer validation statistics for their reconstruction and that they make their original reconstruction (Fig 4 , bottom) available to MBH, so that these authors can also compute validation statistics with the original MM reconstruction. Furthermore, it should be explicitly cleared up if MM are using or not the NOAM data set. Should this validation be successful, I would recommend the publication of both manuscripts. Further evaluation requires months of work and should be left, in my opinion, to the scientific community. This should be the normal scientific process.

 

At this stage, I think any Correction or Retraction by MBH98 is premature and really not required.

Referee #1(Remarks to the Author):
Review of the comment by McIntyre et al. and reply by Mann et al. submitted to Nature

It seems interesting that in the comment not only the original publication (MBH98), but also, MBH99, a rebuttal by Mann et al. available from a CRU website (ref.3), an "unreported MBH calculation" available from a University of Virginia website (ref.5), another rebuttal (corrigendum) by Mann et al. now published in Nature (ref. 9), and a detailed critique of MBH by McIntyre and McKitrick published in Environment and Energy (ref. 14) are cited. Additionally, the paper by Jones and Mann published in Reviews of Geophysics (ref. 4, response) already touches this issue.

Besides numerous technical and data-related issues, McIntyre and McKitrick also address a possible CO2 effect on southwestern US strip-bark trees that was "corrected" using high-latitude tree-ring data. Whether it was at all useful to use these data or to apply this correction, seems not highly relevant, since MBH never hid this issue, but described it in detail. More relevant and pleasing would be if someone would find a way to assess the possible CO2 fertilization effects that potentially influence growth at many sites. Additionally, the observation that some of the chronologies used in MBH98 and MBH99 have quite low sample replication during their early periods is also not new and was mentioned in a recent paper published in EOS.

To judge that the criticism by McIntyre and McKitrick is valid would require downloading all data and applying the seemingly differing approaches. Further, judgments would be needed on methodological decisions that were made by both McIntyre and McKitrick and by Mann et al. as two possibilities within the whole spectrum of methodological decisions on which chronologies to use, the calibration and computation of PC's over different time periods, special treatments to series, and so on. It could be seen as interesting, that the calculations as done by another operator with other perhaps reasonable alternative methodologies can have such a large effect on the resulting reconstruction.

Unfortunately, I have the impression that preconceived notions affect the potential "audit" by McIntyre and McKitrick. That would, of course, not mean that their assessment is necessarily wrong, but might explain the rather harsh and tricky wording used here and at other places by both parties, and I generally do not believe that this sort of an "audit" and rebuttal will lead to a better understanding of past climate variations.

Generally, I believe that the technical issues addressed in the comment and the reply are quite difficult to understand and not necessarily of interest to the wide readership of the Brief Communications section of Nature. I do not see a way to make this communication much clearer, particularly with the space requirements, as this comment is largely related to technical details. I also find it relevant that McIntyre and McKitrick already published a critique on MHB98 including some arguments similar to what is outlined in the current manuscript (ref. 14).

Referee #2(Remarks to the Author):
[SM Note: This is Referee #1 in Nature manuscript 2004-01-14277B]

My comments on the original versions in this exchange suggested that there was insufficient time to understand all the technicalities involved. This is even more the case with these revised versions, their supplementary material and the various replies to replies and comments on comments. The amount of material, often contradictory, is simply too complex and lengthy to resolve all the rights and wrongs in a realistic length of time. I can only give some general impressions and home in on two or three points of detail.

Regarding publication, I think it is all or nothing. Either you publish neither, or both. In the latter case, the main thing that would be achieved is to highlight that a serious disagreement exists. Only a reader with several days to spare (longer if they are unfamiliar with the area), to chase references and probably the authors, could hope to come close to a full understanding of the arguments.

I started my original review by saying that I found merit in the arguments of both MBH & MM. To rewrite this, I believe that some of the criticisms raised by each group of the other's work are valid, but not all. I am particularly unimpressed by the MBH style of 'shouting louder and longer so they must be right'.

I do not have the days of time needed to fully get to the bottom of the arguments, so I look briefly at just three here.

1. I think I understand better than before what the MBH98 PCA is doing, namely centering the data about the mean of the 1902-80 period rather than of the whole series. The question is why, and what properties and interpretation does such a procedure have? Given the non-stationarity of the series, it is certainly not successively maximising variance as in PCA, and talking about 'explained variance' therefore makes little sense. I don't feel I can comment on whether or not this procedure is appropriate without understanding its properties and interpretation.

2. Continuing this theme, the original MM article said that using MBH's PCA on 10 red noise simulations produced a 'hockey stick' (hs) shape in all 10. MBH's response says they have repeated the simulations and 'shown that the claimed result is not true'. It is very unlikely that 10 of MM's simulations all show the hs effect and MBH's do not, simply by chance. Either the two sets of simulations are constructed differently, or there is a mistake in someone's code. This is not something that a referee can resolve.


3. The advocacy of RE in preference to r by MBH is a bit extreme. The correlation coefficient certainly has drawbacks, but no verification measure is perfect, and I see no evidence in the verification literature (or Wilks) that RE is the standard preferred measure. Indeed the only one of the 3 references (7) cited in the revised response that was available to me is somewhat critical of RE. My preference would be not to rely on a single measure, but to look at contributions form bias, differences in variances and departures from linear dependence.

Referee #3(Remarks to the Author):
Comments on the manuscripts
Global-scale temperature patterns and climate forcings over the past centuries: a comment, by McIntyre and McKritik (hereafter MM04) followed by comments on "Reply" by Mann, Bradley and Hughes (hereafter MBH04)

After going through both revised manuscripts and the accompanying, voluminous supplementary material, I have now a much clearer idea about the points of disagreement between both manuscript. I must confess that this has been one of the most difficult reviews that I have been confronted with. Both manuscript plus supplementary material are dense and methodological questions, data quality questions are entangled and both refer heavily on previous information on published papers that has to be scoured beforehand.

The comment by MM04 underlines two apparent errors in the original work of MBH98: the incorrect use of the Principal component methodology to reduce the dimensionality of the NOAMER tree ring data set, and the inclusion of a time series (Gaspe), that seems to be very influential on the final temperature reconstruction.

Considering the changes relative to the first version of MM04, it seems to me that the case presented by MM04 has weakened considerably. The main claim presented in MM04 is now that the main features of MBH98 reconstruction (the hockey stick) derive from two methodological aspects. Now, no preeminence is given to the 16th century being warmer than the 20th century. MM04 have emulated the MBH98 reconstruction improving the methodological aspects that they think are flawed, and arrive to another reconstruction that yields rather low values of RE as verification statistics. They claim that MBH98 reconstruction has also low values of another verification statistics (R2), and therefore conclude that MBH98 is therefore on equal footing as theirs. Accepting this line of reasoning, however, a reader of these manuscripts will be led to think that both reconstructions are not trustworthy (at least, I would not trust any reconstructions with such low values of verification statistics, table 2 in MM04 supplementary material). This only conclusion seems to me rather weak for a manuscript. As a reader, I would rather see a more substantial contribution, such as an alternative reconstruction with a sound validation, that may offer Some further interesting points- comparison with reconstructions, comparison with the different estimations of forcings,etc. .

On the other hand the RE statistics is one that is commonly accepted, not only by MBH98, but by a number of authors working in this field. To argue that other statistics, such as R2, may be more meaningful than RE requires in my opinion a strong justification, which is missing. MBH04 offers, furthermore,a plausible reason for the low values of verification R2 in MBH98 found by MM04. Being a crucial point in MM04, the authors do not provide enough information to assess if these calculations have been performed properly, and for that matter how they have been performed. My own calculations with the data available to me of the validation statistics in the 19th century for the full proxy network tend to support the numbers indicated by MBH04, but this is of course a limited test.

This notwithstanding, I see some merit in MM04 and I would encourage them to pursue their testing of MMB98,and by the way other reconstructions. As I wrote in my first evaluation, this should be a normal and sound scientific process that should not hampered. For instance, questions that seem to be quite critical, such as the sensitivity of the MBH98 reconstructions in more remote periods to changes or omissions in the proxy network or the dependency of the final results to the rescaling of the reconstructed PCs, have become clearer to me now . From the reply in MBH04 I am now afraid that they were not sufficiently described in the original MBH98 work. In particular the PCs renormalization, could have been included as clarification in the recent Corrigendum in Nature by MBH.

At the moment, my opinion is that the present MM04 manuscript could be of interest just for the bunch of specialist working exactly in the area of statistical methods for climate reconstructions, and this only after several hours of considerable work to understand all technical details properly. Perhaps this is caused by the tight constrained imposed to the Communications Arising category.

In summary, judging from the present version of the manuscript and the response by MBH04, I now think that basis for MM04 has wavered and that further work , or further convincing evidence, would be needed to present a more solid case.

Just in case that the editor decides to publish MM04, I would suggest to reformulate the first half of the second paragraph, describing the calculation of principal components of the NOAMER data set in MBH98. The present version can be hardly understood, even by specialists. At the end of the manuscript, I would avoid the term "goodness of fit", since this has another meaning in the framework of standard linear regression methods (related to the linearity or non-linearity of the fit).

Comments on Reply MBH04

The reply by MBH04 on the previous comment by MM04 addresses in. my opinion both points raised by MM04 in a convincing way. Although it is for a reviewer impossible to check all the technical details involved in this reply, they arguments used by MBH04 seem plausible, and I would say they are probably correct. This is of course no guarantee that the entirety of MBH98 work and conclusions are free of error.

Therefore, if the editor decides to go ahead with the publication of MM04, I would recommend to publish MBH04 as well.

I would have some minor comments on this reply:

In general, in a scientific text I would rather avoid as much as possible disqualifying formulations (e.g. demonstrate a lack of familiarity.,,.without merit..) . This is of course a matter of taste, but I think that science and scientist benefit if it the same thing is said in a neutral way.

in page 2, in the middle: RE=-1 is the average value for a random estimate. This is correct only when the estimate has the same variance as the true values. However, I think that a more useful bench mark value for RE would be actually zero, since a poor man's prediction using just climatology would yield zero in the case of a stationary process. Certainly, negative values of RE indicate a quite poor skill.

Second to last paragraph: MBH 04 refer to other published reconstructions to support the lack of 16th century warming. I think this reference is to some extent bowed to match the authors intentions. Some of these reconstrcutions are their own, and others (e.g. Esper et al) show considerably disagreement with MBH98. In any case, the 16th warmth has been dropped in this version of MM04.

Supplementary material : I have revised the original MBH98 publication and I could not find any description of the renomalisation of the reconstructed PCs. If I am correct, one could not recriminate MM04 for not having included this step in their protocol. This renormalisation seems to me somewhat awkward. If I understood properly this amounts to a statistical inflation, which would not yield the best estimations in the sense of Least Square Errors. I do not think that this invalidates the method, but some readers will perhaps be surprised to find out that the MBH98 reconstruction method included this step.

*************PLEASE NOTE THAT THE ATTACHMENTS WILL FOLLOW IN SEPARATE
E-MAILS*********************************