Referee
Reports on First Submission
Referee #1(Remarks to the
Author):
I find merit in the arguments
of both protagonists, though Mann et al. (MBH) is much more difficult to
read than McIntyre & McKitrick (MM). Their explanations are (at
least superficially) less clear and they cram too many things onto the
same diagram, so I find it harder to judge whether I agree with them.
There are two main points of
dispute:
1. The principal component
technique used.
2. The quality of the early
data.
I deal with 1. first. It is
an area where I have expertise, but it is not at all clear what exactly
is being done. MM talk about 'scaling to 1902-1980 mean and standard
deviation', whereas MBH's phrase is 'restandardisation by detrended
standard deviation'. These suggest different things to me. The latter
seems more appropriate than the former, but I am still uneasy about
applying a standardisation based on a small segment of the series to the
whole series, if that is what is being done. MBH seem to be too
dismissive of MM's red noise
simulations. Even if red
noise is not the best model for the series, they should have
reservations about a procedure that gives the 'hockey stick' shape for
all 10 simulations, when such a shape would not be expected. Having said
this, MM's corrected series in their Figure 4c still has the upward
trend towards the end of the series, so this trend is not just an
artifact of MBH's PCA procedure.
I am not qualified to say
much on 2. but it seems to be the crucial point. Both sets of authors
agree that the omission of some early data changes the early
reconstruction considerably. MBH say that the omitted data are reliable;
MM say they are not. Does anyone know who is correct? If there is
disagreement among experts, then the true behaviour of the series must
be very uncertain.
Incidentally, I am not
entirely convinced by MBH's dismissal of the MM model reconstruction on
the basis of RE. I suspect that a lot of the difference is due to the
much larger variance in the MM model reconstruction compare to MBH's.
This is probably inevitable, given the reduced sample size for the early
data.
Referee #2(Remarks to the
Author):
The technical criticisms
raised by McIntyre and McKritrik (MM) concerning the temperature
reconstructions by Mann et al (MBH98), and the reply to this criticism
by Mann et al is quite difficult to evaluate in a short period of time,
since they are aimed at particular technical points of the statistical
methods used by Mann et al, or at the use of particular time series of
proxy data. A proper evaluation would require to redo most of the
calculations presented in both manuscripts, something which is obviously
out of reach in two weeks time. Furthermore, both manuscripts seem to
contradict each other in some basic facts.
Therefore, my comments are
based on my impression of the consistency of the results presented, but
there is a wide margin of uncertainty that could be resolved only by by
looking in detail into the whole data set and the whole software used by
the authors.
In general terms I found the
criticisms raised by McIntyre and McKritik worth of being taken
seriously. They have made an in depth analysis of the MBH
reconstructions and they have found several technical errors that are
only partially addressed in the reply by Mann et al.
1)Mann et al assert that
important features in the reconstruction by MM, for instance the
increased warmth in the 15th century, is due to the fact that they
completely ignore the time series from the NOAM tree ring data sets.
However, MM explicitly state that they have used the two leading PCs of
this data sets. Of course, it is impossible to ascertain who is right
and who is wrong in this particular point, but I feel that Mann et al
should have taken into account in their reply the statement by MM
concerning the NOAM time series.
2)My doubts expressed in
point 1 are strengthen by Figure 1b in Mann et al. reply. Mann et al.
have tried to replicate in this figure the MM reconstruction (MM04c,
green line). But one can clearly see that the variance of this
reconstruction is much larger than the observations even in the
calibration period. Although it might be possible, it seems a very
awkward result. Any linear regression method that I am aware of must
produce a reconstructed predictand with less variance than the observed
predictand (the rest of the variance being the residuals). It seems to
me that the something is technically not correct in the replication by
Mann et al of the MM reconstruction. If the main difference between the
original MBH98 and the MM04c reconstruction is just the elimination of
the NOAM data set, why is the variance of the reconstruction in the
calibration period inflated by a factor of 2-3?.
3)The reply by Mann et al is
in my opinion correct when requiring MM to present some validation
statistics in a validation period, and the RE statistics seems to me
adequate in this context. Since this is the main argument in Mann et al
reply I would urge MM to address this criticism. The low value of the RE
statistics in the replicated MM reconstruction (MM04c) indicated by Mann
et al. seems to be due to the erroneous replication of the MM
reconstruction. An inspection by eye of the MM04c reconstruction seems
to indicate that both reconstructions, the replicated MM04c and the
MBH98 method (1400-1500 model) are more or less equally correlated with
the instrumental temperature in the calibration period, and that the low
RE value of the MM04c reconstruction stems from its much larger
variance. I think that this large variance is unrealistic and therefore
the real RE value should be positive.
4) The MM reconstruction
presented by MM (Fig 4, bottom) should be in principle very similar to
the replicated reconstruction MM04c, but it seems to me that they are
not. For instance in Fig4, I do not see this excess variance compared to
the original MBH98 reconstruction. This suggests again that something is
not correct in the calculation of MM04c by Mann et al.
In summary, my recommendation
is that MM offer validation statistics for their reconstruction and that
they make their original reconstruction (Fig 4 , bottom) available to
MBH, so that these authors can also compute validation statistics with
the original MM reconstruction. Furthermore, it should be explicitly
cleared up if MM are using or not the NOAM data set. Should this
validation be successful, I would recommend the publication of both
manuscripts. Further evaluation requires months of work and should be
left, in my opinion, to the scientific community. This should be the
normal scientific process.
At this stage, I think any
Correction or Retraction by MBH98 is premature and really not required. |
Referee #1(Remarks to the Author):
Review of the comment by McIntyre et al. and reply by Mann et al.
submitted to Nature
It seems interesting that in the comment not only the original
publication
(MBH98), but also, MBH99, a rebuttal by Mann et al. available from a CRU
website (ref.3), an "unreported MBH calculation" available
from a University
of Virginia website (ref.5), another rebuttal (corrigendum) by Mann et
al.
now published in Nature (ref. 9), and a detailed critique of MBH by
McIntyre
and McKitrick published in Environment and Energy (ref. 14) are cited.
Additionally, the paper by Jones and Mann published in Reviews of
Geophysics
(ref. 4, response) already touches this issue.
Besides numerous technical and data-related issues, McIntyre and
McKitrick
also address a possible CO2 effect on southwestern US strip-bark trees
that
was "corrected" using high-latitude tree-ring data. Whether it
was at all
useful to use these data or to apply this correction, seems not highly
relevant, since MBH never hid this issue, but described it in detail.
More
relevant and pleasing would be if someone would find a way to assess the
possible CO2 fertilization effects that potentially influence growth at
many
sites. Additionally, the observation that some of the chronologies used
in
MBH98 and MBH99 have quite low sample replication during their early
periods
is also not new and was mentioned in a recent paper published in EOS.
To judge that the criticism by McIntyre and McKitrick is valid would
require
downloading all data and applying the seemingly differing approaches.
Further, judgments would be needed on methodological decisions that were
made by both McIntyre and McKitrick and by Mann et al. as two
possibilities
within the whole spectrum of methodological decisions on which
chronologies
to use, the calibration and computation of PC's over different time
periods,
special treatments to series, and so on. It could be seen as
interesting,
that the calculations as done by another operator with other perhaps
reasonable alternative methodologies can have such a large effect on the
resulting reconstruction.
Unfortunately, I have the impression that preconceived notions affect
the
potential "audit" by McIntyre and McKitrick. That would, of
course, not mean
that their assessment is necessarily wrong, but might explain the rather
harsh and tricky wording used here and at other places by both parties,
and
I generally do not believe that this sort of an "audit" and
rebuttal will
lead to a better understanding of past climate variations.
Generally, I believe that the technical issues addressed in the comment
and
the reply are quite difficult to understand and not necessarily of
interest
to the wide readership of the Brief Communications section of Nature. I
do
not see a way to make this communication much clearer, particularly with
the
space requirements, as this comment is largely related to technical
details.
I also find it relevant that McIntyre and McKitrick already published a
critique on MHB98 including some arguments similar to what is outlined
in
the current manuscript (ref. 14).
Referee #2(Remarks to the Author):
[SM Note: This is Referee #1 in Nature manuscript 2004-01-14277B]
My comments on the original versions in this exchange suggested
that there was insufficient time to understand all the technicalities
involved. This is even more the case with these revised versions, their
supplementary material and the various replies to replies and comments
on
comments. The amount of material, often contradictory, is simply too
complex
and lengthy to resolve all the rights and wrongs in a realistic length
of
time. I can only give some general impressions and home in on two or
three
points of detail.
Regarding publication, I think it is all or nothing. Either you publish
neither, or both. In the latter case, the main thing that would be
achieved
is to highlight that a serious disagreement exists. Only a reader with
several days to spare (longer if they are unfamiliar with the area), to
chase references and probably the authors, could hope to come close to a
full understanding of the arguments.
I started my original review by saying that I found merit in the
arguments
of both MBH & MM. To rewrite this, I believe that some of the
criticisms
raised by each group of the other's work are valid, but not all. I am
particularly unimpressed by the MBH style of 'shouting louder and longer
so
they must be right'.
I do not have the days of time needed to fully get to the bottom of the
arguments, so I look briefly at just three here.
1. I think I understand better than before what the MBH98 PCA is doing,
namely centering the data about the mean of the 1902-80 period rather
than
of the whole series. The question is why, and what properties and
interpretation does such a procedure have? Given the non-stationarity of
the
series, it is certainly not successively maximising variance as in PCA,
and
talking about 'explained variance' therefore makes little sense. I don't
feel I can comment on whether or not this procedure is appropriate
without
understanding its properties and interpretation.
2. Continuing this theme, the original MM article said that using MBH's
PCA
on 10 red noise simulations produced a 'hockey stick' (hs) shape in all
10.
MBH's response says they have repeated the simulations and 'shown that
the
claimed result is not true'. It is very unlikely that 10 of MM's
simulations
all show the hs effect and MBH's do not, simply by chance. Either the
two
sets of simulations are constructed differently, or there is a mistake
in
someone's code. This is not something that a referee can resolve.
3. The advocacy of RE in preference to r by MBH is a bit extreme. The
correlation coefficient certainly has drawbacks, but no verification
measure
is perfect, and I see no evidence in the verification literature (or
Wilks)
that RE is the standard preferred measure. Indeed the only one of the 3
references (7) cited in the revised response that was available to me is
somewhat critical of RE. My preference would be not to rely on a single
measure, but to look at contributions form bias, differences in
variances
and departures from linear dependence.
Referee #3(Remarks to the Author):
Comments on the manuscripts
Global-scale temperature patterns and climate forcings over the past
centuries: a comment, by McIntyre and McKritik (hereafter MM04) followed by comments on "Reply" by Mann, Bradley and Hughes
(hereafter MBH04)
After going through both revised manuscripts and the accompanying,
voluminous supplementary material, I have now a much clearer idea about
the
points of disagreement between both manuscript. I must confess that this
has
been one of the most difficult reviews that I have been confronted with.
Both manuscript plus supplementary material are dense and methodological
questions, data quality questions are entangled and both refer heavily
on
previous information on published papers that has to be scoured
beforehand.
The comment by MM04 underlines two apparent errors in the original work
of
MBH98: the incorrect use of the Principal component methodology to
reduce
the dimensionality of the NOAMER tree ring data set, and the inclusion
of a
time series (Gaspe), that seems to be very influential on the final
temperature reconstruction.
Considering the changes relative to the first version of MM04, it seems
to
me that the case presented by MM04 has weakened considerably. The main
claim
presented in MM04 is now that the main features of MBH98 reconstruction
(the
hockey stick) derive from two methodological aspects. Now, no
preeminence is
given to the 16th century being warmer than the 20th century. MM04 have
emulated the MBH98 reconstruction improving the methodological aspects
that
they think are flawed, and arrive to another reconstruction that yields
rather low values of RE as verification statistics. They claim that
MBH98
reconstruction has also low values of another verification statistics
(R2),
and therefore conclude that MBH98 is therefore on equal footing as
theirs.
Accepting this line of reasoning, however, a reader of these manuscripts
will be led to think that both reconstructions are not trustworthy (at
least, I would not trust any reconstructions with such low values of
verification statistics, table 2 in MM04 supplementary material). This
only
conclusion seems to me rather weak for a manuscript. As a reader, I
would
rather see a more substantial contribution, such as an alternative
reconstruction with a sound validation, that may offer Some further
interesting points- comparison with reconstructions, comparison with the
different estimations of forcings,etc. .
On the other hand the RE statistics is one that is commonly accepted,
not
only by MBH98, but by a number of authors working in this field. To
argue
that other statistics, such as R2, may be more meaningful than RE
requires
in my opinion a strong justification, which is missing. MBH04 offers,
furthermore,a plausible reason for the low values of verification R2 in
MBH98 found by MM04. Being a crucial point in MM04, the authors do not
provide enough information to assess if these calculations have been
performed properly, and for that matter how they have been performed. My
own
calculations with the data available to me of the validation statistics
in
the 19th century for the full proxy network tend to support the numbers
indicated by MBH04, but this is of course a limited test.
This notwithstanding, I see some merit in MM04 and I would encourage
them to
pursue their testing of MMB98,and by the way other reconstructions. As I
wrote in my first evaluation, this should be a normal and sound
scientific
process that should not hampered. For instance, questions that seem to
be
quite critical, such as the sensitivity of the MBH98 reconstructions in
more
remote periods to changes or omissions in the proxy network or the
dependency of the final results to the rescaling of the reconstructed
PCs,
have become clearer to me now . From the reply in MBH04 I am now afraid
that
they were not sufficiently described in the original MBH98 work. In
particular the PCs renormalization, could have been included as
clarification in the recent Corrigendum in Nature by MBH.
At the moment, my opinion is that the present MM04 manuscript could be
of
interest just for the bunch of specialist working exactly in the area of
statistical methods for climate reconstructions, and this only after
several
hours of considerable work to understand all technical details properly.
Perhaps this is caused by the tight constrained imposed to the
Communications Arising category.
In summary, judging from the present version of the manuscript and the
response by MBH04, I now think that basis for MM04 has wavered and that
further work , or further convincing evidence, would be needed to
present a
more solid case.
Just in case that the editor decides to publish MM04, I would suggest to
reformulate the first half of the second paragraph, describing the
calculation of principal components of the NOAMER data set in MBH98. The
present version can be hardly understood, even by specialists. At the
end of
the manuscript, I would avoid the term "goodness of fit",
since this has
another meaning in the framework of standard linear regression methods
(related to the linearity or non-linearity of the fit).
Comments on Reply MBH04
The reply by MBH04 on the previous comment by MM04 addresses in. my
opinion
both points raised by MM04 in a convincing way. Although it is for a
reviewer impossible to check all the technical details involved in this
reply, they arguments used by MBH04 seem plausible, and I would say they
are
probably correct. This is of course no guarantee that the entirety of
MBH98
work and conclusions are free of error.
Therefore, if the editor decides to go ahead with the publication of
MM04, I
would recommend to publish MBH04 as well.
I would have some minor comments on this reply:
In general, in a scientific text I would rather avoid as much as
possible
disqualifying formulations (e.g. demonstrate a lack of familiarity.,,.without merit..) . This is of course a matter of taste, but I think that
science and scientist benefit if it the same thing is said in a neutral
way.
in page 2, in the middle: RE=-1 is the average value for a random
estimate.
This is correct only when the estimate has the same variance as the true
values. However, I think that a more useful bench mark value for RE
would be
actually zero, since a poor man's prediction using just climatology
would
yield zero in the case of a stationary process. Certainly, negative
values
of RE indicate a quite poor skill.
Second to last paragraph: MBH 04 refer to other published
reconstructions to
support the lack of 16th century warming. I think this reference is to
some
extent bowed to match the authors intentions. Some of these
reconstrcutions
are their own, and others (e.g. Esper et al) show considerably
disagreement
with MBH98. In any case, the 16th warmth has been dropped in this
version of
MM04.
Supplementary material : I have revised the original MBH98 publication
and I
could not find any description of the renomalisation of the
reconstructed
PCs. If I am correct, one could not recriminate MM04 for not having
included
this step in their protocol. This renormalisation seems to me somewhat
awkward. If I understood properly this amounts to a statistical
inflation,
which would not yield the best estimations in the sense of Least Square
Errors. I do not think that this invalidates the method, but some
readers
will perhaps be surprised to find out that the MBH98 reconstruction
method
included this step.
*************PLEASE NOTE THAT THE ATTACHMENTS WILL FOLLOW IN SEPARATE
E-MAILS*********************************
|