##README FOR MCINTYRE AND MCKITRICK 2003 (MM03) SCRIPTS Detailed information was archived in October 2003; this summary readme was prepared in April 2006. Since MM03 was published, much new information regarding MBH98 data and methodology has come to light. Some key issues: In April 2003, I requested the ftp location for the data used in MBH98. Mann first said that he had forgotten where the data was. Subsequently Scott Rutherford identified ftp://holocene.evsc.virginia.edu/pub/sdr/pcproxy.txt as the location of the data. The data set called here (presently archived at "http://www.climate2003.com/data/MM03/pcproxy.txt") was downloaded in April 2003 from Mann's ftp site from the url "ftp://holocene.evsc.virginia.edu/pub/sdr/pcproxy.txt". Correspondence regarding his file is located at http://climate2003.com/file.issues.htm. Problems with this data set were noticed during the summer and in September 2003, confirmation was sought from Mann that this was the data used in MBH98. Mann said that he was too busy to reply to this question and that von Storch and Zorita had been able to replicate his results. After the publication of MM03, Mann said that this data set was the "wrong" data set. In early November 2003, this data set was deleted from Mann's FTP site. Mann said that the correct data was located in ftp://holocene.evsc.virginia.edu/pub/MBH98, a directory to which no prior reference had ever been made at Mann's website; considerable controversy ensued, which has been described elsewhere. Other than principal component series, all data in the original pcproxy.txt file could be traced to the new directory in identical format. In fact, a number of important new inconsistencies were identified between the original SI at Nature (now also deleted) and the data available at the newly disclosed FTP directory. These were reported to NAture in November 2003 in a Materials Complaint. For the principal component series, once the MBH98 directory became available in November 2003, we were able to diagnose all the defects in the PC series in the pcproxy.txt file. There were 4 different problems: 1) the PC series had not been calculated with a principal components algorithm. Instead, Mann had short-centered the date on a short period, which had the effect of "mining" for hockey stick shaped series - which we eventually described in McIntyre and McKitrick GRL 2005, although we identified the problem as early as December 2003. We identified this problem from some remnant Fortran code at Mann's FTP site. 2) in MM03, we noted that MBH did not describe how they handled missing data, which was a real problem in the tree ring networks. We noted this problem and calculated PC series over the maximum available segments. This methodology was consistent with practices of dendrochronologists e.g. Jacoby and d'Arrigo. However, MBH again used an unusual methodology. Instead of calculating PC series over the maximum available period, they did "stepwise" calculations, which was reported for the first time in the wake of MM03. However, the number of retained PC series in each step was not disclosed. In addition, PC series were not calculated for each of the 11 calculation steps in MBH98, but erratically. To complicate matters further, Mann said that a total of 159 different series needed to be used - a figure nowhere mentioned in MBH98, which referred only to 112 series (as in pcproxy.txt). Mann refused to provide information on the schedule of retained PC series. We speculated on retention policies in our 2004 emulations, but an accurate schedule only becaome available with the MBH98 Corrigendum in July 2004, where an extensive new SI was archived. 3) the PC series in pcproxy.txt were incorrectly collated so that many PC series were inserted one year too early. Where there were blank values in 1980, the values were overwritten from left to right so that, for example, 7 Stahle.SWM PC series had identical 1980 values to 7 decimal places! There has been controversy ever since 2003 about the PC series. By calculating PC series over the maximum available period, this resulted in downstream MM03 regression calculations in the 15th century not using the North American PC1. This was quickly identified by Mann et al. who accused us of "throwing out" data - an accusation which continues to this day. Obviously no data was "thrown out". The extreme sensitivity of MBH98 results to the presence/absence of the North American PC1 should not have occurred anyway, given MBH claims that their reconstruction was "robsut" to the presence/absence of ALL dendroclimatic indicators. This extreme sensitivity led us to closely examine what was happening with the North American PC calculations where we subsequently identified the extreme bias in MBH methodology - see point (1) above - which led to the mining of the data set for hockey stick shaped series, which in this case happened to be bristlecone pines, known to be problematic. Indeed, all the heavily weighted series proved to be have been collected by Donald Graybill to ilustrate CO2 fertilization. However, this lay in the future. Once we identified the problematic data, we re-collated all the tree ring networks (except tghe Vaganov network where source data could not be identified) and carried out fresh principal components using a standard algorithm in R - princomp. MBH98 also used an odd multivariate methodology. The MBH98 emulation used in MM03 is an early edition and did not contain some unreported re-scaling steps, but did replicate a HS-shaped series using the pcproxy.txt data and yielded different results with updated data and a conventional PC algorithm. A linear algebra viewpoint was adopted; a highly similar approach was later taken by Wahl and Ammann in their emulation of MBH. Burger and Cubasch's view on re-scaling alternatives is that you can get many different variations using equally plausible variations on MBH and that you can't choose a "right" method of re-scaling based on verification statistics - else your verification becomes part of your calibration. The controversy is ongoing. Scripts _______ The scripts for MM03 were archived in October 2003 to show methodology. In parts of the script, there are references to my own directories and the scripts were not turn-key for public users. In MM05a [GRL] we made an effort to create turnkey scripts although there were still some directory labels that needed to be tweaked to be operable by public users. I've edited MM03 scripts in April 2006 primarily to reflect directory re-organizations, but occasionally to add some clarity. No calculations have been altered. The original scripts are retained with suffix *.old.txt. My current programming style is hopefully cleaner. redo.mann.txt - this is main script to produce results. It calls read.mann.txt and rpc.function.txt. rpc.function.txt - functions used in reconstruction. This is first emulation and has been streamlined since the first emulation. However, the structure of the emulation has remained unchanged: regression of proxies against temperature PCs to obtain calibration coefficients; regression in historical period to obtain reconstructed PCs. Later versions implement strictly linear algebra approach. pcproxy.calculations.txt - This calculates PC series from freshly collated tree ring series and compares explained variance from PC series in pcproxy.txt to potential explained variance from proper PC calculations. These results were the clue to problems with the MBH98 PC calculations. Now we know exactly what MBH did. These calculations established the existence of a problem (now recognized as such) and served their purpose at the time. read.mann.txt - collates MBH information from public archives. Older version collects more. Only data used in MM03 collected here. read.jones.txt - collates CRU temperature data (also NDP-020 data which is not used in MM03). This is used only to obtain gridcell standard deviations. Data ____ pcproxy.txt - the original MBH98 proxy data set formerly archived at ftp://holocene.evsc.virginia.edu/pub/sdr/pcproxy.txt and used in all calculations here. See discussion above. proxy4.txt - updated version using newer data and refreshed PC calculations over maximum available period prname.txt - a file of 112 proxy names. mwpproxy.txt - a file of proxy series used in MBH99, collated from UMass archive (mirrored at WDCP) tree.stahle.texoke.data.txt - a file of tree ring series collated from descriptions in original SI. There are a few differences between this data file and the versions used in MBH98 archived at UVA. tree.stahle.texmex.data.txt - a file of tree ring series collated from descriptions in original SI. There are a few differences between this data file and the versions used in MBH98 archived at UVA. Unaccountably the UVA calculations use 22 series (20 here), noted in Corrigendum. It appears that near-duplicate versions of two pairs exist at UVA. tree.mbh98.data.txt - this has 231 series collated to match identifications at original SI. The UVA version had only 212 series; one series at UVA could not be located at WDCP. The discrepancies were noted in the Corrigendum tree.southamerica.data.txt - again this was collated to match identifiers at original SI. The UVA version had fewer series and again this is noted in the Corrigendum. tree.australia.data.txt - as above tree.mbh99.data.txt - collation of North American sites back to AD1000 weights.txt - this is list of weights emailed by Scott Rutherford. Also at UVA under multiproxy.inf (but I'm blocked). OLD ___ Original scripts have *.old.txt or *.original.txt suffix. Apr 24, 2006