###========================================================================================================### ### BEGIN FUNCTION AND VARIABLE DEFINITIONS ### ###========================================================================================================### #################################### ### SEGMENT: PLOTTING FUNCTIONS ### #################################### ### ### 1. setColors: Generates a color map scale using a supplied set of anchor ### colors. Called by plotMap(). ### ### cols: A set of anchor colors. ### ### divs: The number of different colors between each anchor color. Allowed ### range is 0 to 255 ### ######################################################################################### ### ### 2. mapParams: Sets up projection parameters for plotMap(). ### ### database: Database to use in the maps package ### ### boundary: (Logical) Draw boundaries? ### ### orientation: Vector of lat, lon, and rotation for centering the map ### ### fill: (Logical) Fill polygon interiors? ### ### col: Color for polygon outlines/fill; can be a vector ### ### xlim: Vector of longitude limits ### ### ylim: Vector of latitude limits ### ### wrap: (Logical) Wrap polygons that stretch across the screen? ### ### bg: Background color ### ### border: Percent of plot area to use as white space ### ### lat.lines/lon.lines: (Logical) Draw latitude/longitude lines? ### ### lat.start/lon.start: Starting latitude/longitude ### ### lat.end/lon.end: Ending latitude/longitude ### ### lat.col/lon.col: Color for latitude/longitude lines ### ### lat.res/lon.res: Resolution for lines ### ### type: Draw points or rectangles ### ### pch: Plot character to use ### ### cex: Size for plot characters ### ######################################################################################### ### ### 3. makeOverlay: Formats a set of points to use as an overlay in plotMap(). ### ### X: Data set ### ### coord: Lat, lon, spatial weights, latitude span, and longitude span ### (latter 3 columns are not used for the Antarctica plots) ### ### norm: Normalize data to a maximum of -1/+1? ### ### convert: (Logical) Set to TRUE if color scale needs to be assigned ### to the data ### ### pch: Plotting character ### ### cex: Plotting character size ### ### type: "rect" for rectangles, "points" for points ### ### density: Lines per inch for shaded rectangles ### ### angle: Angle of fill lines for shaded rectangles ### ######################################################################################### ### ### 4. plotMap: Plots points on a map. ### ### dat: The data to be plotted ### ### coord: Matrix containing latitudes, longitudes, spatial weights (not used), ### latitude span (not used), and longitude span (not used) ### ### rng: Color scale maximum (scale will be made from -rng to +rng) ### ### divs: Number of divisions between anchor colors (0 to 255 allowable) ### ### cols: Vector of anchor colors ### ### na.col: Color to use for NA points ### ### m.title: Main plot title ### ### units: Text for units to be displayed on the color scale ### ### font.size: Factor to expand font size ### ### project: Projection type ### ### map.params: Output of mapParams() ### ### norm: (Logical) Normalize data to a maximum of -1/+1? ### ### overlays: (Logical) Are there supplied overlays? ### ### Y: Output of makeOverlay() ### ######################################################################################### ### ### 5. mapRectPlot/mapPointPlot/assignColors: Subroutines for plotMap ### ######################################################################################### ### ### 6. plotSat: Overplots satellite coverage periods. ### ### max: y position coordinate for satellite labels ### ### s: cex setting for text size ### ### rt: Text rotation (degrees) ### ### type: 1 = numeric, 2 = date ### ######################################################################################### ### ### 7. plotEst: Plots results of paired Wilcoxon/T-Tests ### ### est.w: Vector of Wilcoxon results ### ### est.t: Vector of T-Test results ### ### rng: Range the test was performed over ### ######################################################################################### ### ### 8. plotTrend: Plots continental averages with trends annotated ### ### dat: Time series of anomalies ### ### st / en: Start and stop times ### ### x.pos / y.pos: X and Y positions for the text annotation ### ### main.t: Text for the main plot label ### ######################################################################################### ### ### 9. plotBar: Generic bar plot to sort stations by location and color. ### ### dat: Input data matrix ### ### stns: Vector of station names ### ### cols: Vector of colors ### ### stat: Row number containing the statistic to be plotted in "dat" ### ### y.lb: Y-axis label ### ### x.lb: X-axis label ### ### main.t: Main title text ### ### sub.t: Subtitle text ### ### b.mar: Bottom margin width in lines ### ### x.cex: Font size for text station labels ### ### shadow: Data set to be displayed in gray behind the input data ### matrix ### ### p.ch: Plotting character for line end points ### ######################################################################################### ### ### 10. plotWts: Plots the output of get.wts(), which is used to compare ### imputation spatial structure vs. AVHRR spatial structure ### ### dat: The output of get.wts() ### ### method: "S09" or "E-W" ### ### type: Type of plot ### ### = 1: Plot the spatial structure of the selected imputation eof ### = 2: Plot the spatial structure used to impute a selected PC vs. ### the spatial structure from the AVHRR data ### ### eof: The imputation EOF to plot ### ### pc: The AVHRR PC to plot ### ### main.t: Text for a plot title ### setColors=function (cols, divs) { ### Ensure divs within limits if (divs < 0) {divs = 0} if (divs > 255) {divs = 255} ### Get number of anchor points num.pts = length(cols) ### Start the color map len = length(cols) - 1 color.map = cols[1] ### Interpolate between anchors for (i in 1:len) { ### Convert to RGB col.st = col2rgb(cols[i]) col.en = col2rgb(cols[i + 1]) ### Find steps r = (col.en[1, 1] - col.st[1, 1]) / (divs + 1) g = (col.en[2, 1] - col.st[2, 1]) / (divs + 1) b = (col.en[3, 1] - col.st[3, 1]) / (divs + 1) ### Define # of steps ramp = seq(1:(divs + 1)) ### Interpolate r.ramp = col.st[1, 1] + ramp * r g.ramp = col.st[2, 1] + ramp * g b.ramp = col.st[3, 1] + ramp * b ### Concatenate color.map = c(color.map, rgb(red = r.ramp, green = g.ramp, blue = b.ramp, maxColorValue = 255)) } ### Return map color.map } mapParams = function (database = "world", boundary = TRUE, parameters = NULL, orientation = c(-90, 0, 0), fill = FALSE, col = 1, xlim = c(-180, 180), ylim = c(-90, -60), wrap = TRUE, bg = "white", border = c(0.2, 0.2), lat.lines = FALSE, lat.start = -90, lat.end = 90, lat.inc = 10, lat.col = "darkgray", lat.lty = 3, lat.res = 1, lon.lines = FALSE, lon.start = -180, lon.end = 180, lon.inc = 45, lon.col = "darkgray", lon.lty = 3, lon.res = 1, type = "points", pch = 15, cex = 1, lwd = 1) { ### Assemble into list for plotMap() map.params = list() map.params[[1]] = database map.params[[2]] = boundary map.params[[3]] = parameters map.params[[4]] = orientation map.params[[5]] = fill map.params[[6]] = col map.params[[7]] = xlim map.params[[8]] = ylim map.params[[9]] = wrap map.params[[10]] = bg map.params[[11]] = border map.params[[12]] = lat.lines map.params[[13]] = c(lat.start, lat.end, lat.inc, lat.col, lat.lty, lat.res) map.params[[14]] = lon.lines map.params[[15]] = c(lon.start, lon.end, lon.inc, lon.col, lon.lty, lon.res) map.params[[16]] = type map.params[[17]] = pch map.params[[18]] = cex map.params[[19]] = lwd map.params } makeOverlay = function (X, coord = coords, norm = FALSE, convert = FALSE, pch = 15, cex = 0.3, lwd = 1, type = "points", density = NULL, angle = 45) { ### Format for read by plotMap() overlay = list() overlay$data = X overlay$char = pch overlay$size = cex overlay$lwd = lwd overlay$lat = coord[, 1] overlay$lon = coord[, 2] overlay$lat.span = coord[, 4] overlay$lon.span = coord[, 5] overlay$norm = norm overlay$convert = convert overlay$type = type overlay$density = density overlay$angle = angle overlay } plotMap=function (X, coord = coords, rng = 1, divs=255, cols = anchor.colors, na.col = "gray", m.title = NA, units = NA, font.size = 0.8, project = "azequidistant", map.params = NA, norm = FALSE, overlays = FALSE, Y = NA, outline = TRUE) { library(mapproj) ### Read coordinates lat = coord[, 1] lon = coord[, 2] lat.span = coord[, 4] lon.span = coord[, 5] ### If no user defined projection parameters, load defaults if (is.na(map.params[[1]])) {map.params = mapParams()} ### Assign projection parameters dbase = map.params[[1]] bound = map.params[[2]] p.params = map.params[[3]] orientate = map.params[[4]] poly.fill = map.params[[5]] poly.cols = map.params[[6]] xlims = map.params[[7]] ylims = map.params[[8]] wrapper = map.params[[9]] backgnd = map.params[[10]] borders = map.params[[11]] lat.lines = map.params[[12]] lat.params = map.params[[13]] lon.lines = map.params[[14]] lon.params = map.params[[15]] type = map.params[[16]] char = map.params[[17]] char.size = map.params[[18]] line.w = map.params[[19]] ### Split screen if (screen() != FALSE) {close.screen(all.screens = TRUE)} fig = matrix(nrow = 3, ncol = 4) fig[1, ] = c(0, 1, 0.85, 1) fig[2, ] = c(0, .85, 0, 0.85) fig[3, ] = c(.85, 1, 0, 0.85) split.screen(figs = fig) screen(2) ### Plot projection par(mar = c(0, 0, 0, 0) + 0.1) map(database = dbase, boundary = bound, projection = project, parameters = p.params, orientation = orientate, fill = poly.fill, col = poly.cols, xlim = xlims, ylim = ylims, wrap = wrapper, bg = backgnd, border = borders) ### Generate color scale map.cols = setColors(cols, divs) ### Convert data to color scale lims = c(-rng, rng) temp = assignColors(X, lims, map.cols, norm) cols = temp[[1]] na.map = temp[[2]] ### Plot if (type == "rect") { temp = mapRectPlot(map.cols[cols], lat, lon, lat.span, lon.span, dens = NULL, ang = 45) ### Plot NAs if (sum(na.map) > 0) { temp = mapRectPlot(rep(na.col, length(lat))[na.map], lat[na.map], lon[na.map], lat.span[na.map], lon.span[na.map], dens = NULL, ang = 45) } } ### Plot if (type == "points") { temp = mapPointPlot(map.cols[cols], lat, lon, char, char.size, line.w) ### Plot NAs if (sum(na.map) > 0) { temp = mapPointPlot(rep(na.col, length(lat))[na.map], lat[na.map], lon[na.map], char, char.size, line.w) } } ### Overlays? if (overlays == TRUE) { ### Find number of overlays num.overlays = length(Y) ### Iterate through each overlay for (i in 1:num.overlays) { ### Extract data overlay = Y[[i]] dat = overlay$data char = overlay$char size = overlay$size line.w = overlay$lwd lat = overlay$lat lon = overlay$lon lat.span = overlay$lat.span lon.span = overlay$lon.span ovr.norm = overlay$norm = norm convert = overlay$convert type = overlay$type dens = overlay$density ang = overlay$angle over.cols=dat ### Convert? if (convert == TRUE) { converted = assignColors(dat, lims, map.cols, norm) cols = converted[[1]] over.cols=map.cols[cols] na.map = converted[[2]] } ### Plot overlay if (type == "rect") { temp = mapRectPlot(over.cols, lat, lon, lat.span, lon.span, dens, ang) } if (type == "points" ) { temp = mapPointPlot(over.cols, lat, lon, char, size, line.w) } } } ### Overlay map if (outline == TRUE) {map(database = dbase, boundary = bound, projection = project, parameters = p.params, orientation = orientate, fill = poly.fill, col = poly.cols, xlim = xlims, ylim = ylims, wrap = wrapper, bg = backgnd, border = borders, add = TRUE)} ### Lat lines? if (lat.lines == TRUE) { ### Get parameters start.lat = as.numeric(lat.params[1]) end.lat = as.numeric(lat.params[2]) inc.lat = as.numeric(lat.params[3]) col.lat = lat.params[4] lty.lat = as.numeric(lat.params[5]) res.lat = as.numeric(lat.params[6]) ### Generate line segments lats = seq(start.lat, end.lat, inc.lat) x.seq = seq(-180, 180, res.lat) x.len = length(x.seq) ### Plot for (i in 1:length(lats)) { lines(mapproject(list(x = x.seq, y = rep(lats[i], x.len))), col = col.lat, lty = lty.lat) } } ### Lon lines? if (lon.lines == TRUE) { ### Get parameters start.lon = as.numeric(lon.params[1]) end.lon = as.numeric(lon.params[2]) inc.lon = as.numeric(lon.params[3]) col.lon = lon.params[4] lty.lon = as.numeric(lon.params[5]) res.lon = as.numeric(lon.params[6]) ### Generate line segments lons = seq(start.lon, end.lon, inc.lon) y.seq = seq(-90, 90, res.lon) y.len = length(y.seq) ### Plot for (i in 1:length(lons)) { lines(mapproject(list(x = rep(lons[i], y.len), y = y.seq)), col = col.lon, lty = lty.lon) } } ### Plot color scale screen(3) par(mar = c(0, 0, 0, 0) + 0.1) rect(xleft = 0, xright = 1, ybottom = 0, ytop = 1, col = "white", border = 0) lines(x = c(0.45, 0.65), y = c(0.9, 0.9), col = 1) lines(x = c(0.45, 0.65), y = c(0.1, 0.1), col = 1) lines(x = c(0.4, 0.65), y = c(0.5, 0.5), col = 1) yinc = 0.8 / length(map.cols) yvals = 0.1 + seq(1:length(map.cols)) * yinc for (i in 1:length(map.cols)) { rect(xleft = 0.45, xright = 0.6, ytop = yvals[i], ybottom = yvals[i] - yinc, col = map.cols[i], border = NA) } text(x = 0.66, y = 0.5, round(0.5 * (lims[2] + lims[1]), 1), pos = 4, cex = font.size) text(x = 0.66, y = 0.9, round(lims[2], 1), pos = 4, cex = font.size) text(x = 0.66, y = 0.1, round(lims[1], 1), pos = 4, cex = font.size) par(srt = 90) text(x = 0.2, y = 0.5, units, cex=font.size) ### Plot title if (!is.na(m.title)) { par(mar = c(0, 0, 0, 0) + 0.1, cex.main = font.size) screen(1) title(main = m.title) } ### Remove split screens for subsequent plotting close.screen(all = TRUE) } mapRectPlot = function (dat, lat, lon, lat.span, lon.span, dens, ang) { idx = sign(lat * lon) == -1 xy1 = mapproject(lon - (0.5 * lon.span), lat + (0.5 * lat.span)) xy2 = mapproject(lon + (0.5 * lon.span), lat - (0.5 * lat.span)) xy3 = mapproject(lon - (0.5 * lon.span), lat - (0.5 * lat.span)) xy4 = mapproject(lon + (0.5 * lon.span), lat + (0.5 * lat.span)) xy1$x[idx] = xy3$x[idx] xy1$y[idx] = xy3$y[idx] xy2$x[idx] = xy4$x[idx] xy2$y[idx] = xy4$y[idx] rect(xleft = xy1$x, xright = xy2$x, ybottom = xy1$y, ytop = xy2$y, col = dat, border = NA, density = dens, angle = ang) } mapPointPlot = function (dat, lat, lon, char, size, line.w) { xy = mapproject(lon, lat) points(x = xy$x, y = xy$y, col = dat, cex = size, pch = char, lwd = line.w) } assignColors = function (X, lims, map.cols, norm) { ### Determine scale limits steps = (1 + length(map.cols)) / 2 inc = lims[2] / steps ### Scale data by maximum absolute value? dat = as.vector(X) na.map = is.na(dat) if (norm == TRUE ) {dat = dat / max(abs(dat))} ### Assign colors to data points cols = trunc(dat / inc) cols = cols + steps hi = cols > length(map.cols) cols[hi] = length(map.cols) lo = cols < 1 cols[lo] = 1 ### Return temp = list(cols, na.map) } plotSat = function (max, s, rt, type) { ### Which labels? if (type == 1) {lbls = noaa} else {lbls = noaa.date} ### Plot lines & legend for (i in 4:length(noaa)) { abline(v = lbls[[i]][1], lwd = 1, col = "gray") text(x = lbls[[i]][1], y = max, labels = noaa.lbls[i], cex = s, pos = 4, srt = rt) } } plotEst = function (est.w, est.t, rng) { ### Initialize plotting area plot(est.w ~ I(1:300), ylim = c(-3, 3), xlab = "Months from 1982", ylab = "Normalized to 95% CI", main = paste("Estimate of Difference in Means\nAVHRR vs. Ground Data (Common Points)\n", rng, " Month Range", sep = ""), type = "l", cex.main = .9) ### Define +/- 95% CI range rect(-100, -1, 400, 1, col = "gray95") ### Plot satellite labels plotSat(-3, .55, 0, 1) box() ### Add pretty lines abline(h = 1, col = 4) abline(h = -1, col = 4) abline(h = 0, lwd = 2) ### Replot the points & add the legend points(est.w ~ I(1:300), type = "l", col = 2, lwd=2) points(est.t ~ I(1:300), type = "l", lwd = 2) legend("topleft", legend = c("Paired Wilcoxon Test", "Paired T-Test"), col = c(1, 2), fill = c(1, 2), cex = .8, bg = "white") } plotTrend = function (x, st = 1957, en = c(2006, 12), y.pos = NA, x.pos = NA, main.t = "Untitled") { ### Get trend trend = lm(window(x, st, en) ~ I(time(window(x, st, en)))) ### Initialize variables N = length(window(x, st, en)) I = seq(1:N) / frequency(x) rmI = ssq(I - mean(I)) ### Calculate sum of squared errors SSE = ssq(trend[[2]]) / (N - 2) ### Calculate OLS standard errors seOLS = sqrt(SSE / rmI) * 10 ### Calculate two-tailed 95% CI ci95 = seOLS * 1.964 ### Get lag-1 ACF resids = residuals(trend) r1 = acf(resids, lag.max=1, plot=FALSE)$acf[[2]] ### Calculate CIs with Quenouille (Santer) adjustment for autocorrelation Q = (1 - r1) / (1 + r1) ciQ = ci95 / sqrt(Q) ### Plot data plot(window(x, st, en), main=main.t, ylab="Anomaly (Deg C)") ### Add trendline and x-axis abline(h = 0, col = 2) abline(trend, col = 4, lwd = 2) ### Annotate text text(x = ifelse(is.na(x.pos), st, x.pos), y = ifelse(is.na(y.pos), max(x, na.rm = TRUE) - 0.2 * max(x, na.rm = TRUE), y.pos), paste("Slope: ", round(trend$coef[2] * 10, 4), "+/-", round(ciQ, 4), "Deg C/Decade\n(corrected for AR(1) serial correlation)\nLag-1 value:", round(r1, 3)), cex = 0.8, col = 4, pos = 4) ### Return r = list(trend$coef[2] * 10, ciQ) } plotBar = function (dat, stns, cols, stat, y.lb = NA, x.lb = NA, main.t = NA, sub.t = NA, b.mar = 8, x.cex = 0.8, sup.line = NA, sup.txt = NA, shadow = NA, shadow.col = "lightgray", p.ch = 20, ver.line = FALSE) { shad = 0 plt.dat = dat cols = cbind(matrix(stns, ncol = 1), matrix(cols, ncol = 1)) ### Check if a shadow present if (!is.na(shadow[[1]])) { shad = 1 shadow = sort.stats(shadow, cols) } ### Sort by geographical location plt.dat = sort.stats(plt.dat, cols) ### Initialize the plotting area par(mar = c(b.mar, 4, 7, 2)) plot(c(1:length(stns)), xaxt = "n", col = "white", ylim = c(-0.3, 0.25 + as.numeric(max(plt.dat[stat + 1, ], 1.1))), xlab = x.lb, main = main.t, sub = sub.t, ylab = y.lb, type = "h") abline(h = 0) abline(h = 0.1 + as.numeric(max(plt.dat[stat + 1, ], 1.1))) if (ver.line == TRUE) { abline(h = 0.5, col = "lightgray", lty = 2) abline(h = 1, col = "lightgreen") } axis(side = 1, at = c(1:length(stns)), labels = colnames(plt.dat), las=2, cex.axis = 0.9, tck = 0.01) row.s = stat + 1 ### Plot the shadow if (shad == 1) { points(shadow[stat + 1, ] ~ I(seq(1:ncol(shadow)) - 0.35), type = "h", lend = 2, col = shadow.col, lwd = 2) abline(h = 0) points(shadow[stat + 1, ] ~ I(seq(1:ncol(shadow)) - 0.35), type = "p", cex = 1, pch = p.ch, lwd = 1, col = shadow.col) } ### Plot the data points((plt.dat[stat + 1, ]), type = "h", lend = 2, col = plt.dat[1, ], main = main.t, lwd = 2) abline(h=0) legend("top", legend = c("Peninsula", "West Antarctica", "Ross Ice Shelf", "East Antarctica"), cex = 1, col = c(1, 2, 3, 4), bg = "white", pch = 15, ncol = 4, bty = "n") points((plt.dat[stat + 1, ]), type = "p", cex = 1, pch = p.ch, lwd = 1) if (stat != 1) {abline(h = 1, col = "gray")} plt.dat } plotWts = function(dat, method = "S09", type = 1, eof = 1, pc, main.t = "") { ### Find number of ground station columns gnd.l = ncol(dat[[2]][[1]]) - 1 ### Define pc length pc.l = 1 ### Check for allowed combinations if (method == "S09") { idx = 1 pc.l = ncol(dat[[1]][[1]]) - gnd.l } if (method == "E-W") {idx = 2} if (type == 1) {pc = 1} ### Select proper matrix plt.dat = dat[[idx]][[pc]] ### Find neigs neigs = nrow(plt.dat) - 5 ### Find the y-axis limits for plotting type=2 ylm = c(min(plt.dat[(neigs + 2):(neigs + 3), ], na.rm=TRUE), max(plt.dat[(neigs + 2):(neigs + 3), ], na.rm = TRUE)) ylm = c(ylm[1], 0.1 * (ylm[2] - ylm[1]) + ylm[2]) ### Set up plotting par(mar = c(8, 4, 4, 2)) ### Plot imputation spatial EOF if (type == 1) { plot(plt.dat[eof + 1, ], xaxt = "n", col = "white", xlab = "", ylab = "Imputation Spatial EOF Coefficients (Unitless)", main = paste(main.t, "\nImputation EOF #", eof, sep = "")) points(plt.dat[eof + 1, ], col = c(plt.dat[1, 1:gnd.l], rep("goldenrod", pc.l)), type = "h") points(plt.dat[eof + 1, ]) abline(h = 0, lwd = 2) } ### Plot comparative relative contributions for a particular PC if (type == 2) { plot(plt.dat[neigs + 2, ], xaxt = "n", col = "white", ylim = ylm, xlab = "", ylab = "Normalized Spatial EOF Coefficients (Unitless)", main = paste(main.t, "\nAVHRR PC #", pc, sep = "")) legend("top", legend=c("RegEM Spatial EOF Coefficient", "AVHRR Spatial EOF Coefficient"), cex=.8, pch=c(1, 8), bg="white", ncol=2, bty="n") abline(h=0, lwd = 2) abline(h = ylm[2] - 0.05 * (ylm[2] - ylm[1])) for (i in 1:gnd.l) {lines(x = c(i, i), y = c(plt.dat[neigs + 2, i], plt.dat[neigs + 3, i]), col = plt.dat[1, i])} points(plt.dat[neigs + 2, 1:gnd.l], col = 1); points(plt.dat[neigs + 3, 1:gnd.l], pch = 8, lwd = 1.5) if (idx == 1) { for (i in (gnd.l + 1):(ncol(plt.dat))) { y.dat = ifelse((i-gnd.l) == pc, NA, plt.dat[neigs + 2, i]) points(x = i, y = y.dat, col = "goldenrod", lwd = 1.5, type = "h") points(x = i, y = y.dat) } } } ### Label axis axis(side = 1, at = c(1:ncol(plt.dat)), labels = colnames(plt.dat), las = 2, cex.axis = .8, tck = .01) attr(plt.dat, "Info") = paste("Total Contribution and AVHRR Eigenvector stats are for PC #", pc, sep = "") plt.dat } ###################################################################################### ###################### END PLOTTING FUNCTIONS ###################### ###################################################################################### ################################### ### SEGMENT: GENERAL FUNCTIONS ### ################################### ### ### 1. ssq: Calculates the sum of the squares of the input data. ### ######################################################################################### ### ### 2. mScale: Scale/center/add function for matrices. ### ### X: Input matrix ### ### d: Scaling vector / adding vector ### ### scale: (Logical) Multiply columns/rows of X by d? ### ### center: (Logical) Column/row center X? ### ### add: (Logical) Add vector d to columns/rows of X? ### ### type: Column or row operations ### ### = 1: Column operations ### = 2: Row operations ### ### max.size: x1000, maximum number of elements to process at one time ### ### NOTE: The order of operations is always centering -> scaling -> adding. ### ######################################################################################### ### ### 3. makeAnom: Convert a matrix or vector to anomalies. ### ### X: Input matrix or vector ### ### st: Baseline start time / row number ### ### en: Baseline end time / row number ### ### idx: Only use values where idx = TRUE for anomaly calculations ### ### max.size: See mScale ### ######################################################################################### ### ### 4. makeSeason: Parse a vector or matrix for seasons. ### ### X: Input matrix or vector ### ### st: Truncate before this start time / row number ### ### en: Truncate after this end time / row number ### ### season: Which season to parse ### ### = "w": Winter ### = "sp": Spring ### = "su": Summer ### = "f": Fall ### ### south: (Logical) Southern hemisphere seasons? ### ### year.type: Data set is in calendar "cal" or meteorological "met" years ### ######################################################################################### ### ### 5. makeAnnual: Convert a monthly series into annual. ### ### X: Input matrix or vector ### ### cutoff: Least number of monthly data points that must be present to calculate ### an annual average ### ######################################################################################### ### ### 6. parseAll: Parse data. Returns a list of: ### ### [[1]]: Cloudmasked AVHRR data ### [[2]]: Ground station data ### [[3]]: Metadata for ground data ### [[4]]: AVHRR grid cell coordinates ### ### dat: A list generated from "load.data" ### ######################################################################################### ### ### 7. parseSat: Parses an input AVHRR data set for ground station locations. ### ### dat: Input matrix ### ### idx: Metadata to select appropriate grid cells corresponding to ground data ### ### form: Select the ground station set; i.e. "form.S09". The set designates ### stations to be REMOVED. ### ######################################################################################### ### ### 8. parseRandom: Parses a READER set for random withholding. ### ### dat: Input matrix. ### ### form: Select the ground station set; i.e. "form.S09". The set designates ### stations to be REMOVED ### ### withhold: Fraction of actual data points to withhold ### ######################################################################################### ### ### 9. parseStn: Parses a ground data set. Returns a list of: ### ### [[1]]: Parsed ground data ### [[2]]: Early calibration set ### [[3]]: Late calibration set ### ### dat: Input matrix ### ### form: Select the ground station set; i.e. "form.steig". The set designates ### stations to be REMOVED. ### ### form.ver: Select the ground station set; i.e. "form.no.ocean.aws". The set ### designates those stations for which 1/2 of the data will be ### removed for split calibration/verification experiments. ### ######################################################################################### ### ### 10. sortStats: Sorts ground station statistics into geographical order ### for cleaner plotting. Geographical order is column 10 ### in all.idx, where "1" is Peninsula, "2" is West Antarctica, ### "3" is Ross Ice Shelf, and "4" is East Antarctica. ### ### dat: Input matrix ### ### idx: Index matrix with the station names in Row 1 and location numbers ### in Row 2 ### ######################################################################################### ### ### 11. loadData: Loads the ground station data, index metafile, AVHRR ### data, and AVHRR coordinates. Returns a list of: ### ### [[1]]: Manned station data ### [[2]]: AWS station data ### [[3]]: Manned station metafile ### [[4]]: AWS station metafile ### [[5]]: Cloudmasked AVHRR data ### [[6]]: Grid coordinated for the AVHRR data ### ######################################################################################### ### ### 12. downData: Downloads the AVHRR data, the S09 Antarctic reconstruction, ### AVHRR longitudes, and AVHRR latitudes from Steig's site. Also ### downloads the MET READER data for all manned and AWS stations. ### Saves files in the working directory. NOTE: "parse.data" must ### be subsequently used for loading the files into R. ### ######################################################################################### ### ### 13. getReader: Script to scrape BAS' READER site. ### ### x: Vector of station names to read ### ### type: Set to "aws" or "manned" ### ssq = function (x) {sum(x ^ 2)} mScale = function (X, d = 1, scale = TRUE, center = FALSE, add = FALSE, type = 1, na.rm = TRUE, max.size = 500) { ### Convert max.size (command line argument is in thousands) max.size = max.size * 1000 ### Get matrix data n = nrow(X) p = ncol(X) pts = n * p ts.test = FALSE if (is.ts(X)) { ts.st = start(X) ts.fr = frequency(X) tsp(X) = NULL ts.test = TRUE } ### Check for proper length of d & correct ADD setting if (length(d) == 1 & type == 1 & (scale == TRUE | add == TRUE)) {d = rep(d, p)} if (length(d) == 1 & type == 2 & (scale == TRUE | add == TRUE)) {d = rep(d, n)} if (length(d) != p & type == 1 & (scale == TRUE | add == TRUE)) {warning("Scaling vector length different than number of columns.\nScaling vector recycled.", call. = FALSE, immediate. = TRUE)} if (length(d) < n & type == 2 & (scale == TRUE | add == TRUE)) {warning("Scaling vector length less than number of rows.\nNAs produced.", call. = FALSE, immediate. = TRUE)} if (length(d) > n & type == 2 & (scale == TRUE | add == TRUE)) {warning("Scaling vector length greater than number of rows.\nScaling vector truncated.", call. = FALSE, immediate. = TRUE)} ### If matrix too big, break into chunks if (pts > max.size) { blocks = ceiling(pts / max.size) rows = ceiling(n / blocks) } else { blocks = 1 rows = n } ### Do scaling / centering for (i in 1:blocks) { ### Get start and end points st = 1 + (i - 1) * rows en = min(i * rows, n) ### Column centering if (type == 1 & center == TRUE & en > st) { cd = colMeans(X, na.rm = na.rm) s = matrix(cd, ncol = p, nrow = en - st + 1, byrow = TRUE) X[st:en, ] = X[st:en, ] - s rm(cd, s) } ### Row centering if (type == 2 & center == TRUE & en > st) { cd = rowMeans(X, na.rm = na.rm) s = matrix(cd[st:en], ncol = p, nrow = en - st + 1) X[st:en, ] = X[st:en, ] - s rm(cd, s) } ### Column scaling / adding if (type == 1 & (scale == TRUE | add == TRUE) & en > st) { s = matrix(d, ncol = p, nrow = en - st + 1, byrow = TRUE) if (scale == TRUE) {X[st:en, ] = X[st:en, ] * s} if (add == TRUE) {X[st:en, ] = X[st:en, ] + s} rm(s) } ### Row scaling / adding if (type == 2 & (scale == TRUE | add == TRUE) & en > st) { s = matrix(d[st:en], ncol = p, nrow = en - st + 1) if (scale == TRUE) {X[st:en, ] = X[st:en, ] * s} if (add == TRUE) {X[st:en, ] = X[st:en, ] + s} rm(s) } } ### Cleanup & return if (ts.test == TRUE) {X = ts(X, start = ts.st, freq = ts.fr)} X } makeAnom = function (X, st = NA, en = NA, idx = NA, max.size = 500) { ### Initialize vec = FALSE ts.conv = FALSE use.idx = FALSE def.st = st def.en = en ### If a time series, make sure to restore after conversion to matrix if (is.ts(X)) { ### If no start/stop dates, use whole series if (is.na(st)) {st = start(X)} if (is.na(en)) {en = end(X)} ### Store characteristics ts.conv = TRUE def.st = st def.en = en fr = frequency(X) ### Convert start/ends to row numbers if (length(st) == 1) {st = (st[1] * fr) - (time(X)[1] * fr) + 1} if (length(st) == 2) {st = (st[1] * fr + st[2]) - (time(X)[1] * fr)} if (length(en) == 1) {en = (en[1] * fr) - (time(X)[1] * fr) + 1} if (length(en) == 2) {en = (en[1] * fr + en[2]) - (time(X)[1] * fr)} ### Remove time series attribute tsp(X) = NULL } ### Convert vector to matrix if (is.null(dim(X))) { X = matrix(X, ncol = 1) vec = TRUE } ### Start/end row numbers? if (is.na(st)) {st = def.st = 1} if (is.na(en)) {en = def.en = nrow(X)} ### Index provided? temp = X if (!identical(idx, NA)) { temp[!idx] = NA use.idx = TRUE } ### Define mean function if (ncol(X) == 1) {meanFn = mean} else {meanFn = colMeans} ### Iterate through each month for (i in 1:12) { ### Find month and subset for baseline all.mo = seq(from = i, to = nrow(X), by = 12) baseline = all.mo[all.mo >= st & all.mo <= en] ### Calculate and apply monthly means mo.mean = meanFn(temp[baseline, ], na.rm = TRUE) X[all.mo, ] = mScale(as.matrix(X[all.mo, ], ncol = 1), -mo.mean, type = 1, add = TRUE, scale = FALSE, center = FALSE) } ### Convert data back to input characteristics and return if (vec == TRUE) {X = as.vector(X)} if (ts.conv == TRUE) {X = ts(X, start = def.st, freq = fr)} attr(X, "Baseline start: ") = def.st attr(X, "Baseline end: ") = def.en attr(X, "Index used: ") = use.idx X } makeSeason = function(X, st = NA, en = NA, season = "summer", south = TRUE, year.type = "cal") { vec = FALSE; ts.conv = FALSE ### If a time series, make sure to restore after conversion to matrix if (is.ts(X)) {ts.conv = TRUE} ### Allows anomalies to be calculated for vectors if (is.null(dim(X))) { vec = TRUE if (!is.ts(X)) {X = matrix(X, ncol = 1)} } ### Set window if (ts.conv == TRUE) { if (identical(st, NA)) {st = start(X)} if (identical(en, NA)) {en = end(X)} tsp(X) = NULL } else { if (is.na(st)) {st = 1} if (is.na(en)) {en = nrow(X)} X = matrix(X[st:en, ], ncol = ncol(X)) } ### Define seasons if (year.type == "met") { win = c(10:12) spr = c(1:3) sum = c(4:6) fal = c(7:9) y.text = "Meteorological" } else { win = c(1:2, 12) spr = c(3:5) sum = c(6:8) fal = c(9:11) y.text = "Calendar" } if (south == FALSE) { if (season == "winter" | season == "w" | season == 1) {seas = win; s.text = "NH Winter"} if (season == "spring" | season == "sp" | season == 2) {seas = spr; s.text = "NH Spring"} if (season == "summer" | season == "su" | season == 3) {seas = sum; s.text = "NH Summer"} if (season == "fall" | season == "autumn" | season == "f" | season == "a" | season == 4) {seas = fal; s.text = "NH Fall"} } else { if (season == "winter" | season == "w" | season == 1) {seas = sum; s.text = "SH Winter"} if (season == "spring" | season == "sp" | season == 2) {seas = fal; s.text = "SH Spring"} if (season == "summer" | season == "su" | season == 3) {seas = win; s.text = "SH Summer"} if (season == "fall" | season == "autumn" | season == "f" | season == "a" | season == 4) {seas = spr; s.text = "SH Fall"} } ### Parse for season idx = rep(TRUE, 12) idx[seas] = FALSE idx = rep(idx, ceiling(nrow(X) / 12)) length(idx) = nrow(X) X[idx, ] = NA ### Restore characteristics of input data if (vec == TRUE) {X = as.vector(X)} if (ts.conv == TRUE) {X = ts(X, start = st, freq = 12)} ### Return season attr(X, "Season: ") = s.text attr(X, "Year Type: ") = y.text X } makeAnnual = function (X, cutoff = 10) { vec=FALSE; ts.conv=FALSE; mean.f = colMeans; sum.f = colSums ### If a time series, make sure to restore after conversion to matrix if (is.ts(X)) {st = start(X); ts.conv = TRUE} ### Allows anomalies to be calculated for vectors if (is.null(dim(X))) {X = as.matrix(X, ncol = 1); vec = TRUE; mean.f = mean; sum.f = sum} ### Check if input data was a 1-column matrix if (ncol(X) == 1) {mean.f = mean; sum.f = sum} ### Run if data has complete years len = nrow(X) if (len / 12 == trunc(len / 12)) { ### Set up new storage Y = matrix(ncol = ncol(X), nrow = len / 12) ### Get each year's mean; replace with NAs if months < cutoff for (i in 1:(len / 12)) { idx = seq(1:12) + (12 * (i - 1)) complete = sum.f(!is.na(X[idx, ])) < cutoff Y[i, ] = mean.f(X[idx, ], na.rm = TRUE) Y[i, complete] = NA } ### Replace NaNs with NAs idx = is.nan(Y) Y[idx] = NA ### Restore original characteristics of input data if (vec == TRUE) {Y = as.vector(Y)} if (ts.conv == TRUE) {Y = ts(Y, start = st, freq = 1)} } else { ### Oops! Whole years not supplied Y=NA message("Input matrix not in whole years. No calculation performed.") } ### Return Y } parseAll=function(dat = l.data) { ### Initialize list parsed = list() ### Combine and convert ground to ### Kelvin ground = ts.union(dat[[1]], dat[[2]]) + 273.15 ### Return list parse = list(dat[[3]], ground, dat[[4]]) } parseSat=function(dat, idx, form = 0) { if (sum(form) != 0) { idx = idx[-form, ] dat = dat[, idx[, 4]] } dat } parseRandom = function(dat, form, withhold = 0.05) { ### Get form dat = dat[, -form.list[[form]]] ### Random withholding for imputation experiments keep = 1 - withhold rand = matrix(rnorm(dat), ncol = ncol(dat), nrow = nrow(dat)) ver = dat ver[rand > qnorm(keep)] = NA ### Return sets sets = list(dat, ver) } parseStn=function(dat, form = 0, ver.form = 0) { ### Initialize variables early = late = base = dat all.stns = seq(1:nrow(all.idx)) idx = matrix(FALSE, ncol = nrow(all.idx)) idx[, ver.form] = TRUE ### Ensure stations are selected if (sum(ver.form) != 0) { ### Iterate station-by-station in ver.form for (i in 1:length(ver.form)) { ### Find length of data and total record length temp = dat[, ver.form[i]] l1 = l2 = trunc(0.5 * (length(na.omit(as.vector(temp))))) l3 = length(temp) ### Iterate by point. For late calibration, the ### earliest 1/2 of points are removed. Vice-versa ### for early calibration. Record removed points ### in "idx". for (j in 1:l3) { if (!is.na(temp[j]) & l1 > 0) {late[j, ver.form[i]] = NA; l1 = l1 - 1} if (!is.na(temp[l3 + 1 - j]) & l2 > 0) {early[l3 + 1 - j, ver.form[i]] = NA; l2 = l2 - 1} } } } ### Parse data for desired stations and record indices if (sum(form) != 0) { base = dat[, -form] early = early[, -form] late = late[, -form] idx = matrix(idx[, -form], nrow = 1) idx = col(idx)[idx == TRUE] stns = intersect(all.stns[-form], ver.form) } ### Return list ver = list(base, early, late, stns, idx) } sortStats = function(dat, idx) { ### Set up temporary matrix temp = matrix(ncol =ncol(dat), nrow = nrow(dat) + 1) temp[2:(nrow(dat) + 1), ] = dat ### Add color codes for geographical location as row 1 temp[1, ] = as.numeric(idx[, 2]) ### Assign names colnames(temp) = idx[, 1] rownames(temp) = c("Geographical Group", rownames(dat)) ### Remove any stations with NAs rmv.nas = !is.na(temp[2, ]) temp = temp[, rmv.nas] ### Sort by geographical location sort.vec = order(temp[1, ]) sorted = temp[, sort.vec] sorted } loadData = function() { ### Load ground data load("manned.RData") load("aws.RData") ### Load S09 ground data grid = scan("READER_groundstations.txt", n = -1) gnd.S09 = ts(t(array(grid, dim = c(42, 600))), start = 1957, freq = 12) rm(grid) ### Load AVHRR data grid = scan("cloudmaskedavhrr.txt", n = -1) avhrr = ts(t(array(grid, dim = c(5509, 300))), start = 1982, freq=12) rm(grid) ### Load longitude/latitude coordinates lon = read.table("tir_lons.txt") lat = read.table("tir_lats.txt") sat.coord = cbind(lon,lat) ### Load S09 recon grid = scan("ant_recon.txt", n = -1) recon.S09 = ts(t(array(grid, dim = c(5509, 600))), start = 1957, freq = 12) rm(grid) l.data = list(all.manned, all.aws, avhrr, sat.coord, recon.S09, gnd.S09) l.data } downData = function() { ### Download and save S09 data download.file("http://faculty.washington.edu/steig/nature09data/data/cloudmaskedAVHRR.txt", "cloudmaskedavhrr.txt", mode = "wb") download.file("http://faculty.washington.edu/steig/nature09data/data/READER_groundstations.txt", "READER_groundstations.txt", mode = "wb") download.file("http://faculty.washington.edu/steig/nature09data/data/ant_recon.txt", "ant_recon.txt", mode = "wb") download.file("http://faculty.washington.edu/steig/nature09data/data/Tir_lons.txt", "Tir_lons.txt", mode = "wb") download.file("http://faculty.washington.edu/steig/nature09data/data/Tir_lats.txt", "Tir_lats.txt", mode = "wb") ### Get manned station data from MET READER reader.names = all.idx[1:44, 1] all.manned = getReader(x=reader.names, type = "surface") all.manned = window(all.manned, start = 1957, end = c(2009, 12)) save(all.manned, file = "manned.RData") ### Get AWS data from MET READER reader.names = all.idx[45:109, 1] all.aws = getReader(x=reader.names, type = "aws") all.aws = window(all.aws, start=1957, end = c(2009, 12)) save(all.aws, file = "aws.RData") } getReader = function(x = reader.names, type = "surface") { ### Set up dummy time series reader.data = ts(matrix(NA, ncol = 1, nrow = 800), start = 1957, freq = 12) for (i in 1:length(x)) { ### Download READER data data.url = paste("http://www.antarctica.ac.uk/met/READER/", type, "/", x[i], ".All.temperature.txt", sep = "") temp = read.table(data.url, sep = "", skip=1, na.strings = "-", stringsAsFactors = FALSE) ### Make a logical array to throw away pre-1957 datemap = (temp[, 1] > 1956) temp = temp[datemap, ] ### Get earliest date startdate = temp[1,1] ### Stack to a single column and make a time series temp = ts(matrix(t(temp[,-1]), ncol = 1), start = startdate, freq = 12) ### Place data into reader.data reader.data = ts.union(reader.data, temp) } ### Remove dummy column and return reader.data = reader.data[, -1] colnames(reader.data) = x reader.data } ##################################################################################### ###################### END GENERAL FUNCTIONS ###################### ##################################################################################### ######################################## ### SEGMENT: VERIFICATION FUNCTIONS ### ######################################## ### ### 1: getStats: Calculates calibration period rms error, average explained ### variance, and correlation coefficient. Calculates verification ### period rms error, RE, CE, and correlation coefficient. If ### no calibration period map is supplied, it assumes all values ### are verification, and does not calculate RE or calibration ### statistics. ### ### orig: Vector/matrix of original data ### ### est: Vector/matrix of estimated data ### ### cal.idx: Logical map of calibration points ### ######################################################################################### ### ### 2. corFn: Calculates correlation coefficient between ordered pairs of variables ### in a matrix - much faster than diag(cor(X, Y)) ### ### X: First set of variables ### ### Y: Second set of variables ### ### cols: (Logical) If TRUE, variables are arranged in columns; otherwise, ### variables are arranged in rows ### ######################################################################################### ### ### 3. lmFn: Performs linear regression on each variable in a matrix - much faster ### than lm(). ### ### Calculates: Regression coefficient, intercept, residual SE, residual ### lag-1 ACF, DoF adjustment for serial correlation, ### corrected SE, 2-tailed significance values, and any ### desired set of confidence intervals ### ### Also outputs the fit line and the residuals ### ### X: Independent variable (if a vector, will be recycled against each variable ### in Y) ### ### Y: Dependent variables ### ### cols: (Logical) If TRUE, variables are arranged in columns ### ### full: (Logical) If FALSE, return only coefficient and intercept ### ### ci: Vector of confidence intervals - i.e., c(0.95, 0.99, 0.995) - to ### calculate ### ### sig: Returns significance calculations ### ### adj.DoF: (Logical) If TRUE, adjusts degrees-of-freedom for serial ### correlation of the residuals ############################################################################################ ### ### 4. getVerStns: Obtains the on-grid stations not used for a particular form. ### ### Y: Ground station set ### ### form: Ground station clipping vector ### ### form.list: List containing all ground station clipping vectors ### ### ver: List containing all verification station clipping vectors ### ######################################################################################### ### ### 5. getOffsets: Performs moving Wilcoxon and t-tests between the averages of ### the sat and ground anomalies. Calculates offsets based on ### definitions supplied in "sat.cal.matrix". Offsets are based on ### satellite coverage periods. ### ### dat: Satellite data at ground locations. MUST BE A TIME SERIES - RAW DATA. ### ### base: Ground data. MUST BE A TIME SERIES - RAW DATA (not anomalies). ### ### rng: Range for the moving Wilcoxon and t-tests. ### ### st: Start date. ### ### en: End date. ### ######################################################################################### ### ### 6. getWts: Gets the spatial structure used for imputation vs. ### the AVHRR spatial structure for reconstructions done ### using the infilling algorith. Calculates three different ### scenarios (the user must select the scenario that was ### actually used when the user goes to plot; the script ### has no way of knowing how the PCs were infilled). ### ### X: Ground station matrix ### ### Y: PC matrix ### ### form: Ground station clipping vector used ### ### neigs: Number of imputation EOFs used to generate Y ### ### alpha: Scaling factor used to generate Y ### ### DoFL: Lost degrees of freedom assumed when generating Y ### ######################################################################################### ### ### 7. gCirc: Computes great-circle distance between provided lat/longs ### using the Vincenty formula for spheres. ### ### lat.1: Origin latitude ### ### lon.1: Origin longitude ### ### lat.2: Destination latitude (may be a vector) ### ### lon.2: Destination longitude (may be a vector) ### ### r: Radius of the sphere ### getStats = function(orig, est, cal.idx = NA) { ### Only use overlapping points na.idx = is.na(orig) | is.na(est) orig[na.idx] = NA est[na.idx] = NA ### Matrix or vectors? if (is.null(dim(orig))) {orig = matrix(orig, ncol = 1)} if (is.null(dim(est))) {est = matrix(est, ncol = 1)} if (!identical(cal.idx, NA) & is.null(dim(cal.idx))) {cal.idx = matrix(cal.idx, ncol = 1)} if (!identical(dim(orig), dim(est)) | (!identical(dim(orig), dim(cal.idx)) & !identical(cal.idx, NA))) {stop("Nonconformable arguments.")} if (is.null(colnames(orig))) {colnames(orig) = c(1:ncol(orig))} ### Dummy for cal stats (in case no calibration period) cal.stats = NA ### # of columns nc = ncol(orig) ### Calibration period? if (!identical(cal.idx, NA)) { ### Ensure only overlapping points used cal.idx[na.idx] = FALSE ### Center baselines on calibration period orig.temp = orig orig.temp[!cal.idx] = NA orig = mScale(orig, -apply(orig, 2, mean, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) est.temp = est est.temp[!cal.idx] = NA est = mScale(est, -apply(est, 2, mean, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) ### Store calibration data orig.cal = orig est.cal = est orig.cal[!cal.idx] = NA est.cal[!cal.idx] = NA ### Set up storage cal.stats = matrix(nrow = 3, ncol = ncol(orig) + 2) colnames(cal.stats) = c(colnames(orig), "Station Mean", "Pointwise Mean") rownames(cal.stats) = c("RMS Error", "Explained Variance", "Cor. Coef.") attr(cal.stats, "Type: ") = "Calibration Period" ### Calculate rms error resids = (orig.cal - est.cal) ^ 2 n.eff = colSums(!is.na(resids)) - 1 rms = sqrt(colSums(resids, na.rm = TRUE) / n.eff) stn.rms = mean(rms, na.rm = TRUE) point.rms = sqrt(sum(resids, na.rm = TRUE) / (sum(!is.na(resids)) - 1)) cal.stats[1, ] = c(rms, stn.rms, point.rms) ### Calculate average explained variance den = orig.cal ^ 2 R2c = 1 - colSums(resids, na.rm = TRUE) / colSums(den, na.rm = TRUE) stn.R2c = mean(R2c, na.rm = TRUE) point.R2c = 1 - sum(resids, na.rm = TRUE) / sum(den, na.rm = TRUE) cal.stats[2, ] = c(R2c, stn.R2c, point.R2c) ### Calculate correlation coefficient r = corFn(orig.cal, est.cal) stn.r = mean(r, na.rm = TRUE) o = matrix(as.vector(orig.cal), nrow = 1) e = matrix(as.vector(est.cal), nrow = 1) point.r = corFn(o, e, cols = FALSE) cal.stats[3, ] = c(r, stn.r, point.r) ### Set up for verification period orig[cal.idx] = NA est[cal.idx] = NA } ### Verification period if (identical(cal.idx, NA)) { ### Center orig = mScale(orig, -apply(orig, 2, mean, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) est = mScale(est, -apply(est, 2, mean, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) } ### Set up storage ver.stats = matrix(nrow = 4, ncol = ncol(orig) + 2) colnames(ver.stats) = c(colnames(orig), "Station Mean", "Pointwise Mean") rownames(ver.stats) = c("RMS Error", "RE", "CE", "Cor. Coef.") attr(ver.stats, "Type: ") = "Verification Period" ### Calculate rms error resids = matrix((orig - est) ^ 2, ncol = nc) n.eff = colSums(!is.na(resids)) - 1 n.eff[n.eff < 1] = NA rms = sqrt(colSums(resids, na.rm = TRUE) / n.eff) stn.rms = mean(rms, na.rm = TRUE) point.rms = sqrt(sum(resids, na.rm = TRUE) / (sum(!is.na(resids)) - 1)) ver.stats[1, ] = c(rms, stn.rms, point.rms) ### Calculate RE if (!identical(cal.idx, NA)) { den = orig ^ 2 RE = 1 - colSums(resids, na.rm = TRUE) / colSums(den, na.rm = TRUE) stn.RE = mean(RE, na.rm = TRUE) point.RE = 1 - sum(resids, na.rm = TRUE) / sum(den, na.rm = TRUE) ver.stats[2, ] = c(RE, stn.RE, point.RE) } else { ver.stats[2, ] = NA } ### Calculate CE den = mScale(orig, -apply(orig, 2, mean, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) ^ 2 den = colSums(den, na.rm = TRUE) den[den == 0] = NA CE = 1 - colSums(resids, na.rm = TRUE) / den stn.CE = mean(CE, na.rm = TRUE) point.CE = 1 - sum(resids, na.rm = TRUE) / sum(den, na.rm = TRUE) ver.stats[3, ] = c(CE, stn.CE, point.CE) ### Calculate correlation coefficient r = corFn(orig, est) stn.r = mean(r, na.rm = TRUE) o = matrix(as.vector(orig), nrow = 1) e = matrix(as.vector(est), nrow = 1) point.r = corFn(o, e, cols = FALSE) ver.stats[4, ] = c(r, stn.r, point.r) ### Return all.stats = list(cal.stats, ver.stats) all.stats } corFn = function (X, Y, cols = TRUE) { ### Compare existing points only map = is.na(X) | is.na(Y) X[map] = NA Y[map] = NA ### Variables in columns if (cols == TRUE) { X = t(X) - colMeans(X, na.rm = TRUE) Y = t(Y) - colMeans(Y, na.rm = TRUE) ### Variables in rows } else { X = X - rowMeans(X, na.rm = TRUE) Y = Y - rowMeans(Y, na.rm = TRUE) } ### Calculate r = rowSums(X * Y, na.rm = TRUE) / sqrt(rowSums(X ^ 2, na.rm = TRUE) * rowSums(Y ^ 2, na.rm = TRUE)) ### Return r } lmFn = function (X, Y, cols = TRUE, full = FALSE, ci = NA, sig = FALSE, adj.DoF = TRUE) { ### Time series? ### Check for problems ts.prob = 0 if (is.ts(X) & is.ts(Y)) { ts.prob = -1 if (!identical(start(X), start(Y)) | !identical(end(X), end(Y)) | !identical(frequency(X), frequency(Y))) {ts.prob = 1} } if (is.ts(X) & !is.ts(Y)) {ts.prob = 1} if (ts.prob == 1) {stop("Time series problems. Start/end times or frequencies do not match (or Y is not a time series).")} ### Save ts data and get rid of ts attribute if (ts.prob == -1) { st = start(Y) en = end(Y) fr = frequency(Y) tsp(X) = NULL tsp(Y) = NULL } ### Check for dimensions Y.rows = X.rows = 1 if (is.matrix(Y) & cols == TRUE) { Y = t(Y) Y.rows = nrow(Y) } if (is.matrix(Y) & cols == FALSE) { Y.rows = nrow(Y) } if (is.matrix(X) & cols == TRUE) { X = t(X) X.rows = nrow(X) } if (is.matrix(X) & cols == FALSE) { X.rows = nrow(X) } ### Find max dimension rows = max(Y.rows, X.rows) ### Vectors? if (is.vector(Y)) {Y = matrix(rep(Y, rows), nrow = rows, byrow = TRUE)} if (is.vector(X)) {X = matrix(rep(X, rows), nrow = rows, byrow = TRUE)} ### Compare existing points only map = is.na(X) | is.na(Y) X[map] = NA Y[map] = NA ### Find means & center Y.mean = rowMeans(Y, na.rm = TRUE) X.mean = rowMeans(X, na.rm = TRUE) Y = Y - Y.mean X = X - X.mean ### Calculate coefficients n = ncol(X) b1 = ((1 / n) * rowSums(X * Y, na.rm = TRUE)) / ((1 / n) * rowSums(X ^ 2, na.rm = TRUE)) b2 = Y.mean - b1 * X.mean ### All standard calculations? if (full == TRUE | !identical(ci, NA) | sig == TRUE) { ### Find fitted values fitted = b1 * X + Y.mean ### Find residuals resids = Y - (X * b1) ### Find SE of the residuals resid.SE = sqrt(rowSums(resids ^ 2, na.rm = TRUE) / (ncol(Y) - 2)) ### Find SE of the fit sd.X = sqrt(rowSums(X ^ 2, na.rm = TRUE) / (ncol(Y) - 1)) SE = resid.SE / (sd.X * sqrt(ncol(Y) - 1)) ### Adjust DoF for serial correlation? if (adj.DoF == TRUE) { ### Find the lag-1 ACF r = corFn(resids[, -1], resids[, -ncol(Y)], cols = FALSE) ### Apply DoF correction Q = sqrt((1 - r) / (1 + r)) SE = SE / Q } else { r = Q = NA } ### Time series? if (ts.prob == -1) { fitted = ts(t(fitted), start = st, freq = fr) resids = ts(t(resids), start = st, freq = fr) } } else { fitted = resids = resid.SE = SE = r = Q = NA } ### Confidence intervals? if (!identical(ci, NA)) { SE.mat = matrix(rep(SE, length(ci)), nrow = rows, byrow = TRUE) ci = SE.mat * -qnorm(0.5 * (1 - ci)) } else { ci = NA } ### Significance calculations (TWO-TAILED) if (sig == TRUE) { p = 2 * (1 - pnorm(abs(b1) / SE)) } else { p = NA } ### Return lm = list() lm$coef = b1 lm$int = b2 lm$resid.SE = resid.SE lm$resid.acf = r lm$dof.adj = Q lm$SE = SE lm$resids = resids lm$fitted = fitted lm$ci = ci lm$p = p lm } getVerStns = function(Y, form = 11, forms = form.list, ver = form.ver) { ### Obtain verification experiment station complement ### (all stations in vector "idx" withheld) idx = rep(FALSE, nrow(all.idx)) idx[ver[[form]]] = TRUE idx = matrix(idx[-forms[[form]]], nrow=1) idx = col(idx)[idx == TRUE] Y.ver = ts(Y[, -idx], start = start(Y), freq = frequency(Y)) ### Find stations that were entirely withheld from the ground station imputation y = x = rep(TRUE, nrow(all.idx)) y[form.grid] = FALSE x[forms[[form]]] = FALSE z = as.matrix(x!=y) form.withheld = row(z)[z != TRUE] ### Return list ver.list = list(Y.ver, idx, form.withheld) } getOffsets = function(dat, base, rng, st=1982, en=c(2006,12)) { ### Calculate anomalies using only common points base = window(base, st, en) dat = window(dat, st, en) idx = !is.na(base) & !is.na(dat) base.anom = makeAnom(base, idx = idx) dat.anom = makeAnom(dat, idx = idx) ### Get averages dat.m = rowMeans(dat.anom, na.rm=TRUE) base.m = rowMeans(base.anom, na.rm=TRUE) ### Initialize variables w.error = vector() t.error = vector() w.sat = list() t.sat = list() w.fac = vector() t.fac = vector() for (i in 1:length(dat.m)) { ### Get the start and stop points for the test st = i - trunc(0.5 * rng) en = i + trunc(0.5 * rng) if (st > 0 & en < length(dat.m)) { ### If start and stop do not exceed series length, calculate est.means = wilcox.test(dat.m[st:en], base.m[st:en], conf.int = TRUE, paired = TRUE) w.error[i] = as.vector(est.means$est) / (as.vector(est.means$est) - as.vector(est.means$conf.int[1])) est.means = t.test(dat.m[st:en], base.m[st:en], paired = TRUE) t.error[i] = as.vector(est.means$est) / (as.vector(est.means$est) - as.vector(est.means$conf.int[1])) } else { ### If not, fill in NAs w.error[i] = NA; t.error[i] = NA } } offsets = rep(0, length(dat.m)) for (i in 1:nrow(sat.cal.matrix)) { type = sat.cal.matrix[i, 1] st = sat.cal.matrix[i, 2] en = sat.cal.matrix[i, 3] st2 = sat.cal.matrix[i, 4] en2 = sat.cal.matrix[i, 5] ### Check to ensure valid regresssion model entered. Default ### to linear if not. if (type != "lm" & type != "rlm" & type != "loess" & type != "t-test") { print(paste("Invalid type entered (", type, "). Using t-test.", sep = ""), quote = FALSE) type = "t-test" } ### Linear regression if(type == "lm") { fit = lm(base.m[st:en] - dat.m[st:en] ~ I(st:en)) temp.pred = fit$fitted.values + dat.m[st:en] } ### Robust linear regression if(type == "rlm") { fit = rlm(base.m[st:en] - dat.m[st:en] ~ I(st:en)) temp.pred = fit$fitted.values + dat.m[st:en] } ### Loess (only allows polynomial degree to be 1 and uses full ### span because the fit is extrapolated) if(type == "loess") { fit = loess(base.m[st:en] ~ dat.m[st:en], degree = 1, span = 1, model = TRUE, surface = "direct") temp.pred = predict(fit, dat.m[st2:en2]) } ### Simple offset if(type == "t-test") { fit = t.test(base.m[st:en], dat.m[st:en], paired = TRUE)$est temp.pred = as.numeric(fit) + dat.m[st:en] } offsets[st:en] = dat.m[st:en ]- temp.pred } ### Return the list err = list(w.error, t.error, offsets) } getWts = function (X, Y, form, neigs = 3, alpha = 10, DoFL = 1) { ### Set up index idx = all.idx[-form.list[[form]], ] ### Extract AVHRR eigenvector coefficients svd.avhrr = getPcs(all.avhrr, 1982, c(2006, 12), form, FALSE, DoFL) coefs = svd.avhrr[[9]] wts = svd.avhrr[[9]] * alpha PCs = svd.avhrr[[1]] rm(svd.avhrr) ### Prepare results matrix wts.matrix = matrix(nrow = neigs + 5, ncol = ncol(X) + ncol(Y)) wts.matrix[1, ] = c(as.numeric(idx[, 10]), rep(9999, ncol(Y))) colnames(wts.matrix) = c(as.character(as.vector(idx[, 1])), paste("AVHRR PC ", seq(1:ncol(Y)), sep = "")) rownames(wts.matrix) = c("Geographic Location", paste("Imputation EOF ", seq(1:neigs), sep = ""), "Imputation Coefs", "AVHRR Coefs", "Imputation Signs", "AVHRR Signs") wts.matrix.individual = wts.matrix[, -((ncol(X)+2):(ncol(X)+ncol(Y)))] ### Prepare results list wts.list = list() ### S09 Method: wts.list[[1]] = list() for (i in 1:ncol(Y)) { wts.list[[1]][[i]] = wts.matrix attr(wts.list[[1]][[i]], "Method") = "S09" attr(wts.list[[1]][[i]], "PC") = i } attr(wts.list[[1]], "Method") = "S09" ### Eigenvector-weighted method: wts.list[[2]] = list() for(i in 1:ncol(Y)) { wts.list[[2]][[i]] = wts.matrix.individual colnames(wts.list[[2]][[i]]) = c(as.character(as.vector(idx[, 1])), paste("AVHRR PC ", i, sep = "")) attr(wts.list[[2]][[i]], "Method") = "Individual, Weighted" attr(wts.list[[2]][[i]], "PC") = i } attr(wts.list[[2]], "Method") = "Eigenvector-Weighted" ### Prepare comparison PC matrix (for sign detection) pc.matrix = matrix(nrow = nrow(Y), ncol = ncol(Y)) colnames(pc.matrix) = paste("PC ", seq(1:ncol(Y)), sep = "") ### Scale PCs to unit variance DY = sqrt((nrow(Y) - 1) / colSums(Y ^ 2)) Yo = mScale(Y, DY, type = 1, scale = TRUE, add = FALSE, center = FALSE) ### Scale ground data DX = sqrt((nrow(X) - 1) / colSums(X ^ 2)) Xo = mScale(X, DX, type = 1, scale = TRUE, add = FALSE, center = FALSE) ### S09: Simultaneous for (i in 1:ncol(Y)) { ### Collate ground data and PCs dat = cbind(Xo, Yo) dat = mScale(dat, -colMeans(dat, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) ### Take SVD svd.dat = svd(t(dat), nu = neigs, nv = neigs) ### Store resulting spatial structure for (j in 1:neigs) { for (k in 1:ncol(Y)) { wts.list[[1]][[k]][j + 1, ] = svd.dat$u[, j] * svd.dat$d[j] } } ### Reconstitute new.X = t(svd.dat$u %*% diag(svd.dat$d[1:neigs]) %*% t(svd.dat$v)) ### Store for sign comparison pc.matrix = new.X[, (ncol(X) + 1):(ncol(X) + ncol(Y))] ### Calculate signs s = vector() for (j in 1:ncol(Y)) { s[j] = as.numeric(sign(cor.test(Y[1:300, j], pc.matrix[1:300, j])$estimate)) } ### Save signs wts.list[[1]][[i]][neigs + 4, ] = c(rep(NA, ncol(X)), s) ### Get AVHRR signs & store coefficients wts.list[[1]][[i]][neigs + 3, ] = c(coefs[, i], rep(NA, ncol(Y))) stdev = sd(wts.list[[1]][[i]][neigs + 3, 1:ncol(X)]) wts.list[[1]][[i]][neigs + 3, ] = wts.list[[1]][[i]][neigs + 3, ] / (stdev) s = as.numeric(sign(cor.test(Y[301:600, i], PCs[1:300, i])$estimate)) wts.list[[1]][[i]][neigs + 5, ncol(X) + i] = s } ### Eigenvector-weighted for (i in 1:ncol(Y)) { ### Collate dat = cbind(wts[, i] * sign(wts[, i]) * Xo, Yo[, i]) ### Take SVD svd.dat = svd(t(dat), nu = neigs, nv = neigs) ### Store resulting spatial structure for (j in 1:neigs) {wts.list[[2]][[i]][j + 1, ] = svd.dat$u[, j] * svd.dat$d[j]} ### Reconstitute new.X = t(svd.dat$u %*% diag(svd.dat$d[1:neigs]) %*% t(svd.dat$v)) ### Calculate sign s = as.numeric(sign(cor.test(Y[1:300, i], new.X[1:300, ncol(X)+1])$estimate)) ### Save sign wts.list[[2]][[i]][neigs + 4, ncol(X) + 1] = s ### Get AVHRR signs & store coefficients wts.list[[2]][[i]][neigs + 3, ] = c(coefs[, i], NA) stdev = sd(wts.list[[2]][[i]][neigs + 3, 1:ncol(X)]) wts.list[[2]][[i]][neigs + 3, ] = wts.list[[2]][[i]][neigs + 3, ] / (stdev) s = as.numeric(sign(cor.test(Y[301:600, i], PCs[1:300, i])$estimate)) wts.list[[2]][[i]][neigs + 5, ncol(X) + 1] = s } ### Calculate imputation coefficients for (i in 1:ncol(Y)) { ### Get relative contribution of each imputation EOF for selected PC pc.wts1 = wts.list[[1]][[i]][2:(neigs + 1), ncol(X) + i] pc.wts2 = wts.list[[2]][[i]][2:(neigs + 1), ncol(X) + 1] ### Find relative contribution of each series for (j in 1:ncol(X)) { wts.list[[1]][[i]][neigs + 2, j] = sum(wts.list[[1]][[i]][2:(neigs + 1), j] * pc.wts1) wts.list[[2]][[i]][neigs + 2, j] = sum(wts.list[[2]][[i]][2:(neigs + 1), j] * pc.wts2) } ### Find PC contribution for simultaneous for (j in 1:(ncol(wts.list[[1]][[i]]) - ncol(X))) { wts.list[[1]][[i]][neigs + 2, ncol(X) + j] = sum(wts.list[[1]][[i]][2:(neigs + 1), ncol(X) + j] * pc.wts1) } ### Normalize such that the variance is 1 wts.list[[1]][[i]][neigs + 2, ] = wts.list[[1]][[i]][neigs + 4, ncol(X) + i] * wts.list[[1]][[i]][neigs + 5, ncol(X) + i] * wts.list[[1]][[i]][neigs + 2, ] / sd(wts.list[[1]][[i]][neigs + 2, 1:ncol(X)], na.rm = TRUE) wts.list[[2]][[i]][neigs + 2, ] = wts.list[[2]][[i]][neigs + 4, ncol(X) + 1] * wts.list[[2]][[i]][neigs + 5, ncol(X) + 1] * wts.list[[2]][[i]][neigs + 2, ] / sd(wts.list[[2]][[i]][neigs + 2, 1:ncol(X)], na.rm = TRUE) } ### Sort for geographic location for (i in 1:ncol(Y)) { sort.vec = order(wts.list[[1]][[i]][1, ]) wts.list[[1]][[i]] = wts.list[[1]][[i]][, sort.vec] sort.vec = order(wts.list[[2]][[i]][1, ]) wts.list[[2]][[i]] = wts.list[[2]][[i]][, sort.vec] } ### Return wts.list } gCirc = function(lat.1, lon.1, lat.2, lon.2, r) { ### Convert to radians lat.1 = lat.1 * pi / 180 lat.2 = lat.2 * pi / 180 lon.1 = lon.1 * pi / 180 lon.2 = lon.2 * pi / 180 ### Get delta longitude D.lon = lon.2 - lon.1 ### First numerator value D.rho.num.1 = (cos(lat.2) * sin(D.lon)) ^ 2 ### Second numerator value D.rho.num.2 = (cos(lat.1) * sin(lat.2) - sin(lat.1) * cos(lat.2) * cos(D.lon)) ^ 2 ### Calculate numerator D.rho.num = sqrt(D.rho.num.1 + D.rho.num.2) ### Calculate denominator D.rho.den = sin(lat.1) * sin(lat.2) + cos(lat.1) * cos(lat.2) * cos(D.lon) ### Calculate angular distance D.rho = atan2(D.rho.num, D.rho.den) ### Calculate linear distance d = r * D.rho d } ########################################################################################## ###################### END VERIFICATION FUNCTIONS ###################### ########################################################################################## ############################## ### SEGMENT: EM ALGORITHM ### ############################## ### ### The EM algorithm performs estimation of missing values in sparse matrices with the ### following methodological options: ### ### Truncated total least squares (TTLS) ### Ridge regression (IRidge and MRidge) ### Truncated SVD (TSVD) ### ### TTLS and TSVD may be performed by directly proceeding to the fixed truncation parameter ### (default behavior) or may progressively approach the fixed truncation parameter by ### progressively increasing the parameter from 1, iterating to convergence, and using the ### converged covariance/correlation matrix as input for the next truncation value (recommended). ### ### Ridge regression may be performed by using a common regularization parameter for ### each time step that is selected by a GCV function (MRidge), using an individual regularization ### parameter for each variable and each time step (IRidge), or by using a user-supplied ### vector of regularization parameters (either method). ### ### Main arguments: ### ### X: Input data matrix, with variables arranged in columns. If a time-series, the ### solution will be returned as a time series. ### ### C: Optional covariance/correlation matrix. If using the EM algorithm in a correlation ### setting, C will be scaled to have the diagonal equal unity. ### ### type: The method of regression to be used: ### ### "ttls" - TTLS (default) ### "mridge" - Multiple ridge regression ### "iridge" - Individual ridge regression ### "tsvd" - TSVD ### ### n.eof: Maximum number of retained EOFs. Must be set to a numeric value for TTLS ### and TSVD. If set to "NA" for ridge regression and RTTLS, all EOFs with non-zero ### eigenvalues will be used. ### ### maxiter: Maximum number of iterations (default value of 100). ### ### tol: Stagnation tolerance, which is calculated as the RMS change in missing values ### from the previous iteration, scaled by the relative distance from the initial ### mean of zero (default value of 0.001). ### ### sub.tol: Convergence tolerance to use for progressively raising retained EOFs by 1 ### until "n.eof" is reached. This tolerance will be used instead of "tol" for ### estimation prior to reaching "n.eof" for the truncation parameter (default ### value of 0.005). ### ### covr: (Logical) If FALSE (default), perform regression using standardized variables. ### If true, the algorithm does not automatically standardize, which allows the ### user to supply a set of weights for standardization. ### ### DoFL: Degrees of freedom lost (default value of 1). ### ### inflate: Parameter to inflate the residual scaled covariance matrices to account for ### underestimation of residual variance (default value of 1). ### ### min.relvar: Minimum relative variance of the residuals for setting the lower bound ### for determination of the regularization parameter (ridge regression, ### default value of 0.05). If set below the sum of the trace of the negligible ### eigenvalues, machine precision is used to set the lower bound. For RTTLS, ### machine precision is always used for the lower bound. ### ### min.var: Minimum fraction of variance required to be retained by regularization ### (ridge regression only; default to zero). If set to > 0, min.var is used to ### set the upper bound of the regularization parameter. Not used for RTTLS. ### ### penalty: The penalty power used for the GCV function (ridge regression only; default ### value of 2.0). ### ### h.default: Fixed regularization parameters for ridge regression (default value of NA). ### This must be a vector of length 1 for "mridge" and a vector of length equal ### to the number of variables for "iridge". ### ### wts: A vector of weighting factors to be used (default value of NA). Length must be ### equal to the number of variables. ### ### B.series: A vector containing the variable numbers for which retention of the regression ### coefficients is desired (default value of NA). If any element is set to "0", the ### raw B matrices containing all regression coefficients are returned for each unique ### combination of available and missing values. ### ### progressive: A vector describing whether the regression will be done progressively ### (ignored for ridge regression) and which solutions for the incremented values ### of the truncation parameter are to be stored in memory. ### ### [1] - (Logical) Use progressive method (default of FALSE). ### [2] - Starting EOF (default of 1). ### [3] - (Logical) Store intermediate converged solutions (default of FALSE). ### [4] - Starting EOF for storage (default of 1). ### ### plt: A vector describing whether to produce the standard plots for each iteration. ### ### [1] - (Logical) Plot? (default of TRUE) ### [2] - Variable column number to plot (default of 1) ### ### gap: Defines the number of iterations between printing iteration results and for saving ### iteration results to disk (default value of 10). ### ### mem: (Logical) If TRUE, print memory usage information (default of FALSE). ### ### save.file: (Logical) If TRUE, algorithm will save according to the save.method settings ### (default of FALSE). ### ### NOTE: To prevent duplicate saves, setting this parameter does ### NOT result in saving the overall list of outputs. Saving ### the overall output must be done externally to the algorithm. ### ### save.method: Determines the type of saving to disk to be performed: ### ### [1] - (Logical) Save each gap iteration (default of FALSE). ### [2] - (Logical) Save upon convergence for each value of the ### truncation parameter (progressive TTLS and TSVD only; ### default of FALSE). ### ### dat.path: Path for saving. Format is: "X:/path". Do not include "/" after the path. ### If the specified path does not exist, it is created. To save in the working ### directory, set dat.path to NA (default value of NA). ### ### file.name: Filename for saving. Do not include a leading "/". ### ### max.size: Number of elements in 1,000s to be simultaneously calculated during matrix ### scaling/adding operations to prevent allocation and memory errors. Lower ### values of max.size result in the algorithm being able to handle bigger ### matrices, but results in slower performance. Larger values speed up scaling ### operations, but can result in memory allocation errors. Default value of ### 1,000, which allows matrices up to 1 million elements to be calculated ### in one step. ### emFn = function (X, C = NA, type = "ttls", n.eof = NA, maxiter = 100, tol = 0.001, sub.tol = NA, covr = FALSE, DoFL = 1, inflate = 1, min.relvar = 0.05, min.var = 0, penalty = 2, h.default = NA, wts = NA, progressive = c(FALSE, 1, FALSE, 1), plt = c(TRUE, 1), gap = 10, mem = FALSE, save.file = FALSE, save.method = c(FALSE, FALSE), B.series = NA, dat.path = NA, file.name = NA, max.size = 1000) { ############################### ### Section 1: Initialization ############################### ############################## ### DEFINE INTERNAL FUNCTIONS ############################## ### Calculates sum-of-squares ssq = function (x) {sum(x ^ 2)} ### Finds identical patterns combo = function (x, m, s) {identical(x, m)} ### Collation for unique patterns of missing/available data collate1 = function (x) {c(1:p)[missing[x[1], ]]} collate2 = function (x) {c(1:p)[!missing[x[1], ]]} ### Function for calculating elements of the scaled residual covariance matrix residFn = function (x, h, Fo, d, j) { ### Calculate diag.S = (h[j] ^ 2 * h[x] ^ 2) / ((d + h[j] ^ 2) * (d + h[x] ^ 2)) S = sum(Fo[, j] * diag.S * Fo[ ,x]) S } ### Function for performing generalized cross validation (ridge regression) gcvRidge = function (x, a) { ### Define function to be optimized for h gcvFn = function(h, x) { filt = h ^ 2 / (x$d + h ^ 2) G = (sum(filt ^ 2 * x$F2) + x$trS0 ) / (x$n.eff - length(x$d) + sum(filt)) ^ x$penalty G } ### Parse parameters d = a$d dv = a$d.var len = length(a$d) mv = a$min.var mr = a$min.relvar ### Trace of residuals not dependent on h a$trS0 = sum(a$d.S0[x]) ### Estimated minimum trace of missing scaled covariance matrix min.trS = mr * sum(a$d.C[x]) ### Define regularization parameter discrimination ### Default value h.tol = 0.2 / sqrt(a$n.eff) ### Set upper limit on regularization parameter if (mv > min(dv)) { q = max(which(dv < mv)) max.h = (d[q] + (d[q + 1] - d[q]) * (mv - dv[q]) / (dv[q + 1] - dv[q])) } else { max.h = sqrt(max(d)) / h.tol } ### Set lower limit on regularization parameter ### If residual trace not dependent on h exceeds the lower bound of the trace ### of the scaled covariance matrix, use machine precision. Otherwise, find ### eigenvalue closest to min.trS using residual norms of truncated SVD solutions if (a$trS0 > min.trS) { min.h = sqrt(.Machine$double.eps) } else { r = vector() r[len] = a$trS0 for (j in 1:(len - 1)) {r[len - j] = r[len - j + 1] + a$F2[len - j + 1]} min.r = which.min(abs(r - min.trS)) min.h = sqrt(max(d[min.r], min(d) / a$n.eff)) } ### Check for sensible limits & optimize if (min.h > max.h) { warning("Upper bound less than lower bound on regularization parameter; using lower bound.") opt.h = min.h } else { opt.h = optimize(gcvFn, interval = c(min.h, max.h), tol = h.tol, x = a)$minimum } ### Return opt.h } ### Memory smart & faster matrix scaling & adding mScale = function (X, d = 1, scale = TRUE, center = FALSE, add = FALSE, type = 1, na.rm = TRUE, max.size = 500) { ### Convert max.size (command line argument is in thousands) max.size = max.size * 1000 ### Get matrix data n = nrow(X) p = ncol(X) pts = n * p ts.test = FALSE if (is.ts(X)) { ts.st = start(X) ts.fr = frequency(X) tsp(X) = NULL ts.test = TRUE } ### Check for proper length of d & correct ADD setting if (length(d) == 1 & type == 1 & (scale == TRUE | add == TRUE)) {d = rep(d, p)} if (length(d) == 1 & type == 2 & (scale == TRUE | add == TRUE)) {d = rep(d, n)} if (length(d) != p & type == 1 & (scale == TRUE | add == TRUE)) {warning("Scaling vector length different than number of columns.\nScaling vector recycled.", call. = FALSE, immediate. = TRUE)} if (length(d) < n & type == 2 & (scale == TRUE | add == TRUE)) {warning("Scaling vector length less than number of rows.\nNAs produced.", call. = FALSE, immediate. = TRUE)} if (length(d) > n & type == 2 & (scale == TRUE | add == TRUE)) {warning("Scaling vector length greater than number of rows.\nScaling vector truncated.", call. = FALSE, immediate. = TRUE)} ### If matrix too big, break into chunks if (pts > max.size) { blocks = ceiling(pts / max.size) rows = ceiling(n / blocks) } else { blocks = 1 rows = n } ### Do scaling / centering for (i in 1:blocks) { ### Get start and end points st = 1 + (i - 1) * rows en = min(i * rows, n) ### Column centering if (type == 1 & center == TRUE & en > st) { cd = colMeans(X, na.rm = na.rm) s = matrix(cd, ncol = p, nrow = en - st + 1, byrow = TRUE) X[st:en, ] = X[st:en, ] - s rm(cd, s) } ### Row centering if (type == 2 & center == TRUE & en > st) { cd = rowMeans(X, na.rm = na.rm) s = matrix(cd[st:en], ncol = p, nrow = en - st + 1) X[st:en, ] = X[st:en, ] - s rm(cd, s) } ### Column scaling / adding if (type == 1 & (scale == TRUE | add == TRUE) & en > st) { s = matrix(d, ncol = p, nrow = en - st + 1, byrow = TRUE) if (scale == TRUE) {X[st:en, ] = X[st:en, ] * s} if (add == TRUE) {X[st:en, ] = X[st:en, ] + s} rm(s) } ### Row scaling / adding if (type == 2 & (scale == TRUE | add == TRUE) & en > st) { s = matrix(d[st:en], ncol = p, nrow = en - st + 1) if (scale == TRUE) {X[st:en, ] = X[st:en, ] * s} if (add == TRUE) {X[st:en, ] = X[st:en, ] + s} rm(s) } } ### Cleanup & return if (ts.test == TRUE) {X = ts(X, start = ts.st, freq = ts.fr)} X } ##################################### ### TS TO MATRIX (FASTER PROCESSING) ##################################### if (is.ts(X)) { ### Record characteristics x.ts = TRUE st = start(X) en = end(X) fr = frequency(X) r.names = rownames(X) c.names = colnames(X) ### Convert to non-ts (allows faster computations) X = matrix(X, nrow = nrow(X), ncol = ncol(X)) } else { x.ts = FALSE st = en = fr = NA r.names = rownames(X) c.names = colnames(X) } ########################################### ### FIND UNIQUE AVAILABLE/MISSING PATTERNS ########################################### n = nrow(X) p = ncol(X) n.eff = max(1, n - DoFL) missing = is.na(X) miss = list() avail = list() ### Perform if not TSVD if (type != "tsvd") { master = seq(1:n) common = matrix(nrow = n, ncol = n) ### Step through all rows for (i in 1:n) { temp = master[apply(missing, 1, combo, m = missing[i, ])] common[i, ] = c(temp, rep(0, n - length(temp))) } ### Remove empty columns and duplicated rows empty = master[colSums(common) == 0] dupe = master[master != common[, 1]] common = common[-dupe, -empty] ### Store missing/available variables miss = apply(common, 1, collate1) avail = apply(common, 1, collate2) } else {temp = common = empty = dupe = master = miss = avail = NA} #################### ### CHECK ARGUMENTS #################### if (maxiter < 1) { warning("Supplied iteration limit too low (allowed range >= 1).\nSetting to 1.\n") maxiter = 1 } if (tol < 1e-9) { warning("Supplied convergence tolerance too low (allowed range > 1e-9).\nSetting to 1e-9.\n") tol = 1e-9 } if (inflate < 0) { warning("Supplied inflation parameter less than limit (0).\nSetting to 0.\n") inflate = 0 } if (is.na(n.eof) & type != "mridge" & type != "iridge") { warning("Retained EOFs greater than or equal to effective full rank.\nResetting retained EOFs to one less than effective full rank.\n") n.eof = min(n - DoFL - 1, p - 1) } if (n.eof > min(n - DoFL - 1, p - 1) & type != "mridge" & type != "iridge") { warning("Retained EOFs greater than or equal to effective full rank.\nResetting retained EOFs to one less than effective full rank.\n") n.eof = min(n - DoFL - 1, p - 1) } if (!identical(h.default, NA) & length(h.default) != p & type == "iridge") { warning("Inappropriate length of regularization parameter vector.\nDiscarding supplied vector and calculating theoretical parameters using on-board GCV function.\n") h.default = NA } if (!identical(h.default, NA) & length(h.default) != 1 & type == "mridge") { warning("Inappropriate length of regularization parameter vector.\nDiscarding supplied vector and calculating theoretical parameters using on-board GCV function.\n") h.default = NA } if (is.na(sub.tol)) {sub.tol = tol} if (plt[1] == TRUE) { if (is.na(plt[2]) | plt[2] < 1 | plt[2] > p | length(plt[2]) > 1) {plt[2] == 1} } B.series = sort(unique(B.series)) if (is.numeric(B.series) == FALSE) {B.series = NA} if (!identical(B.series, NA) & (min(B.series) < 0 | max(B.series) > p | length(B.series) > p)) { warning("Invalid variables for storing B matrices.\nStorage of B matrices will not be performed.\n") B.series = NA } ################# ### Prep weights ################# ### No weights? if (identical(wts, NA)) { wts = rep(1, p) wts.flag = FALSE } else {wts.flag = TRUE} ### Check for correct length of weights if (length(wts) != p) {stop("Length of weighting vector does not equal number of variables")} ################# ### PREP OPTIONS ################# if (type != "tsvd" & type != "ttls" & type != "iridge" & type != "mridge") {stop(paste("Unknown regression type ", type, ".", sep = ""))} if (type == "ttls") { regFn = ttlsRegFn reg.type = 3 conv.message = "eof" ridge.trunc = n.eof } if (type == "iridge") { regFn = ridgeRegFn reg.type = 1 conv.message = "ridge" ridge.trunc = n.eof if (is.na(ridge.trunc)) {ridge.trunc = max(1, min(n - DoFL, p))} n.eof = 1 } if (type == "mridge") { regFn = ridgeRegFn reg.type = 2 conv.message = "ridge" ridge.trunc = n.eof if (is.na(ridge.trunc)) {ridge.trunc = max(1, min(n - DoFL, p))} n.eof = 1 } if (type == "tsvd") { regFn = tsvdRegFn reg.type = 3 B.series = NA conv.message = "eof" ridge.trunc = n.eof } ### Used if progressively iterating (applicable to TTLS and TSVD only) start.eof = end.eof = store.eof = n.eof if (progressive[1] == TRUE & (reg.type == 3 | reg.type == 0)) { start.eof = progressive[2] end.eof = n.eof if (start.eof > n.eof) { warning("Invalid starting EOF for progressive (greater than ending EOF).\nProgressive not used.\n") start.eof = end.eof = store.eof = n.eof } if (progressive[3] == TRUE) {store.eof = progressive[4]} if (store.eof > end.eof) { warning("Invalid starting EOF for retaining infilled estimates.\nOnly storing final estimate.\n") store.eof = n.eof } } ### Save options if (save.file == TRUE) { ### Parse names and ensure directory exists len = nchar(dat.path) if (!is.na(dat.path) & substr(dat.path, len, len) == "/") {substr(dat.path, len, len) = ""} if (!is.na(file.name) & substr(file.name, 1, 1) == "/") {substr(file.name, 1, 1) = ""} if (is.na(dat.path)) {dat.path = getwd()} if (!file.exists(dat.path)) { file.create(dat.path) message(paste("Directory ", dat.path, " does not exist. Creating.", sep = "")) } if (is.na(file.name)) { a = 1 while (file.exists(paste("EM-", type, "-", Sys.Date(), "-", a, ".RData", sep = ""))) {a = a + 1} file.name = paste("EM-", type, "-", Sys.Date(), "-", a, ".RData", sep = "") } } ########### ### PREP X ########### ### Check for sufficient # of points n.orig = colSums(!missing) n.low = n.orig < DoFL + 1 if (sum(n.low) > 0) { stop.cols = paste(" ", c(1:p)[n.low], sep = "") stop(paste("Insufficient number of points (less than ", DoFL + 1, ") in variables", stop.cols, sep = "")) } ### Center center = colMeans(X, na.rm = TRUE) center[is.na(center)] = 0 X = mScale(X, -center, add = TRUE, scale = FALSE, max.size = max.size) ### Estimate variance scales = sqrt(colSums(X ^ 2, na.rm = TRUE) / (n.orig - DoFL)) ### Initial infill X[missing] = 0 Xhat = matrix(0, nrow = n, ncol = p) ########### ### Prep C ########### ### Supplied C? if (identical(C, NA)) {C = (t(X) %*% X) / (n - DoFL)} ### Check for proper # of rows if (nrow(C) != p) { warning("Incorrect dimensions for covariance matrix.\nGenerating new matrix.\n") C = (t(X) %*% X) / (n - DoFL) } ############################# ### Establish argument lists ############################# ### Option/info list .em.arg = list() .em.arg$n = nrow(X) .em.arg$p = ncol(X) .em.arg$n.eff = max(1, n - DoFL) .em.arg$type = type .em.arg$reg.type = reg.type .em.arg$n.eof = n.eof .em.arg$trunc = ridge.trunc .em.arg$store.eof = store.eof .em.arg$maxiter = maxiter .em.arg$tol = tol .em.arg$sub.tol = sub.tol .em.arg$covr = covr .em.arg$DoFL = DoFL .em.arg$inflate = inflate .em.arg$min.relvar = min.relvar .em.arg$min.var = min.var .em.arg$penalty = penalty .em.arg$h.default = h.default .em.arg$wts = wts .em.arg$wts.flag = wts.flag .em.arg$B.series = B.series .em.arg$gap = gap .em.arg$mem = mem .em.arg$common = common .em.arg$missing = missing .em.arg$miss = miss .em.arg$avail = avail .em.arg$nmis = sum(missing) .em.arg$max.size = max.size .em.arg$plot = plt[1] .em.arg$plot.series = plt[2] .em.arg$start.eof = start.eof .em.arg$end.eof = end.eof ### Establish data list .em.results = list() .em.dat = list() .em.dat$X = X .em.dat$Xhat = Xhat .em.dat$C = C .em.dat$C.res = matrix(0, nrow = p, ncol = p) .em.dat$B = list() .em.dat$dXhat = 1e6 .em.dat$nXhat = 1e6 .em.dat$rXhat = 1e6 .em.dat$p.eff.ave = vector() .em.dat$v.ret.ave = vector() .em.dat$d = vector() .em.dat$M = center ### Establish characteristics list .em.char = list() .em.char$st = st .em.char$en = en .em.char$fr = fr .em.char$ts = x.ts .em.char$r.names = r.names .em.char$c.names = c.names .em.char$scales = scales ### Establish function list .em.fn = list() .em.fn$ssq = ssq .em.fn$residFn = residFn .em.fn$gcvRidge = gcvRidge .em.fn$mScale = mScale ### Clean up rm(X, C, type, reg.type, n.eof, ridge.trunc, maxiter, tol, sub.tol, covr, DoFL, inflate, min.relvar, min.var, penalty, h.default, wts, wts.flag, B.series, gap, mem, common, miss, avail, empty, dupe, temp, master, scales, center, x.ts, st, en, fr) temp = gc() ########################### ### Section 2: Regression ########################### for (i in start.eof:end.eof) { ### Print estimated memory usage? if (.em.arg$mem == TRUE) { size.X = round(object.size(.em.dat$X) / 1048576, 5) size.C = round(object.size(.em.dat$C) / 1048576, 5) if (is.na(.em.arg$B.series[1])) {B.factor = 0} else { if (.em.arg$B.series[1] == 0) {B.factor = .em.arg$p} else {B.factor = length(.em.arg$B.series)} } message("Memory usage:") message("-------------") message(paste("Size of X: ", size.X, " Mb", sep = "")) if (.em.arg$type != "tsvd") { message(paste("Size of C: ", size.C, " Mb", sep = "")) message(paste("Estimated required memory for storage of X, Xhat, C, C.res, and B: ", (2 + (end.eof - start.eof) * progressive[3]) * (((3 + B.factor)) * size.X + 2 * size.C), " Mb", sep = "")) } else {message(paste("Estimated required memory for storage of X, Xhat, and B: ", (2 + (end.eof - start.eof) * progressive[3]) * ((3 + B.factor)) * size.X, " Mb", sep = ""))} message("-------------") message("") } ### Initialize .em.results[[i]] = list() iter = 1 stag = 1e6 ### Check to see if using secondary tolerance if (i == end.eof) {tol = .em.arg$tol} else {tol = .em.arg$sub.tol} ### Iterate while(iter <= .em.arg$maxiter & stag > tol) { ### Regression .em.results[[i]] = regFn(.em.arg, .em.dat, .em.char, .em.fn, i, iter) ### Find changes in estimated values rel = round(.em.results[[i]]$nXhat[iter], 5) stag = round(.em.results[[i]]$rXhat[iter], 5) rms = round(.em.results[[i]]$dXhat[iter], 5) ### Update .em.dat = .em.results[[i]] ### Print iteration/memory usage or save iteration? if (iter / .em.arg$gap == trunc(iter / .em.arg$gap)) { ### Iteration info message(paste(iter, ": Relative: ", rel, "; RMS: ", rms, "; Stagnation: ", stag, ".", sep = "")) ### Memory info if (.em.arg$mem == TRUE) { size.X = round(object.size(.em.dat$X) / 1048576, 5) size.C = round(object.size(.em.dat$C) / 1048576, 5) if (is.na(.em.arg$B.series[1])) {B.factor = 0} else { if (.em.arg$B.series[1] == 0) {B.factor = .em.arg$p} else {B.factor = length(.em.arg$B.series)} } message(paste("Total memory usage: ", sum(gc()[, 2]), " Mb", sep = "")) message(paste("Estimated current usage to retain B matrices: ", (i - start.eof + 1) * B.factor * size.X, " Mb", sep = "")) } ### Save iteration if (save.file == TRUE & save.method[1] == TRUE) { ### Recenter to original centering temp = .em.results[[i]]$X temp[.em.arg$missing] = NA center.vec = colMeans(temp, na.rm = TRUE) .em.results[[i]]$X = mScale(.em.results[[i]]$X, -center.vec + .em.dat$M, type = 1, scale = FALSE, add = TRUE, center = FALSE, max.size = .em.arg$max.size) .em.results[[i]]$Xhat = mScale(.em.results[[i]]$Xhat, -center.vec + .em.dat$M, type = 1, scale = FALSE, add = TRUE, center = FALSE, max.size = .em.arg$max.size) ### Restore row/column names rownames(.em.results[[i]]$X) = rownames(.em.results[[i]]$Xhat) = .em.char$r.names colnames(.em.results[[i]]$X) = colnames(.em.results[[i]]$Xhat) = .em.char$c.names ### Restore ts if (.em.char$ts == TRUE) { .em.results[[i]]$X = ts(.em.results[[i]]$X, start = .em.char$st, freq = .em.char$fr) .em.results[[i]]$Xhat = ts(.em.results[[i]]$Xhat, start = .em.char$st, freq = .em.char$fr) } if (.em.char$ts == TRUE & !is.na(.em.arg$B.series[1]) & .em.arg$B.series[1] != 0) { for (j in .em.arg$B.series) { .em.results[[i]]$B[[j]] = ts(.em.results[[i]]$B[[j]], start = .em.char$st, freq = .em.char$fr) rownames(.em.results[[i]]$B[[j]] ) = .em.char$r.names colnames(.em.results[[i]]$B[[j]] ) = .em.char$c.names } } ### Save save.name = paste(dat.path, "/eof-", i, "-iter-", iter, "-", file.name, sep = "") em = list() em[[iter]] = .em.results[[i]] em[[iter]]$arg = .em.arg save(em, file = save.name) rm(em) } } ### Increment iter = iter + 1 } ### Inform about convergence if (stag > .em.arg$tol & conv.message == "eof") { message(paste(.em.arg$maxiter, " iterations without convergence at ", i, " retained EOFs.\nRelative change: ", rel, "\nRMS change: ", rms, "\nStagnation value: ", stag, "\n", sep = "")) .em.results[[i]]$conv = FALSE } if (stag <= .em.arg$tol & conv.message == "eof") { message(paste(iter - 1, " iterations to convergence at ", i, " retained EOFs.\nRelative change: ", rel, "\nRMS change: ", rms, "\nStagnation value: ", stag, "\n", sep = "")) .em.results[[i]]$conv = TRUE } if (stag > .em.arg$tol & conv.message == "ridge") { message(paste(.em.arg$maxiter, " iterations without convergence with ", round(.em.dat$p.eff.ave[iter - 1], 3), " average number of parameters.\nRelative change: ", rel, "\nRMS change: ", rms, "\nStagnation value: ", stag, sep = "")) message(paste("Truncation point: ", .em.arg$trunc, " (maximum available parameters).\n", sep = "")) .em.results[[i]]$conv = FALSE } if (stag <= .em.arg$tol & conv.message == "ridge") { message(paste(iter - 1, " iterations to convergence with ", round(.em.dat$p.eff.ave[iter - 1], 3), " average number of parameters.\nRelative change: ", rel, "\nRMS change: ", rms, "\nStagnation value: ", stag, sep = "")) message(paste("Truncation point: ", .em.arg$trunc, " (maximum available parameters).\n", sep = "")) .em.results[[i]]$conv = TRUE } ### Cleanup if (store.eof <= i | (save.file == TRUE & save.method[2] == TRUE)) { ### Recenter to original centering temp = .em.results[[i]]$X temp[.em.arg$missing] = NA center.vec = colMeans(temp, na.rm = TRUE) .em.results[[i]]$X = mScale(.em.results[[i]]$X, -center.vec + .em.dat$M, type = 1, scale = FALSE, add = TRUE, center = FALSE, max.size = .em.arg$max.size) .em.results[[i]]$Xhat = mScale(.em.results[[i]]$Xhat, -center.vec + .em.dat$M, type = 1, scale = FALSE, add = TRUE, center = FALSE, max.size = .em.arg$max.size) ### Restore row/column names rownames(.em.results[[i]]$X) = rownames(.em.results[[i]]$Xhat) = .em.char$r.names colnames(.em.results[[i]]$X) = colnames(.em.results[[i]]$Xhat) = .em.char$c.names ### Restore ts if (.em.char$ts == TRUE) { .em.results[[i]]$X = ts(.em.results[[i]]$X, start = .em.char$st, freq = .em.char$fr) .em.results[[i]]$Xhat = ts(.em.results[[i]]$Xhat, start = .em.char$st, freq = .em.char$fr) } if (.em.char$ts == TRUE & !is.na(.em.arg$B.series[1]) & .em.arg$B.series[1] != 0) { for (j in .em.arg$B.series) { .em.results[[i]]$B[[j]] = ts(.em.results[[i]]$B[[j]], start = .em.char$st, freq = .em.char$fr) rownames(.em.results[[i]]$B[[j]] ) = .em.char$r.names colnames(.em.results[[i]]$B[[j]] ) = .em.char$c.names } } ### Save if (save.file == TRUE & save.method[2] == TRUE) { save.name = paste(dat.path, "/eof-", i, "-", file.name, sep = "") converged = list() converged[[i]] = .em.results[[i]] converged[[i]]$arg = .em.arg save(converged, file = save.name) rm(converged) } } ### Setup for next EOF (progressive) .em.dat$dXhat = .em.results[[i]]$dXhat[iter - 1] .em.dat$nXhat = .em.results[[i]]$nXhat[iter - 1] .em.dat$rXhat = .em.results[[i]]$rXhat[iter - 1] .em.dat$dXhat.act = .em.results[[i]]$dXhat.act[iter - 1] .em.dat$rXhat.act = .em.results[[i]]$rXhat.act[iter - 1] .em.dat$rho = list() if (store.eof > i) {.em.results[[i]] = list()} } ### Return .em.results[[i]]$arg = .em.arg .em.results } ### Master routine for TTLS ttlsRegFn = function (arg, dat, char, fn, n.eof, iter) { ### Initialize B storage if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) { for (j in arg$B.series) { dat$B[[j]] = matrix(0, nrow = arg$n, ncol = arg$p) attr(dat$B[[j]], "name") = char$r.names[j] attr(dat$B[[j]], "variable number") = j } } ### New Xhat, C.res, S0, and S Xhat = matrix(0, nrow = arg$n, ncol = arg$p) S = S0 = C.res = matrix(0, nrow = arg$p, ncol = arg$p) ### Find scales and weighting factors if (arg$covr == TRUE) { D = arg$wts / rep(1, arg$p) } else { D = sqrt(diag(dat$C)) D = arg$wts / D } ### Scale if (arg$covr == FALSE | arg$wts.flag == TRUE) { dat$X = fn$mScale(dat$X, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) dat$Xhat = fn$mScale(dat$Xhat, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) dat$C = fn$mScale(D * dat$C, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) } ### Perform eigendecomposition C.eig = eigen(dat$C, symmetric = TRUE) ### Test for positive eigenvalues; change sign & re-sort if not positive sign.vec = sign(C.eig$values) if (sum(sign.vec) == length(C.eig$values)) { ### All positive, so save dat$d = C.eig$values dat$V = C.eig$vectors } else { ### First, eigenvalues d = abs(C.eig$values) sort.vec = rev(order(d)) d = d[sort.vec] dat$d = d ### Now, eigenvectors V = fn$mScale(C.eig$vectors, sign.vec, max.size = arg$max.size) V = V[, sort.vec] dat$V = V ### Cleanup rm(d, V) } ### Cleanup rm(C.eig) ### Setup dat$rho[[iter]] = vector() ### Set the effective rank, adjusted for zero eigenvalues and truncation rank.eff = max(1, min(sum(dat$d > (dat$d[1] * sqrt(.Machine$double.eps))), arg$n.eff, n.eof)) ### Iterate through unique combinations of available/missing values for (i in 1:nrow(arg$common)) { ### Get the row indices idx = arg$common[i, ] idx = idx[idx != 0] ### Parse for rank.eff V = matrix(dat$V[, 1:rank.eff], ncol = rank.eff) V11 = matrix(dat$V[arg$avail[[i]], 1:rank.eff], ncol = rank.eff) ### Calculate estimated covariance of the truncated modes if (rank.eff < arg$p) { ### Get missing residual estimate V22 = matrix(dat$V[arg$miss[[i]], (rank.eff + 1):arg$p], ncol = (arg$p - rank.eff)) ### Estimate residual scaled covariance matrix resids = V22 %*% diag(dat$d[(rank.eff + 1):arg$p], ncol = (arg$p - rank.eff)) %*% t(V22) S0[arg$miss[[i]], arg$miss[[i]]] = S0[arg$miss[[i]], arg$miss[[i]]] + length(idx) * arg$inflate * resids ### If full rank, all residuals dependent on regression } else {S0 = 0} ### Find the matrix of coefficients B = V11 %*% solve(t(V11) %*% V11) %*% t(V) ### Find the solution Xhat[idx, ] = dat$X[idx, arg$avail[[i]]] %*% B ### Store B if (!is.na(arg$B.series[1]) & arg$B.series[1] == 0) { attr(B, "miss") = arg$miss[[i]] attr(B, "rows") = idx dat$B[[i]] = B } if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) { for (j in arg$B.series) { dat$B[[j]][idx, arg$avail[[i]]] = matrix(B[, j], ncol = length(B[, j]), nrow = length(idx), byrow = TRUE) } } } ### Update C.res C.res = S0 + S dat$C.res = C.res ### Unscale if (arg$covr == FALSE | arg$wts.flag == TRUE) { dat$X = fn$mScale(dat$X, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) Xhat = fn$mScale(Xhat, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) dat$C.res = fn$mScale((1 / D) * dat$C.res, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) } ### Compute solution rms change in Xhat dat$dXhat[iter] = sqrt(fn$ssq(Xhat[arg$missing] - dat$X[arg$missing]) / arg$nmis) ### Compute relative change dat$nXhat[iter] = sqrt(fn$ssq(dat$X[arg$missing]) / arg$nmis) ### Compute stagnation value dat$rXhat[iter] = dat$dXhat[iter] / dat$nXhat[iter] ### Update X & Xhat dat$X[arg$missing] = Xhat[arg$missing] dat$Xhat = Xhat ### Recenter center.vec = colMeans(dat$X) dat$X = fn$mScale(dat$X, -center.vec, type = 1, add = TRUE, scale = FALSE, center = FALSE, max.size = arg$max.size) dat$Xhat = fn$mScale(dat$Xhat, -center.vec, type = 1, add = TRUE, scale = FALSE, center = FALSE, max.size = arg$max.size) colnames(dat$X) = colnames(dat$Xhat) = char$c.names rownames(dat$X) = rownames(dat$Xhat) = char$r.names ### Generate C from X plus C.res dat$C = ((t(dat$X) %*% dat$X) + dat$C.res) / arg$n.eff ### Plotting? if (arg$plot == TRUE) { ### Margins par(mar = c(0.5, 4, 2, .15) + 0.1) ### Split screen close.screen(all = TRUE) split.screen(c(3, 1)) ### Plot selected series screen(1) temp.X = dat$X[, arg$plot.series] temp.X[arg$missing[, arg$plot.series]] = NA temp.Xhat = dat$Xhat[, arg$plot.series] plot(temp.Xhat, ylim = c(min(temp.X, temp.Xhat, na.rm = TRUE), max(temp.X, temp.Xhat, na.rm = TRUE)), type = "l", col = 2, xaxt = "n", ylab = "Deg C", main = paste("Series ", arg$plot.series, " Actual (Black) & Estimated (Red)")) points(temp.X, type = "l", col = 1) ### Plot effective parameters (regularized TLS) or series variance (TTLS) screen(2) temp.D = diag(dat$C) plot(char$scales ^ 2, type = "h", ylim = c(0, max(temp.D, char$scales ^ 2)), xaxt = "n", ylab = "Variance", main = "Variance (Black: Original; Red: Cov. Matrix Diagonal)") rect(arg$plot.series - 0.3, 0, arg$plot.series + 0.6, max(temp.D, char$scales ^ 2), border = NA, col = 7) points(char$scales ^ 2, type = "h") points(temp.D ~ I(c(1:arg$p) + 0.3), type = "h", col = 2) ### Plot eigenvalues screen(3) plot(dat$d, type = "h", xaxt = "n", ylab = "Eigenvalue", main = paste("Eigenvalue Spectrum (Truncation Parameter: ", rank.eff, ")", sep = "")) ### Close & reset to defaults close.screen(all = TRUE) par(mar = c(5, 4, 4, 2) + 0.1) } ### Cleanup & return dat } ### Master routine for ridge regression ridgeRegFn = function (arg, dat, char, fn, n.eof, iter) { ### Initialize B storage if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) { for (j in arg$B.series) { dat$B[[j]] = matrix(0, nrow = arg$n, ncol = arg$p) attr(dat$B[[j]], "name") = char$r.names[j] attr(dat$B[[j]], "variable number") = j } } ### New C.res and Xhat C.res = matrix(0, nrow = arg$p, ncol = arg$p) Xhat = matrix(0, nrow = arg$n, ncol = arg$p) ### Find scales and weighting factors if (arg$covr == TRUE) { D = arg$wts / rep(1, arg$p) } else { D = sqrt(diag(dat$C)) D = arg$wts / D } ### Scale if (arg$covr == FALSE | arg$wts.flag == TRUE) { dat$X = fn$mScale(dat$X, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) dat$Xhat = fn$mScale(dat$Xhat, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) dat$C = fn$mScale(D * dat$C, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) } ### Setup storage variables if (arg$reg.type == 1) { dat$h = matrix(0, nrow = arg$n, ncol = arg$p) dat$p.eff = matrix(0, nrow = arg$n, ncol = arg$p) dat$v.ret = matrix(0, nrow = arg$n, ncol = arg$p) } else { dat$h = rep(0, arg$n) dat$p.eff = rep(0, arg$n) dat$v.ret = rep(0, arg$n) } ### Iterate through unique combinations of available/missing values for (i in 1:nrow(arg$common)) { ### Setup B B = matrix(diag(rep(1, arg$p))[arg$avail[[i]], ], nrow = length(arg$avail[[i]])) ### Find number of missing values n.miss = length(arg$miss[[i]]) ### Perform eigendecomposition C.eig = eigen(dat$C[arg$avail[[i]], arg$avail[[i]]], symmetric = TRUE) ### Test for positive eigenvalues; change sign & re-sort if not positive sign.vec = sign(C.eig$values) if (sum(sign.vec) == length(C.eig$values)) { ### All positive, so save dat$d = C.eig$values dat$V = C.eig$vectors } else { ### First, eigenvalues d = abs(C.eig$values) sort.vec = rev(order(d)) d = d[sort.vec] dat$d = d ### Now, eigenvectors V = fn$mScale(C.eig$vectors, sign.vec, max.size = arg$max.size) V = V[, sort.vec] dat$V = V ### Cleanup rm(d, V) } ### Cleanup rm(C.eig) ### Find number of non-zero eigenvalues, adjusted for effective DoF and truncation rank.eff = min(sum(dat$d > (dat$d[1] * 1e-12)), arg$n.eff, arg$trunc, length(arg$avail[[i]])) dat$d = dat$d[1:rank.eff] dat$V = matrix(dat$V[, 1:rank.eff], ncol = rank.eff) dat$d.var = cumsum(dat$d) / sum(dat$d) ### Get the row indices idx = arg$common[i, ] idx = idx[idx != 0] ### Compute B, scaled residual covariance matrix, and estimated values if (n.miss > 0) { ### Setup list for GCV gcv.arg = list() gcv.arg$d = dat$d gcv.arg$d.var = dat$d.var gcv.arg$n.eff = arg$n.eff gcv.arg$min.var = arg$min.var gcv.arg$min.relvar = arg$min.relvar gcv.arg$penalty = arg$penalty gcv.arg$d.C = diag(diag(dat$C)[arg$miss[[i]]], nrow = n.miss) ### Fourier coefficients Fo = diag(1 / sqrt(dat$d), nrow = rank.eff) %*% t(dat$V) %*% matrix(dat$C[arg$avail[[i]], arg$miss[[i]]], ncol = n.miss) gcv.arg$F2 = rowSums(Fo * Fo) ### Scaled residual covariance matrix that does not depend on h if (arg$n.eff > rank.eff) { S0 = dat$C[arg$miss[[i]], arg$miss[[i]]] - t(Fo) %*% Fo } else { S0 = matrix(0, nrow = n.miss, ncol = n.miss) } ### Diagonal of S0 gcv.arg$d.S0 = diag(S0) ### Find ridge parameters ### Individual if (arg$reg.type == 1) { h = vector() S = matrix(0, nrow = n.miss, ncol = n.miss) for (j in 1:n.miss) { ### Ridge parameters if (is.na(arg$h.default[arg$miss[[i]][j]])) { gcv.arg$F2 = matrix(Fo[, j] ^ 2, ncol = 1) dat$h[idx, arg$miss[[i]][j]] = rep(fn$gcvRidge(x = j, a = gcv.arg), length(idx)) h[j] = dat$h[idx[1], arg$miss[[i]][j]] } else { h[j] = dat$h[idx[1], arg$miss[[i]][j]] = arg$h.default[arg$miss[[i]][j]] } ### Effective number of parameters dat$p.eff[idx, arg$miss[[i]][j]] = matrix(sum(dat$d / (dat$d + h[j] ^ 2)), nrow = length(idx), byrow = TRUE) ### Retained variance dat$v.ret[idx, arg$miss[[i]][j]] = matrix(sum(dat$d ^ 2 / (dat$d + h[j] ^ 2)) / sum(dat$d), nrow = length(idx), byrow = TRUE) ### Get B filt = 1 / (dat$d + h[j] ^ 2) if (rank.eff == 1) {temp = matrix(dat$V * filt, ncol = 1)} else { temp = fn$mScale(dat$V, filt, type = 1, scale = TRUE, add = FALSE, center = FALSE, max.size = arg$max.size) } B.temp = temp %*% t(dat$V) %*% matrix(dat$C[arg$avail[[i]], arg$miss[[i]][j]], ncol=1) rm(temp, filt) ### Assemble residual scaled covariance matrix temp = apply(matrix(c(1:j), nrow = 1), 2, fn$residFn, h = h[1:j], Fo = matrix(Fo[, 1:j], ncol = j), d = dat$d, j = j) S[j, 1:j] = S[1:j, j] = length(idx) * arg$inflate * (S0[1:j, j] + temp) rm(temp) ### Get solution B[, arg$miss[[i]][j]] = B.temp Xhat[idx, arg$miss[[i]][j]] = dat$X[idx, arg$avail[[i]]] %*% B.temp } } ### Multiple if (arg$reg.type == 2) { ### Ridge parameter if (is.na(arg$h.default)) {h = dat$h[i] = fn$gcvRidge(x = c(1:n.miss), a = gcv.arg)} else {h = dat$h[i] = arg$h.default} ### Effective number of parameters dat$p.eff[idx] = sum(dat$d / (dat$d + h ^ 2)) ### Retained variance dat$v.ret[idx] = sum(dat$d ^ 2 / (dat$d + h ^ 2)) / sum(dat$d) ### Get B filt = 1 / (dat$d + h ^ 2) if (rank.eff == 1) {temp = matrix(dat$V * filt, ncol = 1)} else { temp = fn$mScale(dat$V, filt, type = 1, scale = TRUE, add = FALSE, center = FALSE, max.size = arg$max.size) } B = temp %*% t(dat$V) %*% matrix(dat$C[arg$avail[[i]], ], nrow = length(arg$avail[[i]])) rm(temp, filt) ### Estimated scaled covariance matrix of residuals S = length(idx) * arg$inflate * (S0 + t(Fo) %*% (Fo * h ^ 4 / (dat$d + h ^ 2) ^ 2)) ### Get solution Xhat[idx, ] = dat$X[idx, arg$avail[[i]]] %*% B } } else { Xhat[idx, ] = dat$X[idx, ] B = 0 S = 0 } ### Update C.res if (!identical(S, 0)) {C.res[arg$miss[[i]], arg$miss[[i]]] = C.res[arg$miss[[i]], arg$miss[[i]]] + S} ### Store B if (!is.na(arg$B.series[1]) & arg$B.series[1] == 0) { attr(B, "miss") = arg$miss[[i]] attr(B, "rows") = idx dat$B[[i]] = B } if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) { for (j in arg$B.series) { dat$B[[j]][idx, arg$avail[[i]]] = matrix(B[, j], ncol = length(B[, j]), nrow = length(idx), byrow = TRUE) } } } ### Compute average effective number of parameters, retained variance, and store C.res if (arg$reg.type == 1) { temp.p = dat$p.eff temp.p[!arg$missing] = NA dat$p.eff.ave[iter] = sum(temp.p, na.rm = TRUE) / sum(arg$missing) } else { dat$p.eff.ave[iter] = sum(dat$p.eff) / arg$n } dat$v.ret.ave[iter] = sum(dat$v.ret) / arg$nmis dat$C.res = C.res ### Unscale if (arg$covr == FALSE | arg$wts.flag == TRUE) { dat$X = fn$mScale(dat$X, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) Xhat = fn$mScale(Xhat, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) dat$C.res = fn$mScale((1 / D) * dat$C.res, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) } ### Compute solution rms change in Xhat dat$dXhat[iter] = sqrt(fn$ssq(Xhat[arg$missing] - dat$X[arg$missing]) / arg$nmis) ### Compute relative change dat$nXhat[iter] = sqrt(fn$ssq(dat$X[arg$missing]) / arg$nmis) ### Compute stagnation value dat$rXhat[iter] = dat$dXhat[iter] / dat$nXhat[iter] ### Update X & Xhat dat$X[arg$missing] = Xhat[arg$missing] dat$Xhat = Xhat ### Recenter center.vec = colMeans(dat$X) dat$X = fn$mScale(dat$X, -center.vec, type = 1, add = TRUE, scale = FALSE, center = FALSE, max.size = arg$max.size) dat$Xhat = fn$mScale(dat$Xhat, -center.vec, type = 1, add = TRUE, scale = FALSE, center = FALSE, max.size = arg$max.size) colnames(dat$X) = colnames(dat$Xhat) = char$c.names rownames(dat$X) = rownames(dat$Xhat) = char$r.names ### Generate C from X plus C.res dat$C = ((t(dat$X) %*% dat$X) + dat$C.res) / arg$n.eff ### Plotting? if (arg$plot == TRUE) { ### Margins par(mar = c(0.5, 4, 2, .15) + 0.1) ### Split screen close.screen(all = TRUE) split.screen(c(3, 1)) ### Plot selected series screen(1) temp.X = dat$X[, arg$plot.series] temp.X[arg$missing[, arg$plot.series]] = NA temp.Xhat = dat$Xhat[, arg$plot.series] plot(temp.Xhat, ylim = c(min(temp.X, temp.Xhat, na.rm = TRUE), max(temp.X, temp.Xhat, na.rm = TRUE)), type = "l", col = 2, xaxt = "n", ylab = "Deg C", main = paste("Series ", arg$plot.series, " Actual (Black) & Estimated (Red)")) points(temp.X, type = "l", col = 1) ### Plot series variance screen(2) temp.D = diag(dat$C) plot(char$scales ^ 2, type = "h", ylim = c(0, max(temp.D, char$scales ^ 2)), xaxt = "n", ylab = "Variance", main = "Variance (Black: Original; Red: Cov. Matrix Diagonal)") rect(arg$plot.series - 0.3, 0, arg$plot.series + 0.6, max(temp.D, char$scales ^ 2), border = NA, col = 7) points(char$scales ^ 2, type = "h") points(temp.D ~ I(c(1:arg$p) + 0.3), type = "h", col = 2) ### Plot effective parameters screen(3) if (arg$reg.type == 1) { temp.p = dat$p.eff temp.p[!arg$missing] = NA temp.p = rowMeans(temp.p, na.rm = TRUE) temp.p[is.na(temp.p)] = 0 } else { temp.p = dat$p.eff } plot(temp.p, ylim = c(0, max(na.omit(temp.p))), pch = 15, col = "white", xaxt = "n", ylab = "# Eff. Params", main = paste("Effective Parameters (Time Step Average)", sep = "")) points(temp.p, pch = 15, cex = 0.5) ### Close & reset to defaults close.screen(all = TRUE) par(mar = c(5, 4, 4, 2) + 0.1) } ### Cleanup & return dat } ### Master routine for TSVD tsvdRegFn = function (arg, dat, char, fn, n.eof, iter) { ### Remove unnecessary matrices dat$C = dat$C.res = NA ### Setup Xhat Xhat = matrix(0, nrow = arg$n, ncol = arg$p) ### Find scales and weighting factors if (arg$covr == TRUE) { D = arg$wts / rep(1, arg$p) } else { D = sqrt(colSums(dat$X ^ 2) / arg$n.eff) D = arg$wts / D } ### Scale if (arg$covr == FALSE | arg$wts.flag == TRUE) { dat$X = fn$mScale(dat$X, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) dat$Xhat = fn$mScale(dat$Xhat, D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) } ### Approximate X svd.X=svd(dat$X, nu = n.eof, nv = n.eof) if(n.eof > 1) { Xhat = svd.X$u[, 1:n.eof] %*% diag(svd.X$d[1:n.eof]) %*% t(svd.X$v[, 1:n.eof]) } if(n.eof == 1) { Xhat = as.matrix(svd.X$u[, 1]) %*% as.matrix(svd.X$d[1]) %*% t(svd.X$v[, 1]) } ### Unscale if (arg$covr == FALSE | arg$wts.flag == TRUE) { dat$X = fn$mScale(dat$X, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) Xhat = fn$mScale(Xhat, 1 / D, type = 1, add = FALSE, center = FALSE, max.size = arg$max.size) } ### Compute solution rms change in Xhat dat$dXhat[iter] = sqrt(fn$ssq(Xhat[arg$missing] - dat$X[arg$missing]) / arg$nmis) ### Compute relative change dat$nXhat[iter] = sqrt(fn$ssq(dat$X[arg$missing]) / arg$nmis) ### Compute stagnation value dat$rXhat[iter] = dat$dXhat[iter] / dat$nXhat[iter] ### Update X & Xhat dat$X[arg$missing] = Xhat[arg$missing] dat$Xhat = Xhat ### Recenter center.vec = colMeans(dat$X) dat$X = fn$mScale(dat$X, -center.vec, type = 1, add = TRUE, scale = FALSE, center = FALSE, max.size = arg$max.size) dat$Xhat = fn$mScale(dat$Xhat, -center.vec, type = 1, add = TRUE, scale = FALSE, center = FALSE, max.size = arg$max.size) colnames(dat$X) = colnames(dat$Xhat) = char$c.names rownames(dat$X) = rownames(dat$Xhat) = char$r.names ### Plotting? if (arg$plot == TRUE) { ### Margins par(mar = c(0.5, 4, 2, .15) + 0.1) ### Split screen close.screen(all = TRUE) split.screen(c(3, 1)) ### Plot selected series screen(1) temp.X = dat$X[, arg$plot.series] temp.X[arg$missing[, arg$plot.series]] = NA temp.Xhat = dat$Xhat[, arg$plot.series] plot(temp.Xhat, ylim = c(min(temp.X, temp.Xhat, na.rm = TRUE), max(temp.X, temp.Xhat, na.rm = TRUE)), type = "l", col = 2, xaxt = "n", ylab = "Deg C", main = paste("Series ", arg$plot.series, " Actual (Black) & Estimated (Red)")) points(temp.X, type = "l", col = 1) ### Plot series variance screen(2) temp.D = colSums(dat$X ^ 2) / arg$n.eff plot(char$scales ^ 2, type = "h", ylim = c(0, max(temp.D, char$scales ^ 2)), xaxt = "n", ylab = "Variance", main = "Variance (Black: Original; Red: Cov. Matrix Diagonal)") rect(arg$plot.series - 0.3, 0, arg$plot.series + 0.6, max(temp.D, char$scales ^ 2), border = NA, col = 7) points(char$scales ^ 2, type = "h") points(temp.D ~ I(c(1:arg$p) + 0.3), type = "h", col = 2) ### Plot eigenvalues screen(3) plot(svd.X$d, type="h", xaxt = "n", ylab = "Eigenvalue", main = paste("Eigenvalue Spectrum (Truncation Parameter: ", n.eof, ")", sep = "")) ### Close & reset to defaults close.screen(all = TRUE) par(mar = c(5, 4, 4, 2) + 0.1) } ### Cleanup & return dat } ################################################################################ ###################### END EM ALGORITHM ###################### ################################################################################ ######################################## ### SEGMENT: IMPUTATION EXPERIMENTS ### ######################################## ### ### 1. doGndRandom: Performs an infilling experiment with each of the ### station sets. Verification statistics are calculated ### using random withholding. ### ### Y: The full time series set of READER data ### ### n.max: The maximum number of EOFs to use for the ground station infilling ### ### tol: Convergence tolerance ### ### withhold: Proportion of data point to withhold for cross-validation ### ### st: Start date ### ### en: End date ### ### form.st: The ground station clipping vector to start with ### ### form.en: The ground station clipping vector to end with ### ### form: List of all ground station clipping vectors ### ### form.names: Vector of descriptive names for the station sets ### ### plt: See emFn() ### ######################################################################################### ### ### 2. doGndStn: Performs an infilling experiment with each of the ### station sets. Verification statistics are calculated ### using early/late withholding for designated ### stations. ### ### For early calibration, the last half of the data for stations ### in vector "form.ver" is withheld. For late calibration, ### the first half of the same data is withheld. ### ### dat: The full time series set of READER data ### ### n.max: The maximum number of EOFs to use for the ground station imputation ### ### tol: Convergence tolerance ### ### path: Path to use for saving both the imputation results and the statistics ### file. Relative paths allowed. ### ### st: Start date for the imputation ### ### en: End date for the imputation ### ### form.st: The ground station clipping vector to start with ### ### form.en: The ground station clipping vector to end with ### ### form: List of all ground station clipping vectors ### ### ver: List of ground station verification clipping vectors ### ### form.names: Vector of descriptive names for the station sets ### ### plt: See emFn() ### ######################################################################################### ### ### 3. doRls: Performs a series of reconstruction experiments using a regularized ### least squares fit of ground data to the AVHRR spatial EOFs. ### ### v.min: Starting number of spatial eigenvectors ### ### v.max: Maximum number of spatial eigenvectors. If NA, doRls() will ### set v.max equal to the effective rank of the ground station data. ### ### other options: See getRls() in the RECONSTRUCTION functions segment ### ######################################################################################### ### ### 4. cvGnd: Generates separate infilled ground station sets by withholding one ### station at a time. ### ### Yo: Full ground station set ### ### type: Regression type ("iridge", "mridge", "tsvd", or "ttls"). See emFn(). ### ### tol / maxiter / DoFL / n.eof / plt / progressive: See emFn() ### ######################################################################################### ### ### 5. verEigen: Calculates estimates to withheld stations using the output of cvGnd(). ### ### cv.gnd: Output of cv.gnd ### ### dat: AVHRR data (must be a time series) ### ### n.max: Maximum number of retained EOFs to use in the regression ### ### alpha: Weighting factor for ground stations (alphas > 1 --> OLS solution with ground ### stations considered independent; alphas < 1 --> OLS solution with PCs ### considered independent) ### ### st: Start date for the reconstruction ### ### en: End date for the reconstruction ### ### forms: List of ground station clipping vectors ### ### type: Regression type ("iridge", "mridge", "tsvd", or "ttls"). See emFn(). ### ### tol / maxiter / DoFL / n.eof / plt / progressive: See emFn() ### ######################################################################################### ### ### 6. verRLS: Calculates estimates to withheld stations using the output of cvGnd(). ### ### cv.gnd: Output of cv.gnd ### ### X: AVHRR data (must be a time series) ### ### form: Ground station clipping vector to use ### ### v.min: Minimum number of AVHRR spatial eigenvectors to use ### ### v.max: Maximum number of AVHRR spatial eigenvectors to use ### ### h.min: Minimum bound on regularization parameter ### ### h.max: Upper bound on regularization parameter ### ### DoFL: Degrees of freedom lost ### ### tol: Tolerance for the regularization parameter optimize function; see optimize() ### in the R help ### ### st.y: Start date for the ground data ### ### en.y: End date for the ground data ### ### st.x: Start date for the satellite data ### ### en.x: End date for the satellite data ### ### external: Not implemented. Leave as NA. ### ######################################################################################### ### ### 7. verS09: Performs the S09 reconstruction by withholding one station at a time. ### ### Y: The 42 S09 stations in the first 42 columns, the first 3 AVHRR PCs in the ### last 3 columns. Must be a time series. ### ### default.pcs: The output of getPcs() for form.S09 ### doGndRandom = function (Y = Y.all, n.max = 10, tol = 0.0005, withhold = 0.05, plt = c(T, 1), st = 1957, en = c(2006, 12), form.st = 1, form.en = length(form.list), form = form.list, form.names = form.name) { ### Set time window Y = window(Y, st, en) Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) ### Set up storage variables method = c("TSVD", "TTLS") gnd.stats = list() for (i in 1:2) { gnd.stats[[i]] = list() attr(gnd.stats[[i]], "Method") = method[i] for (j in form.st:form.en) { gnd.stats[[i]][[j]] = list() attr(gnd.stats[[i]][[j]], "Method") = method[i] attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j] } } ### Iterate through each ground set for (j in form.st:form.en) { ### Obtain the station complement and verification complement stn.set = parseRandom(Y, form = j, withhold = withhold) base = stn.set[[1]] ver = stn.set[[2]] cal.idx = !is.na(ver) ### TTLS cannot run with regpar > avail, so limit maximum ### regpar to minimum available stations at any time ttls.lim = min(c(rowSums(ver / ver, na.rm = TRUE))) - 1 ################### ### Mode TSVD ################### ### Set mode variable i = 1 ### Set up storage gnd.stats[[i]][[j]] = matrix(NA, nrow = 14, ncol = n.max) colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained EOFs") rownames(gnd.stats[[i]][[j]]) = c("Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.", "Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.") attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 are station means; rows 8 - 14 are pointwise statistics.") infill = emFn(ver, maxiter = 5000, tol = tol, n.eof = n.max, type = "tsvd", covr = FALSE, progressive = c(T, 1, T, 1), DoFL= 12, plt = plt) ### Calculate and collate statistics for(k in 1:n.max) { stats = getStats(base, infill[[k]]$Xhat, cal.idx) gnd.stats[[i]][[j]][1:3, k] = stats[[1]][, ncol(ver) + 1] gnd.stats[[i]][[j]][4:7, k] = stats[[2]][, ncol(ver) + 1] gnd.stats[[i]][[j]][8:10, k] = stats[[1]][, ncol(ver) + 2] gnd.stats[[i]][[j]][11:14, k] = stats[[2]][, ncol(ver) + 2] attr(gnd.stats[[i]][[j]], "Method") = method[i] attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j] } ################### ### Mode TTLS ################### ### Set mode variable i = 2 ### Set up storage gnd.stats[[i]][[j]] = matrix(NA, nrow = 14, ncol = n.max) colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained EOFs") rownames(gnd.stats[[i]][[j]]) = c("Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.", "Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.") attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 are station means; rows 8 - 14 are pointwise statistics.") infill = emFn(ver, maxiter = 1500, tol = tol, n.eof = min(n.max, ttls.lim), type = "ttls", covr = FALSE, progressive = c(T, 1, T, 1), DoFL= 12, plt = plt) ### Calculate and collate statistics for(k in 1:min(n.max, ttls.lim)) { stats = getStats(base, infill[[k]]$Xhat, cal.idx) gnd.stats[[i]][[j]][1:3, k] = stats[[1]][, ncol(ver) + 1] gnd.stats[[i]][[j]][4:7, k] = stats[[2]][, ncol(ver) + 1] gnd.stats[[i]][[j]][8:10, k] = stats[[1]][, ncol(ver) + 2] gnd.stats[[i]][[j]][11:14, k] = stats[[2]][, ncol(ver) + 2] attr(gnd.stats[[i]][[j]], "Method") = method[i] attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j] } } ### Return gnd.stats } doGndStn = function(Y = Y.all, n.max = 10, tol = .0005, plt = c(T, 1), st = 1957, en = c(2006, 12), form.st = 1, form.en = length(form.list), form = form.list, ver = form.ver, form.names = form.name) { ### Set time window Y = window(Y, st, en) Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) ### Set up storage variables method = c("TSVD", "TTLS") gnd.stats = list() for (i in 1:2) { gnd.stats[[i]] = list() attr(gnd.stats[[i]], "Method") = method[i] for (j in form.st:form.en) { gnd.stats[[i]][[j]] = list() attr(gnd.stats[[i]][[j]], "Method") = method[i] attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j] } } ### Iterate through each ground set for (j in form.st:form.en) { ### Obtain the station complement and verification complement stn.set = parseStn(Y, form[[j]], ver[[j]]) base = stn.set[[1]] early = stn.set[[2]] late = stn.set[[3]] stns = stn.set[[4]] idx = stn.set[[5]] ### Map calibration data points cal.early = (early == base & !is.na(early)) cal.late = (late == base & !is.na(late)) ### TTLS cannot run with regpar > kavlr, so limit maximum ### regpar to minimum available stations at any time ttls.lim = min(c(rowSums(early / early, na.rm = TRUE), rowSums(late / late, na.rm = TRUE))) - 1 ################### ### Mode TSVD ################### ### Set mode variable i=1 ### Set up storage gnd.stats[[i]][[j]] = matrix(NA, nrow = 28, ncol = n.max) colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained EOFs") row.list = c("Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.", "Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.") rownames(gnd.stats[[i]][[j]]) = c(paste("Early Cal.", row.list), paste("Late Cal.", row.list)) attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 and 15 - 21 are station means; rows 8 - 14 and 22 - 28 are pointwise statistics.") ### Infill infill.early = emFn(early, maxiter = 5000, tol = tol, n.eof = n.max, type = "tsvd", covr = FALSE, DoFL = 12, progressive = c(T, 1, T, 1), plt = plt) infill.late = emFn(late, maxiter = 5000, tol = tol, n.eof = n.max, type = "tsvd", covr = FALSE, DoFL = 12, progressive = c(T, 1, T, 1), plt = plt) ### Calculate and collate statistics for(k in 1:n.max) { early.stats = getStats(base, infill.early[[k]]$Xhat, cal.early) late.stats = getStats(base, infill.late[[k]]$Xhat, cal.late) gnd.stats[[i]][[j]][1:3, k] = early.stats[[1]][, ncol(early) + 1] gnd.stats[[i]][[j]][4:7, k] = early.stats[[2]][, ncol(early) + 1] gnd.stats[[i]][[j]][8:10, k] = early.stats[[1]][, ncol(early) + 2] gnd.stats[[i]][[j]][11:14, k] = early.stats[[2]][, ncol(early) + 2] gnd.stats[[i]][[j]][15:17, k] = late.stats[[1]][, ncol(late) + 1] gnd.stats[[i]][[j]][18:21, k] = late.stats[[2]][, ncol(late) + 1] gnd.stats[[i]][[j]][22:24, k] = late.stats[[1]][, ncol(late) + 2] gnd.stats[[i]][[j]][25:28, k] = late.stats[[2]][, ncol(late) + 2] attr(gnd.stats[[i]][[j]], "Method") = method[i] attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j] } ################### ### Mode TTLS ################### ### Set mode variable i=2 ### Set up storage gnd.stats[[i]][[j]] = matrix(NA, nrow = 28, ncol = n.max) colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained EOFs") row.list = c("Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.", "Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.") rownames(gnd.stats[[i]][[j]]) = c(paste("Early Cal.", row.list), paste("Late Cal.", row.list)) attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 and 15 - 21 are station means; rows 8 - 14 and 22 - 28 are pointwise statistics.") ### Infill infill.early = emFn(early, maxiter = 5000, tol = tol, n.eof = n.max, type = "ttls", covr = FALSE, DoFL = 12, progressive = c(T, 1, T, 1), plt = plt) infill.late = emFn(late, maxiter = 5000, tol = tol, n.eof = n.max, type = "ttls", covr = FALSE, DoFL = 12, progressive = c(T, 1, T, 1), plt = plt) ### Calculate and collate statistics for(k in 1:min(n.max, ttls.lim)) { early.stats = getStats(base, infill.early[[k]]$Xhat, cal.early) late.stats = getStats(base, infill.late[[k]]$Xhat, cal.late) gnd.stats[[i]][[j]][1:3, k] = early.stats[[1]][, ncol(early) + 1] gnd.stats[[i]][[j]][4:7, k] = early.stats[[2]][, ncol(early) + 1] gnd.stats[[i]][[j]][8:10, k] = early.stats[[1]][, ncol(early) + 2] gnd.stats[[i]][[j]][11:14, k] = early.stats[[2]][, ncol(early) + 2] gnd.stats[[i]][[j]][15:17, k] = late.stats[[1]][, ncol(late) + 1] gnd.stats[[i]][[j]][18:21, k] = late.stats[[2]][, ncol(late) + 1] gnd.stats[[i]][[j]][22:24, k] = late.stats[[1]][, ncol(late) + 2] gnd.stats[[i]][[j]][25:28, k] = late.stats[[2]][, ncol(late) + 2] attr(gnd.stats[[i]][[j]], "Method") = method[i] attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j] } } ### Return gnd.stats } doRls = function (Y, X = all.avhrr, form = form.grid.96, v.min = 1, v.max = 300, h.min = 0.05, h.max = 10, tol = 0.001, DoFL = 12, st.y = 1957, en.y = c(2006,12), st.x = 1982, en.x = c(2006,12)) { ### Setup storage err = vector() h = vector() external = list() ### Get predictor station grid cells external$Y.ver = window(Y.all[, -form], st.y, en.y) external$Y.ver = mScale(external$Y.ver, -colMeans(external$Y.ver, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) na.ver = is.na(external$Y.ver) Y.idx = all.idx[-form, 4] external$Y.idx = Y.idx Y = window(Y, st.y, en.y) Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) tsp(Y) = NULL external$Y = matrix(Y, ncol = ncol(Y)) ### Get verification stations for optimization form.ver = form[!is.element(form, form.grid)] Z = Y.all[, form.ver] Z = window(Z, st.y, en.y) Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) Z = matrix(Z, ncol = ncol(Z)) colnames(Z) = all.idx[form.ver, 1] external$Z = Z external$na.ver = cbind(is.na(Z), na.ver) external$Z.idx = all.idx[form.ver, 4] ### Get spatial structure X = makeAnom(window(X, st.x, en.x)) X.svd = svd(X) ### Find rank rank = max(1, min(v.max, sum(X.svd$d > sqrt(.Machine$double.eps) * 10))) ### Extract components external$V = matrix(X.svd$v[, 1:rank], ncol = rank) external$d = X.svd$d[1:rank] external$U = matrix(X.svd$u[, 1:rank], ncol = rank) external$n.eff = nrow(X) - DoFL ### Define subcomponents external$VY = matrix(external$V[external$Y.idx, ], ncol = rank) external$LY = mScale(external$VY, external$d / sqrt(external$n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE) external$VZt = t(matrix(external$V[external$Z.idx, ], ncol = rank)) ### Prepare spatial EOFs external$LYt = external$LY2 = list() external$LYt[[1]] = t(external$LY) external$LY2[[1]] = external$LYt[[1]] %*% external$LY ### Prepare censored EOFs for internal cross-validation for (j in 1:ncol(Y)) { external$LYt[[j + 1]] = external$LYt[[1]][, -j] external$LY2[[j + 1]] = external$LYt[[j + 1]] %*% t(external$LYt[[j + 1]]) } ### Iterate through v.min to v.max for (i in v.min:v.max) { ### Find rank external$r = r = min(i, rank) ### Perform regression temp = getRls(Y, X, form, i, h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x, external) err[i] = temp$err h[i] = temp$h } ### Return rls.stats = list(err, h) rls.stats } cvGnd = function (Yo, type = "iridge", tol = 0.0005, maxiter = 500, DoFL = 12, n.eof = NA, plt = c(T, 1), progressive = c(F, 1, F, 1), form = form.grid.96) { ### Set up storage cv.gnd = list() if (is.na(n.eof)) {idx = 1} else {idx = n.eof} ### Infill the supplied set with all series message(paste("Infilling ground set 1 of", ncol(Yo) + 1)) if (type != "tsvd") { temp = suppressMessages(emFn(Yo, type = type, tol = tol, maxiter = maxiter, n.eof = n.eof, DoFL = DoFL, plt = plt, progressive = progressive)) } else { temp = suppressMessages(emFn(Yo, type = type, tol = tol, maxiter = maxiter, n.eof = n.eof, DoFL = DoFL, plt = plt, progressive = c(T, 1, T, 1))) } ### Save results & covariance matrix (to speed up infilling when stations are withheld) cv.gnd[[1]] = list() cv.gnd[[1]]$X = temp[[idx]]$X cv.gnd[[1]]$C = temp[[idx]]$C grid.idx = all.idx[-form, 4] cv.gnd[[1]]$idx = grid.idx ### Repeat infilling with stations in unique grid cells withheld, one at a time for (i in 1:ncol(Yo)) { message(paste("Infilling ground set", i + 1, "of", ncol(Yo) + 1)) cv.gnd[[i + 1]] = list() g.idx = c(1:ncol(Yo))[grid.idx[i] == grid.idx] Y2 = Yo[, -g.idx] if (type != "tsvd") { temp = suppressMessages(emFn(Y2, C = cv.gnd[[1]]$C[-g.idx, -g.idx], type = type, tol = tol, maxiter = maxiter, n.eof = n.eof, DoFL = DoFL, plt = plt, progressive = progressive)) } else { temp = suppressMessages(emFn(Y2, type = type, tol = tol, maxiter = maxiter, n.eof = n.eof, DoFL = DoFL, plt = plt, progressive = c(T, 1, T, 1))) } cv.gnd[[i + 1]]$X = temp[[idx]]$X cv.gnd[[i + 1]]$idx = grid.idx[-g.idx] cv.gnd[[i + 1]]$omit = g.idx } ### Return cv.gnd } verEigen = function (cv.gnd, dat = all.avhrr, type = "tsvd", n.max = 8, tol = 0.005, form = 11, DoFL = 12, alpha = 10, pc.max = 150, maxiter = 50, st = 1957, en = c(2006,12), forms = form.list) { ### Get satellite anomalies dat = makeAnom(dat) ### Center Y.ver Y.ver = window(Y.all[, -forms[[form]]], st, en) Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) na.ver = is.na(Y.ver) Y.idx = all.idx[-forms[[form]], 4] ### Get original data for unused stations form.ver = forms[[form]][!is.element(forms[[form]], form.grid)] Z = Y.all[, form.ver] Z = window(Z, st, en) Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) Z = matrix(Z, ncol = ncol(Z)) Z.len = ncol(Z) colnames(Z) = all.idx[form.ver, 1] na.ver = cbind(is.na(Z), na.ver) Z.idx = all.idx[form.ver, 4] Z = cbind(Z, Y.ver) ### Get correlation AVHRR PCs & weights pc.data = getPcs(dat, st = start(dat), en = en, form = form, covr = FALSE, DoFL = DoFL) U = ts(pc.data[[4]], st = start(dat), freq = 12) d = pc.data[[3]] D = pc.data[[5]] DY = D[Y.idx] DZ = D[Z.idx] D = c(DZ, DY) V = pc.data[[2]] VY = V[Y.idx, ] VZ = V[Z.idx, ] V = rbind(VZ, VY) Y.wts = pc.data[[9]] rownames(Y.wts) = colnames(Y.ver) ### Set up storage variables ver.estimates = matrix(0, nrow = nrow(cv.gnd[[1]]$X), ncol = ncol(Z)) colnames(ver.estimates) = c(all.idx[form.ver, 1], colnames(Y.ver)) ### Iterate through each station in cv.gnd for (s in 1:length(cv.gnd)) { ### Get ground data Y = cv.gnd[[s]]$X Y = mScale(Y, -colMeans(Y), type = 1, add = TRUE, scale = FALSE, center = FALSE) rank = min(pc.max, length(d)) message(paste("\n\nReconstruction", s, "of", length(cv.gnd))) ### Set up storage variables sat.stats = matrix(nrow = n.max + 2, ncol = rank) colnames(sat.stats) = paste("Recon PCs", c(1:rank)) rownames(sat.stats) = c(paste("RMS Error, k =", c(1:n.max)), "Min Error at k =", "Recon RMS Error:") attr(sat.stats, "Method") = type infilled.Xhat = infilled.X = list() for(i in 1:n.max) { ### Storage for full recon PCs - unspliced infilled.Xhat[[i]] = matrix(nrow = nrow(Y), ncol = rank) colnames(infilled.Xhat[[i]]) = paste("AVHRR PC #", c(1:rank), sep = "") attr(infilled.Xhat[[i]], "Method") = type attr(infilled.Xhat[[i]], "Truncation Parameter") = i ### Storage for full recon PCs - spliced infilled.X[[i]] = matrix(nrow = nrow(Y), ncol = rank) colnames(infilled.X[[i]]) = paste("AVHRR PC #", c(1:rank), sep = "") attr(infilled.X[[i]], "Method") = type attr(infilled.X[[i]], "Truncation Parameter") = i } ### Setup optimal.pcs.Xhat = matrix(0, nrow = nrow(Y), ncol = rank) optimal.pcs.X = matrix(0, nrow = nrow(Y), ncol = rank) recon = matrix(0, nrow = nrow(Y), ncol = ncol(Z)) if (s > 1) {recon = recon[, -1]} Z.temp = Z V.temp = V D.temp = D na.temp = na.ver if (s > 1) { Z.temp = Z.temp[, -(s - 1 + Z.len)] V.temp = V.temp[-(s - 1 + Z.len), ] D.temp = D.temp[-(s - 1 + Z.len)] na.temp = na.temp[, -(s - 1 + Z.len)] } ### Iterate through each PC, up to pc.max for (j in 1:rank) { wts = Y.wts if (s > 1) {wts = Y.wts[-(s - 1), ]} ### Set up X = cbind(Y, U[, j]) wts = as.vector(c(alpha * wts[, j], 1)) ### Infill message(paste("Infilling AVHRR PC #", j)) infill.Y = suppressMessages(emFn(X, n.eof = n.max, maxiter = maxiter, tol = tol, type = type, wts = wts, progressive = c(T, 1, T, 1), plt = c(T, ncol(X)), DoFL = DoFL)) ### Save PCs for (k in 1:n.max) { infilled.Xhat[[k]][, j] = infill.Y[[k]]$Xhat[, ncol(X)] * d[j] infilled.X[[k]][, j] = infill.Y[[k]]$X[, ncol(X)] * d[j] } } message("\nOptimizing . . . ") ### Get verification stats for (j in 1:rank) { for (k in 1:n.max) { recon.eof = (matrix(infilled.Xhat[[k]][, j], ncol = 1) %*% t(V.temp[, j])) %*% diag(D.temp) recon.eof[na.temp] = NA recon.eof = mScale(recon.eof, -colMeans(recon.eof, na.rm = TRUE), type = 1, add = TRUE, center = FALSE, scale = FALSE) sat.stats[k, j] = sqrt(sum((Z.temp - recon.eof) ^ 2, na.rm = TRUE) / sum(na.ver)) } ### Record truncation value for minimum error sat.stats[n.max + 1, j] = c(1:n.max)[min(sat.stats[1:n.max, j]) == sat.stats[1:n.max, j]] ### Save the PC optimal.pcs.Xhat[, j] = infilled.Xhat[[sat.stats[n.max + 1, j]]][, j] optimal.pcs.X[, j] = infilled.X[[sat.stats[n.max + 1, j]]][, j] ### Calculate recon verification stats recon = recon + (matrix(optimal.pcs.Xhat[, j], ncol = 1) %*% t(V.temp[, j])) %*% diag(D.temp) recon[na.temp] = NA recon = mScale(recon, -colMeans(recon, na.rm = TRUE), type = 1, add = TRUE, center = FALSE, scale = FALSE) sat.stats[n.max + 2, j] = sqrt(sum((Z.temp - recon) ^ 2, na.rm = TRUE) / sum(na.temp)) } ### Find minimum recon rms min.rms = min(sat.stats[n.max + 2, ]) rank = c(1:rank)[sat.stats[n.max + 2, ] == min(sat.stats[n.max + 2, ])] message(paste(". . . complete.\nMinimum RMS error of", round(min.rms, 5), "at", rank, "retained satellite PCs.")) ### Store estimates if (s == 1) { ver.estimates[, 1:Z.len] = ((optimal.pcs.Xhat[, 1:rank] %*% t(V[, 1:rank])) %*% diag(D))[, 1:Z.len] } else { ver.estimates[, Z.len + s - 1] = ((optimal.pcs.Xhat[, 1:rank] %*% t(V[, 1:rank])) %*% diag(D))[, Z.len + s - 1] } } ### Return ver.eigen = list() ver.eigen$ver = ts(Z, start = st, freq = 12) ver.eigen$estimates = ts(ver.estimates, start = st, freq = 12) ver.eigen } verRls = function (cv.gnd, X = all.avhrr, form = form.grid.96, v.min = 1, v.max = 300, h.min = 0.05, h.max = 10, DoFL = 12, tol = 0.001, st.y = 1957, en.y = c(2006,12), st.x = 1982, en.x = c(2006,12), external = NA) { ### Setup storage external = list() external[[1]] = list() ### Get location vectors and original station data ### Get predictor station grid cells Y.ver = window(Y.all[, -form], st.y, en.y) tsp(Y.ver) = NULL external[[1]]$Y.ver = Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) na.ver = is.na(Y.ver) external[[1]]$Y.idx = Y.idx = all.idx[-form, 4] ### Get verification stations for optimization form.ver = form[!is.element(form, form.grid)] Z = Y.all[, form.ver] Z = window(Z, st.y, en.y) tsp(Z) = NULL external[[1]]$Z = Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) external[[1]]$na.ver = na.ver = cbind(is.na(Z), na.ver) colnames(Z) = all.idx[form.ver, 1] external[[1]]$Z.idx = Z.idx = all.idx[form.ver, 4] ### Setup storage for estimates estimates = matrix(nrow = nrow(Y.ver), ncol = ncol(Y.ver) + ncol(Z)) estimates = ts(estimates, start = st.y, freq = 12) ### Get spatial structure X = makeAnom(window(X, st.x, en.x)) X.svd = svd(X) ### Find rank if (is.na(v.max)) {v.max = length(X.svd$d)} rank.eff = max(1, min(v.max, sum(X.svd$d > sqrt(.Machine$double.eps) * 10))) ### Extract components external[[1]]$V = V = matrix(X.svd$v[, 1:rank.eff], ncol = rank.eff) external[[1]]$d = d = X.svd$d[1:rank.eff] external[[1]]$U = U = matrix(X.svd$u[, 1:rank.eff], ncol = rank.eff) external[[1]]$n.eff = n.eff = nrow(X) - DoFL ### Define subcomponents external[[1]]$Y = Y = cv.gnd[[1]]$X external[[1]]$VY = VY = matrix(V[Y.idx, ], ncol = rank.eff) external[[1]]$LY = LY = mScale(VY, d / sqrt(n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE) external[[1]]$VZt = VZt = t(matrix(V[Z.idx, ], ncol = rank.eff)) ### Prepare spatial EOFs external[[1]]$LYt = LYt = external[[1]]$LY2 = LY2 = list() external[[1]]$LYt[[1]] = LYt[[1]] = t(LY) external[[1]]$LY2[[1]] = LY2[[1]] = LYt[[1]] %*% LY ### Prepare censored EOFs for internal cross-validation for (i in 1:ncol(Y)) { external[[1]]$LYt[[i + 1]] = LYt[[i + 1]] = LYt[[1]][, -i] external[[1]]$LY2[[i + 1]] = LY2[[i + 1]] = LYt[[i + 1]] %*% t(LYt[[i + 1]]) } ### Find optimal regularization parameter for full set and store estimates ### to unused stations temp.err = rep(1e12, v.max) min.err = rep(1e12, ncol(Y.ver) + 1) message("Finding optimal parameters for full set.") for (i in v.min:v.max) { message(paste("AVHRR EOFs 1 -", i)) external[[1]]$r = i temp = suppressMessages(getRls(cv.gnd[[1]]$X, X, form, i, h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x, external[[1]])) temp.err[i] = temp$err } min.err[1] = external[[1]]$r = c(1:v.max)[temp.err == min(temp.err)] full = suppressMessages(getRls(Yo[[1]], X, form, min.err[1], h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x, external[[1]])$Xhat) estimates[, 1:ncol(Z)] = full %*% matrix(VZt[1:min.err[1], ], nrow = min.err[1]) message(paste("Complete. Optimal AVHRR EOFs:", min.err[1], "\n")) ### Set up for withholding by each predictor for (i in 1:ncol(Y)) { ### Extract the appropriate components grid.idx = cv.gnd[[i + 1]]$idx omit = cv.gnd[[i + 1]]$omit external[[2]] = list() external[[2]]$Y = cv.gnd[[i + 1]]$X external[[2]]$Y.ver = Y.ver[, -omit] external[[2]]$Y.idx = grid.idx external[[2]]$Z = Z external[[2]]$na.ver = cbind(is.na(external[[2]]$Z), is.na(external[[2]]$Y.ver)) external[[2]]$Z.idx = Z.idx external[[2]]$d = d external[[2]]$n.eff = n.eff external[[2]]$V = V external[[2]]$U = U external[[2]]$VY = matrix(VY[-omit, ], ncol = rank.eff) external[[2]]$VZt = VZt external[[2]]$LY = matrix(LY[-omit, ], ncol = rank.eff) external[[2]]$LYt = external[[2]]$LY2 = list() external[[2]]$LYt[[1]] = t(external[[2]]$LY) external[[2]]$LY2[[1]] = external[[2]]$LYt[[1]] %*% external[[2]]$LY for (j in 1:ncol(external[[2]]$Y)) { external[[2]]$LYt[[j + 1]] = external[[2]]$LYt[[1]][, -j] external[[2]]$LY2[[j + 1]] = external[[2]]$LYt[[j + 1]] %*% t(external[[2]]$LYt[[j + 1]]) } ### Find optimal regularization parameter for full set and store estimates ### to unused stations temp.err = rep(1e12, v.max) message(paste("Finding optimal parameters based on station #", i, "being withheld.")) for (j in v.min:v.max) { message(paste("AVHRR EOFs 1 -", j)) external[[2]]$r = j temp = suppressMessages(getRls(external[[2]]$Y, X, form, j, h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x, external[[2]])) temp.err[j] = temp$err } ### Record minimum error min.err[i + 1] = c(1:v.max)[temp.err == min(temp.err)] external[[2]]$r = min.err[i + 1] ### Get the PC estimates est = suppressMessages(getRls(external[[2]]$Y, X, form, min.err[i + 1], h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x, external[[2]])$Xhat) ### Get the station estimate estimates[, ncol(Z) + i] = est %*% t(matrix(external[[1]]$VY[i, 1:min.err[i + 1]], ncol = min.err[i + 1])) message(paste("Complete. Optimal AVHRR EOFs:", min.err[i + 1], "\n")) } ### Return full.ver = list() full.ver$ver = ts(cbind(Z, Y.ver), start = st.y, freq = 12) full.ver$estimates = ts(estimates, start = st.y, freq = 12) full.ver$err = min.err full.ver } verS09 = function (Y = S09, default.pcs = S09.pcs) { ### Get S09 stations Y.ver = window(Y.all, 1957, c(2006, 12)) Y.ver = Y.ver[, -form.S09] Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) na.ver = is.na(Y.ver) tsp(Y.ver) = NULL Y.idx = all.idx[-form.S09, 4] ### Get unused stations form.ver = form.S09[!is.element(form.S09, form.grid)] Z = Y.all[, form.ver] Z = window(Z, 1957, c(2006,12)) tsp(Z) = NULL Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) na.ver = na.ver = cbind(is.na(Z), na.ver) colnames(Z) = all.idx[form.ver, 1] Z.idx = all.idx[form.ver, 4] ### Setup storage estimates = matrix(0, nrow = 600, ncol = 98) ### Get unused station estimates estimates[, 1:ncol(Z)] = (default.pcs$X %*% t(default.pcs$V[Z.idx, ])) %*% diag(default.pcs$D[Z.idx]) colnames(estimates) = c(colnames(Z), colnames(Y.ver)) ### Leave out one station at a time & get estimates for (i in 1:42) { message(paste("Infilling PCs station", i, "of", 42, "withheld.")) infilled.S09 = suppressMessages(emFn(Y[, -i], type = "ttls", DoFL = 1, maxiter = 50, tol = 0.001, n.eof = 3, plt = c(T, 42), progressive = c(F, 1, F, 1))) pcs = infilled.S09[[3]]$X[, 42:44] estimates[, ncol(Z) + i] = (pcs %*% t(matrix(default.pcs$V[Y.idx[i], ], nrow = 1))) %*% diag(default.pcs$D[Y.idx[i]], ncol = 1) } ### Remove off-grid stations off.grid = all.idx[form.S09.no.ocean, 1] off.grid = c(1:42)[is.element(colnames(Y.ver), off.grid)] Y.ver = Y.ver[, -off.grid] estimates = estimates[, -(ncol(Z) + off.grid)] ver = cbind(Z, Y.ver) ### Return S09.ver = list() S09.ver$ver = ts(ver, start = 1957, freq = 12) S09.ver$estimates = ts(estimates, start = 1957, freq = 12) S09.ver } ########################################################################################## ###################### END IMPUTATION EXPERIMENTS ###################### ########################################################################################## ########################################## ### SEGMENT: RECONSTRUCTION FUNCTIONS ### ########################################## ### ### 1. getPcs: Returns a list of the following: ### ### [[1]] Principal components as a time series ### [[2]] Spatial singular vectors ### [[3]] Singular values ### [[4]] Temporal singular vectors ### [[5]] Standardization vector ### [[6]] Spatial EOFs ### [[7]] Spatial EOFs at ground station locations ### [[8]] Spatial eigenvectors at ground station locations ### [[9]] Normalized (sum-of-squares = 1.0) spatial singular ### vectors at ground station locations ### ### NOTE: Data set does not need to be anomalies, but must be a ### time series. ### ### dat: Input data set ### ### st: Start date for processing ### ### en: End date for processing ### ### form: Ground station index vector ### ### covr: (Logical) Covariance (TRUE) or standardized (FALSE)? ### ### DoFL: Degrees of freedom lost ### ######################################################################################### ### ### 2. getEigen: Wrapper function for emFn() for infilling satellite components ### using a WLS concept for the ground stations ### ### Y: Ground data (must be a time series) ### ### dat: AVHRR data (must be a time series) ### ### type: Regression method ("tsvd" or "ttls") ### ### n.max: Maximum number of retained EOFs for each regression ### ### tol: Convergence tolerance to use ### ### form: Ground station selection vector ### ### DoFL: Degrees of freedom lost (12 for monthly anomalies) ### ### alpha: Scaling factor when using eigenvector weighting ### ### pc.max: Maximum number of AVHRR PCs to include ### ### maxiter: Maximum number of iterations to convergence ### ### st: Start period for the reconstruction ### ### en: End period for the reconstruction ### ### forms: The list of station clipping vectors ### ######################################################################################### ### ### 3. getRls: Function to generate principal components by using ### ground stations as predictors and AVHRR spatial data ### to provide the spatial coefficients. ### ### Y: Ground data (must be a time series) ### ### X: Data set to provide the spatial weights (AVHRR data) ### ### form: Ground station selection vector ### ### idx: Index for removing verification stations (vector of stations to REMOVE) ### ### v.max: Maximum number of spatial eigenvectors to use from X ### ### h.min: Minimum allowable regularization parameter (set as small as possible) ### ### h.max: Maximum allowable regularization parameter ### ### tol: Tolerance to use in the optimization function ### ### DoFL: Degrees of freedom lost (12 for monthly anomalies) ### ### st.y: Reconstruction start year ### ### en.y: Reconstruction end year ### ### st.x: Satellite data start year ### ### en.x: Satellite data end year ### ### external: Externally parsed data to be fed to getRls (used by doRls) ### ######################################################################################### ### ### 4. getRecon: Function to generate a reconstruction using a fully infilled ### set of principal components. ### ### est.pcs: The estimated PCs ### ### verbose: (Logical) If true, calculates maps of p-values for all trends ### ### compare.S09: (Logical) If true, compares all trend and significance maps ### to S09 ### ######################################################################################### ### ### 5. offsetArea: Function to calculate anomaly offsets for the nearest- ### station reconstruction. ### ### Yo: Station data, presented as a time series of anomalies ### ### maxoverlap: Maximum number of overlapping data points to use ### for offset calculations ### getPcs = function (dat, st = 1982, en = c(2006, 12), form = form.grid.96, covr = FALSE, DoFL = 12) { ### Define period dat = makeAnom(window(dat, st, en)) ### Initialize scaling D = rep(1, ncol(dat)) n.eff = nrow(dat) - DoFL ### If covr=FALSE, scale to correlation if (covr == FALSE) { D = sqrt(colSums(dat ^ 2) / n.eff) dat = mScale(dat, 1 / D, type = 1, scale = TRUE, add = FALSE, center = FALSE) dat = ts(dat, st, freq = 12) } ### Calculate the singular value decomposition of the input matrix dat.svd = svd(t(dat)) ### Find rank rank.eff = max(1, sum(dat.svd$d > sqrt(.Machine$double.eps) * 10)) ### Store U = matrix(dat.svd$v[, 1:rank.eff], ncol = rank.eff) d = dat.svd$d[1:rank.eff] V = matrix(dat.svd$u[, 1:rank.eff], ncol = rank.eff) ### Generate PCs pcs = ts(mScale(U, d, scale = TRUE, add = FALSE, center = FALSE), start = st, freq = 12) ### Calculate LY (Spatial EOFs at ground stations) L = mScale(V, d / sqrt(n.eff), scale = TRUE, add = FALSE, center = FALSE) LY = matrix(L[all.idx[-form.list[[form]], 4], ], ncol = rank.eff) ### Calculate VY (eigenvector coefficients at ground stations) VY = matrix(V[all.idx[-form.list[[form]], 4], ], ncol = rank.eff) ### Calculate V.wts (VY normalized to have a sum-of-squares of 1) V.wts = VY V.scale = sqrt(1 / colSums(VY ^ 2)) V.wts = mScale(VY, V.scale, type = 1, scale = TRUE, add = FALSE, center = FALSE) ### Return the desired number of PCs pc.data = list(pcs, V, d, U, D, L, LY, VY, V.wts) ### Return pc.data } getEigen = function(Y, dat = all.avhrr, type = "tsvd", n.max = 8, tol = 0.005, form = 11, DoFL = 12, alpha = 10, pc.max = 150, maxiter = 50, st = 1957, en = c(2006,12), forms = form.list) { ### Get satellite anomalies dat = makeAnom(dat) ### Center Y Y.ver = window(Y.all[, -forms[[form]]], st, en) Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) na.ver = is.na(Y.ver) Y.idx = all.idx[-forms[[form]], 4] Y = window(Y, st, en) Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) ### Get original data for unused stations form.ver = forms[[form]][!is.element(forms[[form]], form.grid)] Z = Y.all[, form.ver] Z = window(Z, st, en) Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) Z = matrix(Z, ncol = ncol(Z)) colnames(Z) = all.idx[form.ver, 1] na.ver = cbind(is.na(Z), na.ver) Z.idx = all.idx[form.ver, 4] Z = cbind(Z, Y.ver) ### Get correlation AVHRR PCs & weights pc.data = getPcs(dat, st = start(dat), en = en, form = form, covr = FALSE, DoFL = DoFL) U = ts(pc.data[[4]], st = start(dat), freq = 12) d = pc.data[[3]] D = pc.data[[5]] DY = D[Y.idx] DZ = D[Z.idx] D = c(DZ, DY) V = pc.data[[2]] VY = V[Y.idx, ] VZ = V[Z.idx, ] V = rbind(VZ, VY) Y.wts = pc.data[[9]] rownames(Y.wts) = colnames(Y) rank = min(pc.max, length(d)) ### Set up storage variables sat.stats = matrix(nrow = n.max + 2, ncol = rank) colnames(sat.stats) = paste("Recon PCs", c(1:rank)) rownames(sat.stats) = c(paste("RMS Error, k =", c(1:n.max)), "Min Error at k =", "Recon RMS Error:") attr(sat.stats, "Method") = type infilled.Xhat = infilled.X = list() for(i in 1:n.max) { ### Storage for full recon PCs - unspliced infilled.Xhat[[i]] = matrix(nrow = nrow(Y), ncol = rank) colnames(infilled.Xhat[[i]]) = paste("AVHRR PC #", c(1:rank), sep = "") attr(infilled.Xhat[[i]], "Method") = type attr(infilled.Xhat[[i]], "Truncation Parameter") = i ### Storage for full recon PCs - spliced infilled.X[[i]] = matrix(nrow = nrow(Y), ncol = rank) colnames(infilled.X[[i]]) = paste("AVHRR PC #", c(1:rank), sep = "") attr(infilled.X[[i]], "Method") = type attr(infilled.X[[i]], "Truncation Parameter") = i } optimal.pcs.Xhat = matrix(0, nrow = nrow(Y), ncol = rank) optimal.pcs.X = matrix(0, nrow = nrow(Y), ncol = rank) recon = matrix(0, nrow = nrow(Y), ncol = ncol(Z)) ### Iterate through each PC, up to pc.max for (j in 1:rank) { ### Set up X = cbind(Y, U[, j]) wts = as.vector(c(alpha * Y.wts[, j], 1)) ### Infill message(paste("Infilling AVHRR PC #", j)) infill.Y = suppressMessages(emFn(X, n.eof = n.max, maxiter = maxiter, tol = tol, type = type, wts = wts, progressive = c(T, 1, T, 1), plt = c(T, ncol(X)), DoFL = DoFL)) ### Save PCs for (k in 1:n.max) { infilled.Xhat[[k]][, j] = infill.Y[[k]]$Xhat[, ncol(X)] * d[j] infilled.X[[k]][, j] = infill.Y[[k]]$X[, ncol(X)] * d[j] } } message("\nOptimizing . . . ") ### Get verification stats for (j in 1:rank) { for (k in 1:n.max) { recon.eof = (matrix(infilled.Xhat[[k]][, j], ncol = 1) %*% t(V[, j])) %*% diag(D) recon.eof[na.ver] = NA recon.eof = mScale(recon.eof, -colMeans(recon.eof, na.rm = TRUE), type = 1, add = TRUE, center = FALSE, scale = FALSE) sat.stats[k, j] = sqrt(sum((Z - recon.eof) ^ 2, na.rm = TRUE) / sum(na.ver)) } ### Record truncation value for minimum error sat.stats[n.max + 1, j] = c(1:n.max)[min(sat.stats[1:n.max, j]) == sat.stats[1:n.max, j]] ### Save the PC optimal.pcs.Xhat[, j] = infilled.Xhat[[sat.stats[n.max + 1, j]]][, j] optimal.pcs.X[, j] = infilled.X[[sat.stats[n.max + 1, j]]][, j] ### Calculate recon verification stats recon = recon + (matrix(optimal.pcs.Xhat[, j], ncol = 1) %*% t(V[, j])) %*% diag(D) recon[na.ver] = NA recon = mScale(recon, -colMeans(recon, na.rm = TRUE), type = 1, add = TRUE, center = FALSE, scale = FALSE) sat.stats[n.max + 2, j] = sqrt(sum((Z - recon) ^ 2, na.rm = TRUE) / sum(na.ver)) } ### Find minimum recon rms min.rms = min(sat.stats[n.max + 2, ]) rank = c(1:pc.max)[sat.stats[n.max + 2, ] == min(sat.stats[n.max + 2, ])] message(paste(". . . complete.\nMinimum RMS error of", round(min.rms, 5), "at", rank, "retained satellite PCs.")) ### Return eigen = list() eigen$stats = sat.stats eigen$Xhat = ts(optimal.pcs.Xhat[, 1:rank], start = st, frequency = 12) eigen$X = ts(optimal.pcs.X[, 1:rank], start = st, frequency = 12) eigen$D = pc.data[[5]] eigen$V = pc.data[[2]][, 1:rank] eigen } getRls = function (Y, X = all.avhrr, form = form.grid.96, v.max = NA, h.min = 0.05, h.max = 10, tol = 0.001, DoFL = 12, st.y = 1957, en.y = c(2006,12), st.x = 1982, en.x = c(2006,12), external = NA) { message(paste("RLS using ", v.max, " satellite spatial eigenvectors.\n==============================================", sep = "")) ### Externally supplied data? if (identical(external, NA)) { ### Get predictor station grid cells Y.ver = window(Y.all[, -form], st.y, en.y) Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) na.ver = is.na(Y.ver) Y.idx = all.idx[-form, 4] Y = window(Y, st.y, en.y) Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) tsp(Y) = NULL tsp(Y.ver) = NULL ### Get verification stations for optimization form.ver = form[!is.element(form, form.grid)] Z = Y.all[, form.ver] Z = window(Z, st.y, en.y) Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) tsp(Z) = NULL na.ver = cbind(is.na(Z), na.ver) colnames(Z) = all.idx[form.ver, 1] Z.idx = all.idx[form.ver, 4] ### Get spatial structure X = makeAnom(window(X, st.x, en.x)) X.svd = svd(X) ### Find rank if (is.na(v.max)) {v.max = length(X.svd$d)} rank.eff = max(1, min(v.max, sum(X.svd$d > sqrt(.Machine$double.eps) * 10))) ### Extract components V = matrix(X.svd$v[, 1:rank.eff], ncol = rank.eff) d = X.svd$d[1:rank.eff] U = matrix(X.svd$u[, 1:rank.eff], ncol = rank.eff) n.eff = nrow(X) - DoFL ### Define subcomponents VY = matrix(V[Y.idx, ], ncol = rank.eff) LY = mScale(VY, d / sqrt(n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE) VZt = t(matrix(V[Z.idx, ], ncol = rank.eff)) ### Prepare spatial EOFs LYt = LY2 = list() LYt[[1]] = t(LY) LY2[[1]] = LYt[[1]] %*% LY ### Prepare censored EOFs for internal cross-validation for (i in 1:ncol(Y)) { LYt[[i + 1]] = LYt[[1]][, -i] LY2[[i + 1]] = LYt[[i + 1]] %*% t(LYt[[i + 1]]) } } else { ### Load the external data rank.eff = external$r Y = external$Y Y.ver = external$Y.ver Y.idx = external$Y.idx Z = external$Z na.ver = external$na.ver Z.idx = external$Z.idx V = matrix(external$V[, 1:rank.eff], ncol = rank.eff) d = external$d[1:rank.eff] U = matrix(external$U[, 1:rank.eff], ncol = rank.eff) n.eff = external$n.eff VY = matrix(external$VY[, 1:rank.eff], ncol = rank.eff) LY = matrix(external$LY[, 1:rank.eff], ncol = rank.eff) VZt = matrix(external$VZt[1:rank.eff, ], nrow = rank.eff) LY2 = LYt = list() LY2[[1]] = matrix(external$LY2[[1]][1:rank.eff, 1:rank.eff], ncol = rank.eff) LYt[[1]] = matrix(external$LYt[[1]][1:rank.eff, ], nrow = rank.eff) for (i in 1:ncol(Y)) { LYt[[i + 1]] = matrix(external$LYt[[1]][1:rank.eff, -i], nrow = rank.eff) LY2[[i + 1]] = LYt[[i + 1]] %*% t(LYt[[i + 1]]) } rm(external) } ### Define optimization function rlsOpt = function(h, LY2, LYt, Y, Y.ver, Z, n.eff, r, d, VY, VZt, na.ver) { ### Estimate PCs based on all predictor stations inv = solve(LY2[[1]] + diag(rep(h ^ 2, r), ncol = r)) %*% LYt[[1]] sol.pcs = mScale(Y %*% t(inv), d / sqrt(n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE) ### Find estimates for withheld stations (Z) sol = matrix(nrow = nrow(Z), ncol = ncol(Z) + ncol(Y)) sol[, 1:ncol(Z)] = sol.pcs %*% VZt ### Perform leave-one-out verification for predictor stations for (i in 1:ncol(Y)) { VZt = matrix(VY[i, ], ncol = 1) inv = solve(LY2[[i + 1]] + diag(rep(h ^ 2, r), ncol = r)) %*% LYt[[i + 1]] sol.pcs = mScale(Y[, -i] %*% t(inv), d / sqrt(n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE) sol[, ncol(Z) + i] = sol.pcs %*% VZt } ### Get error sol[na.ver] = NA sol = mScale(sol, -colMeans(sol, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) err = sqrt(sum((sol - cbind(Z, Y.ver)) ^ 2, na.rm = TRUE) / sum(!na.ver)) err } ### Define cross-validation extraction function rlsCv = function(h, LY2, LYt, Y, Y.ver, Z, n.eff, r, d, VY, VZt, na.ver) { ### Estimate PCs based on all predictor stations inv = solve(LY2[[1]] + diag(rep(h ^ 2, r), ncol = r)) %*% LYt[[1]] sol.pcs = mScale(Y %*% t(inv), d / sqrt(n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE) ### Find estimates for withheld stations (Z) sol = matrix(nrow = nrow(Z), ncol = ncol(Z) + ncol(Y)) sol[, 1:ncol(Z)] = sol.pcs %*% VZt ### Perform leave-one-out verification for predictor stations for (i in 1:ncol(Y)) { VZt = matrix(VY[i, ], ncol = 1) inv = solve(LY2[[i + 1]] + diag(rep(h ^ 2, r), ncol = r)) %*% LYt[[i + 1]] sol.pcs = mScale(Y[, -i] %*% t(inv), d / sqrt(n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE) sol[, ncol(Z) + i] = sol.pcs %*% VZt } ### Get error sol[na.ver] = NA sol = mScale(sol, -colMeans(sol, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) sol } ### Find optimal regularization parameter opt = optimize(rlsOpt, tol = tol, interval = c(h.min, h.max), LY2, LYt, Y, Y.ver, Z, n.eff, rank.eff, d, VY, VZt, na.ver) h = opt$minimum err = opt$objective message(paste("Regularization parameter: ", round(h, 5), sep = "")) message(paste("Minimum cross-validation RMS error: ", round(err, 5), "\n", sep = "")) ### Perform RLS regression inv = solve(LY2[[1]] + diag(rep(h ^ 2, rank.eff), ncol = rank.eff)) %*% LYt[[1]] sol.pcs = ts(mScale(Y %*% t(inv), d / sqrt(n.eff), type = 1, scale = TRUE, add = FALSE, center = FALSE), st.y, freq = 12) ### Get cross-validation estimates sol = rlsCv(h, LY2, LYt, Y, Y.ver, Z, n.eff, rank.eff, d, VY, VZt, na.ver) ### Store est.pcs = list() est.pcs$X = sol.pcs est.pcs$Xhat = sol.pcs est.pcs$V = V est.pcs$D = rep(1, ncol(X)) est.pcs$h = h est.pcs$err = err est.pcs$sol = sol est.pcs$ver = cbind(Z, Y.ver) ### Return est.pcs } getRecon = function (est.pcs, verbose = FALSE, compare.S09 = FALSE) { ### Get data Xhat = est.pcs$Xhat X = est.pcs$X V = est.pcs$V D = est.pcs$D st = start(Xhat) fr = frequency(Xhat) recon = list() splice = TRUE if (identical(Xhat, X)) {splice = FALSE} if (compare.S09 == TRUE) {verbose = TRUE} ### Get recons recon[[1]] = ts(mScale(Xhat %*% t(V), D, type = 1, scale = TRUE, add = FALSE, center = FALSE), start = st, freq = fr) if (splice == TRUE) { recon[[2]] = ts(mScale(X %*% t(V), D, type = 1, scale = TRUE, add = FALSE, center = FALSE), start = st, freq = fr) } else { recon[[2]] = recon[[1]] } message(" Elements in Returned List") message("=================================================================") message("[[1]]: Unspliced reconstruction [[2]]: Spliced reconstruction") message("=================================================================") ### Get regions recon[[3]] = recon[[4]] = list() recon[[3]][[1]] = ts(rowMeans(recon[[1]][, mask.east]), start = st, freq = fr) recon[[3]][[2]] = ts(rowMeans(recon[[1]][, mask.west]), start = st, freq = fr) recon[[3]][[3]] = ts(rowMeans(recon[[1]][, mask.west.a]), start = st, freq = fr) recon[[3]][[4]] = ts(rowMeans(recon[[1]][, mask.ross]), start = st, freq = fr) recon[[3]][[5]] = ts(rowMeans(recon[[1]][, mask.pen]), start = st, freq = fr) recon[[3]][[6]] = ts(rowMeans(recon[[1]]), start = st, freq = fr) if (splice == TRUE) { recon[[4]][[1]] = ts(rowMeans(recon[[2]][, mask.east]), start = st, freq = fr) recon[[4]][[2]] = ts(rowMeans(recon[[2]][, mask.west]), start = st, freq = fr) recon[[4]][[3]] = ts(rowMeans(recon[[2]][, mask.west.a]), start = st, freq = fr) recon[[4]][[4]] = ts(rowMeans(recon[[2]][, mask.ross]), start = st, freq = fr) recon[[4]][[5]] = ts(rowMeans(recon[[2]][, mask.pen]), start = st, freq = fr) recon[[4]][[6]] = ts(rowMeans(recon[[2]]), start = st, freq = fr) } else { recon[[4]] = recon[[3]] } message(" Time Series of Regional Averages") message("-----------------------------------------------------------------") message("[[3]]: Unspliced reconstruction [[4]]: Spliced reconstruction") message(" [[1]] <----- East Antarctic Avg -----> [[1]]") message(" [[2]] <----- West Antarctic Avg -----> [[2]]") message(" [[3]] <----- West (No Ross) Avg -----> [[3]]") message(" [[4]] <----- Ross Ice Shelf Avg -----> [[4]]") message(" [[5]] <----- Peninsula Avg -----> [[5]]") message(" [[6]] <----- Continental Avg -----> [[6]]") message("=================================================================") ### Get time-period trend maps & significance calculations recon[[5]] = recon[[6]] = recon[[7]] = recon[[8]] = list() X = c(1:600) / 120 recon[[5]][[1]] = lmFn(X[1:300], recon[[1]][1:300, ], sig = verbose, adj.DoF = TRUE) recon[[5]][[2]] = lmFn(X, recon[[1]], sig = verbose, adj.DoF = TRUE) recon[[5]][[3]] = lmFn(X[265:564], recon[[1]][265:564, ], sig = verbose, adj.DoF = TRUE) recon[[5]][[4]] = lmFn(X[301:600], recon[[1]][301:600, ], sig = verbose, adj.DoF = TRUE) if (splice == TRUE) { recon[[6]][[1]] = lmFn(X[1:300], recon[[2]][1:300, ], sig = verbose, adj.DoF = TRUE) recon[[6]][[2]] = lmFn(X, recon[[2]], sig = verbose, adj.DoF = TRUE) recon[[6]][[3]] = lmFn(X[265:564], recon[[2]][265:564, ], sig = verbose, adj.DoF = TRUE) recon[[6]][[4]] = lmFn(X[301:600], recon[[2]][301:600, ], sig = verbose, adj.DoF = TRUE) } else { recon[[6]] = recon[[5]] } message(" Spatial Trend Maps by Period (Deg C / Decade)") message("-----------------------------------------------------------------") message("[[5]]: Unspliced reconstruction [[6]]: Spliced reconstruction") message(" [[1]] <----- 1957 - 1982 -----> [[1]]") message(" [[2]] <----- 1957 - 2006 -----> [[2]]") message(" [[3]] <----- 1979 - 2003 -----> [[3]]") message(" [[4]] <----- 1982 - 2006 -----> [[4]]") if (verbose == TRUE) { SE = list() SE[[1]] = SE[[3]] = list() SE[[1]][[1]] = recon[[5]][[1]]$SE recon[[7]][[1]] = recon[[5]][[1]]$p recon[[7]][[2]] = recon[[5]][[2]]$p recon[[7]][[3]] = recon[[5]][[3]]$p recon[[7]][[4]] = recon[[5]][[4]]$p recon[[5]][[1]] = recon[[5]][[1]]$coef recon[[5]][[2]] = recon[[5]][[2]]$coef recon[[5]][[3]] = recon[[5]][[3]]$coef recon[[5]][[4]] = recon[[5]][[4]]$coef recon[[8]][[1]] = recon[[6]][[1]]$p recon[[8]][[2]] = recon[[6]][[2]]$p recon[[8]][[3]] = recon[[6]][[3]]$p recon[[8]][[4]] = recon[[6]][[4]]$p recon[[6]][[1]] = recon[[6]][[1]]$coef recon[[6]][[2]] = recon[[6]][[2]]$coef recon[[6]][[3]] = recon[[6]][[3]]$coef recon[[6]][[4]] = recon[[6]][[4]]$coef message("-----------------------------------------------------------------") message(" Two-Tailed Significance Tests (p > |t|)") message("-----------------------------------------------------------------") message("[[7]]: Unspliced reconstruction [[8]]: Spliced reconstruction") message(" [[1]] <----- 1957 - 1982 -----> [[1]]") message(" [[2]] <----- 1957 - 2006 -----> [[2]]") message(" [[3]] <----- 1979 - 2003 -----> [[3]]") message(" [[4]] <----- 1982 - 2006 -----> [[4]]") message("=================================================================") } else { recon[[5]][[1]] = recon[[5]][[1]]$coef recon[[5]][[2]] = recon[[5]][[2]]$coef recon[[5]][[3]] = recon[[5]][[3]]$coef recon[[5]][[4]] = recon[[5]][[4]]$coef recon[[6]][[1]] = recon[[6]][[1]]$coef recon[[6]][[2]] = recon[[6]][[2]]$coef recon[[6]][[3]] = recon[[6]][[3]]$coef recon[[6]][[4]] = recon[[6]][[4]]$coef message("=================================================================") } ### Get seasonal trend maps & significance calculations recon[[9]] = recon[[10]] = recon[[11]] = recon[[12]] = list() X = c(1:600) / 120 wint1 = makeSeason(recon[[1]], season = "w", south = TRUE) spri1 = makeSeason(recon[[1]], season = "sp", south = TRUE) summ1 = makeSeason(recon[[1]], season = "su", south = TRUE) fall1 = makeSeason(recon[[1]], season = "f", south = TRUE) recon[[9]][[1]] = lmFn(X, wint1, sig = verbose, adj.DoF = TRUE) recon[[9]][[2]] = lmFn(X, spri1, sig = verbose, adj.DoF = TRUE) recon[[9]][[3]] = lmFn(X, summ1, sig = verbose, adj.DoF = TRUE) recon[[9]][[4]] = lmFn(X, fall1, sig = verbose, adj.DoF = TRUE) if (splice == TRUE) { wint2 = makeSeason(recon[[2]], season = "w", south = TRUE) spri2 = makeSeason(recon[[2]], season = "sp", south = TRUE) summ2 = makeSeason(recon[[2]], season = "su", south = TRUE) fall2 = makeSeason(recon[[2]], season = "f", south = TRUE) recon[[10]][[1]] = lmFn(X, wint2, sig = verbose, adj.DoF = TRUE) recon[[10]][[2]] = lmFn(X, spri2, sig = verbose, adj.DoF = TRUE) recon[[10]][[3]] = lmFn(X, summ2, sig = verbose, adj.DoF = TRUE) recon[[10]][[4]] = lmFn(X, fall2, sig = verbose, adj.DoF = TRUE) } else { recon[[10]] = recon[[9]] } message(" Seasonal Trend Maps (Deg C / Decade)") message("-----------------------------------------------------------------") message("[[9]]: Unspliced reconstruction [[10]]: Spliced reconstruction") message(" [[1]] <----- Winter -----> [[1]]") message(" [[2]] <----- Spring -----> [[2]]") message(" [[3]] <----- Summer -----> [[3]]") message(" [[4]] <----- Autumn -----> [[4]]") if (verbose == TRUE) { SE[[1]][[2]] = recon[[9]][[1]]$SE SE[[1]][[3]] = recon[[9]][[2]]$SE SE[[1]][[4]] = recon[[9]][[3]]$SE SE[[1]][[5]] = recon[[9]][[4]]$SE recon[[11]][[1]] = recon[[9]][[1]]$p recon[[11]][[2]] = recon[[9]][[2]]$p recon[[11]][[3]] = recon[[9]][[3]]$p recon[[11]][[4]] = recon[[9]][[4]]$p recon[[9]][[1]] = recon[[9]][[1]]$coef recon[[9]][[2]] = recon[[9]][[2]]$coef recon[[9]][[3]] = recon[[9]][[3]]$coef recon[[9]][[4]] = recon[[9]][[4]]$coef recon[[12]][[1]] = recon[[10]][[1]]$p recon[[12]][[2]] = recon[[10]][[2]]$p recon[[12]][[3]] = recon[[10]][[3]]$p recon[[12]][[4]] = recon[[10]][[4]]$p recon[[10]][[1]] = recon[[10]][[1]]$coef recon[[10]][[2]] = recon[[10]][[2]]$coef recon[[10]][[3]] = recon[[10]][[3]]$coef recon[[10]][[4]] = recon[[10]][[4]]$coef message("-----------------------------------------------------------------") message(" Two-Tailed Significance Tests (p > |t|)") message("-----------------------------------------------------------------") message("[[11]]: Unspliced reconstruction [[12]]: Spliced reconstruction") message(" [[1]] <----- Winter -----> [[1]]") message(" [[2]] <----- Spring -----> [[2]]") message(" [[3]] <----- Summer -----> [[3]]") message(" [[4]] <----- Autumn -----> [[4]]") message("=================================================================") } else { recon[[9]][[1]] = recon[[9]][[1]]$coef recon[[9]][[2]] = recon[[9]][[2]]$coef recon[[9]][[3]] = recon[[9]][[3]]$coef recon[[9]][[4]] = recon[[9]][[4]]$coef recon[[10]][[1]] = recon[[10]][[1]]$coef recon[[10]][[2]] = recon[[10]][[2]]$coef recon[[10]][[3]] = recon[[10]][[3]]$coef recon[[10]][[4]] = recon[[10]][[4]]$coef message("=================================================================") } ### S09 comparisons if (compare.S09 == TRUE) { recon[[13]] = recon[[14]] = list() wints = makeSeason(saved.S09.recon, season = "w", south = TRUE) spris = makeSeason(saved.S09.recon, season = "sp", south = TRUE) summs = makeSeason(saved.S09.recon, season = "su", south = TRUE) falls = makeSeason(saved.S09.recon, season = "f", south = TRUE) recon[[13]][[1]] = lmFn(X, saved.S09.recon, sig = verbose, adj.DoF = TRUE) recon[[13]][[2]] = lmFn(X, wints, sig = verbose, adj.DoF = TRUE) recon[[13]][[3]] = lmFn(X, spris, sig = verbose, adj.DoF = TRUE) recon[[13]][[4]] = lmFn(X, summs, sig = verbose, adj.DoF = TRUE) recon[[13]][[5]] = lmFn(X, falls, sig = verbose, adj.DoF = TRUE) SE[[3]][[1]] = recon[[13]][[1]]$SE SE[[3]][[2]] = recon[[13]][[2]]$SE SE[[3]][[3]] = recon[[13]][[3]]$SE SE[[3]][[4]] = recon[[13]][[4]]$SE SE[[3]][[5]] = recon[[13]][[5]]$SE recon[[14]][[1]] = recon[[13]][[1]]$p recon[[14]][[2]] = recon[[13]][[2]]$p recon[[14]][[3]] = recon[[13]][[3]]$p recon[[14]][[4]] = recon[[13]][[4]]$p recon[[14]][[5]] = recon[[13]][[5]]$p recon[[13]][[1]] = recon[[13]][[1]]$coef recon[[13]][[2]] = recon[[13]][[2]]$coef recon[[13]][[3]] = recon[[13]][[3]]$coef recon[[13]][[4]] = recon[[13]][[4]]$coef recon[[13]][[5]] = recon[[13]][[5]]$coef message(" S09 Maps (1957 - 2006)") message("-----------------------------------------------------------------") message("[[13]]: Trend (Deg C / Decade) [[14]]: Significance (p > |t|)") message(" [[1]] <----- All Seasons -----> [[1]]") message(" [[2]] <----- Winter -----> [[2]]") message(" [[3]] <----- Spring -----> [[3]]") message(" [[4]] <----- Summer -----> [[4]]") message(" [[5]] <----- Autumn -----> [[5]]") message("=================================================================") ### Generate significance maps (residual trend p-value) recon[[15]] = list() recon[[15]][[1]] = lmFn(X, saved.S09.recon - recon[[1]], sig = verbose, adj.DoF = TRUE)$p recon[[15]][[2]] = lmFn(X, wints - wint1, sig = verbose, adj.DoF = TRUE)$p recon[[15]][[3]] = lmFn(X, spris - spri1, sig = verbose, adj.DoF = TRUE)$p recon[[15]][[4]] = lmFn(X, summs - summ1, sig = verbose, adj.DoF = TRUE)$p recon[[15]][[5]] = lmFn(X, falls - fall1, sig = verbose, adj.DoF = TRUE)$p message(" Maps of p-values (1957 - 2006)") message("-----------------------------------------------------------------") message("[[15]]: Residual Trend p-value (2-Tailed)") message(" [[1]] <----- All Seasons") message(" [[2]] <----- Winter") message(" [[3]] <----- Spring") message(" [[4]] <----- Summer") message(" [[5]] <----- Autumn") message("=================================================================") ### Generate tabular summary recon[[16]] = matrix(nrow = 20, ncol = 8) comp.season = list() comp.season[[1]] = comp.season[[2]] = comp.season[[3]] = matrix(nrow = 600, ncol = 20) comp.season[[1]][, 1] = recon[[3]][[6]] comp.season[[1]][, 2] = recon[[3]][[1]] comp.season[[1]][, 3] = recon[[3]][[2]] comp.season[[1]][, 4] = recon[[3]][[5]] comp.season[[1]][, 5] = ts(rowMeans(wint1), start = st, freq = fr) comp.season[[1]][, 6] = ts(rowMeans(wint1[, mask.east]), start = st, freq = fr) comp.season[[1]][, 7] = ts(rowMeans(wint1[, mask.west]), start = st, freq = fr) comp.season[[1]][, 8] = ts(rowMeans(wint1[, mask.pen]), start = st, freq = fr) comp.season[[1]][, 9] = ts(rowMeans(spri1), start = st, freq = fr) comp.season[[1]][, 10] = ts(rowMeans(spri1[, mask.east]), start = st, freq = fr) comp.season[[1]][, 11] = ts(rowMeans(spri1[, mask.west]), start = st, freq = fr) comp.season[[1]][, 12] = ts(rowMeans(spri1[, mask.pen]), start = st, freq = fr) comp.season[[1]][, 13] = ts(rowMeans(summ1), start = st, freq = fr) comp.season[[1]][, 14] = ts(rowMeans(summ1[, mask.east]), start = st, freq = fr) comp.season[[1]][, 15] = ts(rowMeans(summ1[, mask.west]), start = st, freq = fr) comp.season[[1]][, 16] = ts(rowMeans(summ1[, mask.pen]), start = st, freq = fr) comp.season[[1]][, 17] = ts(rowMeans(fall1), start = st, freq = fr) comp.season[[1]][, 18] = ts(rowMeans(fall1[, mask.east]), start = st, freq = fr) comp.season[[1]][, 19] = ts(rowMeans(fall1[, mask.west]), start = st, freq = fr) comp.season[[1]][, 20] = ts(rowMeans(fall1[, mask.pen]), start = st, freq = fr) comp.season[[3]][, 1] = ts(rowMeans(saved.S09.recon), start = st, freq = fr) comp.season[[3]][, 2] = ts(rowMeans(saved.S09.recon[, mask.east]), start = st, freq = fr) comp.season[[3]][, 3] = ts(rowMeans(saved.S09.recon[, mask.west]), start = st, freq = fr) comp.season[[3]][, 4] = ts(rowMeans(saved.S09.recon[, mask.pen]), start = st, freq = fr) comp.season[[3]][, 5] = ts(rowMeans(wints), start = st, freq = fr) comp.season[[3]][, 6] = ts(rowMeans(wints[, mask.east]), start = st, freq = fr) comp.season[[3]][, 7] = ts(rowMeans(wints[, mask.west]), start = st, freq = fr) comp.season[[3]][, 8] = ts(rowMeans(wints[, mask.pen]), start = st, freq = fr) comp.season[[3]][, 9] = ts(rowMeans(spris), start = st, freq = fr) comp.season[[3]][, 10] = ts(rowMeans(spris[, mask.east]), start = st, freq = fr) comp.season[[3]][, 11] = ts(rowMeans(spris[, mask.west]), start = st, freq = fr) comp.season[[3]][, 12] = ts(rowMeans(spris[, mask.pen]), start = st, freq = fr) comp.season[[3]][, 13] = ts(rowMeans(summs), start = st, freq = fr) comp.season[[3]][, 14] = ts(rowMeans(summs[, mask.east]), start = st, freq = fr) comp.season[[3]][, 15] = ts(rowMeans(summs[, mask.west]), start = st, freq = fr) comp.season[[3]][, 16] = ts(rowMeans(summs[, mask.pen]), start = st, freq = fr) comp.season[[3]][, 17] = ts(rowMeans(falls), start = st, freq = fr) comp.season[[3]][, 18] = ts(rowMeans(falls[, mask.east]), start = st, freq = fr) comp.season[[3]][, 19] = ts(rowMeans(falls[, mask.west]), start = st, freq = fr) comp.season[[3]][, 20] = ts(rowMeans(falls[, mask.pen]), start = st, freq = fr) RO10 = lmFn(X, comp.season[[1]], sig = TRUE, ci = 0.95, adj.DoF = TRUE) S09 = lmFn(X, comp.season[[3]], sig = TRUE, ci = 0.95, adj.DoF = TRUE) resids = lmFn(X, comp.season[[3]] - comp.season[[1]], sig = TRUE, ci = 0.95, adj.DoF = TRUE) recon[[16]][, 1] = RO10$coef recon[[16]][, 2] = RO10$ci recon[[16]][, 3] = RO10$SE recon[[16]][, 4] = S09$coef recon[[16]][, 5] = S09$ci recon[[16]][, 6] = S09$SE recon[[16]][, 7] = resids$coef recon[[16]][, 8] = resids$ci colnames(recon[[16]]) = c("RO10 Trend", "RO10 95% CI", "RO10 SE", "S09 Trend", "S09 95% CI", "S09 SE", "Residual Trend", "Residual 95% CI") r.names = c("Cont", "East", "West", "Pen") rownames(recon[[16]]) = c(r.names, paste("Winter", r.names), paste("Spring", r.names), paste("Summer", r.names), paste("Fall", r.names)) message(" Tabulated Trends") message("-----------------------------------------------------------------") message("[[16]]: Matrix of tabulated trends, CIs, SEs, and Z-test results") message("=================================================================") } ### Return recon } offsetArea = function(Yo, maxoverlap = 60) { ### Initialize variables offset = rep(NA, ncol(Yo)) offset[!is.na(Yo[1, ])] = 0 ### Cycle through each year for (i in 1:600) { ### Create station mask for data with offsets already computed mask = is.na(offset) newmask = ((!is.na(Yo[i, ])) - !mask) > 0 ### Generate offsets if new stations have been introduced if (sum(newmask) > 0) { ### Iterate for each new station for (j in which(newmask == TRUE)) { ### Calculate distance dist = g.circ(surf.coord[j, 1], surf.coord[j, 2], surf.coord[, 1], surf.coord[, 2], 6371) ### Sort stations, smallest distance to largest hh = order(dist[!mask], decreasing = FALSE) ### Remove stations with no data namask = is.na(Yo[i, hh]) hh = hh[!namask][1] ### Get the station information for the selected station stationindexb = which(dist == dist[!mask][hh])[1] ### Define raw station as a new object (easier to track) newstation = Yo[, j] ### Define offset station as new object oldstation = Yo[,stationindexb]+offset[stationindexb] maskb = (!is.na(newstation) & !is.na(oldstation)) ### Define mask for maximum overlap if (sum(maskb) > maxoverlap) { ### Truncate if greater than maxoverlap oldvals = oldstation[maskb][1:maxoverlap] newvals = newstation[maskb][1:maxoverlap] } else { ### Use all if less than or equal to maxoverlap oldvals = oldstation[maskb] newvals = newstation[maskb] } ### Calculate offset value offset[j] = mean(oldvals) - mean(newvals) } } } ### Return offsets offset } ############################################################################################ ###################### END RECONSTRUCTION FUNCTIONS ###################### ############################################################################################ ###################################### ### SEGMENT: VARIABLE ASSIGNMENTS ### ###################################### ### ### The purpose of this section is to create turnkey code by removing ### the need to download separate indexing files for the different ### station configurations, closest grid cell annotations, geographical ### locations, satellite dates, calibration offset periods, and trend ### calculation periods. ### ### Form variables (for parsing stations) ### ### S09: removes all stations not used by S09 ### no.ocean: removes all southern ocean stations ### man.no.ocean: removes all AWS and southern ocean stations ### aws.no.ocean: removes all manned and southern ocean stations ### grid: removes all stations not on the AVHRR data grid ### man.grid: removes all AWS stations and stations not on the AVHRR data grid ### aws.grid: removes all manned stations and stations not on the AVHRR data grid ### ### Suffixes: ### ### .b: Adelaide and Deception removed ### ### .96: Stations with fewer than 8 years (96 months) of data and Adelaide ### and Deception removed ### ### .ver: Stations designated for partial data removal for split calibration/ ### verification experiments. For verification with sets that do not include ### AWS stations, stations with less than 16 years are selected. For ### verification with sets that include AWS stations, AWS stations are ### selected. ### ### Casey_New_Airstrip is removed in all sets. It does not have enough points for ### anomaly calculations. ### Casey Airstrip is removed in all sets. It does not have enough points for ### ground station verification testing. ### Dome_F is removed in all sets. It has only 1 data point. ### Terra_Nova_Bay is removed in all sets. It is a duplicate of Mario_Zuchelli. ### Hi Priestley Gl (Modesta) is removed in all sets. It displays a large difference ### in measured temperature following maintenance in ~1992. ### form.all = c(52, 53, 60, 74, 105) form.S09 = c(17, 26, 45:109) form.S09.no.ocean = c(7, 9, 12, 16, 18, 19, 21, 22, 24, 28, 36, 41) form.no.ocean = c(9, 17, 19, 24, 26, 36, 41, 52, 53, 60, 74, 102, 105) form.no.ocean.b = c(1, 9, 12, 17, 19, 24, 26, 36, 41, 52, 53, 60, 74, 102, 105) form.no.ocean.96 = c(form.no.ocean.b, 4, 45, 48, 52, 56, 57, 58, 59, 61, 64, 69, 72, 75, 78, 80, 82, 84, 94, 95, 98, 100, 103, 104, 106) form.man.no.ocean = c(form.S09, 9, 19, 24, 36, 41) form.man.no.ocean.b = c(form.man.no.ocean, 1, 12) form.man.no.ocean.96 = c(form.man.no.ocean.b, 4) form.grid = c(7, 9, 16, 17, 18, 19, 21, 22, 24, 26, 28, 36, 41, 52, 53, 60, 74, 96, 98, 102, 105, 109) form.grid.a = c(7, 9, 16, 17, 18, 19, 21, 22, 24, 26, 28, 36, 41, 52, 53, 60, 74, 96, 98, 102, 105, 109) form.grid.b = c(form.grid, 1, 12) form.grid.96 = c(form.grid.b, 4, 45, 48, 52, 56, 57, 58, 59, 61, 64, 69, 72, 75, 78, 80, 82, 84, 94, 95, 100, 103, 104, 106) form.man.grid = c(7, 9, 12, 16, 17, 18, 19, 21, 22, 24, 26, 28, 36, 41, 45:109) form.man.grid.b = c(form.man.grid, 1) form.man.grid.96 = c(form.man.grid.b, 4) form.aws.no.ocean = c(1:44, 52, 53, 60, 74, 102, 105) form.aws.no.ocean.96 = c(form.aws.no.ocean, 45, 48, 52, 56, 57, 58, 59, 61, 64, 69, 72, 75, 78, 80, 82, 84, 94, 95, 98, 100, 103, 104, 106) form.aws.grid = c(1:44, 52, 53, 60, 74, 96, 102, 105, 109) form.aws.grid.96 = c(form.aws.grid, 45, 48, 52, 56, 57, 58, 59, 61, 64, 69, 72, 75, 78, 80, 82, 84, 94, 95, 98, 100, 103, 104, 106) form.noaws.ver = c(1, 4, 6, 8, 12, 16, 17, 21, 22, 26, 27, 38, 44) form.aws.ver = c(45:104, 106:109) form.long = c(1:109)[-c(2, 8, 10, 11, 13, 15, 20, 29, 30, 31, 34, 40)] form.long.ver = c(1:109)[-c(2, 8, 10, 11, 13, 15, 20, 29, 30, 31, 34, 40, 52, 53, 60, 74, 96, 102, 105, 109)] form.list = list(form.all, form.S09, form.no.ocean, form.no.ocean.b, form.no.ocean.96, form.man.no.ocean, form.man.no.ocean.b, form.man.no.ocean.96, form.grid, form.grid.b, form.grid.96, form.man.grid, form.man.grid.b, form.man.grid.96, form.long) form.name = c("Full", "S09", "No.Ocean.1A", "No.Ocean.1B", "No.Ocean.1C", "No.Ocean.2A", "No.Ocean.2B", "No.Ocean.2C", "Grid.1A", "Grid.1B", "Grid.1C", "Grid.2A", "Grid.2B", "Grid.2C", "Restricted Verification") form.ver = list(form.aws.ver, form.noaws.ver, form.aws.ver, form.aws.ver, form.aws.ver, form.noaws.ver, form.noaws.ver, form.noaws.ver, form.aws.ver, form.aws.ver, form.aws.ver, form.noaws.ver, form.noaws.ver, form.noaws.ver, form.long.ver, 1:109) form.ver.name = c("AWS", "No.AWS", "AWS", "AWS", "AWS", "No.AWS", "No.AWS", "No.AWS", "AWS", "AWS", "AWS", "No.AWS", "No.AWS", "No.AWS", "Restricted Verification", "All") ### Satellite periods noaa = list(1:300, 1:150, 151:300, 1:36, 37:83, 84:151, 157:228, 229:282, 283:300) noaa.date = list(1982, 1982, 1982, 1982, 1985, 1988.917, 1995, 2001, 2005.5) noaa.lbls = c("1982-2006", "1982-1994", "1995-2006", "NOAA-7", "NOAA-9", "NOAA-11", "NOAA-14", "NOAA-16", "NOAA-17") ### Dummy label for geographical plotting map.lbls = c("Deg C/Decade", "", "") ### Satellite offsets matrix sat.cal.matrix = matrix(ncol = 5, nrow = 6) sat.cal.matrix[1, ] = c("t-test", 143, 156, 143, 156) sat.cal.matrix[2, ] = c("t-test", 84, 142, 84, 142) sat.cal.matrix[3, ] = c("t-test", 37, 83, 37, 83) sat.cal.matrix[4, ] = c("t-test", 157, 214, 157, 214) sat.cal.matrix[5, ] = c("lm", 1, 36, 1, 36) sat.cal.matrix[6, ] = c("t-test", 229, 282, 229, 282) ### Ground station names, lat/lon, nearest grid cells, distances, geographic group temp.idx = matrix(nrow = 109, ncol = 10) temp.idx[1, ] = c("Adelaide", -67.8, 292.1, 48, 41, 40, 55, 54, 41.68639807, 1) temp.idx[2, ] = c( "Amundsen_Scott", -90, 0, 1970, 1971, 2040, 1900, 1969, 29.9732532, 4) temp.idx[3, ] = c( "Arturo_Prat", -62.5, 300.3, 3, 2, 4, 8, 9, 114.1041535, 1) temp.idx[4, ] = c( "Asuka", -71.5, 24.1, 3243, 3162, 3244, 3163, 3161, 44.99882239, 4) temp.idx[5, ] = c( "Belgrano_I", -78, 321.2, 926, 925, 879, 977, 976, 70.63341805, 4) temp.idx[6, ] = c( "Belgrano_II", -77.9, 325.4, 974, 1029, 973, 1028, 975, 27.52472898, 4) temp.idx[7, ] = c( "Bellingshausen", -62.2, 301.1, 2, 1, 3, 7, 8, 142.4397892, 1) temp.idx[8, ] = c( "Byrd", -80, 240, 817, 818, 864, 863, 816, 13.24221794, 2) temp.idx[9, ] = c( "Campbell", -52, 169, 4935, 4934, 4865, 4793, 4864, 1639.851284, 1) temp.idx[10, ] = c( "Casey", -66.3, 110.5, 5423, 5422, 5393, 5394, 5450, 68.60901916, 4) temp.idx[11, ] = c( "Davis", -68.6, 78, 5334, 5299, 5260, 5300, 5261, 32.453277, 4) temp.idx[12, ] = c( "Deception", -63, 299.3, 4, 3, 10, 9, 11, 108.4405257, 1) temp.idx[13, ] = c( "Dumont_Durville", -66.7, 140, 4576, 4500, 4501, 4499, 4575, 55.12623375, 4) temp.idx[14, ] = c( "Esperanza", -63.4, 303, 6, 1, 15, 7, 2, 31.74784973, 1) temp.idx[15, ] = c( "Faraday", -65.4, 295.6, 21, 22, 13, 29, 28, 56.14826582, 1) temp.idx[16, ] = c( "Ferraz", -62.1, 301.6, 1, 2, 6, 7, 3, 135.1830542, 1) temp.idx[17, ] = c( "Gough", -40.4, 350.1, 1855, 1450, 1583, 1856, 1451, 3385.574039, 1) temp.idx[18, ] = c( "Great_Wall", -62.2, 301, 2, 1, 3, 7, 8, 144.3266525, 1) temp.idx[19, ] = c( "Grytviken", -54.3, 323.5, 6, 1, 15, 7, 2, 1795.237216, 1) temp.idx[20, ] = c( "Halley", -75.5, 333.6, 1021, 1022, 1080, 1079, 1081, 28.00646963, 4) temp.idx[21, ] = c( "Jubany", -62.2, 301.4, 1, 2, 3, 7, 6, 135.3601642, 1) temp.idx[22, ] = c( "King_Sejong", -62.2, 301.3, 2, 1, 3, 7, 6, 140.335687, 1) temp.idx[23, ] = c( "Leningradskaja", -69.5, 159.4, 3240, 3157, 3156, 3239, 3155, 53.99238049, 4) temp.idx[24, ] = c( "Macquarie", -54.5, 158.9, 4273, 4196, 3647, 3566, 3967, 1632.703337, 1) temp.idx[25, ] = c( "Marambio", -64.2, 303.3, 15, 8, 7, 9, 6, 19.63652572, 1) temp.idx[26, ] = c( "Marion", -46.8, 37.8, 4502, 4426, 4350, 4650, 4577, 2455.134345, 1) temp.idx[27, ] = c( "Mario_Zucchelli", -74.7, 164.1, 2646, 2647, 2645, 2648, 2732, 30.02328627, 3) temp.idx[28, ] = c( "Marsh", -62.2, 301.1, 2, 1, 3, 7, 8, 120.8924296, 1) temp.idx[29, ] = c( "Mawson", -67.6, 62.9, 5196, 5139, 5197, 5140, 5250, 11.74122039, 4) temp.idx[30, ] = c( "McMurdo", -77.92, 166.67, 2418, 2419, 2416, 2417, 2345, 7.885579706, 3) temp.idx[31, ] = c( "Mirny", -66.5, 93, 5483, 5462, 5484, 5463, 5436, 33.4106842, 4) temp.idx[32, ] = c( "Molodeznaja", -67.7, 45.9, 4650, 4651, 4577, 4723, 4795, 14.61550264, 4) temp.idx[33, ] = c( "Neumayer", -70.7, 351.6, 1451, 1517, 1858, 1787, 1584, 6.016633582, 4) temp.idx[34, ] = c( "Novolazarevskaya", -70.8, 11.8, 2496, 2573, 2574, 2422, 2658, 18.54203705, 4) temp.idx[35, ] = c( "O_Higgins", -63.3, 302.1, 1, 7, 6, 2, 15, 15.54485958, 1) temp.idx[36, ] = c( "Orcadas", -60.7, 315.3, 6, 1, 15, 7, 2, 814.7070993, 1) temp.idx[37, ] = c( "Rothera", -67.5, 291.9, 41, 40, 47, 48, 51, 8.923597155, 1) temp.idx[38, ] = c( "Russkaya", -74.8, 223.1, 699, 700, 741, 657, 698, 47.7955837, 2) temp.idx[39, ] = c( "San_Martin", -68.1, 292.9, 55, 54, 48, 62, 61, 15.22295047, 1) temp.idx[40, ] = c( "Scott_Base", -77.85, 166.77, 2418, 2419, 2417, 2416, 2345, 7.885579706, 3) temp.idx[41, ] = c( "Signy", -60.7, 314.4, 6, 1, 15, 7, 2, 766.003401, 1) temp.idx[42, ] = c( "Syowa", -69, 39.6, 4197, 4198, 4274, 4121, 4122, 40.10138004, 4) temp.idx[43, ] = c( "Vostok", -78.5, 106.9, 3855, 3934, 3694, 3776, 3775, 51.25190992, 4) temp.idx[44, ] = c( "Zhongshan", -69.4, 76.4, 5259, 5209, 5154, 5210, 5155, 20.31622125, 4) temp.idx[45, ] = c( "Bonaparte_Point", -64.8, 295.9, 13, 5, 12, 20, 21, 44.3733877, 1) temp.idx[46, ] = c( "Butler_Island", -72.2, 299.8, 223, 250, 195, 278, 166, 17.33735847, 1) temp.idx[47, ] = c( "Byrd", -80, 240.6, 817, 818, 816, 864, 863, 5.856875557, 2) temp.idx[48, ] = c( "Cape_Denison", -67, 142.7, 4349, 4425, 4348, 4272, 4501, 44.79745815, 4) temp.idx[49, ] = c( "Cape_King", -73.6, 166.6, 2566, 2649, 2567, 2650, 2490, 19.99906213, 3) temp.idx[50, ] = c( "Cape_Philips", -73.1, 169.6, 2491, 2492, 2490, 2493, 2569, 37.08498924, 3) temp.idx[51, ] = c( "Cape_Ross", -76.7, 163, 2642, 2643, 2564, 2641, 2726, 33.29787583, 3) temp.idx[52, ] = c( "Casey_Airstrip", -66.3, 110.8, 5423, 5422, 5450, 5393, 5394, 42.86898755, 4) temp.idx[53, ] = c( "Casey_New_Airstrip", -66.3, 110.8, 5423, 5422, 5450, 5393, 5394, 42.86898755, 4) temp.idx[54, ] = c( "Clean_Air", -89.99, 179.99, 1970, 1971, 2040, 1900, 1969, 29.9732532, 4) temp.idx[55, ] = c( "D_10", -66.7, 139.8, 4576, 4500, 4499, 4501, 4575, 40.52006846, 4) temp.idx[56, ] = c( "D_47", -67.4, 138.7, 4575, 4499, 4574, 4648, 4500, 18.62263574, 4) temp.idx[57, ] = c( "D_57", -68.1, 137.5, 4497, 4573, 4496, 4574, 4572, 29.68688679, 4) temp.idx[58, ] = c( "D_80", -70, 134.9, 4416, 4417, 4493, 4340, 4339, 29.29911398, 4) temp.idx[59, ] = c( "Dome_C_II", -75.1, 123.4, 4098, 4174, 4023, 4099, 3943, 50.26200551, 4) temp.idx[60, ] = c( "Dome_F", -77.3, 39.7, 3341, 3260, 3259, 3342, 3340, 23.22316898, 4) temp.idx[61, ] = c( "Doug", -82.3, 246.8, 1002, 952, 907, 1112, 860, 27.93340239, 2) temp.idx[62, ] = c( "Drescher", -72.87, 340.97, 1132, 1133, 1134, 1197, 1196, 32.60891964, 4) temp.idx[63, ] = c( "Elaine", -83.1, 174.2, 2055, 2125, 2126, 2056, 2054, 38.61377024, 3) temp.idx[64, ] = c( "Elizabeth", -82.6, 222.9, 1242, 1180, 1305, 1243, 1181, 56.51167125, 2) temp.idx[65, ] = c( "Enigma_Lake", -74.7, 164, 2646, 2647, 2645, 2732, 2731, 30.13542191, 3) temp.idx[66, ] = c( "Erin", -84.9, 231.2, 1431, 1366, 1498, 1239, 1303, 50.91496271, 2) temp.idx[67, ] = c( "Ferrell", -77.9, 170.8, 2276, 2277, 2275, 2274, 2348, 36.26682375, 3) temp.idx[68, ] = c( "GC41", -71.6, 111.3, 4842, 4912, 4913, 4770, 4984, 24.97980932, 4) temp.idx[69, ] = c( "GEO3", -68.7, 61.1, 5079, 5014, 5141, 5142, 5015, 15.50248633, 4) temp.idx[70, ] = c( "GF08", -68.5, 102.1, 5353, 5318, 5386, 5354, 5385, 39.98831567, 4) temp.idx[71, ] = c( "Gill", -80, 181.4, 1922, 1992, 1923, 1993, 1851, 18.55797104, 3) temp.idx[72, ] = c( "Harry", -83, 238.6, 1114, 1115, 1175, 1058, 1057, 40.55293055, 2) temp.idx[73, ] = c( "Henry", -89, 359, 1892, 1893, 1891, 1894, 1890, 466.9032731, 4) temp.idx[74, ] = c( "Hi_Priestley_Gl", -73.6, 160.7, 2901, 2817, 2902, 2818, 2984, 13.87841726, 3) temp.idx[75, ] = c( "LGB10", -71.3, 59.2, 4664, 4663, 4734, 4591, 4735, 32.34699498, 4) temp.idx[76, ] = c( "LGB20", -73.8, 55.7, 4291, 4134, 4212, 4213, 4058, 9.036639558, 4) temp.idx[77, ] = c( "LGB35", -76, 65, 4068, 4142, 4219, 4143, 3993, 40.11379661, 4) temp.idx[78, ] = c( "LGB59", -73.5, 76.78, 4676, 4677, 4747, 4603, 4604, 15.53572843, 4) temp.idx[79, ] = c( "Larsen_Ice_Shelf", -66.9, 299.1, 49, 43, 42, 50, 57, 17.55978504, 1) temp.idx[80, ] = c( "Law_Dome_Summit", -66.7, 112.7, 5395, 5424, 5425, 5363, 5396, 19.65541292, 4) temp.idx[81, ] = c( "Lettau", -82.5, 185.6, 1847, 1846, 1848, 1845, 1916, 99.19493816, 3) temp.idx[82, ] = c( "Limbert", -75.4, 300.1, 458, 497, 420, 538, 498, 43.56113626, 1) temp.idx[83, ] = c( "Linda", -78.5, 168.4, 2346, 2345, 2347, 2348, 2344, 7.312975998, 3) temp.idx[84, ] = c( "Lynn", -74.2, 160.4, 2900, 2816, 2815, 2901, 2899, 29.01424379, 3) temp.idx[85, ] = c( "Manuela", -74.9, 163.7, 2646, 2645, 2647, 2731, 2732, 35.49586369, 3) temp.idx[86, ] = c( "Marble_Point", -77.4, 163.7, 2564, 2563, 2489, 2488, 2642, 27.32305531, 3) temp.idx[87, ] = c( "Marilyn", -80, 165.1, 2413, 2343, 2414, 2342, 2484, 25.48352667, 3) temp.idx[88, ] = c( "Minna_Bluff", -78.6, 166.7, 2417, 2418, 2416, 2344, 2419, 56.36846685, 3) temp.idx[89, ] = c( "Mount_Siple", -73.2, 232.9, 456, 419, 494, 495, 382, 115.8677037, 2) temp.idx[90, ] = c( "Nansen_Ice_Sheet", -74.8, 163.3, 2731, 2645, 2646, 2730, 2732, 58.82892652, 3) temp.idx[91, ] = c( "Nico", -89, 89.7, 2392, 2463, 2321, 2393, 2464, 324.8575404, 4) temp.idx[92, ] = c( "Pegasus_North", -77.9, 166.5, 2418, 2417, 2419, 2416, 2489, 23.96057281, 3) temp.idx[93, ] = c( "Pegasus_South", -78, 166.6, 2418, 2417, 2419, 2416, 2345, 16.24984004, 3) temp.idx[94, ] = c( "Penguin_Point", -67.6, 146.2, 4196, 4272, 4348, 4349, 4424, 7.449501047, 4) temp.idx[95, ] = c( "Port_Martin", -66.8, 141.4, 4501, 4425, 4424, 4500, 4348, 31.55325773, 4) temp.idx[96, ] = c( "Possession_Island", -71.9, 171.2, 2494, 2493, 2492, 2491, 2571, 134.1092842, 3) temp.idx[97, ] = c( "Priestley_Gl", -74.3, 163.2, 2731, 2732, 2647, 2730, 2646, 19.00585189, 3) temp.idx[98, ] = c( "Racer_Rock", -64.1, 298.4, 4, 10, 9, 5, 11, 49.47196117, 1) temp.idx[99, ] = c( "Relay_Station", -74, 43.1, 3822, 3743, 3823, 3742, 3821, 7.636661371, 4) temp.idx[100, ] = c( "Santa_Claus_Island", -65, 294.3, 14, 22, 21, 13, 29, 84.15697374, 1) temp.idx[101, ] = c( "Schwerdtfeger", -79.9, 170, 2272, 2273, 2202, 2271, 2203, 13.05798907, 3) temp.idx[102, ] = c( "Scott_Island", -67.4, 180, 3157, 3240, 3322, 3404, 3485, 422.0673688, 4) temp.idx[103, ] = c( "Siple", -75.9, 276, 324, 358, 293, 395, 359, 50.35567713, 2) temp.idx[104, ] = c( "Sutton", -67.1, 141.4, 4425, 4501, 4424, 4500, 4349, 22.90986224, 4) temp.idx[105, ] = c( "Terra_Nova_Bay", -74.7, 164.1, 2646, 2647, 2645, 2648, 2732, 30.02328627, 3) temp.idx[106, ] = c( "Theresa", -84.6, 244.2, 1236, 1363, 1173, 1113, 1299, 88.8988494, 2) temp.idx[107, ] = c( "Tourmaline_Plateau", -74.1, 163.4, 2732, 2731, 2648, 2817, 2647, 15.38304621, 3) temp.idx[108, ] = c( "Uranus_Glacier", -71.4, 291.1, 124, 146, 110, 109, 123, 15.74802782, 1) temp.idx[109, ] = c( "Whitlock", -76.2, 168.4, 2419, 2348, 2347, 2418, 2346, 173.4223562, 3) all.idx=matrix(nrow = 109, ncol = 10) all.idx[, 1] = as.vector(as.character(temp.idx[, 1])) all.idx = data.frame(all.idx) all.idx[, 2:10] = as.vector(as.numeric(temp.idx[, 2:10])) rm(temp.idx) ### Grid cell masks for geographic locations mask.pen = c(1:114, 117:128, 140:152, 166:181, 195:208, 223:235, 250:260, 278:287, 310:317, 345:350, 383:386, 420:422, 457:459) mask.west.a = c(115, 116, 129:139, 153:165, 182:194, 209:222, 236:249, 261:277, 288:309, 318:344, 351:382, 387:419, 423:456, 460:495, 497:535, 538:575, 578:615, 620:657, 663:700, 707:743, 752:787, 797:832, 843:878, 891:924, 938:970, 988:1020, 1043:1074, 1101:1131, 1163:1184, 1187:1192, 1227:1246, 1252:1256, 1292:1310, 1357:1374, 1424:1442, 1493:1505, 1560:1567, 1628:1634, 1696:1701, 1765:1770, 1835:1840, 1907:1912, 1978:1982, 2049:2053, 2120:2123, 2191:2194, 2262:2265, 2335:2337, 2407:2410, 2482:2483) mask.ross = c(1185:1186, 1247:1251, 1311:1319, 1375:1383, 1443:1449, 1506:1516, 1568:1582, 1635:1649, 1702:1717, 1771:1785, 1841:1854, 1913:1925, 1983:1996, 2054:2066, 2124:2136, 2195:2206, 2266:2277, 2338:2348, 2411:2416, 2418:2419, 2484:2486) mask.west = c(mask.west.a, mask.ross) mask.east = seq(1:5509) mask.east[c(mask.pen, mask.west)] = NA mask.east = as.vector(na.omit(mask.east)) mask.cols = rep(4, 5509) mask.cols[mask.pen] = 1 mask.cols[mask.west.a] = 2 mask.cols[mask.ross] = 3 mask.cols.noross = mask.cols mask.cols.noross[mask.ross] = 2 ### Set anchor colors cold.end = rgb(red = 50, green = 0, blue = 70, maxColorValue = 255) cold.2 = rgb(red = 0, green = 80, blue = 220, maxColorValue = 255) cold.1 = rgb(red = 0, green = 210, blue = 255, maxColorValue = 255) mid = rgb(red = 248, green = 248, blue = 248, maxColorValue = 255) hot.1 = rgb(red = 255, green = 225, blue = 0, maxColorValue = 255) hot.2 = rgb(red = 255, green = 80, blue = 0, maxColorValue = 255) hot.end = rgb(red = 60, green = 0, blue = 0, maxColorValue = 255) anchor.colors = c(cold.end, cold.2, cold.1, mid, hot.1, hot.2, hot.end) ######################################################################################## ###################### END VARIABLE ASSIGNMENTS ###################### ######################################################################################## ###======================================================================================================### ### END FUNCTION AND VARIABLE DEFINITIONS ### ### BEGIN COMPUTATIONAL SECTION ### ###======================================================================================================### ##################################### ### SEGMENT: LOAD AND PARSE DATA ### ##################################### ### ### If desired, un-comment "downData()" to obtain the S09 and READER data ### directly from online sources. The data will be saved in the working ### directory and subsequently loaded into R with "loadData()". ### ### Download data from online sources # downData() ### Load data into R l.data = loadData() saved.S09.recon = l.data[[5]] gnd.S09 = l.data[[6]] ### Get parsed data all = parseAll(l.data) all.avhrr = all[[1]] all.gnd = all[[2]] colnames(all.gnd) = all.idx[, 1] sat.coord = all[[3]] coords = matrix(nrow = nrow(sat.coord), ncol = 5) coords[, 1] = sat.coord[, 2] coords[, 2] = sat.coord[, 1] coords[, 3] = coords[, 4] = coords[, 5] = 1 allsurf.coord = all.idx[2:3] ### Grid cell masks used by S09 mask.S09.pen = (seq(1:200))[(sat.coord[1:200, 1] < 0 | sat.coord[1:200, 1] > 180) & (sat.coord[1:200, 2] > -72)] mask.S09.pen = mask.S09.pen[-115] mask.S09.west = sort(c((seq(1:5509))[(sat.coord[, 1] < 299.5 & sat.coord[, 1] > 179) & !((sat.coord[, 1] < 0 | sat.coord[, 1] > 180) & (sat.coord[, 2] > -72))], 116, 223, 278)) noteast = sort(c(mask.S09.pen, mask.S09.west)) mask.S09.east = (seq(1:5509))[-noteast] ####################################################################################### ###################### END LOAD AND PARSE DATA ###################### ####################################################################################### ################################### ### SEGMENT: SATELLITE OFFSETS ### ################################### ### ### This section computes gross satellite offsets based ### on Wilcoxon and t-tests for difference in means. ### ### NOAA-7: Correction for linear drop in temperature. ### ### NOAA-9: Correction for small negative offset in temperature. ### ### NOAA-11: Correction for small negative offset through 1993 ### and major negative offset post-1993. ### ### NOAA-14: Correction for major positive offset. ### ### NOAA-16: Correction for small negative offset. ### ### NOAA-17: No correction. ### ### NOTE: OFFSETS DETERMINED FOR INFORMATIONAL PURPOSES ONLY. ### SATELLITE DATA IS ***NOT*** CORRECTED BY THESE OFFSETS. ### ### Get only stations on the AVHRR grid and the associated ### AVHRR cells gnd.grid = all.gnd[, -form.grid] av.grid = parseSat(all.avhrr, all.idx, form = form.grid) ### Plot results of Wilcoxon/t-tests and determine offsets rng = 24 ### Get the offsets offsets = getOffsets(av.grid, gnd.grid, rng = rng) ### Plot the offsets plotEst(offsets[[2]], offsets[[1]], rng) ### Get trends sat.trends = list() X = c(1:300) / 120 sat.trends[[2]] = lmFn(X, makeAnom(all.avhrr))$coef ##################################################################################### ###################### END SATELLITE OFFSETS ###################### ##################################################################################### ######################################## ### SEGMENT: IMPUTATION EXPERIMENTS ### ######################################## ### ### This segment performs the cross-validation experiments used to select the ### optimal number of retained EOFs and to compare infilling methods. ### ### doGndRandom() performs TTLS/TSVD infilling of the pre-set ground station sets ### from the VARIABLE ASSIGNMENT segment utilizing random withholding. ### ### doGndStn() performs TTLS/TSVD infilling of the pre-set ground station sets ### from the VARIABLE ASSIGNMENT segment utilizing early/late withholding. ### ### doSat() requires a fully imputed ground station set for input and ### performs infilling of the AVHRR principle components ### ### doRls() utilizes the internal cross-validation function in the RLS ### algorithm to determine the optimal number of retained satellite EOFs ### for any given infilled ground station matrix. ### ### The output of these functions may be plotted using "plt.gnd.stats.iter()" ### and "plt.gnd.stats.avg()" for ground station imputations, and ### "plt.sat.stats.iter()" and "plt.sat.stats.avg()" for satellite ### imputations. ### ### Ground station experiment using random withholding (NOT RUN) # gnd.stats.random = doGndRandom() ### Ground station experiment using early/late withholding ### on designated stations (NOT RUN) # gnd.stats.stn = doGndStn() ### Infill satellite sets using fully infilled ground station set (NOT RUN) # sat.stats = doSat() ### Impute RLS sets using fully infilled ground station set (NOT RUN) # rls.stats = doRls() ########################################################################################## ###################### END IMPUTATION EXPERIMENTS ###################### ########################################################################################## ########################################################## ### SEGMENT: NEAREST-STATION RECONSTRUCTION (NOT RUN) ### ########################################################## ### ### Parse for ground stations (NOT RUN) # Yo = makeAnom(window(all.gnd[, -form.grid.96], 1957, c(2006,12))) # surf.coord = allsurf.coord[-form.grid.96, ] # statnumber = length(surf.coord[, 1]) # colnames(Yo) = all.idx[-form.grid.96, 1] ### Find nearest stations (NOT RUN) # stationindex = array(0, dim=c(5509, 600)) ### Cycle through each time # for (j in 1:600) { ### Find stations with data # grndmask = !is.na(Yo[j, ]) ### Cycle through each grid point # for (i in 1:5509) { ### Get distances # dist = gCirc(surf.coord[, 1], surf.coord[, 2], sat.coord[i, 2], sat.coord[i, 1], 6371) ### Store nearest station # stationindex[i, j] = which(min(dist[grndmask]) == dist)[1] # } # } ### Calculate offsets (NOT RUN) # offset = offsetArea(Yo = Yo, maxoverlap = 60) # Yoc = t(t(Yo) + offset) # Yoc = ts(Yoc, start = 1957, freq = 12) ### Assemble into reconstruction (NOT RUN) # rec.area = array(NA, dim=c(600, 5509)) ### Cycle through all values # for (j in 1:600) { # rec.area[j, ]=Yoc[j, stationindex[, j]] # } # rec.area = ts(rec.area, start = 1957, freq = 12) ########################################################################################## ###################### END NEAREST-STATION RECONSTRUCTION ###################### ########################################################################################## ################################ ### SEGMENT: RECONSTRUCTION ### ################################ ### ###======================================================== ### Infill ground stations: Set Grid.1C (form.list[[11]]) ###======================================================== ### Infill the ground set (optimal settings) ### Get the station set Y.all = makeAnom(window(all.gnd, 1957, c(2006,12))) Yo = Y.all[, -form.grid.96] ### Infill (with neat plotting) main.gnd = emFn(Yo, maxiter = 500, tol = 0.0005, type = "ttls", n.eof = 7, plt = c(T, 1), DoFL = 12) ### Extract final iteration Y = main.gnd[[7]]$X Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE) ###==================================== ### Set up ground station verification ###==================================== ### Obtain unused stations ver.list = getVerStns(Y, 2, form.list, form.ver) form.withheld = ver.list[[3]] withheld.names = all.idx[-form.withheld, 1] withheld.locs = all.idx[-form.withheld, 10] grid.names = all.idx[-form.grid, 1] grid.locs = all.idx[-form.grid, 10] ###====================================================================== ### Infill satellite PCs: Eigenvector-weighted method, optimal settings ###====================================================================== ### Infill satellite PCs eigen.pcs = getEigen(Y, pc.max = 150, n.max = 8) ### Get reconstruction recon.eigen = getRecon(eigen.pcs, verbose = TRUE, compare.S09 = TRUE) ###============================================= ### Regularized Least Squares estimation of PCs ###============================================= ### Calculate estimates ### Optimal v.max for iridge from doRls(): 126 rls.pcs = getRls(Y, v.max = 126) ### Get reconstruction recon.rls = getRecon(rls.pcs) ###================= ### S09 REPLICATION ###================= ### Replicates S09 procedure with PCs imputed simultaneously with ground stations, TTLS, ### n.eof = 3, and satellite verification reconstructions ### Get the AVHRR data ### Temporal information - S09, correlation S09.data = getPcs(all.avhrr, 1982, c(2006, 12), 2, FALSE, 1) S09.early.data = getPcs(all.avhrr, 1982, c(1994, 6), 2, FALSE, 1) S09.late.data = getPcs(all.avhrr, c(1994, 7), c(2006, 12), 2, FALSE, 1) ### Infill ### Get the station set S09.Y = gnd.S09 colnames(S09.Y) = all.gnd[1, -form.S09] ### Get first 3 satellite PCs S09.X = ts(S09.data[[1]][, 1:3], start = 1982, freq = 12) S09.X.early = ts(S09.early.data[[1]][, 1:3], start = 1982, freq = 12) S09.X.late = ts(S09.late.data[[1]][, 1:3], start = c(1994, 7), freq = 12) ### Merge S09 = ts.union(S09.Y, S09.X) S09.early = ts.union(S09.Y, S09.X.early) S09.late = ts.union(S09.Y, S09.X.late) ### Infill and extract infilled.S09 = emFn(S09, type = "ttls", DoFL = 1, maxiter = 50, tol = 0.001, n.eof = 3, plt = c(T, 43), progressive = c(F, 1, F, 1)) infilled.S09.early = emFn(S09.early, type = "ttls", DoFL = 1, maxiter = 50, tol = 0.001, n.eof = 3, plt = c(T, 43), progressive = c(F, 1, F, 1)) infilled.S09.late = emFn(S09.late, type = "ttls", DoFL = 1, maxiter = 50, tol = 0.001, n.eof = 3, plt = c(T, 43), progressive = c(F, 1, F, 1)) S09.pcs = S09.pcs.early = S09.pcs.late = list() S09.pcs$V = S09.data[[2]][, 1:3] S09.pcs.early$V = S09.early.data[[2]][, 1:3] S09.pcs.late$V = S09.late.data[[2]][, 1:3] S09.pcs$D = S09.data[[5]] S09.pcs.early$D = S09.early.data[[5]] S09.pcs.late$D = S09.early.data[[5]] rm(S09.data, S09.early.data, S09.late.data) S09.pcs$X = ts(infilled.S09[[3]]$X[, 43:45], 1957, freq = 12) S09.pcs$Xhat = ts(infilled.S09[[3]]$Xhat[, 43:45], 1957, freq = 12) S09.pcs.early$X = ts(infilled.S09.early[[3]]$X[, 43:45], 1957, freq = 12) S09.pcs.early$Xhat = ts(infilled.S09.early[[3]]$Xhat[, 43:45], 1957, freq = 12) S09.pcs.late$X = ts(infilled.S09.late[[3]]$X[, 43:45], 1957, freq = 12) S09.pcs.late$Xhat = ts(infilled.S09.late[[3]]$Xhat[, 43:45], 1957, freq = 12) ### Get reconstructions recon.S09 = getRecon(S09.pcs) recon.S09.early = getRecon(S09.pcs.early) recon.S09.late=getRecon(S09.pcs.late) ### Ground station verification test ################################################################################### ###################### END RECONSTRUCTIONS ###################### ################################################################################### ############################## ### SEGMENT: VERIFICATION ### ############################## ### ### This section compares verification statistics to ground stations ### between the eigen and rls reconstructions and the S09 reconstruction. ### It also replicates the S09 satellite verification statistics. ### ###================= ### S09 REPLICATION ###================= ### Define proper time windows orig = makeAnom(all.avhrr) est.early = window(recon.S09.early[[1]], start = 1982) est.late = window(recon.S09.late[[1]], start = 1982) early.idx = late.idx = est.early == est.early window(early.idx, start = c(1994, 7)) = FALSE window(late.idx, end = c(1994, 6)) = FALSE ### Get satellite verification statistics S09.ver.early = getStats(orig, est.early, early.idx) S09.ver.late = getStats(orig, est.late, late.idx) ###=================================== ### GROUND STATION VERIFICATION STATS ###=================================== ### Generate the ground station sets (NOT RUN) # cv.gnd = cvGnd(Yo) ### Generate E-W reconstructions (NOT RUN) # eigen.ver.estimates = verEigen(cv.gnd, pc.max = 150, n.max = 8) ### Get E-W verification stats (NOT RUN) # eigen.ver.stats = getStats(eigen.ver.estimates$ver, eigen.ver.estimates$estimates) ### Generate RLS reconstructions (NOT RUN) # rls.ver.estimates = verRls(cv.gnd, v.min = 1, v.max = 150) ### Get RLS verification stats (NOT RUN) # rls.ver.stats = getStats(rls.ver.estimates$ver, rls.ver.estimates$estimates) ### Get S09 estimates (NOT RUN) # S09.ver.estimates = verS09(S09, S09.pcs) ### Get S09 verification stats (NOT RUN) # S09.ver.stats = getStats(S09.ver.estimates$ver, S09.ver.estimates$estimates) ###========================================================== ### GROUND STATION EXPLAINED VARIANCE - FULL RECONSTRUCTIONS ###========================================================== ### Obtain the original data for all on-grid stations grid.locs = all.idx[-form.grid, 4] grid.names = all.idx[-form.grid, 1] grid.orig = all.gnd[, -form.grid] grid.orig = makeAnom(window(grid.orig, 1957, c(2006, 12))) colnames(grid.orig) = grid.names idx = !is.na(grid.orig) ### Get the data from the E-W reconstruction eigen.grid = recon.eigen[[1]][, grid.locs] colnames(eigen.grid) = grid.names ### Get the data from the RLS reconstruction rls.grid = recon.rls[[1]][, grid.locs] colnames(rls.grid) = grid.names ### Get the data from the S09 reconstruction S09.grid = saved.S09.recon[, grid.locs] colnames(S09.grid) = grid.names ### Calculate stats eigen.stats = getStats(grid.orig, eigen.grid, idx)[[1]] rls.stats = getStats(grid.orig, rls.grid, idx)[[1]] S09.stats = getStats(grid.orig, S09.grid, idx)[[1]] ### Calculate stats for just 1957 - 1981 eigen.stats.57.81 = getStats(grid.orig[1:300, ], eigen.grid[1:300, ], idx[1:300, ])[[1]] rls.stats.57.81 = getStats(grid.orig[1:300, ], rls.grid[1:300, ], idx[1:300, ])[[1]] S09.stats.57.81 = getStats(grid.orig[1:300, ], S09.grid[1:300, ], idx[1:300, ])[[1]] ### Calculate stats for just 1982 - 2006 eigen.stats.82.06 = getStats(grid.orig[301:600, ], eigen.grid[301:600, ], idx[301:600, ])[[1]] rls.stats.82.06 = getStats(grid.orig[301:600, ], rls.grid[301:600, ], idx[301:600, ])[[1]] S09.stats.82.06 = getStats(grid.orig[301:600, ], S09.grid[301:600, ], idx[301:600, ])[[1]] ###============================================== ### EIGEN AND RLS SATELLITE STATS FOR COMPARISON ###============================================== ### Compare EIGEN and RLS vs. satellite eigen.sat.stats = getStats(makeAnom(all.avhrr), window(recon.eigen[[1]], start = 1982)) rls.sat.stats = getStats(makeAnom(all.avhrr), window(recon.rls[[1]], start = 1982)) ### Compare S09 unspliced solution vs. satellite S09.sat.stats = getStats(makeAnom(all.avhrr), window(recon.S09[[1]], start = 1982)) ################################################################################ ###################### END VERIFICATION ###################### ################################################################################