Auk and Seal: June 2014

I just finished TA'ing Advanced Quantitative Methods in Psychology at Murdoch University, and I'd like to share an Excel/Calc file (note its an .ods) to help students visualize ANCOVA/regression. Users can tweak regression coefficients to minimize the Sum of Square Residuals and/or find the lines of best fit. For the math enthusiasts, it also demos the underlying matrix multiplication needed to solve the problem analytically. The Experiment: we are interested in whether treatments of sucrose (8%, 32%, 64%) effect rats' speed at completing a maze (DV) while controlling for the rats age (aka, the 'covariate' in ANCOVA).

One should never 'do stats' in excel (so I also provide R code below), but this is a great, quick way to visualize several statistics concepts that many non-staticians may not understand, including:
- how a mix of continuous and categorical variables can be visualized and different parallel lines, - how categorical variables are represented as indicator variables in the model matrix. - how Planned Constrasts determine the type of indicator columns (the default in the .ods file is coded using the 'first' treatment as the reference treatment, like in R, but users can do their own. Try the Helmert!)
- how to solve least squares regression coefficients via matrix multiplication

If I were a teacher in an intro stats course, I'd ask students to reproduce the above. One operation I couldn't figure out was how to calculate the S.E. of the regression coefficients: this requires more matrix operations than Excel offers.


# ANCOVA in R

# data: DV is speed to complete maze

# categorical variables: 3 sucrose treatments

# covariate: rate AGE

library(car)

d<-data.frame(speed=c(1.4,2,3.2,1.4,2.3,4,5,4.7,6.2,3.1,3.2,4,4.5,6.4,4.4,4.1,5.8,6.6,6.5,5.9,5.9,3,5.9,5.6),
              trt=factor(rep(c(0,1,2),each=8),labels=c("sucrose8","sucrose32","sucrose64")),
              age=c(12,5,8,7,11,2,3,4,7,5,12,12,8,4,3,2,5,8,12,2,7,11,4,3))

# calculate regression coefficients and statistics manually

# then compare to worksheet and R

X <- model.matrix(~age+trt,data=d) # model matrix

betas<-solve(t(X)%*%X)%*%(t(X)%*%d$speed) # regression coefficients

SSm<-sum(((X%*%betas) - mean(d$speed))**2) # explained variation

SSr<- sum(((X%*%betas) - d$speed)**2) # residual sum of squares

df_m<-ncol(X)-1; df_r<-nrow(X)-ncol(X) # degrees of freedom

MSm<-SSm/df_m # Mean Square of Model

MSr <-SSr/df_r # means square error / estimate of variance

F.stat <- MSm/MSr



# standard error of regression coefficients

betas.se<- sqrt(diag(solve(t(X)%*%X)*MSr)) # MSr is an estimate of sigma^2

betas.t <- betas/betas.se # chance of seeing this large of effect ~ t distr

betas.p <- 2*(1-pt(q=abs(betas.t),df=df_r)) # p-values of regression coefs




# redo above, but with R's linear model function

m1 <- lm(speed~age+trt,d)

F.stat1<-summary(m1)$fstatistic # compare F.stat and F.stat1

coef(m1) # compare with betas

summary(m1)$coefficients # compare with betas.se,t,and p



# try with different Planned Constrasts:

# use Helmert contrast to test the hypothesis that 8% is different from

# 32% and 64% is different from both 8 and 32%

m.helm <- lm(speed~ age+trt, data=d,  contrasts = list(trt = "contr.Helmert"))

Sunday, 1 June 2014

Planned contrasts and ANCOVA in Excel / LibreOffice Calc