Overview

In this workshop we will take our first look at (generalised) linear mixed models (GLMMs or LMMs). The use of LMMs is becoming increasingly widespread across many aspect of the Psychological and Life Sciences in place of more traditional models (such as ANOVA) which are based on the general linear model. LMMs work by modelling individual data points (rather than aggregate data), can cope with unbalanced designed, missing data, a combination of categorical and continuous predictors, multiple random effects, participants and item covariates - and with GLMMs we can model data of different distribution types (e.g., data from the binomial distribution). The philosophy begind (G)LMMs is relatively straightforward and can be thought of as an extension of the general linear model. In this first video, I’ll summarise the general linear model and then show how LMMs take things a little bit further.

  

  

Slides

You can download the slides in .odp format by clicking here and in .pdf format by clicking on the image below.

  

Link to slides

  

Once you’ve watched the video above, run the code below on your own machines.

Linear Models Recap

Predicting Height From Gender

library(tidyverse)

We first read in the datafile we need. We are also going to use mutate() to turn our subject and gender columns into factors.

gender_height_data <- read_csv("https://raw.githubusercontent.com/ajstewartlang/15_mixed_models_pt1/master/data/gender_height_data.csv")

gender_height_data <- gender_height_data %>%
  mutate(subject = factor(subject),
            gender = factor(gender))

Now let’s plot our data. It’s really important to do this before you build any models. You need to make sure the data look as you expect, and if you’re trying to fit a linear model to your data it’s important to be sure the relationship between your variables is roughly linear. Note, I’ve added a little bit of horizontal jitter to the plot.

set.seed(1234)
gender_height_data %>%
  ggplot(aes(x = gender, y = height)) +
  geom_jitter(width = .1) +
  theme_minimal() +
  labs(x = "Gender", y = "Height (cm)") +
  scale_x_discrete(labels = c("Female", "Male"))

Now let’s build a linear model using the lm() function and look at the parameter estimates using summary().

height_model <- lm(height ~ gender, data = gender_height_data)
summary(height_model)
## 
## Call:
## lm(formula = height ~ gender, data = gender_height_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.500 -3.125  0.000  3.125  7.500 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  165.000      2.700  61.104 1.29e-09 ***
## gendermale    12.500      3.819   3.273    0.017 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.401 on 6 degrees of freedom
## Multiple R-squared:  0.641,  Adjusted R-squared:  0.5812 
## F-statistic: 10.71 on 1 and 6 DF,  p-value: 0.01696

We can see from this output that the mean height of our Females is 165 cm - this corresponds to the intercept of our model. To calculate the mean height of our Males, we add 12.5cm to our intercept of 165 (giving us 177.5 cm). We see from the t-test that the difference between these two groups is significant (p = 0.017).

Predicting Height From Age

Now let’s look at a case where we have two continuous variables.

age_height_data <- read_csv("https://raw.githubusercontent.com/ajstewartlang/15_mixed_models_pt1/master/data/age_height_data.csv")

Let’s plot our data.

age_height_data %>%
  ggplot(aes(x = age, y = height)) +
  geom_point() +
  theme_minimal() +
  labs(x = "Age (years)", y = "Height (cm)")

The plot suggests there’s a linear relaionship between our two variables. Let’s build a linear model and examine the parameter estimates.

age_model <- lm(height ~ age, data = age_height_data)
summary(age_model)
## 
## Call:
## lm(formula = height ~ age, data = age_height_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.045 -2.104  1.646  3.201  3.557 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  126.281     11.411  11.067 3.24e-05 ***
## age            2.398      0.602   3.984  0.00725 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.721 on 6 degrees of freedom
## Multiple R-squared:  0.7257, Adjusted R-squared:  0.6799 
## F-statistic: 15.87 on 1 and 6 DF,  p-value: 0.007252

We can see that age is a significant predictor (p = .007). We interpret the parameter estimates slighly differently with continuous predictors compared to how we interpreted them when our predictor was categorical (as in the previous example). The intercept corresponds to someone’s height when they are zero years old - which obviously makes no sense in this case. The model we’ve built really only captures the linear relationship between height and age for the age ranges we have data for. The relationship between height and age across the lifespan is obviously not linear! The estimate for age (2.398) should be interpreted as the increase in someone’s height for every one year. So, for our dataset people tend to grow 2.398 cm per year.

For a further gentle introduction to (G)LMMs I recommend you have a look at these two great tutorial papers by Bodo Winter. Just click on the images below to access them.

  

Link to tutorial 1 Link to tutorial 2

  

Mixed Models

One Factor Design With Two Levels

Imagine we are interested in how a person’s reaction time varies whether they’re responding to Large or Small target items. We observe the same 10 people each responding to 5 Large and 5 Small target items. We have 10 observations per person. These observations are not independent of each other as (which is an assumption of a linear model).

Imagine also that we have different Target Items (e.g., 10 different items that were presented in either in Large or Small format). Each Target Item might have been a little different. One particular Target might just be responded to more quickly (regardless of what condition it was in) - in other words, the Target Items will also have different baselines.

We’ll be using the {lme4} package to build our models so let’s load that first. Remember, if you haven’t already installed {lme4} on your machine remember you need to first run in the Console install.packages(lme4). We’ll also load the {lmerTest} package to give us estimated p-values for our parameter estimates.

library(lme4)
library(lmerTest)

Let’s read in our data and turn our subject, item, and condition columns into factors.

mixed_model_data <- read_csv("https://raw.githubusercontent.com/ajstewartlang/15_mixed_models_pt1/master/data/mixed_model_data.csv")

mixed_model_data <- mixed_model_data %>%
  mutate(subject = factor(subject),
            item = factor(item),
            condition = factor(condition))

And generate some descriptives.

mixed_model_data %>% 
  group_by(condition) %>%
  summarise(mean_rt = mean(rt), sd_rt = sd(rt))
## # A tibble: 2 × 3
##   condition mean_rt sd_rt
##   <fct>       <dbl> <dbl>
## 1 large        854.  87.8
## 2 small        804.  97.5

We’ll build our mixed model with condition as a fixed effect, and subject and item as random effects.

mixed_model <- lmer(rt ~ condition + (1 | subject) + (1 | item), 
                    data = mixed_model_data)
summary(mixed_model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: rt ~ condition + (1 | subject) + (1 | item)
##    Data: mixed_model_data
## 
## REML criterion at convergence: 4696.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5385 -0.6366 -0.1475  0.6054  2.6043 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 1240.3   35.22   
##  item     (Intercept)  442.8   21.04   
##  Residual             7126.7   84.42   
## Number of obs: 400, groups:  subject, 10; item, 5
## 
## Fixed effects:
##                Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)     854.140     15.755  12.166  54.213 6.99e-16 ***
## conditionsmall  -49.780      8.442 385.000  -5.897 8.12e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## conditnsmll -0.268

We see from the above output that our Intercept parameter estimate is 854.140. This corresponds to the mean RT for the Large experimental condition. The estimate -49.780 corresponds to the difference in mean RT between the Large experimental condition and the Small experimental condition. In other words, people are 49ms faster in the Small condition relative to the Large condition. Note, in this example R is using dummy or treatment coding to code the levels of our condition. This is fine for a design with one factor, but becomes problematic when we have factorial designs involving interaction effects. For these kinds of designs we need to use contrast coding. We’ll cover this later in more detail but it’s worth noting at this point that the coding scheme you use can change how you interpret the parameter estimates. Many people get this wrong…

We can use the Likelihood Ratio Test (LRT) to determine whether our model containing our fixed effect of condition is better than a model that contains only the random effects. Note, you can only use the LRT when one model is a subset or (or nested within) the other. Let’s build a model containing only our random effects. We’ll call it mixed_model_null.

mixed_model_null <- lmer(rt ~ (1 | subject) + (1 | item), 
                         data = mixed_model_data)

We can then compare the two models to each other via the LRT using anova().

anova(mixed_model, mixed_model_null)
## Data: mixed_model_data
## Models:
## mixed_model_null: rt ~ (1 | subject) + (1 | item)
## mixed_model: rt ~ condition + (1 | subject) + (1 | item)
##                  npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## mixed_model_null    4 4751.5 4767.5 -2371.8   4743.5                         
## mixed_model         5 4720.1 4740.1 -2355.1   4710.1 33.368  1  7.626e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see the models differ from each other, with the AIC, BIC, and deviance measures all lower for the model with the fixed effect. This indicates that model with the fixed effect of condition explains more of the variability in our data than does the model with only random effects (and no fixed effect).

Let’s now build a model which models the slopes of our random effects. This means we are allowing the difference between the two levels of our fixed effect to differ in magnitude from one participant to the next, and from one item to the next.

mixed_model_slopes <- lmer(rt ~ condition + (1 + condition | subject)
                           + (1 + condition | item), data = mixed_model_data)

We can investigate the model parameter estimates using the summary() function.

summary(mixed_model_slopes)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: rt ~ condition + (1 + condition | subject) + (1 + condition |  
##     item)
##    Data: mixed_model_data
## 
## REML criterion at convergence: 4689.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5261 -0.6680 -0.1617  0.6653  2.8354 
## 
## Random effects:
##  Groups   Name           Variance Std.Dev. Corr
##  subject  (Intercept)     689.3   26.25        
##           conditionsmall  624.7   24.99    0.61
##  item     (Intercept)     321.4   17.93        
##           conditionsmall  482.4   21.96    0.01
##  Residual                6880.4   82.95        
## Number of obs: 400, groups:  subject, 10; item, 5
## 
## Fixed effects:
##                Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)     854.140     12.946   7.751  65.975 6.09e-12 ***
## conditionsmall  -49.780     15.092   5.924  -3.298   0.0168 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## conditnsmll 0.033

We can see that with a more complex random effect structure (i.e., random slopes as well as intercepts), the effect of our fixed effect of condition is still clearly there (and it is significant).

One Factor With Three Levels

In this next video, I’ll show you how to build a mixed model for a design where you have one experimental factor with three levels.

  

  

Once you’ve watched the video, run the code below on your own machines.

Imagine we measured 24 subjects’ gaze duration while reading a sentence that appeared in one of three conditions - Negative, Neutral, and Positive. We had 24 items. The study is repeated measures (so everyone saw every condition).

tidied_factor_1_data <- read_csv("https://raw.githubusercontent.com/ajstewartlang/15_mixed_models_pt1/master/data/one_factor.csv")

tidied_factor_1_data <- tidied_factor_1_data %>%
  mutate(Subject = factor(Subject),
         Item = factor(Item),
         Condition = factor(Condition))

Let’s plot our data.

tidied_factor_1_data %>%
  ggplot(aes(x = Condition, y = Gaze, colour = Condition)) +
  geom_violin(width = .5) +
  geom_jitter(width = .1, alpha = .1) +
  stat_summary(fun.data = "mean_cl_boot", colour = "black") +
  theme_minimal() +
  labs(x = "Condition",
       y = "Gaze Duration (ms.)") +
  guides(colour = "none") +
  coord_flip()

If we build the following model, we end up with a singular fit warning suggesting we are trying to estimate too many parameters than our data supports.

factor_1_model <- lmer(Gaze ~ Condition + (1 + Condition | Subject) + 
                         (1 + Condition | Item), data = tidied_factor_1_data)
## boundary (singular) fit: see ?isSingular

We can simplify the model by dropping random effect terms until the warning goes away and we have a set of parameter estimates we can be confident in.

factor_1_model <- lmer(Gaze ~ Condition + (1 | Subject) + (1 | Item), 
                       data = tidied_factor_1_data) 

We can check the model assumptions using the {performance} package. Remember, we want to see the residuals (roughly) normally distributed.

library(performance)
check_model(factor_1_model)

These look pretty good to me.

We can generate the summary of the model.

summary(factor_1_model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Gaze ~ Condition + (1 | Subject) + (1 | Item)
##    Data: tidied_factor_1_data
## 
## REML criterion at convergence: 8713.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4240 -0.6246 -0.1505  0.4069  7.5250 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept)  81340   285.2   
##  Item     (Intercept)  29586   172.0   
##  Residual             206838   454.8   
## Number of obs: 574, groups:  Subject, 24; Item, 24
## 
## Fixed effects:
##                   Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)        1083.90      75.53   45.12  14.350  < 2e-16 ***
## ConditionNeutral    100.84      46.55  525.13   2.166  0.03073 *  
## ConditionPositive   123.40      46.48  525.09   2.655  0.00818 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) CndtnN
## ConditnNtrl -0.308       
## ConditnPstv -0.309  0.501

In this case, the Intercept corresponds to the Negative condition - this is because we are still using R dummy coding of contrasts and the reference level is the condition that occurs first alphabetically. The estimates for conditionNeutral and conditionPositive involve a comparison of these two levels of our factor with the reference level (i.e., the Negative condition). We see that both levels of our factor differ from this reference level.

To determine whether each condition differs from each other condition, we can run pairwise comparisons using the {emmeans} package.

library(emmeans)

This is the same type of pairwise comparisons that we ran in the context of ANOVA. By default, Tukey correction is used for multiple comparisons.

emmeans(factor_1_model, pairwise ~ Condition)
## $emmeans
##  Condition emmean   SE   df lower.CL upper.CL
##  Negative    1084 75.5 45.1      932     1236
##  Neutral     1185 75.5 45.1     1033     1337
##  Positive    1207 75.5 45.0     1055     1359
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast            estimate   SE  df t.ratio p.value
##  Negative - Neutral    -100.8 46.5 525  -2.166  0.0780
##  Negative - Positive   -123.4 46.5 525  -2.655  0.0222
##  Neutral - Positive     -22.6 46.5 525  -0.485  0.8783
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

We can see with an appropriate correction for multiple comparisons (remember the familywise error rate?), that the only pairwise comparison that is significant is the Negative vs Positive condition comparison.

Your Challenge

Within R, import the dataset “data1.csv” that can be found at:

https://raw.githubusercontent.com/ajstewartlang/15_mixed_models_pt1/master/data/data1.csv

These data are from a reaction time experiment. Fifty participants had to respond to a word on the screen. Their task was to press a button on a button box only when they recognized the word (our DV is measured in milliseconds). The words were either Rare or Common. The design is repeated measures. We might expect Common words to be recognized more quickly than Rare words. Run the appropriate LMM to determine whether this is indeed correct.

You should find a mixed model where RTs to the Rare condition are about 200 ms. slower than RTs to the Common condition, and this is a significant difference - so our prediction is supported. Did you remember to check the model assumptions?

Improve this Workshop

If you spot any issues/errors in this workshop, you can raise an issue or create a pull request for this repo.