Lecture 10 - Generalised linear models

Example 2 - Negative binomial regression - Student absence

Author

Anna Lindgren

Published

May 11, 2026

1 Packages

We now need to activate the MASS package. MASS = Modern Applied Statistics with S (“S” is the statistical software “R” is inspired by).

# Lecture 10 - Negative binomial  regression
library(tidyverse); theme_set(theme_bw() + theme(text = element_text(size = 16)))
library(MASS)

2 Number of days absent

The data is taken from https://stats.oarc.ucla.edu/r/dae/negative-binomial-regression/

2.1 Read the data

The data at UCLA is in Stata (another statistical software) format. The package foreign has functions for reading, i.a., SAS, Stata, Minitab, Octave, SPSS formats. Use it if you like.

#library(foreign)
#data.nb <- read.dta("https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta")

If not, I have make an R version of the data:

load("Data/nb_data.RData")
glimpse(data.nb)
Rows: 314
Columns: 5
$ id      <dbl> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 10…
$ gender  <fct> male, male, female, female, female, female, female, male, male…
$ math    <dbl> 63, 27, 20, 16, 2, 71, 63, 3, 51, 49, 31, 22, 73, 77, 10, 89, …
$ daysabs <dbl> 4, 4, 2, 3, 3, 13, 11, 7, 10, 9, 4, 5, 5, 6, 1, 0, 1, 0, 5, 24…
$ prog    <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 1, 3, 2, 2, 2, 1, 2, 2,…

2.2 Set factor labels

See the description in the link above.

data.nb <- data.nb |> mutate(
  prog = factor(prog, 
                levels = c(1, 2, 3),
                labels = c("General", "Academic", "Vocational")))
glimpse(data.nb)
Rows: 314
Columns: 5
$ id      <dbl> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 10…
$ gender  <fct> male, male, female, female, female, female, female, male, male…
$ math    <dbl> 63, 27, 20, 16, 2, 71, 63, 3, 51, 49, 31, 22, 73, 77, 10, 89, …
$ daysabs <dbl> 4, 4, 2, 3, 3, 13, 11, 7, 10, 9, 4, 5, 5, 6, 1, 0, 1, 0, 5, 24…
$ prog    <fct> Academic, Academic, Academic, Academic, Academic, Academic, Ac…

2.3 Plot vs math by program

Plot the number of days absent by math score and program

ggplot(data.nb, aes(math, daysabs, color = prog)) +
  geom_point() +
  facet_wrap(~ prog)

3 Fit a negbin model

The ordinary glm() function cannot fit a negative binomial regression so we have to use the glm.nb() (nb = negative binomial) function from the MASS package.

absence_glmnb <- glm.nb(daysabs ~ math + prog, data = data.nb)
absence_glmnb

Call:  glm.nb(formula = daysabs ~ math + prog, data = data.nb, init.theta = 1.032713156, 
    link = log)

Coefficients:
   (Intercept)            math    progAcademic  progVocational  
      2.615265       -0.005993       -0.440760       -1.278651  

Degrees of Freedom: 313 Total (i.e. Null);  310 Residual
Null Deviance:      427.5 
Residual Deviance: 358.5    AIC: 1741
summary(absence_glmnb)

Call:
glm.nb(formula = daysabs ~ math + prog, data = data.nb, init.theta = 1.032713156, 
    link = log)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     2.615265   0.197460  13.245  < 2e-16 ***
math           -0.005993   0.002505  -2.392   0.0167 *  
progAcademic   -0.440760   0.182610  -2.414   0.0158 *  
progVocational -1.278651   0.200720  -6.370 1.89e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(1.0327) family taken to be 1)

    Null deviance: 427.54  on 313  degrees of freedom
Residual deviance: 358.52  on 310  degrees of freedom
AIC: 1741.3

Number of Fisher Scoring iterations: 1

              Theta:  1.033 
          Std. Err.:  0.106 

 2 x log-likelihood:  -1731.258 

The summary() also gives us an estimate of \(\theta\) and its standard error.

3.1 Confidence intervals for \(\beta\)

Profile likelihood confidence intervals for the \(\beta\)-parameters are calculated with confint() as usual.

beta <- absence_glmnb$coefficients
ci_beta = confint(absence_glmnb)
cbind(beta = beta, ci_beta) |> round(digits = 2)
                beta 2.5 % 97.5 %
(Intercept)     2.62  2.24   3.01
math           -0.01 -0.01   0.00
progAcademic   -0.44 -0.81  -0.09
progVocational -1.28 -1.68  -0.89
cbind(RR = exp(beta), exp(ci_beta)) |> round(digits = 2)
                  RR 2.5 % 97.5 %
(Intercept)    13.67  9.41  20.35
math            0.99  0.99   1.00
progAcademic    0.64  0.44   0.91
progVocational  0.28  0.19   0.41

3.2 Estimate means with confidence intervals

The linear predictor and its confidence interval is calculated with the help of predict(), as before:

absence_pred <- cbind(
  data.nb,
  xb = predict(absence_glmnb, se.fit = TRUE))

absence_pred <- absence_pred |> 
  mutate(
    xb.residual.scale = NULL,
    xb.lwr = xb.fit - 1.96*xb.se.fit,
    xb.upr = xb.fit + 1.96*xb.se.fit,
    mu.hat = exp(xb.fit),
    mu.lwr = exp(xb.lwr),
    mu.upr = exp(xb.upr))

Add the fitted line and confidence intervals to the plot

ggplot(absence_pred, aes(math, daysabs, color = prog)) +
  geom_point() +
  geom_line(aes(y = mu.hat), linewidth = 1) +
  geom_ribbon(aes(ymin = mu.lwr, ymax = mu.upr), alpha = 0.1) +
  facet_wrap(~ prog)

4 Likelihood ratio tests

4.1 Against the null model

The \(D\) and \(D_0\) saved in the glm.nb-result reflect what happens if we remove the x-variables but keep the \(\theta\)-estimate the same. But if we remove variables the \(\theta\) has to pick up the left-over variability and will change, so the null deviance will be different for different models.

absence_null <- glm.nb(daysabs ~ 1, data = data.nb)
absence_null$null.deviance
[1] 357.4863
absence_glmnb$null.deviance
[1] 427.5401

We need to use a Likelihood-ratio test now. The easiest is to rely on the anova()-function that has this implemented.

anova(absence_null, absence_glmnb)
Likelihood ratio tests of Negative Binomial Models

Response: daysabs
        Model     theta Resid. df    2 x log-lik.   Test    df LR stat.
1           1 0.7999836       313       -1792.945                      
2 math + prog 1.0327132       310       -1731.258 1 vs 2     3 61.68695
       Pr(Chi)
1             
2 2.563505e-13

We should reject H\(_0\): \(\beta_1 + \beta_2 = \beta_3 = 0\). At least one of the \(\beta\)-parameters is significantly different from zero.

4.2 Are the differences between the programs?

We need to take the fact that the null deviance changes when we change the model, into account. This is because we also get a new \(\theta\)-estimate. Again, use the anova()-function here.

Update the model by removing prog. We can use the update() function:

absence_red <- update(absence_glmnb, . ~ . - prog)
summary(absence_red)

Call:
glm.nb(formula = daysabs ~ math, data = data.nb, init.theta = 0.8558564931, 
    link = log)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.24663    0.13916  16.145  < 2e-16 ***
math        -0.01034    0.00259  -3.991 6.58e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(0.8559) family taken to be 1)

    Null deviance: 375.05  on 313  degrees of freedom
Residual deviance: 357.90  on 312  degrees of freedom
AIC: 1782.3

Number of Fisher Scoring iterations: 1

              Theta:  0.8559 
          Std. Err.:  0.0829 

 2 x log-likelihood:  -1776.3060 
anova(absence_red, absence_glmnb)
Likelihood ratio tests of Negative Binomial Models

Response: daysabs
        Model     theta Resid. df    2 x log-lik.   Test    df LR stat.
1        math 0.8558565       312       -1776.306                      
2 math + prog 1.0327132       310       -1731.258 1 vs 2     2 45.04798
      Pr(Chi)
1            
2 1.65179e-10

We should reject H\(_0\): \(\beta_1 = \beta_2 = 0\). There are significant differences between at least two of the programs.

5 AIC, BIC and pseudo R2

Fit the null model as well, and collect AIC, BIC, \(p\), deviance \(D\) and null deviance \(D_0\) for the three models.

collect.AICetc <- data.frame(
  AIC(absence_null, absence_red, absence_glmnb), 
  BIC(absence_null, absence_red, absence_glmnb),
  D = c(absence_null$deviance, absence_red$deviance, absence_glmnb$deviance),
  D0 = c(absence_null$null.deviance, absence_red$null.deviance,
         absence_glmnb$null.deviance),
  p = c(absence_null$df.null - absence_null$df.residual,
        absence_red$df.null - absence_red$df.residual,
        absence_glmnb$df.null - absence_glmnb$df.residual)) |> 
  mutate(df.1 = NULL)
collect.AICetc
              df      AIC      BIC        D       D0 p
absence_null   2 1796.945 1804.444 357.4863 357.4863 0
absence_red    3 1782.306 1793.554 357.8987 375.0467 1
absence_glmnb  5 1741.258 1760.005 358.5193 427.5401 3
Note

Note that the null deviance now changes with the model.

Calculate the pseudo \(R^2\) and adjusted \(R^2\) for each of the models.

collect.AICetc |> mutate(
  R2 = 1 - D/D0,
  R2.adj = 1 - (D + p)/D0) |> round(digits = 3)
              df      AIC      BIC       D      D0 p    R2 R2.adj
absence_null   2 1796.945 1804.444 357.486 357.486 0 0.000  0.000
absence_red    3 1782.306 1793.554 357.899 375.047 1 0.046  0.043
absence_glmnb  5 1741.258 1760.005 358.519 427.540 3 0.161  0.154

The full model has lowest AIC and BIC and highest \(R^2_{adj}\) so they agree that it is the best one.

6 Negbin or Poisson?

6.1 Using the standardized deviance residuals

Calculate the standardized deviance residuals for the negative binomial model.

absence_infl <- influence(absence_glmnb)
absence_pred <- absence_pred |> mutate(
  v = hatvalues(absence_glmnb),
  devres = absence_infl$dev.res,
  std.devres = devres/sqrt(1 - v))

Fit a Poisson regression model to the data and calculate its standardized deviance residuals as well:

absence_pois_glm <- glm(daysabs ~ math + prog, family = "poisson", 
                        data = data.nb)
absence_pois_infl <- influence(absence_pois_glm)

absence_pred <- absence_pred |> mutate(
  xb_pois = predict(absence_pois_glm),
  v_pois = hatvalues(absence_pois_glm),
  devres_pois = absence_pois_infl$dev.res,
  std.devres_pois = devres_pois/sqrt(1 - v_pois))

checking the min and max values to get equal y-axes

absence_pred |> summarise(
  min_nb = min(std.devres),
  max_nb = max(std.devres),
  min_pois = min(std.devres_pois),
  max_pois = max(std.devres_pois))
     min_nb   max_nb  min_pois max_pois
1 -2.172154 2.539892 -4.300453 7.446744

A y-axis from \(-4.5\) to \(+7.5\) seems reasonable.

6.1.1 Residual plot for Poisson

ggplot(absence_pred, aes(x = xb_pois, color = prog)) +
  geom_point(aes(y = std.devres_pois), size = 2) +
  geom_hline(yintercept = c(-3, -2, 0, 2, 3), 
             linetype = c("dotted", "dashed", "solid", "dashed", "dotted"), 
             linewidth = 1) +
  expand_limits(y = c(-4.5, 7.5)) +
  labs(y = "std dev.res", x = "xb", color = "program",
       title = "Absence: Poisson model")

6.1.2 Residual plot for negbin

ggplot(absence_pred, aes(x = xb.fit, color = prog)) +
  geom_point(aes(y = std.devres), size = 2) +
  geom_hline(yintercept = c(-3, -2, 0, 2, 3), 
             linetype = c("dotted", "dashed", "solid", "dashed", "dotted"), 
             linewidth = 1) +
  expand_limits(y = c(-4.5, 7.5)) +
  labs(y = "std dev.res", x = "xb", color = "program",
       title = "Absence: Negbin model")

Poisson residuals are too large. Use negbin model.

6.2 With Likelihood ratio test

Do NOT use the anova() function as that tests two models following the same distributions, which is not the case here.

You have to compute everything “by hand” using the likelihood function, logLik().

Warning

This is not the same as the deviance, since the saturated models behave differently in Poisson and Negative binomial. We are not testing \(\beta\)-parameters here. Instead, the Poisson distribution is nested within the Negative binomial where the parameter \(1/\theta = 0\).

D_pois <- -2*logLik(absence_pois_glm)[1]
D_pois
[1] 2657.285
D_nb <- -2*logLik(absence_glmnb)[1]
D_nb
[1] 1731.258
D_diff <- D_pois - D_nb
D_diff
[1] 926.0272
qchisq(1 - 0.05, 1)
[1] 3.841459
pchisq(D_diff, 1, lower.tail = FALSE)
[1] 2.157298e-203

We should reject H\(_0\): \(1/\theta = 0\). The extra parameter that allows for the extra variability and turns a Poisson regression into a Negative binomial regression, is significant.

Poisson gives the same overall \(\mu\) as Negative binomial regression, but the fit does not account for increasing variability for increasing \(\mu\).

7 Bootstrap prediction intervals

The bootstrap prediction intervals are pre-calculated and available in nb_predint.RData

load("Data/nb_predint.RData")

Add them to the plot. They are stepfunctions, since \(Y\) is non-negative integers, but they are plotted here withgeom_smooth() to avoid the raggedness.

ggplot(absence_pred, aes(math, daysabs, color = prog)) +
  geom_point() +
  geom_line(aes(y = mu.hat)) +
  geom_ribbon(aes(ymin = mu.lwr, ymax = mu.upr), alpha = 0.1) +
  geom_smooth(data = nb.predint, aes(y = pred.lwr), linewidth = 1, se = FALSE) +
  geom_smooth(data = nb.predint, aes(y = pred.upr), linewidth = 1, se = FALSE) +
  labs(title = "Expected number of days absent",
       caption = "95% conf.int. and 95% bootstrap pred.int.") +
  facet_wrap(~ prog)

Calculating bootstrap intervals

If interested, see Lecture10_ex1_negbin-predint.R for calculating these yourself using the function in boot_negbin.R

8 Are the awards negative binomial?

Refit the poisson model from Example 1 and compare with a negative binomial model.

award <- read.csv("Data/poisson_sim.csv")
award <- award |> mutate(
  prog = factor(prog,
                levels = c(1, 2, 3),
                labels = c("General", "Academic", "Vocational")))
award_pois_glm <- glm(num_awards ~ prog + math, family = "poisson",
                      data = award)
award_glmnb <- glm.nb(num_awards ~ prog + math, data = award)

award_pred <- cbind(
  award,
  xb_pois = predict(award_pois_glm),
  xb_nb = predict(award_glmnb))

award_pred <- award_pred |> mutate(
  devres_pois = influence(award_pois_glm)$dev.res,
  v_pois = hatvalues(award_pois_glm),
  std.devres_pois = devres_pois/sqrt(1 - v_pois),
  devres_nb = influence(award_glmnb)$dev.res,
  v_nb = hatvalues(award_glmnb),
  std.devres_nb = devres_nb/sqrt(1 - v_nb))
ggplot(award_pred, aes(x = xb_pois, color = prog)) +
  geom_point(aes(y = std.devres_pois), size = 2) +
  geom_hline(yintercept = c(-3, -2, 0, 2, 3), 
             linetype = c("dotted", "dashed", "solid", "dashed", "dotted"), 
             linewidth = 1) +
  labs(y = "std dev.res", x = "xb", color = "program",
       title = "Awards: Poisson model")

ggplot(award_pred, aes(x = xb_nb, color = prog)) +
  geom_point(aes(y = std.devres_nb), size = 2) +
  geom_hline(yintercept = c(-3, -2, 0, 2, 3), 
             linetype = c("dotted", "dashed", "solid", "dashed", "dotted"), 
             linewidth = 1) +
  labs(y = "std dev.res", x = "xb", color = "program",
       title = "Awards: Negbin model")

D_diff.award <- -2*(logLik(award_pois_glm)[1] - logLik(award_glmnb)[1])
D_diff.award
[1] 1.693609
qchisq(1 - 0.05, 1)
[1] 3.841459
pchisq(D_diff.award, 1, lower.tail = FALSE)
[1] 0.1931259

No, the number of awards does not need a negative binomial. Poisson is ok.