# 📊 [archived] Multinomial regression in R

In my current project on Long-term care at some point we were required to use a regression model with multinomial responses. I was very surprised that in contrast to well-covered binomial GLM for binary response case, multinomial case is poorly described. Surely, there are half-dozen packages overlapping each other, however, there is no sound tutorial or vignette. Hopefully, my post will improve the current state.

Disclaimer:This post is outdated and was archived for back compatibility: please use with care! This post does not reflect the author’s current point of view and might deviate from the current best practices.

We can distinguish two types of multinominal responses, namely nominal and ordinal. For nominal response a variable can possess a value from predefined finite set and these values are not ordered. For instance a variable `color`

can be either `green`

or `blue`

or `green`

. In machine learning the problem is often referred to as a classification. In contrast to nominal case, for ordinal repose variable the set of values has the relative ordering. For example, a variable `size`

can be `small < middle < large`

. Furthermore, depending on a link function we can have logit or probit models.

## Nominal response models

According to Agresti (2002) we can the problem can be formulated by two similar approaches: through baseline-category logits or multivariate GLM. In general, these two approaches are equivalent with identical maximum-likelihood estimates, the only thing which is different is the formula representation.

### Baseline-category logits (multinomial logit model)

The baseline-category logits is implemented as a function in three distinct packages, namely `nnet::multinom()`

(referred as to log-linear model), `mlogit::mlogit`

, `mnlogit::mnlogit`

(claims to be more efficient implementation than `mlogit`

, see comparison of perfomances of these packages).

Let $p_j = \mathbb{P}(Y = j \mid \boldsymbol{x})$ is a probability of dependent variable $Y$ to have value $j$ given a vector of explanatory variables’ values $\boldsymbol{x}$. In total, there are $J$ categories, and obviously, due to second axiom of probability $\sum_j p_j = 1$. We fix a baseline category at level $J$ (or at any other level), and the model is as follows:

\[\log \frac{p_j}{p_J} = \alpha_j + \boldsymbol{\beta}'_j \boldsymbol{x}, \quad j = 1, ..., J - 1,\]describing the effects of explanatory $\boldsymbol{x}$ on logits of odds between a level $j$ and baseline level. Of course, using these $J-1$ equations and the second axiom it’s possible to come back to probabilities (which is a nice exercise, by the way):

\[p_j = \frac{\exp(\alpha_j + \boldsymbol{\beta}'_j \boldsymbol{x})}{1 + \sum_{h = 1}^{J-1}\exp(\alpha_h + \boldsymbol{\beta}'_h \boldsymbol{x})}\]For each group $j$ the set of parameters $\alpha_j$ and $\boldsymbol{\beta}_j$ are distinct. Let’s now estimate those $\alpha_j, \quad \boldsymbol{\beta}_j, \quad j = 1, …, J - 1$ by different packages and make sure that estimates are identical. I use `marital.nz`

data from `VGAM`

package.

```
# install.packages("VGAM")
library(VGAM)
data(marital.nz)
# age ethnicity mstatus
# 1 29 European Single
# 2 55 European Married/Partnered
# 3 44 European Married/Partnered
# 4 53 European Divorced/Separated
# 5 45 European Married/Partnered
# 7 30 European Single
unique(marital.nz$mstatus)
# [1] Single Married/Partnered Divorced/Separated Widowed
# Levels: Divorced/Separated Married/Partnered Single Widowed
```

The data contains “marital data mainly from a large NZ company collected in the early 1990s”. Dependent variable `mstatus`

has four unordered classes `Divorced/Separated`

, `Married/Partnered`

, `Single`

, and `Widowed`

. We use `age`

as the only exploratory variable.

- Package
`nnet`

```
library(nnet)
fit_nnet <- multinom(mstatus ~ age, marital.nz)
coef(fit_nnet)
# (Intercept) age
# Married/Partnered 2.778686 -0.003538729
# Single 6.368064 -0.152745520
# Widowed -6.753123 0.099333903
```

- Package
`mlogit`

```
library(mlogit)
fit_mlogit <- mlogit(mstatus ~ 0 | age, data = marital.nz, shape = "wide")
matrix(fit_mlogit$coefficients, ncol = 2)
# [,1] [,2]
# [1,] 2.778666 -0.003538297
# [2,] 6.368056 -0.152745424
# [3,] -6.753157 0.099334560
```

- Package
`mnlogit`

```
library(mnlogit)
marital.nz_long <- mlogit.data(data = marital.nz, choice = "mstatus")
fit_mnlogit <- mnlogit(mstatus ~ 1 | age | 1, marital.nz_long)
matrix(fit_mnlogit$coefficients, ncol = 2, byrow = TRUE)
# [,1] [,2]
# [1,] 2.778666 -0.003538297
# [2,] 6.368056 -0.152745424
# [3,] -6.753157 0.099334560
```

Even though the latter package is very efficient and customizable, there are several points I am not a big fan of. First off, `mnlogit`

works *only* with long data instead of common and familiar for regression wide. That’s why we had to use `mlogit.data`

to convert the data. Second, the formula’s syntax is too confusing despite its customizability. Of course, the list is not exhaustive, other packages exists, e.g. brglm2.

### Multinomial logit model as multivariate GLM

For this model instead of treating the response variable as a scalar we set to be a vector of $J-1$ elements ($J$-th is redundant). Then, $\boldsymbol{y_i} = (y_{i,1}, …, y_{i, J-1})’$ and $\boldsymbol{\mu_i} = (p_{i,1}, …, p_{i, J-1})’$. Therefore,

\[g_j(\boldsymbol{\mu}_i) = \log \frac{\mu_{i,j}}{1 - (\mu_{i,1}+...+\mu_{i, J-1})}\]and

\[\boldsymbol{g}(\boldsymbol{\mu}_i) = \boldsymbol{X}_i \boldsymbol{\beta}\]where $\boldsymbol{g}$ is a vector of link functions.

The package `vgam`

deals exactly with cases of multivariate GLM and GAM. Let’s compute estimates for this model, which should coincide with previously calculated ones:

```
library(VGAM)
fit_vgam <- vglm(mstatus ~ age, multinomial(refLevel = 1),
data = marital.nz)
matrix(fit_vgam@coefficients, ncol = 2)
# [,1] [,2]
# [1,] 2.778666 -0.003538297
# [2,] 6.368056 -0.152745424
# [3,] -6.753157 0.099334560
```

## Ordinal response model: proportional odds model

For ordinal response variable the model is slightly different. Let $Y$ be a categorical response variable with $J$ categories which are ordered $1<…<J$. Therefore, it is possible to define cumulative probabilities as

\[\mathbb{P}(Y \leq j \mid \boldsymbol{x}) = p_1 + ... + p_j, \quad j = 1, ..., J\]Then, cumulative logits are:

\[\text{logit}(\mathbb{P}(Y \leq j \mid \boldsymbol{x})) = \log\frac{\mathbb{P}(Y \leq j \mid \boldsymbol{x})}{1 - \mathbb{P}(Y \leq j \mid \boldsymbol{x})} = \log\frac{p_1 + ... + p_j}{p_{j+1} + ...+ p_J}, \quad j = 1, ..., J - 1\]Let’s now define the cumulative logits and exploratory variables $\boldsymbol{x}$:

\[\text{logit}(\mathbb{P}(Y \leq j \mid \boldsymbol{x})) = \alpha_j + \boldsymbol{\beta}' \boldsymbol{x}, \quad j = 1, ..., J-1\]Note that $\boldsymbol{\beta}$ are the same for each logit. However, intercepts can be different and necessarily are non-decreasing.

The model got its name from its property:

\[\text{logit}(\mathbb{P}(Y \leq j \mid \boldsymbol{x}_1)) - \text{logit}(\mathbb{P}(Y \leq j \mid \boldsymbol{x}_2)) = \log\frac{\mathbb{P}(Y \leq j \mid \boldsymbol{x}_1) / \mathbb{P}(Y \geq j \mid \boldsymbol{x}_1)}{\mathbb{P}(Y \leq j \mid \boldsymbol{x}_2) / \mathbb{P}(Y \geq j \mid \boldsymbol{x}_2)} = \boldsymbol{\beta}' (\boldsymbol{x}_1 - \boldsymbol{x}_2)\]Again, there are at least four packages, which calibrate the proportional odds model. Let’s quickly compare those estimates using Italian household data for 2006 dataset `ecb06it`

from `VGAMdata`

package. We try to explain ordinal variable `education`

of 8 levels by numeric `age`

.

```
# install.packages("VGAMdata")
library(VGAMdata)
data(ecb06it)
# str(ecb06.it)
head(ecb06.it[, c("age", "education")])
# age education
# 1 58 highschool
# 4 81 primaryschool
# 5 52 highschool
# 9 67 middleschool
# 12 56 middleschool
# 16 72 primaryschool
```

- Package
`MASS`

Perhaps the most famous function is `MASS::polr`

.

```
library(MASS)
fit_polr <- polr(formula = education ~ age, data = ecb06.it)
summary(fit_polr)$coefficients[, 1, drop = FALSE]
# Value
# age -0.06417893
# none|primaryschool -6.95688936
# primaryschool|middleschool -4.51869196
# middleschool|profschool -3.06471919
# profschool|highschool -2.73295822
# highschool|bachelors -0.96907401
# bachelors|masters -0.89517059
# masters|higherdegree 2.42815131
```

- Package
`VGAM`

```
fit_vglm <- vglm(formula = education ~ age, family = propodds, data = ecb06.it)
as.matrix(fit_vglm@coefficients)
# [,1]
# (Intercept):1 6.95576156
# (Intercept):2 4.51825182
# (Intercept):3 3.06430069
# (Intercept):4 2.73254206
# (Intercept):5 0.96867493
# (Intercept):6 0.89470432
# (Intercept):7 -2.42867591
# age -0.06417086
```

- Package
`ordinal`

```
library(ordinal)
fit_clm <- clm(formula = education ~ age, data = ecb06.it)
as.matrix(fit_clm$coefficients)
# [,1]
# none|primaryschool -6.9557784
# primaryschool|middleschool -4.5182645
# middleschool|profschool -3.0643131
# profschool|highschool -2.7325541
# highschool|bachelors -0.9686858
# bachelors|masters -0.8947152
# masters|higherdegree 2.4286635
# age -0.0641711
```

Nice thing about this package is that it allows for using different link functions, i.e. `"logit"`

, `"probit"`

, `"cloglog"`

, `"loglog"`

, and `"cauchit"`

. To my regret I know only `"logit"`

and `"probit"`

from this list.

- Package
`rms`

```
library(rms)
fit_lrm <- lrm(formula = education ~ age, data = ecb06.it)
as.matrix(fit_lrm$coefficients)
# [,1]
# y>=primaryschool 6.9557784
# y>=middleschool 4.5182645
# y>=profschool 3.0643131
# y>=highschool 2.7325541
# y>=bachelors 0.9686858
# y>=masters 0.8947152
# y>=higherdegree -2.4286635
# age -0.0641711
```

This function was rather unstable. Adding more exploratory variable have thrown an error a couple of times.

Coefficients are consistent (difference in signs are explained by $\mathbb{P}(Y \leq j)$ and $\mathbb{P}(Y \geq j)$), which is good.

Perhaps, now you have a question which package to use? Well, I do not know, just choose one and stick to it. I will use probably `VGAM`

, as long as it covers various models and seems like nicely documented.

References:

- Agresti, A. (2002) Categorical Data, Second edition, Wiley
- STAT504