Inference for linear combinations of slope coefficients
Inference for conditional expectations
Recall: Region takes on values SOUTH, WEST, MIDWEST.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -555.5274 212.5735 -2.613 0.01817 *
RegionSOUTH 837.0791 292.4465 2.862 0.01079 *
RegionWEST 527.3892 393.8955 1.339 0.19823
Bonus 1.8994 0.6424 2.957 0.00883 **
Advert 2.6585 0.2132 12.469 5.59e-10 ***
RegionSOUTH:Bonus -0.8742 0.9571 -0.913 0.37382
RegionWEST:Bonus 0.7477 1.0145 0.737 0.47117
RegionSOUTH:Advert -1.5546 0.4800 -3.239 0.00483 **
RegionWEST:Advert -1.6964 0.5644 -3.006 0.00796 **
Which region's slope on Bonus is estimated to be 1.8994?
What does the coefficient on RegionWEST:Advert represent?
How would I assess whether I need to include the interaction between Advert and Region in the above regression?
Today we'll discuss the distribution of linear combinations of estimated slope coefficients:
\begin{align*} a^\top\hat{\beta} = \sum_{j=0}^p a_{j+1}\hat{\beta}_j \end{align*}We've been studying a special case to date:
Setting $a_{j+1} = 1$ and the remaining entries zero returns the slope coefficient $\hat{\beta}_j$.
Today we'll discuss the distribution of linear combinations of estimated slope coefficients:
\begin{align*} a^\top\hat{\beta} = \sum_{j=0}^p a_{j+1}\hat{\beta}_j \end{align*}The more general form will be useful for, among other things:
Inference for slope coefficients for the non-reference category when doing regression with interactions
Inference for $E(Y\mid X = \tilde{x})$.
We'll now explore the distribution of certain linear combinations of slopes. Let's first recall some basic properties about random vectors:
Let $Z\in \mathbb{R}^m$ be a random vector.
\begin{align*} Z &= (Z_1, \ldots, Z_m)^\top \\ E(Z) &= (E(Z_1), \ldots, E(Z_m))^\top \end{align*}Properties of Expectation
For $A\in \mathbb{R}^{n\times m}$, $c\in \mathbb{R}^n$, $A,c$ constant:
\begin{align*} E(AZ + c) &= A\;E(Z) + c \end{align*}For $Z\in \mathbb{R}^m$ a random vector:
\begin{align*} \text{Var}(Z) &= E[(Z-E(Z))(Z-E(Z))^\top]\\ &= E(ZZ^\top) - E(Z)E(Z)^\top\\ &=\left( \begin{array}{cccc} \text{var}(Z_1) & \cdots & \cdots & \text{cov}(Z_1, Z_m) \\ \text{cov}(Z_2, Z_1) & \text{cov}(Z_2,Z_2) & \cdots & \text{cov}(Z_2, Z_m) \\ \vdots & \ddots & \ddots & \vdots \\ \text{cov}(Z_m, Z_1) & \cdots & \cdots & \text{var}(Z_m) \end{array} \right) \end{align*}Properties of Variance
For $A\in \mathbb{R}^{n\times m}$, $c \in \mathbb{R}^n$, $A,c$ constant:
\begin{align*} \text{Var}(AZ + c) &= A\;\text{Var}(Z)A^\top \end{align*}For $\hat{\beta}\in \mathbb{R}^{p+1}$ our vector of estimated coefficients:
\begin{align*} E(\hat{\beta}) &= (E(\hat{\beta}_0), \ldots,E(\hat{\beta}_p))^\top\\ \text{Var}(\hat{\beta}) &= E[(\hat{\beta}-E(\hat{\beta}))(\hat{\beta}-E(\hat{\beta}))^\top]\\ &=\left( \begin{array}{cccc} \text{var}(\hat{\beta}_0) & \cdots & \cdots & \text{cov}(\hat{\beta}_0, \hat{\beta}_p) \\ \text{cov}(\hat{\beta}_1, \hat{\beta}_0) & \text{var}(\hat{\beta}_1) & \cdots & \text{cov}(\hat{\beta}_1, \hat{\beta}_p) \\ \vdots & \ddots & \ddots & \vdots \\ \text{cov}(\hat{\beta}_p, \hat{\beta}_0) & \cdots & \cdots & \text{var}(\hat{\beta}_p) \end{array} \right) \end{align*}Properties
For $A\in \mathbb{R}^{d\times (p+ 1)}$, $c \in \mathbb{R}^d$, $A,c$ constant:
\begin{align*} E(A\hat{\beta} + c) &= AE(\hat{\beta}) + c\\ \text{Var}(A\hat{\beta} + c) &= A\;\text{Var}(\hat{\beta})A^\top \end{align*}Consider the stronger linear model
\begin{align*} y = X\beta + \varepsilon, \quad \varepsilon &\sim \text{MVN}(0, \sigma^2_\varepsilon I) \end{align*}with $E(\varepsilon) = 0$, $\text{Var}(\varepsilon) = \sigma^2_\varepsilon I$.
Consequences
Suppose the stronger linear model holds. For $A\in \mathbb{R}^{d\times (p+1)}$, $c \in \mathbb{R}^d$, $A,c$ constant:
\begin{align*} E(A\hat{\beta} + c) &= A\beta + c\\ \text{Var}(A\hat{\beta} + c) &= \sigma^2_\varepsilon A(X^\top X)^{-1}A^\top\\ A\hat{\beta} + c &\sim \text{MVN}(A\beta + c, \sigma^2_\varepsilon A(X^\top X)^{-1}A^\top) \end{align*}Multivariate normality is a consequence of following: if $Z$ follows a multivariate normal distribution, so too does $AZ + c$.
We know in practice we don't have access to $\sigma^2_\varepsilon$.
Estimated it by $\hat{\sigma}^2_\varepsilon$, the mean squared error.
We can estimate $\text{Var}(\hat{\beta})$ by $\hat{V}(\hat{\beta})$, which simply replaces $\sigma^2_\varepsilon$ with its estimate:
Estimated Variance Matrices
Suppose the weaker linear model holds. For $A\in \mathbb{R}^{d\times (p+1)}$, $c \in \mathbb{R}^d$, $A,c$ constant:
\begin{align*} \hat{V}(\hat{\beta}) &= \hat{\sigma}^2_\varepsilon(X^\top X)^{-1}\\ \hat{V}(\hat{\beta}_j) &= \hat{\sigma}^2_\varepsilon(X^\top X)^{-1}_{(j+1), (j+1)}\\ \hat{V}(A\hat{\beta}+c)&= \hat{\sigma}^2_\varepsilon A(X^\top X)^{-1}A^\top \end{align*}Note that $\text{se}(\hat{\beta}_j) = \sqrt{\hat{V}(\hat{\beta}_j)}$.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -555.5274 212.5735 -2.613 0.01817 *
RegionSOUTH 837.0791 292.4465 2.862 0.01079 *
RegionWEST 527.3892 393.8955 1.339 0.19823
Bonus 1.8994 0.6424 2.957 0.00883 **
Advert 2.6585 0.2132 12.469 5.59e-10 ***
RegionSOUTH:Bonus -0.8742 0.9571 -0.913 0.37382
RegionWEST:Bonus 0.7477 1.0145 0.737 0.47117
RegionSOUTH:Advert -1.5546 0.4800 -3.239 0.00483 **
RegionWEST:Advert -1.6964 0.5644 -3.006 0.00796 **
The output provides standard errors for the slopes on bonus and advert in the midwest (the reference category). What about the south and west?
Let $\hat{\beta}_\text{Advert, South}$ be the slope for the southern region. Note
\begin{align*} \hat{\beta}_\text{Advert, South} &= \hat{\beta}_\text{Advert} + \hat{\beta}_\text{RegionSOUTH:Advert} \end{align*}Now, using the general formula for the variance of the sum of two random variables:
\begin{align*} \text{var}(\hat{\beta}_\text{Advert, South}) &= \text{var}(\hat{\beta}_\text{Advert}) + \text{var}(\hat{\beta}_\text{RegionSOUTH:Advert})\\ &+ 2\text{cov}(\hat{\beta}_\text{Advert}, \hat{\beta}_\text{RegionSOUTH:Advert}) \end{align*}The standard summary information can provide estimates of $\text{var}(\hat{\beta}_\text{Advert}) + \text{var}(\hat{\beta}_\text{RegionSOUTH:Advert})$, but provides no information about $\text{cov}(\hat{\beta}_\text{Advert}, \hat{\beta}_\text{RegionSOUTH:Advert})$
Instead, this would be an off-diagonal entry of $\text{Var}(\hat{\beta})$
Consider the slope on Advert for the southern region
Define $a = (a_1,\ldots,a_9)^\top$ as:
\begin{align*} a_i &= \begin{cases} 1 & i=5,8\\ 0 & \text{otherwise} \end{cases} \end{align*}Then,
\begin{align*} \hat{\beta}_\text{Advert, South} &= a^\top\hat{\beta}, \end{align*}where $\hat{\beta}$ contains all 9 estimated coefficients.
[1,] "(Intercept)"
[2,] "RegionSOUTH"
[3,] "RegionWEST"
[4,] "Bonus"
[5,] "Advert"
[6,] "RegionSOUTH:Bonus"
[7,] "RegionWEST:Bonus"
[8,] "RegionSOUTH:Advert"
[9,] "RegionWEST:Advert"
Let $X \in \mathbb{R}^{n\times 9}$ be the design matrix from
lm(Sales~Region*Bonus + Region*Advert)
Can't calculate this standard error using summary output alone.
Can calculate it if we have $\hat{\sigma}_{\varepsilon}$ and the design matrix.
Note: the (5,8) entry of $\sigma^2_\varepsilon (X^\top X)^{-1}$ equals $\text{cov}(\hat{\beta}_\text{Advert}, \hat{\beta}_\text{RegionSOUTH:Advert})$
Show: our choice of $a$ returns
\[ \begin{aligned} &\text{var}(\hat{\beta}_\text{Advert}) + \text{var}(\hat{\beta}_\text{RegionSOUTH:Advert}) \\ &\quad+ 2\text{cov}(\hat{\beta}_\text{Advert}, \hat{\beta}_\text{RegionSOUTH:Advert}) \end{aligned} \]Suppose $y_1,\ldots,y_n$ are $iid$ and normally distributed with $E(y_i) = \mu_y$ and $\text{var}(y_i) = \sigma^2$
Estimate $\mu_y$ by $\hat{\mu}_y = \bar{y} = \frac{1}{n}\sum_{i=1}^n y_i$
$\text{SD}(\hat{\mu}_y) = \sigma/\sqrt{n}$
$\text{SE}(\hat{\mu}_y) = \hat{\sigma}/\sqrt{n}$, where $\hat{\sigma} = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(y_i-\bar{y})^2}$
Form confidence intervals / perform hypothesis tests for $\mu_y$ using $\hat{\mu}_y$, $\text{SE}(\hat{\mu}_y)$, and the $t_{n-1}$ distribution.
For instance, a $100(1-\alpha)$% Confidence Interval for $\mu_y$ takes the form:
\begin{align*} \hat{\mu}_y \pm t_{1-\alpha/2, n-1}se(\hat{\mu}_y) \end{align*}Now, suppose $y_1,\ldots,y_n$ are generated from the (stronger) linear model:
\begin{align*} y_i &= \beta_0 + \beta_1x_{i1} + \ldots + \beta_px_{ip} + \varepsilon_i,\\ \varepsilon_i &\overset{\text{iid}}{\sim}N(0, \sigma^2_\varepsilon). \end{align*}For any particular value for the predictors $\tilde{x} = (1, \tilde{x}_1,\ldots,\tilde{x}_p)^\top$:
\begin{align*} \mu_{y\mid \tilde{x}} = E(y\mid x = \tilde{x}) &= \beta_0 + \beta_1\tilde{x}_{1} + \ldots + \beta_p\tilde{x}_{p}\\ &= \tilde{x}^\top\beta \end{align*}After running OLS regression and obtaining my estimate $\hat{\beta}$:
\begin{align*} \hat{\mu}_{y\mid \tilde{x}} = \hat{E}(y\mid x = \tilde{x}) &= \hat{\beta}_0 + \hat{\beta}_1\tilde{x}_{1} + \ldots + \hat{\beta}_p\tilde{x}_{p}\\ &= \tilde{x}^\top\hat{\beta} \end{align*}How can I perform inference on $\mu_{y\mid \tilde{x}}$ (hypothesis tests, confidence intervals, etc...)?
Inference on $\mu_y$
Across the entire population of individuals, what's the expected height of male?
Unconditional expectation of male height.
Inference on $\mu_{y\mid \tilde{x}}$
Across the entire population of individuals whose fathers were 76 inches tall, what's the expected height of a male?
Conditional expectation of male height: condition on the value of an explanatory variable (here, height of father).
$\tilde{x} = (1, 76)^\top$