Statistical properties of $\hat{\beta}$
$\hat{\beta} = ({X}^\top X)^{-1}{X}^\top y$
Unbiasedness of $\hat{\beta}$
Under the weaker linear model described on Slide 2:
\begin{align*} \textbf{E}(\hat{\beta}) = \beta,\quad \textbf{E}(X\hat{\beta}) = X\beta \end{align*}Which properties of the linear model do we need?
$\textbf{E}(\varepsilon) = 0$ (Yes or no)?
$\text{Var}(\varepsilon) = \sigma_\varepsilon^2 I_n$ (Yes or no)?
$\hat{\beta} = ({X}^\top X)^{-1}{X}^\top y$
Variance of $\hat{\beta}$
Under the weaker linear model described on Slide 2:
\begin{align*} \text{Var}(\hat{\beta}) &= \sigma^2_{\varepsilon}({X}^\top X)^{-1}\\ \end{align*}Which properties of the linear model do we need?
$\textbf{E}(\varepsilon) = 0$ (Yes or no)?
$\text{Var}(\varepsilon) = \sigma_\varepsilon^2 I_n$ (Yes or no)?
When we discuss inference, we'll need to extract the variances for particular slope coefficients. These are embedded within $\text{Var}(\hat{\beta})= \sigma^2_{\varepsilon}({X}^\top X)^{-1}$
$\sigma^2_{\varepsilon}({X}^\top X)^{-1}$ is a $(p+1)\times (p+1)$ matrix.
The variances for $\hat{\beta}_0, \hat{\beta}_1,...,\hat{\beta}_p$ are contained on the diagonal.
For the $j$-th slope coefficient, $\hat{\beta}_j$, look at the $(j+1)$st diagonal element of $\sigma^2_{\varepsilon}({X}^\top X)^{-1}$.
Off-diagonal elements provide the covariance between estimated slope coefficients.
See the lecture code for an animation. The result: here are the fitted regression lines from a series of data sets, each with the same points $x_1,..,x_n$ and each of size 50. The true intercept and slope are $\beta_0 = 1$, $\beta_1 = 5$, and the true standard deviation of $\varepsilon$ is $\sigma_\varepsilon = 2$