Aanalysis of Variance (ANOVA)
The partial $\mathcal{F}$- test
We now know how to, given a set of explanatory variables $x_1,\dots,x_p$, create a multiple regression model to try to predict the value of the response $y$.
Is there evidence that the model helped? That is, is there evidence that by exploiting information about our predictor variables, we can provide improved predictions in $y$?
Improved relative to what?
If I didn't have information on these explanatory variables, what would my best guess be for any individual? The overall mean
We'd certainly hope we're doing better!
As we now show, the test will be based on the magnitude of $R$-squared: how much variation in $y$ could we explain from our regression?
Compare the variation in $y$ we can attribute to $X$ relative to the overall variation in $y$.
Under the assumptions of the multiple regression model, the null hypothesis for testing "was multiple regression worth it?" takes on the form:
\[\mathbf{H_o}: \beta_1 = \beta_2 = \dots = \beta_p = 0\]The alternative is
\[\mathbf{H_a}: \text{at least one } \beta_j \neq 0, \quad j=1,\dots,p\]Another way to think about this: under the null, the only thing that matters for predicting a response is the intercept, $\beta_0$.
If this were true, $\text{E}(y_i \mid x_i) = \beta_0$. That is, every point has the same mean regardless of the values of the explanatory variables
As we showed: in a regression that only contains an intercept, $\hat{\beta}_0$, the estimated intercept coefficient, equals $\bar{y}$
We'll form a test statistic whose value is large when $x_1,\dots,x_p$ are strong predictors of $y$ (large values $\Rightarrow$ strong evidence against the null). This corresponds to large values for $R$-squared!
The $F$-Statistic
Our test statistic takes on the form:
\[F_{\text{stat}} = \frac{R^2}{1 - R^2}\left(\frac{n - p - 1}{p}\right)\]This is what R reports as the F-statistic in its output from multiple regression
If, in reality, the null is true and none of the $p$ covariates had predictive power, would we observe $R$-squared exactly equal zero?
$R^2$ will be greater than zero due to chance correlations, even if the explanatory variables don't actually matter
Also: the more predictors you add to your regression model, the larger the $R^2$ value becomes, even if your predictor variables are unrelated to the response (see the Lecture 3 code for a demonstration).
We need to account for this when assessing statistical significance
We need to find: what does an "extreme" value for the $F_{\text{stat}}$ look like.
Under the null hypothesis, what is the distribution of $F_{\text{stat}}$? Let's first simulate...
Under the null hypothesis, the distribution of $F_{\text{stat}}$ is a member of a family of probability distributions known as $\mathcal{F}$-distributions.
These distributions are not symmetric
Just like with $t$-distributions, the $\mathcal{F}$ is parameterized by "degrees of freedom."
Unlike the $t$, the $\mathcal{F}$ has two separate degrees of freedom: $\mathcal{F}_{df_1, df_2}$
Under the null hypothesis, $F_{\text{stat}}$ follows a $\mathcal{F}_{p,\, n-p-1}$ distribution. Once again:
$p$ is the number of predictor variables in our model
$n$ is the sample size
We calculate a $p$-value as $P(\mathcal{F}_{p,\, n-p-1} \geq F_{\text{stat}})$
Here, extremeness is always in the right tail. Do you see why?
In R: 1-pf(Fstat, df1=p, df2=n-p-1)
summary(lm.medicorp)
...
Multiple R-squared: 0.8872, Adjusted R-squared: 0.8774
F-statistic: 90.47 on 2 and 23 DF, p-value: 1.261e-11
$\mathtt{F\text{-}statistic} = 90.47$
$F_{\text{stat}} = 0.8872/(1-0.8872) \times (23/2) = 90.47$
on 2 and 23 DF
$F_{\text{stat}}$ follows a $\mathcal{F}_{2, 23}$ distribution
p-value: 1.261e-11
$P(\mathcal{F}_{2, 23} \geq 90.47) = \mathtt{1\text{-}pf(90.47, 2, 23)} \approx 0$
summary(lm.medicorp)
...
Multiple R-squared: 0.8872, Adjusted R-squared: 0.8774
F-statistic: 90.47 on 2 and 23 DF, p-value: 1.261e-11
Hence, we reject the null hypothesis that $\beta_{\text{advert}} = \beta_{\text{bonus}} = 0$ and suggest that our regression model provides a statistically significant improvement in predictions for sales.
We'll soon introduce other hypothesis tests which also make use of the $\mathcal{F}$-distribution to compute $p$-values
Will be called Partial $\mathcal{F}$-tests
More natural to view these tests as comparisons between the sum of squared errors in models with different predictor variables
Let's try to understand the overall $\mathcal{F}$ test when viewed through this lens...
The overall $\mathcal{F}$-test can be viewed as a comparison between:
The sum of squared errors in a model including all of the predictor variables;
The sum of squared errors in a model only including those variables whose coefficients don't equal zero under the null
Under the null hypothesis tested by the overall $\mathcal{F}$ test, $\beta_1 = \beta_2 = \dots = \beta_p = 0$
Compare a regression (a) including all predictor variables; to (b) a regression only including an intercept term.
Due to chance correlations, $SSE < SST$ even if the null hypothesis is true!
Objective: Quantify how large of a discrepancy between $SST$ and $SSE$ we should expect under the null.
If what we observe is much larger than expected, we reject the null.
The $F$-Statistic
Our test statistic may be expressed as:
\[F_{\text{stat}} = \frac{(SST - SSE)/p}{SSE/(n - p - 1)}\]Numerator: difference between the sum of squared total and the sum of squared errors, divided by the differences in the degrees of freedom for the residual terms in those two regressions
Denominator: sum of squared errors in the full model, divided by degrees of freedom for residual vector in the full model
Definition
Categorical (also called nominal or qualitative) variables are ones in which the "values" are labels for group memberships. These values are often (but not always) coded as words.
The variable "colors" can take on values 'red', 'blue,'...
The variable "Nationality" can take on values 'American', 'Canadian,'...
Definition
Categorical (also called nominal or qualitative) variables are ones in which the "values" are labels for group memberships. These values are often (but not always) coded as words.
Categorical variables that can only take on two distinct values are called binary variables
The variable "College Graduate?" can take on values 'yes' or 'no'
So far we've talked about conducting multiple regression where our predictor and response variables are both continuous.
What does R do when we feed it a categorical predictor variable?
lm(Sales~Region)$coefficients
(Intercept) RegionSOUTH RegionWEST
1498.6355 -461.0997 -336.1042
If I have a categorical variable that takes on $G$ values, R creates $G-1$ binary variables
(also called dummy variables) and includes them as regressors
RegionSOUTH $= \mathbb{1}\{\text{Region = South}\}$
RegionSOUTH: 1 if territory was southern, 0 otherwise.
Referred to as "one-hot encoding" of a categorical variable.
Equation is then of the form
\[\widehat{\text{Score}} = \hat{\beta}_0 + \hat{\beta}_1 \mathbb{1}\{\text{Region = South}\} + \hat{\beta}_2 \mathbb{1}\{\text{Region = West}\}\]My region-specific prediction from the regression line would then be
Midwest: $\widehat{\text{Sales}} = \hat{\beta}_0 = 1498.6$
South: $\widehat{\text{Sales}} = \hat{\beta}_0 + \hat{\beta}_1 = 1498.6 - 461.1 = 1037.5$
West: $\widehat{\text{Sales}} = \hat{\beta}_0 + \hat{\beta}_2 = 1498.6 - 336.1 = 1162.5$
Compare to...
tapply(Sales, Region, mean)
MIDWEST SOUTH WEST
1498.635 1037.536 1162.531
The group that R doesn't include an indicator for (in this case, Midwest) is known as the
reference category.
The reason: the interpretation of the slope coefficients is made relative to (in reference to) the excluded category
Slopes of Categorical Predictors
Suppose my categorical predictor variable takes on $G$ values, $c_1,\dots,c_G$, and my equation is of the form
\[\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 \mathbb{1}\{x_i = c_1\} + \dots + \hat{\beta}_{G-1} \mathbb{1}\{x_i = c_{G-1}\},\]such that $c_G$ is the reference category. My expected difference in values of $y$ for two individuals, one in category $c_j$ and one in $c_G$, is $\hat{\beta}_j$.
Why can't we include $G$ dummy variables, one for each category? Recall the normal equations:
\[X^\top X \widehat{\beta} = X^\top y\]To express our vector of slope coefficients as $\widehat{\beta} = (X^\top X)^{-1} X^\top y$, we needed to make sure $(X^\top X)^{-1}$ was invertible
Need $p + 1 < n$
Need linear independence between the columns of $X$ (avoidance of collinearity)
If I include $G$ dummy variables, $d_1,\dots,d_G$, what would $d_1 + d_2 + \dots + d_G$ equal?
Implication: since first column of the design matrix $X$ is the intercept column, the resulting design matrix would not have linearly independent columns.
> summary(lm(Sales~Region))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1498.64 45.22 33.141 < 2e-16 ***
RegionSOUTH -461.10 72.51 -6.359 1.73e-06 ***
RegionWEST -336.10 69.69 -4.823 7.24e-05 ***
Residual standard error: 150 on 23 degrees of freedom
Multiple R-squared: 0.669, Adjusted R-squared: 0.6402
F-statistic: 23.24 on 2 and 23 DF, p-value: 3.005e-06
What does the (stronger) regression model corresponding to the output above look like, specialized to the case of these predictors?
What would it look like under the null hypothesis tested by the overall $\mathcal{F}$-test?
The population expectation under the multiple regression model would be
\[\text{E}(Y_i \mid x_i) = \beta_0 + \beta_1 \mathbb{1}\{x_i = c_1\} + \dots + \beta_{G-1} \mathbb{1}\{x_i = c_{G-1}\}\]Under the null hypothesis tested by the overall $\mathcal{F}$,
\[\mathbf{H_o}: \beta_1 = \dots = \beta_{G-1} = 0,\]the expectation function would instead be
\[\text{E}(Y_i \mid x_i) = \beta_0,\]i.e. the expectation is the same for all categories.
So, the overall $\mathcal{F}$-test from this regression can be used to assess the null $\mathbf{H_0:}$ $\mu_g = \beta_0$ for all $g$, where $\mu_g$ is the population mean in the $g$th category.
In our example: $p$-value was $3 \times 10^{-6}$, so reject the null.
This regression-based approach for testing the equality of means through use of the overall $\mathcal{F}$ test has its own name in the literature: Analysis of Variance, or ANOVA for short.
Note that the test is assuming the truth of the (stronger) regression model. Most importantly here...
Homoskedasticity
Normality
In this context, homoskedasticity means $\sigma_1 = \dots = \sigma_G$. Do you see why?
When you conducted two-sample $t$-tests in your introductory statistics class, did you assume the standard deviations were equal between groups?
Estimate Std. Error t value Pr(>|t|)
(Intercept) -682.3699 179.3558 -3.805 0.000913 ***
Bonus 2.0970 0.6904 3.037 0.005855 **
Advert 2.6949 0.2563 10.513 2.95e-10 ***
F-statistic: 90.47 on 2 and 23 DF, p-value: 1.261e-11
What lurking variables aren't accounted for in this regression?
In the same way, our test for differences in means across regions did not account for differences on the basis of bonus and advertisement
Maybe there were more sales in the midwest because there was more money spent on advertisements there!
Can we try to control for all variables through a multiple regression?
lm(Sales~Region+Bonus+Advert)
(Intercept) RegionSOUTH RegionWEST Bonus Advert
-151.586 -155.291 -150.234 1.534 2.132
Compare the slope on South before and after adding Bonus and Advert
Only Categoricals
(Intercept) RegionSOUTH RegionWEST
1498.6 -461.1 -336.1
Comparing two territories, one in the midwest and one in the south, we'd expect the one in the south to have $461.1 thousand fewer sales.
Include Bonus and Advert
(Intercept) RegionSOUTH RegionWEST
-151.586 -155.291 -150.234
Comparing two territories, one in the midwest and one in the south with the same bonus and advertisement spending we'd expect the one in the south to have $155.29 thousand fewer sales.
Once again, here's our regression output
Coefficients:
(Intercept) RegionSOUTH RegionWEST Bonus Advert
-151.586 -155.291 -150.234 1.534 2.132
In the south, for two territories with the same advertising spend but who differ in total bonus paid by 1 hundred dollars, what's my expected difference in sales (in thousands of dollars)?
In the midwest, for two territories with the same advertising spend but who differ in total bonus paid by 1 hundred dollars, what's my expected difference in sales (in thousands of dollars)?
Once again, here's our regression output
Coefficients:
(Intercept) RegionSOUTH RegionWEST Bonus Advert
-151.586 -155.291 -150.234 1.534 2.132
In general, how do the regression lines vary across groups? Different intercepts, but same slopes
Is it obvious that this should be the case? Might the impact depend on region? Might the two interact?
It may be that the slope of a predictor variable changes as a function of another categorical variable.
The continuous variable may impact the expectation in different groups differently
We can capture these differences through the incorporation of interaction terms.
Interaction Term
To form an interaction term, we simply take the product of the two covariates whose interaction we are
investigating: $X_i\times X_j$
The interaction term between a continuous and categorical covariate takes on the following form:
\begin{align*} X_i \mathbf{1}\{X_j = c\} &= \begin{cases} 0 & X_j \neq c\\ X_i & X_j = c\end{cases} \end{align*}In R, including interactions is straightforward:
summary(lm(Sales~Region*Bonus + Region*Advert))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -555.5274 212.5735 -2.613 0.01817 *
RegionSOUTH 837.0791 292.4465 2.862 0.01079 *
RegionWEST 527.3892 393.8955 1.339 0.19823
Bonus 1.8994 0.6424 2.957 0.00883 **
Advert 2.6585 0.2132 12.469 5.59e-10 ***
RegionSOUTH:Bonus -0.8742 0.9571 -0.913 0.37382
RegionWEST:Bonus 0.7477 1.0145 0.737 0.47117
RegionSOUTH:Advert -1.5546 0.4800 -3.239 0.00483 **
RegionWEST:Advert -1.6964 0.5644 -3.006 0.00796 **
The last four coefficients are the interactions.
At this extreme, each group has its own slopes and intercepts
Midwest: $-555.5 + 1.90(\texttt{Bonus}) + 2.66(\texttt{Advert})$
South: $(-555.5 + 837.1) + (1.90-0.87)\texttt{Bonus} + (2.66-1.55)\texttt{Advert}$
West: $(-555.5 + 527.4) + (1.90+0.75)\texttt{Bonus} + (2.66-1.70)\texttt{Advert}$
What do you think would happen if I fit three separate regressions, one each for Midwest, South, and West? How would the output compare to what I'm seeing above?
Through this process we've built up from a fairly simple model for sales to one that separately models sales according to region.
Did we need to do this to the full extent we've provided here?
For example: perhaps the effectiveness of advertising varies regionally, but not the impact of bonus...
How do we assess this?
Unfortunately, this won't generally correspond to a simple $t$-test...
The required test is called a partial $\mathcal{F}$-test.
Partial in the sense that it compares large (full) models to some partial subset of the large model
General use: suppose we fit a model $Y_i = \beta_0 + \beta_1x_{i1} + \dots+\beta_px_{ip} + \varepsilon_i$, and are interested in the null hypothesis that some subset of slopes $\mathcal{I}\subseteq \{1,\dots,p\}$ are zero.
For example, suppose we want to test if the slopes on the third and fifth predictors are both zero, versus the alternative that at least one is nonzero.
The test compares the predictive performance in the larger model to the predictive performance in the smaller model, which uses only a subset of the predictor variables.
Natural extension of the $t$-tests on individual coefficients to dealing with sets of coefficients.
In fact: if the subset is a single coefficient, the two will be identical
Suppose I am interested in the null hypothesis
\begin{align*} \mathbf{H_o}:& \;\beta_j = 0 \text{ for some subset of covariates } \mathcal{I}\\ \mathbf{H_a}:& \;\text{at least one slope in the subset $\mathcal{I}$ is nonzero} \end{align*}A natural way to test this is to compare the $SSE$ from a regression including all variables to the $SSE$ from a regression excluding the variables whose slopes are zero under the null
Full Model: The model including the full set of covariates
Reduced Model: The model only including the covariates whose slopes aren't set to zero under the null
Ex: if we wanted to see if we needed interactions between region and bonus, we'd compare a model including them (the full model) to a model excluding them (the reduced model)
Full Model (includes all the predictor variables)
$SSE_{Full}$: the $SSE$ in the full model, including all candidate predictor variables
$df_{Full}$: the degrees of freedom for the residual vector under the full model
Reduced Model (includes subset where slopes aren't equal to zero under $H_0$)
$SSE_{Red}$: the $SSE$ in the reduced model, including only the predictor variables whose slope coefficients aren't equal to zero under the null
$df_{Red}$: the degrees of freedom for the residual vector under the reduced model.
The $F$-Statistic
Our test statistic may be expressed as:
Numerator: difference between the SSE in reduced and full models, divided by the differences in the degrees of freedom
The $F$-Statistic
Our test statistic may be expressed as:
Denominator: SSE in the full model divided by degrees of freedom for residual vector in the full model
Under the Null: $F_{stat}\sim \mathcal{F}_{(df_{red}-df_{full}),\; df_{full}}$
Under the assumptions of the multiple regression model, the null hypothesis for testing "was multiple regression worth it at all?" with our predictor variables $x_1,\dots,x_p$ takes on the form:
\begin{align*} \mathbf{H_o}: & \;\beta_1 =\beta_2=\dots=\beta_p = 0 \end{align*}The alternative is
\begin{align*} \mathbf{H_a}: & \;\text{at least one } \beta_j\neq 0\;,\; j=1,\dots,p \end{align*}Another way to think about this: under the null, the only thing that matters for predicting a response is the intercept, $\beta_0$.
If this were true, $\textbf{E}(Y_i|x_1,\dots,x_p) = \beta_0$. That is, every point has the same mean regardless of the values of the explanatory variables
One can show: in a regression that only contains an intercept, $\hat{\beta}_0$, the estimated intercept coefficient, equals $\bar{y}$
Full Model
Contains all $p_{full}$ of the predictor variables
$df_{full} = n-p_{full}-1$
Reduced Model
Only contains the intercept. $p=0$ for this regression
$df_{red} = n-1$
$SSE_{red} = SST$
> summary(lm(Sales~Region*Bonus + Region*Advert))
...
Residual standard error: 48.57 on 17 degrees of freedom
Multiple R-squared: 0.9743, Adjusted R-squared: 0.9623
F-statistic: 80.68 on 8 and 17 DF, p-value: 5.371e-12
$p$-value for the Overall $\mathcal{F}$-test is given in the very last line: $5.371\times 10^{-12}$
Hence, at $\alpha=0.05$ we reject the null hypothesis and suggest that our regression model provides a statistically significant improvement in prediction in sales relative to predicting the overall mean for all individuals.
Acts as a "proof of concept." There's at least something going on that makes doing regression useful in the first place.
Take as our full model the model including interactions between Bonus and Region and Advertisement and Region. Let's test the null hypothesis that there isn't an interaction between Bonus and Region.
Full Model: Interactions between Bonus and Region, and between Advertisement and Region
Reduced Model: Interaction between Advertisement and Region, but no interaction between Bonus and Region
To conduct the test, we fit the full and reduced linear models, and feed the results into the
anova command.
lm.full = lm(Sales~Bonus*Region + Advert*Region)
lm.reduced = lm(Sales~Bonus + Advert*Region)
anova(lm.reduced, lm.full)
Model 1: Sales ~ Bonus + Advert * Region
Model 2: Sales ~ Bonus * Region + Advert * Region
Res.Df RSS Df Sum of Sq F Pr(>F)
1 19 45723
2 17 40109 2 5614.1 1.1898 0.3284
Conclusion: Fail to reject the null. It suggests that we don't need to include interactions between region and bonus in the model.
Interaction effects are not solely limited to interactions between continuous variables and categoricals.
Let's go back to the mall and consider a regression of sales on income and
competitors
(Intercept) income competitors
91.422 7.495 -25.135
For two stores with median income equal to 47K, what's the expected difference in sales if
one store has 1 more competitor? -25.135
For two stores with median income equal to 74K, what's the expected difference in sales if
one store has 1 more competitor? -25.135
Interaction effects are not solely limited to interactions between continuous variables and categoricals.
Let's go back to the mall and consider a regression of sales on income and
competitors
(Intercept) income competitors
91.422 7.495 -25.135
That is, the particular value of median income doesn't affect the expected difference in sales.
Is this obviously true? Might competitors and income not
interact?
The interaction term between two continuous variables takes on the form
\begin{align*} X_i:X_j &= X_i\times X_j \end{align*}lm(formula = sales ~ income * competitors)
(Intercept) income competitors income:competitors
205.0266 5.6312 -61.6539 0.5841
The variable income:competitors is found by, for each row, taking the product of the income of
that store and the number of competitors within that store.
If income was 47K and there were two competitor stores, the interaction variable would take on the value $47\times 2 = 94$
lm(formula = sales ~ income * competitors)
(Intercept) income competitors income:competitors
205.0266 5.6312 -61.6539 0.5841
What's the predicted sales/sqft for a store with three competitors and 47K in income?
$205 + 5.63\times 47 + (-61.65)\times 3 + 0.584\times 47\times 3$
For two stores with median income equal to 47K, what's the expected difference in sales per
square foot if one store has 1 more competitor?
$-61.65 + 0.585\times 47 = -34$
lm(formula = sales ~ income * competitors)
(Intercept) income competitors income:competitors
205.0266 5.6312 -61.6539 0.5841
For two stores with median income equal to 74K, what's the expected difference in sales if one
store has 1 more competitor?
$-61.65 + 0.585\times 74 = -18$
Estimate Std. Error t value Pr(>|t|)
(Intercept) 205.0266 128.7509 1.592 0.1165
income 5.6312 2.1195 2.657 0.0101 *
competitors -61.6539 38.2112 -1.614 0.1119
income:competitors 0.5841 0.6026 0.969 0.3363
How can we use this output to see if including the interaction improved predictive performance relative to a model without an interaction? What's the conclusion at $\alpha=0.05$?
Suppose that $T \sim t_{df}$. Then,
\begin{align*} T^2 & \sim \mathcal{F}_{1, df} \end{align*}In words, the square of a $t_{df}$-distributed random variable follows an $\mathcal{F}$ distribution, with one numerator degree of freedom and $df$ denominator degrees of freedom.
With this in mind, let's consider an interesting correspondence between
$t$-tests for individual slope parameters
Partial $\mathcal{F}$-tests where only one slope parameter equals zero under the null
Consider two approaches to testing whether or not we need the interaction between income and competitors:
Partial $\mathcal{F}$-test
Model 1: sales ~ income + competitors
Model 2: sales ~ income * competitors
Res.Df RSS Df Sum of Sq F Pr(>F)
1 61 277232
2 60 272958 1 4274.1 0.9395 0.3363
Consider two approaches to testing whether or not we need the interaction between income and competitors:
$t$-test
Estimate Std. Error t value Pr(>|t|)
(Intercept) 205.0266 128.7509 1.592 0.1165
income 5.6312 2.1195 2.657 0.0101 *
competitors -61.6539 38.2112 -1.614 0.1119
income:competitors 0.5841 0.6026 0.969 0.3363
$0.97^2 = 0.94$. $p$-values are identical.
Consider a simple regression of Sales on Income
lm(formula = sales ~ income)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 120.4583 58.6327 2.054 0.0442 *
income 6.0807 0.9066 6.707 6.89e-09 ***
---
F-statistic: 44.99 on 1 and 62 DF, p-value: 6.888e-09
Compare the $p$-values on the overall F test and for the slope on income. Why are they equal?
A random variable $T_\nu$ follows a $t_\nu$ distribution if it can be expressed as follows:
\begin{align*} T_\nu &= \frac{Z}{\sqrt{\sum_{i=1}^\nu Z_i^2/\nu}}, \end{align*}where $Z,Z_1,\dots,Z_\nu\overset{iid}{\sim}N(0,1)$.
All of our $t$-statistics can be expressed in this form under the stronger linear model!
A random variable $F_{\nu_1, \nu_2}$ follows a $\mathcal{F}_{\nu_1,\nu_2}$ distribution if it can be expressed as follows:
\begin{align*} F_{\nu_1,\nu_2} &= \frac{\sum_{i=1}^{\nu_1} W_i^2/\nu_1}{\sum_{j=1}^{\nu_2} Z_j^2/{\nu_2}}, \end{align*}where $W_1,\dots,W_{\nu_1}, Z_1,\dots,Z_{\nu_2}\overset{iid}{\sim}N(0,1)$.
All of our $F$-statistics can be expressed in this form under the stronger linear model!