## Problem

The question that one of my teaching assistants posed was “What is the difference between `lm(y ~ x)`

and `lm(y ~ (poly,1))`

for linear regression in R?” That is, it is quickly apparent that those commands produce different results, but for the same intention. Here I will try to explore the underlying difference.

Disclaimer: I know that the following discussion is incomplete. These are simply notes that I threw together overnight.

## Exposition

For a quick study, we can observe that the commands `lm(y ~ x)`

and `lm(y ~ (poly,1))`

produce different results:

```
# 50 ordered pairs of random numbers
x <- runif(50)
y <- runif(50, -3, 3)
# first linear model
lm1 <- lm(y ~ x)
# second linear model
lm2 <- lm(y ~ poly(x, 1))
# found coefficients
lm1
```

```
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -0.03135 -0.37976
```

`lm2`

```
##
## Call:
## lm(formula = y ~ poly(x, 1))
##
## Coefficients:
## (Intercept) poly(x, 1)
## -0.2030 -0.7187
```

```
# plot
plot(x,y, main = "quick scatterplot and linear regression")
points(mean(x), mean(y), col = "purple", pch = 8)
abline(lm1, col = "red", lwd = 3)
abline(lm2, col = "blue", lwd = 3)
legend(0.5, 2,
c("lm(y ~ x)", "lm(y ~ poly(x, 1))"),
col = c("red", "blue"),
lwd = c(3,3))
```

We observe that `lm(y ~ x)`

goes though the sample means \((\bar{x}, \bar{y})\), while `lm(y ~ (poly,1))`

does not. Oddly enough, if we apply a metric such as the coefficient of determination (\(R^{2}\)), then the model metrics are the same!

`summary(lm1)$adj.r.squared`

`## [1] -0.01688544`

`summary(lm2)$adj.r.squared`

`## [1] -0.01688544`

## Orthogonal Polynomials

Some searches on the web point toward how the `poly`

command uses othogonal polynomials by default. That is, modeling with \[\hat{y} = a + bx + cx^{2} + ...\] is easy to interpret, higher degree terms will not add much to the predictive ability for our regression models (e.g. \(x^7\) and \(x^8\) are “too close” within some interval). Side note: with the \[{1, x, x^{2}, ...}\] basis, we learn that the Vandermonde matrix for this basis has a *high condition number* and calculations with this route are *ill-conditioned*.

## Coefficients of Orthogonal Polynomials

There has been a lot of discussion about how `poly`

works and its internal and recursive algorithm to produce a set of orthogonal polynomials over an interval of values. Here I hope to produce a simple example where we can follow the numbers.

```
x <- 1:5 # mean is 3
y <- 15*x + 18 # line of slope 15 and y-intercept 18
```

Using the `lm`

command quickly recovers the slope and intercept

```
lm_raw <- lm(y ~ x)
lm_raw
```

```
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 18 15
```

However, the route with orthogonal polynomials yields different coefficients.

```
lm_orth <- lm(y ~ poly(x,1))
lm_orth
```

```
##
## Call:
## lm(formula = y ~ poly(x, 1))
##
## Coefficients:
## (Intercept) poly(x, 1)
## 63.00 47.43
```

We can extract the coefficents from the orthogonal polynomial route, along with some normalization factors.

```
a <- attributes(poly(x,1))$coefs$alpha
a
```

`## [1] 3`

```
n <- attributes(poly(x,1))$coefs$norm2
n
```

`## [1] 1 5 10`

## Building Orthogonal Polynomials

There are several ways to build orthogonal polynomials, and here I will try out the Gram-Schmidt process. For the base case, \[p_{0} := 1\] The degree-one term is \[p_{1}(x) = x - \frac{<x, p_{0}>}{<p_{0}, p_{0}>} = x - \frac{\int_{1}^{5} \! x \, dx}{\int_{1}^{5} \, dx} = x - 3\]

Note that \(p_{1}(x) = x - 3\) is centered at the mean \(\bar{x} = 3\).

Combining the coefficients found from `lm(y ~ poly(x,1))`

and a normalization factor from above, we get

\[\hat{y} = 63 + 47.43 \cdot \frac{x - 3}{\sqrt{10}}\]

which is indeed \(\hat{y} = 18 + 15x\) when simplified (up to rounding error, too much hand-waving, and a missing number).