Degree-One Polynomials

Problem

The question that one of my teaching assistants posed was “What is the difference between lm(y ~ x) and lm(y ~ (poly,1)) for linear regression in R?” That is, it is quickly apparent that those commands produce different results, but for the same intention. Here I will try to explore the underlying difference.

Disclaimer: I know that the following discussion is incomplete. These are simply notes that I threw together overnight.

Exposition

For a quick study, we can observe that the commands lm(y ~ x) and lm(y ~ (poly,1)) produce different results:

# 50 ordered pairs of random numbers
x <- runif(50)
y <- runif(50, -3, 3)

# first linear model
lm1 <- lm(y ~ x)

# second linear model
lm2 <- lm(y ~ poly(x, 1))

# found coefficients
lm1
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##    -0.03135     -0.37976
lm2
## 
## Call:
## lm(formula = y ~ poly(x, 1))
## 
## Coefficients:
## (Intercept)   poly(x, 1)  
##     -0.2030      -0.7187
# plot
plot(x,y, main = "quick scatterplot and linear regression")
points(mean(x), mean(y), col = "purple", pch = 8)
abline(lm1, col = "red", lwd = 3)
abline(lm2, col = "blue", lwd = 3)
legend(0.5, 2, 
       c("lm(y ~ x)", "lm(y ~ poly(x, 1))"),
       col = c("red", "blue"),
       lwd = c(3,3))

We observe that lm(y ~ x) goes though the sample means \((\bar{x}, \bar{y})\), while lm(y ~ (poly,1)) does not. Oddly enough, if we apply a metric such as the coefficient of determination (\(R^{2}\)), then the model metrics are the same!

summary(lm1)$adj.r.squared
## [1] -0.01688544
summary(lm2)$adj.r.squared
## [1] -0.01688544

Orthogonal Polynomials

Some searches on the web point toward how the poly command uses othogonal polynomials by default. That is, modeling with \[\hat{y} = a + bx + cx^{2} + ...\] is easy to interpret, higher degree terms will not add much to the predictive ability for our regression models (e.g. \(x^7\) and \(x^8\) are “too close” within some interval). Side note: with the \[{1, x, x^{2}, ...}\] basis, we learn that the Vandermonde matrix for this basis has a high condition number and calculations with this route are ill-conditioned.

Coefficients of Orthogonal Polynomials

There has been a lot of discussion about how poly works and its internal and recursive algorithm to produce a set of orthogonal polynomials over an interval of values. Here I hope to produce a simple example where we can follow the numbers.

x <- 1:5       # mean is 3
y <- 15*x + 18 # line of slope 15 and y-intercept 18

Using the lm command quickly recovers the slope and intercept

lm_raw <- lm(y ~ x)
lm_raw
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##          18           15

However, the route with orthogonal polynomials yields different coefficients.

lm_orth <- lm(y ~ poly(x,1))
lm_orth
## 
## Call:
## lm(formula = y ~ poly(x, 1))
## 
## Coefficients:
## (Intercept)   poly(x, 1)  
##       63.00        47.43

We can extract the coefficents from the orthogonal polynomial route, along with some normalization factors.

a <- attributes(poly(x,1))$coefs$alpha
a
## [1] 3
n <- attributes(poly(x,1))$coefs$norm2
n
## [1]  1  5 10

Building Orthogonal Polynomials

There are several ways to build orthogonal polynomials, and here I will try out the Gram-Schmidt process. For the base case, \[p_{0} := 1\] The degree-one term is \[p_{1}(x) = x - \frac{<x, p_{0}>}{<p_{0}, p_{0}>} = x - \frac{\int_{1}^{5} \! x \, dx}{\int_{1}^{5} \, dx} = x - 3\]

Note that \(p_{1}(x) = x - 3\) is centered at the mean \(\bar{x} = 3\).

Combining the coefficients found from lm(y ~ poly(x,1)) and a normalization factor from above, we get

\[\hat{y} = 63 + 47.43 \cdot \frac{x - 3}{\sqrt{10}}\]

which is indeed \(\hat{y} = 18 + 15x\) when simplified (up to rounding error, too much hand-waving, and a missing number).

Related