Visualizing a Continuous by Continuous Interaction in Linear Regression

Visualizing a Continuous by Continuous Interaction in Linear Regression

Visualizing a Continuous by Continuous Interaction in Linear Regression
Visualizing a Continuous by Continuous Interaction in Linear Regression

View high res version here.

“Did you see if there was a difference by [insert the person’s favorite population or topic of interest].” With a resigned response I reply “No I haven’t, but that is a good idea.” I’ve had this exchange in every research talk I’ve given.

Often times we are interested in understanding how one group varies by another when predicting an outcome. This can be examined via an interaction term in a regression model. An interaction term means we are multiplying two independent variables to see how their product predicts an outcome variable. In doing so, we can examine if the two variables significantly vary over a range of values. This is in contrast to stratifying by a variable and running two separate regression models. Although stratification allows us to examine differences in regression coefficients, we cannot test if these differences are significant because they are estimated in two completely separate models.

Although relatively easy to program in statistical packages, interpreting the coefficients can be tricky. This is especially true when the interacting variables are both continuous. I’ve found the most effective way to understand interactions is through visualizations.

In this post, I visualize and interpret an interaction between two continuous in a linear regression.

The Data

The Data come from the 2013-204 cycle of the National Health and Nutrition Examination Survey (NHANES). I am specifically interested how income and healthy eating are associated with body mass index (BMI). In other words, is the association between BMI and healthy eating consistent across income levels (i.e., no interaction) or does it vary by income (i.e., presence of an interaction).

The Model

To test this question, we will use a General Linear Model (GLM) with a Gaussian link function. The choice of linear model doesn’t matter as much, the interpretation is mostly the same. I am choosing this model because it can easily incorporate the NHANES probability weights. I will also adjust for confounding variables such as minutes sedentary per day, age, and gender.

Results

I found a significant interaction between income and HEI score, which suggests that predicted BMI differs by HEI score across income levels. That is, HEI and income should not be considered independent when predicting BMI. If we were to include HEI and income in the model separately (with no interaction) then we would not be modeling the relationship appropriately.

The Visualization

The coefficient from the GLM isn’t very intuitive. This is especially true for a continuous by continuous interaction since there are so many possible value combinations. With a categorical by continuous interaction, things are simpler. For example, if we had an interaction between HEI and gender, then we can show a regression line between BMI and HEI within men and women.

With two continuous variables, such a plot isn’t possible because there is no natural grouping variable. Plotting HEI by each level of income isn’t practical because there are hundreds of income levels. So how do we solve this issue?

The most common method is to split one of the continuous variables into meaningful categories. “Meaningful categories” is vague and could be interpreted in a lot of different ways, so the standard approach is to use use three categories: -1 SD; the mean value; and +1 SD. This is a nice approach because it is relatively easy to interpret and makes comparisons across studies consistent.

Interpretation of the Visualization

This plot shows predicted values of BMI by HEI score for those with incomes 1 SD below the mean (lower income); at the mean (moderate income); and 1 SD above the mean (higher income). The overall interaction is significant (P < 0.05). That isn’t too surprising given the sample size and small units of measurement in both variables. The next questions is, are the slopes at −1 SD, at the mean, and +1 significantly different than 0? The answers are: no, yes, and yes.

When looking across the three facets of the graph, the first thing to notice is there isn’t a big difference in BMI by income and HEI. That is, the effect size is pretty small. We also see that the slopes are similar too, although, those with higher incomes have the steepest slope. Interestingly, for those with lower incomes, eating healthier isn’t associated with lower BMI; that is only true for those with moderate- to higher incomes.

Leave a Reply

Your email address will not be published. Required fields are marked *