Using a Forest Plot to Display Regression Results

# Using a Forest Plot to Display Regression Results View high res version here.

How many times have you sat through a presentation and stared blankly at a table of regression results? If you have been to my presentations, it has been many, many times. I was thinking about a better and more intuitive way to present regression results that also gives a sense of uncertainty.

In this post, I show how to visualize OLS regression results via a forest plot. The nice thing about forest plots is that they give a sense of magnitude, uncertainty, and distribution in an easy to understand format.

There are some important things know with the forest graph above.

1. If the bar crosses 0, marked by the vertical dashed line, the variable is not statistically significant.
2. The point in the middle of each bar is the regression point estimate (i.e., the beta-hat value).
3. The length of the bar represents a 95% confidence interval, giving a sense of uncertainty of the point estimate.
4. If the point and bar are less than zero, then the variable is negatively associated with the dependent variable. If they are above zero, then the variable is positively associated with the dependent variable.

In the graph above, I chose to standardize the continuous variables so they are more easily comparable. Each point and bar represents +/- 1 standard deviation change in the variable. For gender this isn’t necessary since it is a binary factor variables (coded as 0 or 1).

To create this forest plot, I used data from the 2013-2014 cycle of NHANES. I limit the analysis to adults aged 18-79 and incorporated the survey weights, so the results generalize to the U.S. population. The variables of interest are:

• Body Mass Index (BMI). This is kg/m2 and is a standard measure of body composition, mostly used to determine weight status (i.e., underweight, normal, overweight, or obese).
• Minutes Sedentary. Survey item that asks “How much time do you usually spend sitting on a typical day?” The lead to the question states “The following question is about sitting at school, at home, getting to and from places, or with friends including time spent sitting at a desk, traveling in a car or bus, reading, playing cards, watching television, or using a computer. Do not include time spent sleeping.”
• Healthy Eating Index (HEI). The 2010 version of the HEI. HEI is a continuous measure of how well a person’s diet conforms to the 2010 Dietary Guidelines for Americans and serves as an overall measure of dietary quality. Scores range from 0-100 with higher scores indicating better dietary quality.
• Gender. Factor variable with men as the reference category.
• Percent of the Federal Poverty Level (FPL). This is a measure of a households’s income relative to the poverty level. Higher scores indicate that a household has a higher income. In NHANES, FPL is top-coded (i.e., values don’t exceed) 500% of the FPL.
• Age. Age in years.

Here is a table of descriptive statistics:

Mean (SE) or Percent
Minutes Sedentary425 (6.1)
HEI54 (0.4)
FPL297 (0.1)
Age46 (0.4)
Gender
Men47.5
Women52.5

What the graph shows makes sense. As dietary quality and income increase, predicted BMI scores decrease. As age and sedentary behaviors increase, BMI increases. Also, women are predicted to have higher BMIs than men. The differences aren’t too big though, ranging from -1 to 1.5 BMI units.

Analysis done using R and RStudio.