Browsed by
Category: How To

How to do fuzzy matching in R

How to do fuzzy matching in R

Fuzzy Wuzzy was a bear,Fuzzy Wuzzy had no hair,Fuzzy Wuzzy wasn’t very Fuzzy,Was he? — Extremely Relevant Children’s Rhyme Fuzzy matching links two or more non-identical character strings together. Ideally, when linking data sets together, there would be a unique variable that identifies each row (or rows) in each data set. We do not, however, live in an ideal world. Often times when getting data from sources or systems that are not explicitly linked, we won’t have a perfect unique…

Read More Read More

Stata to R: How to Tabulate a Categorical Variable

Stata to R: How to Tabulate a Categorical Variable

When working with a data set, one of the first things I do is look at the count and relative frequency of categorical variables of interest. In Stata, this is relatively straight forward with the tab command. In R, however, it isn’t quite as straight forward, but still possible via the dplyr package.  You might also be interested in my other posts on getting started with R: How to Rename Variables in R – Stylized Data How to Recode Factor…

Read More Read More

How to Recode Factor and Character Variables in R

How to Recode Factor and Character Variables in R

Recoding factor and character variable values is a common task in data analysis. Although common, it isn’t as easy as you might expect in R, especially compared to Stata and SAS. I’ve found that people often get the most frustrated with these basic tasks when learning R, so in this post my goal is to take away that frustration. Often we want to use factor variables that can be dummy coded and easily used in regression models. Other times we…

Read More Read More

How to Rename Variables in R

How to Rename Variables in R

I want to show you how to rename variables in R. This is a basic task but one that I do frequently when working with a new dataset. Renaming variables is useful, especially when creating graphics. For example, if I were plotting these data, I would want the variable name to show as “Coffee Roast” rather than “coffee.” If I were just doing data wrangling, I wouldn’t care as much about the variable name. But when presenting data, I want the…

Read More Read More

How to Use Survey Weights in R

How to Use Survey Weights in R

Survey weights are common in large-scale government-funded data collections. For example, NHIS and NHANES are two large scale surveys that track the health and well-being of Americans that have survey weights. These data collections use complex and multi-stage survey sampling to ensure that results are representative of the U.S. population. Although use of survey weights is sometimes contested in regression analyses, they are needed for simple means and proportions. The general guidance is that if analysts can control for the…

Read More Read More

Using Python to interface with PostgreSQL

Using Python to interface with PostgreSQL

A big part of doing data analyses is simply getting the data. Sometimes this is super easy and takes minutes. Other times it can be a bit more complicated. This is especially true if data are stored in multiple data sets, as often the case in data science projects or large-scale epidemiological studies. Luckily, the Structured Query Language (SQL) makes this task much easier and less error prone. PostgreSQL (Postgres for short) is a powerful, open source object-relational database system…

Read More Read More