As everyday computing power increases, data becomes more available, and more researchers become familiar with powerful programming languages like Python and R, data visualization will continue to become a larger and larger part of social science. I hope that not too far in the future, pages like this one will seem quaint (maybe they already do) and visualization will be a standard part of methodological training in many social science graduate programs. We are not quite there yet, though. Here, I have compiled a list of resources for making visualization work easier.

Books and the like

Kieran Healy’s book is a wonderful reference and large sections of it are available free online.

The “Data visualization” chapter of R for Data Science is very helpful. So is the chapter on “Graphics for communication.”

Although not on visualization per se, Matt Salganik’s book, Bit by Bit, on social science in the digital age, is also a great resource for those interested in “big data” and the like.

Web resources

The R Graph Gallery is a fantastic resource for sorting through different approaches and options for visualizing data in R. Created and maintained by Yan Holtz.

A note on ggplot2 for beginners

Anyone interested in visualizing data with R will quickly come to know ggplot2, which is part of the tidyverse family of packages. ggplot2 is flexible and powerful, but, as with R in general, the learning curve can be a bit steep. Once you begin to understand the logic behind it, though–essentially, building up layers of visual information on top of each other–you will deeply appreciate the power of this package and its “layered grammar of graphics.” As cataloged in the R Graph Gallery above, ggplot2 supports everything from visualizing spatial data to using forest (i.e. dot-and-whisker) plots to visualize regression results to generating alluvial plots to show flows and time trends.

For example, I get a kick out of the alluvial plot below, which I made with ggplot2. It shows the proportion of survey respondents who mentioned different issues as “the most important problem” in the United States from 1939 to 2015, with the issues mentioned more often towards the top. Data are taken from the Roper Center Most Important Problem Dataset using all supplied sources, not just the Gallop data. “Environment” is highlighted in dark gray. Click the image for a larger view—it’s big!

Issue areas identified as "the most important problem" by U.S. survey respondents, 1939-2015
A visualization of survey responses to questions about “the most important problem” facing the United States, 1939 to 2015. Built from the Roper Center Most Important Problem Dataset.

Color Schemes

colorbrewer – Ideal for generating low-n color schemes.

iwanthue – Ideal for generating high-n color schemes (e.g. 10+ colors). Hosted and developed by the MediaLab at SciencesPo, which is also home to a number of other useful tools like Table 2 Net, for drawing graphs (networks) from tables of data.

htmlcolorcodes – A simple site for getting html hex color codes

8-digit hex codes – Rather than designating an “alpha” (a transparency factor) for a plot, it can sometimes be useful to specify transparency directly in specific color codes, e.g. when you want to highlight a specific line or trend in a plot with many lines and trends. You can do this with 8-digit hex codes. (See also here.)