Visualization

As everyday computing power increases, data becomes more available, and more researchers become familiar with powerful programming languages like Python and R, data visualization will continue to become a larger and larger part of social science. Hopefully, not too far in the future, pages like this one will seem quaint (maybe they already do) and visualization will be a standard part of methodological training in many social science graduate programs. We are not quite there yet, though. Here, I have compiled a list of resources for making visualization work easier.

Books and the like

Kieran Healy’s book, Data Visualization, is a wonderful reference and large sections of it are available free online.

The “Data visualization” chapter of R for Data Science is very helpful. So is the chapter on “Graphics for communication.”

Better Data Visualizations, by Jonathan Schwabish, is an excellent guide to producing effective data visualizations for social science and beyond.

Although not on visualization per se, Matt Salganik’s book, Bit by Bit, on social science in the digital age, is also a great resource for those interested in “big data” and the like.

Web resources

The R Graph Gallery is a nice (ad-filled) resource for sorting through different approaches and options for visualizing data in R. Created and maintained by Yan Holtz.

A note on ggplot2 for beginners

Anyone interested in visualizing data with R will quickly learn ggplot2, which is part of the tidyverse family of R packages. ggplot2 is flexible and powerful but, as with R in general, the learning curve can be a bit steep. Once you begin to understand the logic behind it, though – essentially, building up layers of visual information on top of one another – you will deeply appreciate the power of this package and its “layered grammar of graphics.” As cataloged in the R Graph Gallery above, ggplot2 supports everything from visualizing spatial data to using forest (i.e. dot-and-whisker) plots to visualize regression results to generating alluvial plots to show flows of things through time.

The plot below, which reveals important patterns in environmental civil litigation across federal court districts in the United States, illustrates the power of ggplot2. Each plot element – the points, the dashed diagonal line, the n, μ, and M text annotations – are separate “layers” in the overall plot, sourced from their own data with their aesthetic attributes specified independently. Things like the color, shape, and size of the points are easy to map onto variables of interest. In this case, the color and shape of points are mapped onto a variable for geographic region while the size of points is mapped onto a variable that captures the number of cases in a federal court district. The data are also structured so that they can be “faceted” into several plots – in this case, four different plots for each of four plaintiff types. The “background” or meta-level plot attributes are separate layers, too, like the visual “theme” (appearance) of the plot background, the axis labels, and so on.

Once you get the hang of it, ggplot2 is really easy to understand and use – it’s an extremely versatile tool platform for visualizing data in the name of telling stories and revealing patterns that we might like to unearth in the name of social science and policy. Don’t be intimidated – skim over Hadley Wickam’s work, dive into Stack Overflow, and you’ll have a handle on things in no time!

Wins and losses for environmental civil suits in U.S. federal court districts by region and by plaintiff type. It turns out that the federal government uses the courts to enforce environmental law relatively uniformly across the country – and wins most of the cases it brings. Environmental advocacy groups, by contrast, focus their legal attention overwhelmingly in the West. Because these groups normally sue the government, this geographic maldistribution puts less enforcement pressure on the environmental state in other parts of the country.

Color Schemes

viridis – a thoughtfully constructed color scheme implemented in R that works well in greyscale, for folks who are color blind, and that is aesthetically appealing.

colorbrewer – Ideal for generating low-n color schemes.

iwanthue – Ideal for generating high-n color schemes (e.g. 10+ colors). Hosted and developed by the MediaLab at SciencesPo, which is also home to a number of other useful tools like Table 2 Net, for drawing graphs (networks) from tables of data.

htmlcolorcodes – A simple site for getting html hex color codes

8-digit hex codes – Rather than designating an “alpha” (a transparency factor) for a plot, it can sometimes be useful to specify transparency directly in specific color codes, e.g. when you want to highlight a specific line or trend in a plot with many lines and trends. You can do this with 8-digit hex codes. (See also here.)