As everyday computing power increases, data becomes more available, and more researchers become familiar with powerful programming languages like Python and R, data visualization will continue to become a larger and larger part of social science. Hopefully, not too far in the future, pages like this one will seem quaint (maybe they already do) and visualization will be a standard part of methodological training in many social science graduate programs. We are not quite there yet, though. Here, I have compiled a list of resources for making visualization work easier.
Books and the like
Kieran Healy’s book, Data Visualization, is a wonderful reference and large sections of it are available free online.
The “Data visualization” chapter of R for Data Science is very helpful. So is the chapter on “Graphics for communication.”
Better Data Visualizations, by Jonathan Schwabish, is an excellent guide to producing effective data visualizations for social science and beyond.
Although not on visualization per se, Matt Salganik’s book, Bit by Bit, on social science in the digital age, is also a great resource for those interested in “big data” and the like.
Web resources
The R Graph Gallery is a nice (ad-filled) resource for sorting through different approaches and options for visualizing data in R. Created and maintained by Yan Holtz.
A note on ggplot2 for beginners
Anyone interested in visualizing data with R will quickly learn ggplot2, which is part of the
The plot below, which reveals important patterns in environmental civil litigation across federal court districts in the United States, illustrates the power of ggplot2. Each plot element – the points, the dashed diagonal line, the n, μ, and M text annotations – are separate “layers” in the overall plot, sourced from their own data with their aesthetic attributes specified independently. Things like the color, shape, and size of the points are easy to map onto variables of interest. In this case, the color and shape of points are mapped onto a variable for geographic region while the size of points is mapped onto a variable that captures the number of cases in a federal court district. The data are also structured so that they can be “faceted” into several plots – in this case, four different plots for each of four plaintiff types. The “background” or meta-level plot attributes are separate layers, too, like the visual “theme” (appearance) of the plot background, the axis labels, and so on.
Once you get the hang of it, ggplot2 is really easy to understand and use – it’s an extremely versatile tool platform for visualizing data in the name of telling stories and revealing patterns that we might like to unearth in the name of social science and policy. Don’t be intimidated – skim over Hadley Wickam’s work, dive into Stack Overflow, and you’ll have a handle on things in no time!
Color Schemes
viridis – a thoughtfully constructed color scheme implemented in R that works well in greyscale, for folks who are color blind, and that is aesthetically appealing.
colorbrewer – Ideal for generating low-n color schemes.
iwanthue – Ideal for generating high-n color schemes (e.g. 10+ colors). Hosted and developed by the MediaLab at SciencesPo, which is also home to a number of other useful tools like Table 2 Net, for drawing graphs (networks) from tables of data.
htmlcolorcodes – A simple site for getting html hex color codes
8-digit hex codes – Rather than designating an “alpha” (a transparency factor) for a plot, it can sometimes be useful to specify transparency directly in specific color codes, e.g. when you want to highlight a specific line or trend in a plot with many lines and trends. You can do this with 8-digit hex codes. (See also here.)