A few months ago I was helping a student figure out the different ways to visualize Goodreads book ratings. The data lent itself to a particular style of visualization, but after that conversation was over, I promised myself I'd take another crack at it.
This article describes the data, the approaches I took to the data, and three methods - each with a different workbook embedded. Each workbook starts with an explanation of how the data is organized, and if I had to manipulate the data, how I did it.
The subsequent pages show the different visualizations (vizzes) I created with each method, along with the type of chart and my goal in using that chart type. I also used examples to highlight how the reader should/can approach the viz.
I wanted a relatively small set of data and I wanted to work quickly, so I manually pulled the ratings for the top 25 Forbidden Books on this Goodreads list. The data included the book title, author, and number of reviewers who marked the book 5-star, 4-star, 3-star, 2-star and 1-star.
As I recall, this was the approach we settled on last fall - the simplest and most direct way to show this data. Each book is displayed with its star rating, shown in percent and in count of reviewers. I created three different charts to show this data: a lollipop, an area chart and a bunch of pie charts.
This method was inspired by a project I'd done in school, something I called the Malaria Playground. It was designed to show the percent of children (as a number) with and without Malaria on any given playground in Africa, with the dates selected by the viewer.
I wasn't satisfied with the way this looked, and I eventually found a different way to show this. In this second example, rather than list the numbers of children with and without Malaria, I showed a field of 100 children shapes and color-coded the icons for kids with and without Malaria.
Method 2 takes a few liberties with the data. I wanted to look at the percentages of votes differently. Instead of seeing a pie chart (or a stacked bar chart) that showed proportion, I thought it might be different to see the proportion based on the field of 100, just like my second Malaria viz.
So if 5-star ratings account for 31% of the total reviews of a book, in a field of 100, 31 of the 100 stars are color-coded as 5-star ratings. Each of the chart types in this workbook tell different stories about percentage of reviews for the different books.
Method 3 takes radical liberties with the original data set - and from the resulting charts you can see quite a different story than in the other two methods.
The data for this method is based on the population of reviewers for each book, with each mark representing 1000 reviewers.
Was this a good use of time? Certainly - I found new ways to show simple data, but it took a fair amount of data manipulation in Excel. Is it a better way? It really depends on the intent and the audience. I don't want to make it harder to understand, and method 3 runs that risk - but what I learned could change how I approach other analyses, and for that this was worth every minute of my time.