Summarizing Books Read Over Time

I recently read an interesting blog post where the author examined their books rated on Goodreads and summarizing interesting trends. I decided to do a similar analysis even though I use LibraryThing instead.

LibraryThing has a nice option to allow to to export your data in a variety of formats. Since I write R code to parse CSV files everyday I thought I would do something different and parse a JSON file with python.

I have been on LibraryThing since 2007 and the first question I was interested in was have my average ratings changed over time? I calculated the mean for each book by year:

Year Average Rating
2007 3.446809
2008 3.480000
2009 3.485294
2010 3.641509
2011 3.456522
2012 3.529412
2013 3.321429
2014 3.614583

While uninteresting, this makes a lot of sense - if I am reading a book that I do not enjoy, I will usually bail on it which tends to bias my ratings upward. Over time, there have been a few notable exceptions.

One of the other interesting analyses in the blog post was examining how the reviewer’s ratings have changed based on the month of the year. I wanted to make a similar plot using R’s ggplot2 however since I was writing this in python I was largely limited to matplotlib. Fortunately, many people have struggled with this issue and the fine folks at yhat have ported ggplot2 over to python. With this library I was able to use geom_smooth to produce the following plot showing rating trends by week.

I tried to figure out why my legend never showed up but I figured that since most of the trend lines were pretty much the same anyways that the plot was fine without a legend. It appears that I get in most of my good reviews early in the year and am harsher later in the year.

The last figure in the blog post compares the writer’s review scores to the Goodreads consensus score. I attempted to replicate this but ran into more trouble than it was worth to extract that data from LibraryThing so I abandoned that analysis.

If interested, I put my python code in a GitHub gist.