Hardcover Analysis

Goals of the notebook

NOTE: This analysis was last updated in Fall 2024, and the data has been updated since then, so the writing may not match the data.

Here’s what I’ll find in this notebook

  • I’ll make a searchable table
  • What authors have had the most appearances on the list?
  • What author has had the most appearances this year? last 2 years? 5 years? 10 years?
  • What authors have had the most No. 1 appearances?
  • What author has had the most No. 1 appearances this year? last 2 years? 5 years? 10 years?
  • What authors have had the most books on the list?
  • What author has had the most books on the the list this year? last 2 years? 5 years? 10 years?
  • What books have been on the list for the most number of weeks?
  • What books were No. 1 for the most number of weeks?
  • How did movie adaptations affect the charts?
  • What authors have had the most No. 1 books?
  • What author has had the most No. 1 books in the last 5 years? 10 years?

Setup

I’ll load the necessary libraries.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(DT)

Import

I’ll import the data into the notebook by saving the file into a new object

bestsellers <- read_rds("data-processed/bestsellers-full.rds") # saving the data into a new object

glimpse(bestsellers) # glimpsing the data
Rows: 63,446
Columns: 5
$ year   <dbl> 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 193…
$ week   <date> 1931-10-12, 1931-10-12, 1931-10-12, 1931-10-12, 1931-10-12, 19…
$ rank   <dbl> 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …
$ title  <chr> "THE TEN COMMANDMENTS", "FINCHE'S FORTUNE", "THE GOOD EARTH", "…
$ author <chr> "Warwick Deeping", "Mazo de la Roche", "Pearl S. Buck", "Willa …

Making a searchable table

I’ll make a searchable table to look for certain authors and books.

bestsellers |> 
  datatable()
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

What authors have had the most appearances on the list?

I’ll use gsa to find what authors have had the most appearances on the list.

Romance author Danielle Steel has had the most appearances on the list with 1022. Horror author Stephen King comes in second with 972. Legal thriller author John Grisham comes in third with 893, still hundreds above 4th place

I’m using this data for a story chart. Here is a link to that chart.

bestsellers_appearances <- bestsellers |> # saving the data into a new object
  group_by(author) |> # group by author
  summarize(total_appearances = n()) |> # count appearances
  arrange(desc(total_appearances)) # arrange in descending order by total_appearances

bestsellers_appearances |> # start with the data
   filter(total_appearances > 400) |> # only include rows with total_appearances over 400
  write_csv("data-processed/most-appearances.csv") |> # save the data for data wrapper
  print() # print it
# A tibble: 10 × 2
   author             total_appearances
   <chr>                          <int>
 1 Danielle Steel                  1023
 2 Stephen King                     973
 3 John Grisham                     893
 4 Taylor Caldwell                  524
 5 James A. Michener                478
 6 David Baldacci                   465
 7 Nicholas Sparks                  441
 8 Robert Ludlum                    406
 9 Leon Uris                        405
10 Mary Higgins Clark               403

What author has had the most appearances this year? last 2 years? 5 years? 10 years?

First, I’ll find which authors have had the most appearances this year.

Rebecca Yarros has had 118, Bonnie Garmus has had 68 and Barbara Kingsolver has had 62.

bestsellers |> # start with the data
  filter(year > 2022) |> # only include rows with a year over 2022
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 33) # only include appearances above 33

I’ll look at the past two years.

It’s Rebecca Yarros with 118.

bestsellers |> # start with the data
  filter(year > 2021) |> # only include rows with a year over 2021
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 40) # only include appearances above 40

I’ll look at the past five years.

It’s John Grisham with 159.

bestsellers |> # start with the data
  filter(year > 2018) |> # only include rows with a year over 2018
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 68) # only include appearances above 68

I’ll look at 10 years.

It’s John Grisham again with 276.

bestsellers |> # start with the data
  filter(year > 2013) |> # only include rows with a year over 2013
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 111) # only include appearances above 111

What authors have had the most No. 1 appearances?

John Grisham comes in first with 188. Stephen King takes second with 152. James A. Michener takes third with 132.

bestsellers |> # start with the data
  filter(rank == 1) |> # only include rows with a rank of 1
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 61) # only include appearances above 61

What author has had the most No. 1 appearances this year? last 2 years? 5 years? 10 years?

I’ll look at this year first.

It’s Rebecca Yarros with 24.

bestsellers |> # start with the data
  filter(year > 2022,
         rank == 1) |> # only include rows with a year over 2022 and a rank of 1
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 1) # only include appearances above 1

Now two years.

It’s Rebecca Yarros with 24.

bestsellers |> # start with the data
  filter(year > 2021,
         rank == 1) |> # only include rows with a year over 2021 and a rank of 1
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 4) # only include appearances above 4

5 years?

It’s Delia Owens with 54.

bestsellers |> # start with the data
  filter(year > 2018,
         rank == 1) |> # only include rows with a year over 2018 and a rank of 1
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 4) # only include appearances above 4

And finally, 10 years?

It’s John Grisham with 59, but Delia Owens comes close with 54.

bestsellers |> # start with the data
  filter(year > 2013,
         rank == 1) |> # only include rows with a year over 2013 and a rank of 1
  group_by(author) |> # group by author
  summarize(appearances = n()) |> # count appearances per author
  arrange(desc(appearances)) |> # arrange in descending order by appearances
  filter(appearances > 9) # only include appearances above 9

What authors have had the most books on the list?

Danielle Steel has had the most books on the list with 142. Stuart Woods comes in a far away second with 68. Stephen King comes in third with 58.

bestsellers |> # start with the data
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 40) # only include titles above 40

What author has had the most books on the list in the last year? 2 years? 5 years? 10 years?

I’ll start with this year.

It’s Danielle Steel with 12.

bestsellers |> # start with the data
  filter(year >2022) |> # only include rows with year over 2022
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 2) # only include titles above 2

I’ll look at two years.

It’s Danielle Steel with 19.

bestsellers |> # start with the data
  filter(year >2021) |> # only include rows with a year over 2021
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 3) # only include titles above 3

I’ll look at the last 5 years.

It’s Danielle Steel again with 39.

bestsellers |> # start with the data
  filter(year > 2018) |> # only include rows with a year over 2018
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 8) # only include titles above 8

Now, I’ll look at the last 10 years.

It’s also Danielle Steel with 65.

bestsellers |> # start with the data
  filter(year > 2013) |> # only include rows with a year over 2013
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 15) # only include titles above 15

What books have been on the list for the most number of weeks?

I’ll use gsa again to find which books have been on the list for the most number of weeks.

Oh, The Places You’ll Go! by Dr. Suess takes first with 178 weeks on the list. The Da Vinci Code by Dan Brown and The Celestine Prophecy by James Redfield are tied for second with 165 weeks. The Bridges of Madison County by Robert James Waller is in fourth with 164 weeks.

I’m using this data for a story chart. Here is a link to that chart.

bestsellers |> # start with the data
  group_by(author, title) |> # group by author and title
  summarize(appearances = n()) |> # count number of appearances
  arrange(desc(appearances)) |> # arrange in descending order
  filter(appearances > 106) |> # only include rows with appearances greater than 106
  write_csv("data-processed/most-weeks.csv") |> # save the data for data wrapper
  print() # print it
`summarise()` has grouped output by 'author'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
# Groups:   author [10]
   author              title                         appearances
   <chr>               <chr>                               <int>
 1 Dr. Seuss           OH, THE PLACES YOU'LL GO!             178
 2 Dan Brown           THE DA VINCI CODE                     165
 3 James Redfield      THE CELESTINE PROPHECY                165
 4 Robert James Waller THE BRIDGES OF MADISON COUNTY         164
 5 Anthony Doerr       ALL THE LIGHT WE CANNOT SEE           132
 6 Delia Owens         WHERE THE CRAWDADS SING               132
 7 Herman Wouk         THE CAINE MUTINY                      123
 8 Patrick Dennis      AUNTIE MAME                           112
 9 Lloyd C. Douglas    THE ROBE                              111
10 Kathryn Stockett    THE HELP                              108

What books were No. 1 for the most number of weeks?

I’ll find which books were No. 1 for the most number of weeks by filtering first, and then using gsa.

The Da Vinci Code has the most number of weeks at No. 1 with 59. Where the Crawdads Sing by Delia Owens is in second with 54 weeks at No. 1.

It’s interesting that both these books had their fair share of controversy AND a movie adaptation.

I’m using this data for a story chart. Here is a link to that chart.

bestsellers |> # start with the data
  filter(rank == 1) |> # only include rows where the rank is 1
  group_by(author, title) |> # group by author and title
  summarize(appearances = n()) |> # count number of appearances
  arrange(desc(appearances)) |> # arrange in descending order
  filter(appearances > 33) |> # only include rows with appearances greater than 33
  write_csv("data-processed/most-one-weeks.csv") |> # save the data for data wrapepr
  print() # print it
`summarise()` has grouped output by 'author'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
# Groups:   author [10]
   author              title                             appearances
   <chr>               <chr>                                   <int>
 1 Dan Brown           THE DA VINCI CODE                          59
 2 Delia Owens         WHERE THE CRAWDADS SING                    54
 3 James A. Michener   HAWAII                                     49
 4 Herman Wouk         THE CAINE MUTINY                           48
 5 Erich Segal         LOVE STORY                                 41
 6 James Michener      THE SOURCE                                 41
 7 Richard Bach        JONATHAN LIVINGSTON SEAGULL                38
 8 Robert James Waller THE BRIDGES OF MADISON COUNTY              38
 9 Leon Uris           TRINITY                                    36
10 John le Carré       THE SPY WHO CAME IN FROM THE COLD          34

How did movie adaptations affect the charts?

Let’s see if the books returned to the charts after the move adaptations came out.

First, I’ll look at The Da Vinci Code. Once I filtered to just look The Da Vinci Code, I noticed that it actually appeared to stay on the charts pretty consistently. I grouped and summarized by year to check what I was seeing. Rather than having a resurgance after the movie came out (in 2006), The Da Vinci code appears to have actually just stayed on the list pretty much the whole time.

bestsellers |> # start with the data
  filter(str_detect(title, "THE DA VINCI CODE")) |> # only inlcude The Da Vinci Code
  group_by(year) |> # group by year
  summarize(appearance = n()) # count the number of appearances in each year

Now, I’ll look at Where the Crawdads Sing. The book was published in 2018 and the movie was released in July 2022 (the same week it got on the list in 2022). It looks like the book gained traction in the years following it’s release, possibly from the adaptation being announced, but it had slowed down by the time the movie was released.

bestsellers |> # start with the data
  filter(str_detect(title, "WHERE THE CRAWDADS SING")) # only include Where the Crawdads Sing
bestsellers |> # start with the data
  filter(str_detect(title, "WHERE THE CRAWDADS SING")) |> # only include Where the Crawdads Sing
  group_by(year) |> # group by year
  summarize(appearances = n()) # count number of appearances per year

What authors have had the most titles reach No. 1?

I’ll find which authors have had the most No. 1 titles.

Stephen King is in first with 44 No. 1 titles. John Grisham is in second with 40 titles. Danielle Steel is in third with 36 titles.

These authors were also top three for most appearances on the chart, just in a different order.

bestsellers |> # start with the data
  filter(rank == 1) |> # only include rows with a rank of 1
  distinct(author, title) |> # only include distinct author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count number of titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 16) # only include rows with titles greater than 16

What author has had the most No. 1 books in the last 2 years? 5 years? 10 years?

I’ll start with 2 years.

John Grisham has the most with 5 No. 1 books.

bestsellers |> # start with the data
  filter(rank == 1, 
         year >2021) |> # only include rows with a rank of 1 and a year over 2021
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 1) # only include titles above 1

I’ll find which authors have had the most No. 1 books in the last 5 years.

John Grisham is in first with 10 No.1 titles in the last 5 years. David Baldacci is in second with 8.

bestsellers |> # start with the data
  filter(rank == 1, 
         year > 2018) |> # only include rows with a rank of 1 and a year over 2018
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 2) # only include titles above 2

Now, I’ll look at the last 10 years.

The results are fairly similar to the last five years. John Grisham is in first with 16 No. 1 titles in the last 10 years. David Baldacci is in second with 14. Stephen King is in third with 13.

bestsellers |> # start with the data
  filter(rank == 1,
         year > 2013) |> # only include rows with a rank of 1 and a year over 2013
  distinct(author, title) |> # only include unique author/title combinations
  group_by(author) |> # group by author
  summarize(titles = n()) |> # count titles per author
  arrange(desc(titles)) |> # arrange in descending order by titles
  filter(titles > 7) # only include titles above 7