NOTE: This analysis was last updated in Fall 2024, and the data has been updated since then, so the writing may not match the data.
Here’s what I’ll find in this notebook
I’ll make a searchable table
What authors have had the most appearances on the list?
What author has had the most appearances this year? last 2 years? 5 years? 10 years?
What authors have had the most No. 1 appearances?
What author has had the most No. 1 appearances this year? last 2 years? 5 years? 10 years?
What authors have had the most books on the list?
What author has had the most books on the the list this year? last 2 years? 5 years? 10 years?
What books have been on the list for the most number of weeks?
What books were No. 1 for the most number of weeks?
How did movie adaptations affect the charts?
What authors have had the most No. 1 books?
What author has had the most No. 1 books in the last 5 years? 10 years?
Setup
I’ll load the necessary libraries.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)library(DT)
Import
I’ll import the data into the notebook by saving the file into a new object
bestsellers <-read_rds("data-processed/bestsellers-full.rds") # saving the data into a new objectglimpse(bestsellers) # glimpsing the data
Rows: 63,446
Columns: 5
$ year <dbl> 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 193…
$ week <date> 1931-10-12, 1931-10-12, 1931-10-12, 1931-10-12, 1931-10-12, 19…
$ rank <dbl> 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …
$ title <chr> "THE TEN COMMANDMENTS", "FINCHE'S FORTUNE", "THE GOOD EARTH", "…
$ author <chr> "Warwick Deeping", "Mazo de la Roche", "Pearl S. Buck", "Willa …
Making a searchable table
I’ll make a searchable table to look for certain authors and books.
bestsellers |>datatable()
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html
What authors have had the most appearances on the list?
I’ll use gsa to find what authors have had the most appearances on the list.
Romance author Danielle Steel has had the most appearances on the list with 1022. Horror author Stephen King comes in second with 972. Legal thriller author John Grisham comes in third with 893, still hundreds above 4th place
I’m using this data for a story chart. Here is a link to that chart.
bestsellers_appearances <- bestsellers |># saving the data into a new objectgroup_by(author) |># group by authorsummarize(total_appearances =n()) |># count appearancesarrange(desc(total_appearances)) # arrange in descending order by total_appearancesbestsellers_appearances |># start with the datafilter(total_appearances >400) |># only include rows with total_appearances over 400write_csv("data-processed/most-appearances.csv") |># save the data for data wrapperprint() # print it
# A tibble: 10 × 2
author total_appearances
<chr> <int>
1 Danielle Steel 1023
2 Stephen King 973
3 John Grisham 893
4 Taylor Caldwell 524
5 James A. Michener 478
6 David Baldacci 465
7 Nicholas Sparks 441
8 Robert Ludlum 406
9 Leon Uris 405
10 Mary Higgins Clark 403
What author has had the most appearances this year? last 2 years? 5 years? 10 years?
First, I’ll find which authors have had the most appearances this year.
Rebecca Yarros has had 118, Bonnie Garmus has had 68 and Barbara Kingsolver has had 62.
bestsellers |># start with the datafilter(year >2022) |># only include rows with a year over 2022group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >33) # only include appearances above 33
I’ll look at the past two years.
It’s Rebecca Yarros with 118.
bestsellers |># start with the datafilter(year >2021) |># only include rows with a year over 2021group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >40) # only include appearances above 40
I’ll look at the past five years.
It’s John Grisham with 159.
bestsellers |># start with the datafilter(year >2018) |># only include rows with a year over 2018group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >68) # only include appearances above 68
I’ll look at 10 years.
It’s John Grisham again with 276.
bestsellers |># start with the datafilter(year >2013) |># only include rows with a year over 2013group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >111) # only include appearances above 111
What authors have had the most No. 1 appearances?
John Grisham comes in first with 188. Stephen King takes second with 152. James A. Michener takes third with 132.
bestsellers |># start with the datafilter(rank ==1) |># only include rows with a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >61) # only include appearances above 61
What author has had the most No. 1 appearances this year? last 2 years? 5 years? 10 years?
I’ll look at this year first.
It’s Rebecca Yarros with 24.
bestsellers |># start with the datafilter(year >2022, rank ==1) |># only include rows with a year over 2022 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >1) # only include appearances above 1
Now two years.
It’s Rebecca Yarros with 24.
bestsellers |># start with the datafilter(year >2021, rank ==1) |># only include rows with a year over 2021 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >4) # only include appearances above 4
5 years?
It’s Delia Owens with 54.
bestsellers |># start with the datafilter(year >2018, rank ==1) |># only include rows with a year over 2018 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >4) # only include appearances above 4
And finally, 10 years?
It’s John Grisham with 59, but Delia Owens comes close with 54.
bestsellers |># start with the datafilter(year >2013, rank ==1) |># only include rows with a year over 2013 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >9) # only include appearances above 9
What authors have had the most books on the list?
Danielle Steel has had the most books on the list with 142. Stuart Woods comes in a far away second with 68. Stephen King comes in third with 58.
bestsellers |># start with the datadistinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >40) # only include titles above 40
What author has had the most books on the list in the last year? 2 years? 5 years? 10 years?
I’ll start with this year.
It’s Danielle Steel with 12.
bestsellers |># start with the datafilter(year >2022) |># only include rows with year over 2022distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >2) # only include titles above 2
I’ll look at two years.
It’s Danielle Steel with 19.
bestsellers |># start with the datafilter(year >2021) |># only include rows with a year over 2021distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >3) # only include titles above 3
I’ll look at the last 5 years.
It’s Danielle Steel again with 39.
bestsellers |># start with the datafilter(year >2018) |># only include rows with a year over 2018distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >8) # only include titles above 8
Now, I’ll look at the last 10 years.
It’s also Danielle Steel with 65.
bestsellers |># start with the datafilter(year >2013) |># only include rows with a year over 2013distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >15) # only include titles above 15
What books have been on the list for the most number of weeks?
I’ll use gsa again to find which books have been on the list for the most number of weeks.
Oh, The Places You’ll Go! by Dr. Suess takes first with 178 weeks on the list. The Da Vinci Code by Dan Brown and The Celestine Prophecy by James Redfield are tied for second with 165 weeks. The Bridges of Madison County by Robert James Waller is in fourth with 164 weeks.
I’m using this data for a story chart. Here is a link to that chart.
bestsellers |># start with the datagroup_by(author, title) |># group by author and titlesummarize(appearances =n()) |># count number of appearancesarrange(desc(appearances)) |># arrange in descending orderfilter(appearances >106) |># only include rows with appearances greater than 106write_csv("data-processed/most-weeks.csv") |># save the data for data wrapperprint() # print it
`summarise()` has grouped output by 'author'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
# Groups: author [10]
author title appearances
<chr> <chr> <int>
1 Dr. Seuss OH, THE PLACES YOU'LL GO! 178
2 Dan Brown THE DA VINCI CODE 165
3 James Redfield THE CELESTINE PROPHECY 165
4 Robert James Waller THE BRIDGES OF MADISON COUNTY 164
5 Anthony Doerr ALL THE LIGHT WE CANNOT SEE 132
6 Delia Owens WHERE THE CRAWDADS SING 132
7 Herman Wouk THE CAINE MUTINY 123
8 Patrick Dennis AUNTIE MAME 112
9 Lloyd C. Douglas THE ROBE 111
10 Kathryn Stockett THE HELP 108
What books were No. 1 for the most number of weeks?
I’ll find which books were No. 1 for the most number of weeks by filtering first, and then using gsa.
The Da Vinci Code has the most number of weeks at No. 1 with 59. Where the Crawdads Sing by Delia Owens is in second with 54 weeks at No. 1.
It’s interesting that both these books had their fair share of controversy AND a movie adaptation.
I’m using this data for a story chart. Here is a link to that chart.
bestsellers |># start with the datafilter(rank ==1) |># only include rows where the rank is 1group_by(author, title) |># group by author and titlesummarize(appearances =n()) |># count number of appearancesarrange(desc(appearances)) |># arrange in descending orderfilter(appearances >33) |># only include rows with appearances greater than 33write_csv("data-processed/most-one-weeks.csv") |># save the data for data wrapeprprint() # print it
`summarise()` has grouped output by 'author'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
# Groups: author [10]
author title appearances
<chr> <chr> <int>
1 Dan Brown THE DA VINCI CODE 59
2 Delia Owens WHERE THE CRAWDADS SING 54
3 James A. Michener HAWAII 49
4 Herman Wouk THE CAINE MUTINY 48
5 Erich Segal LOVE STORY 41
6 James Michener THE SOURCE 41
7 Richard Bach JONATHAN LIVINGSTON SEAGULL 38
8 Robert James Waller THE BRIDGES OF MADISON COUNTY 38
9 Leon Uris TRINITY 36
10 John le Carré THE SPY WHO CAME IN FROM THE COLD 34
How did movie adaptations affect the charts?
Let’s see if the books returned to the charts after the move adaptations came out.
First, I’ll look at The Da Vinci Code. Once I filtered to just look The Da Vinci Code, I noticed that it actually appeared to stay on the charts pretty consistently. I grouped and summarized by year to check what I was seeing. Rather than having a resurgance after the movie came out (in 2006), The Da Vinci code appears to have actually just stayed on the list pretty much the whole time.
bestsellers |># start with the datafilter(str_detect(title, "THE DA VINCI CODE")) |># only inlcude The Da Vinci Codegroup_by(year) |># group by yearsummarize(appearance =n()) # count the number of appearances in each year
Now, I’ll look at Where the Crawdads Sing. The book was published in 2018 and the movie was released in July 2022 (the same week it got on the list in 2022). It looks like the book gained traction in the years following it’s release, possibly from the adaptation being announced, but it had slowed down by the time the movie was released.
bestsellers |># start with the datafilter(str_detect(title, "WHERE THE CRAWDADS SING")) # only include Where the Crawdads Sing
bestsellers |># start with the datafilter(str_detect(title, "WHERE THE CRAWDADS SING")) |># only include Where the Crawdads Singgroup_by(year) |># group by yearsummarize(appearances =n()) # count number of appearances per year
What authors have had the most titles reach No. 1?
I’ll find which authors have had the most No. 1 titles.
Stephen King is in first with 44 No. 1 titles. John Grisham is in second with 40 titles. Danielle Steel is in third with 36 titles.
These authors were also top three for most appearances on the chart, just in a different order.
bestsellers |># start with the datafilter(rank ==1) |># only include rows with a rank of 1distinct(author, title) |># only include distinct author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count number of titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >16) # only include rows with titles greater than 16
What author has had the most No. 1 books in the last 2 years? 5 years? 10 years?
I’ll start with 2 years.
John Grisham has the most with 5 No. 1 books.
bestsellers |># start with the datafilter(rank ==1, year >2021) |># only include rows with a rank of 1 and a year over 2021distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >1) # only include titles above 1
I’ll find which authors have had the most No. 1 books in the last 5 years.
John Grisham is in first with 10 No.1 titles in the last 5 years. David Baldacci is in second with 8.
bestsellers |># start with the datafilter(rank ==1, year >2018) |># only include rows with a rank of 1 and a year over 2018distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >2) # only include titles above 2
Now, I’ll look at the last 10 years.
The results are fairly similar to the last five years. John Grisham is in first with 16 No. 1 titles in the last 10 years. David Baldacci is in second with 14. Stephen King is in third with 13.
bestsellers |># start with the datafilter(rank ==1, year >2013) |># only include rows with a rank of 1 and a year over 2013distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >7) # only include titles above 7