NOTE: This analysis was last updated in Fall 2024, and the data has been updated since then, so the writing may not match the data.
Here’s what I’ll find in this notebook
I’ll make a searchable table
What authors have had the most appearances on the list?
What author has had the most appearances this year? last 2 years? 5 years? 10 years?
What authors have had the most No. 1 appearances?
What author has had the most No. 1 appearances this year? last 2 years? 5 years? 10 years?
What authors have had the most books on the list?
What author has had the most books on the the list this year? last 2 years? 5 years? 10 years?
What books have been on the list for the most number of weeks?
What books were No. 1 for the most number of weeks?
How did movie adaptations affect the charts?
What authors have had the most No. 1 books?
What author has had the most No. 1 books in the last 5 years? 10 years?
Setup
I’ll load the necessary libraries.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)library(DT)
Import
I’ll import the data into the notebook by saving the file into a new object
bestsellers <-read_rds("data-processed/bestsellers-combined.rds") # saving the data into a new objectglimpse(bestsellers) # glimpsing the data
Rows: 7,125
Columns: 7
$ year <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016…
$ week <date> 2016-02-14, 2016-02-14, 2016-02-14, 2016-02-14, 2016-02-1…
$ rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3…
$ title <chr> "NYPD RED 4", "SPIDER GAME", "THE BANDS OF MOURNING", "THE…
$ author <chr> "James Patterson and Marshall Karp", "Christine Feehan", "…
$ publisher <chr> "Little, Brown", "Jove", "Tor/Tom Doherty", "Grand Central…
$ description <chr> "Detective Zach Gordon and his partner, members of an elit…
Making a searchable table
I’ll make a searchable table to look for certain authors and books.
bestsellers |>datatable()
What authors have had the most appearances on the list?
I’ll use gsa to find what authors have had the most appearances on the list.
Colleen Hoover blows everyone else out of the water with 448 total appearances. John Grisham comes in second with 355, and David Baldacci comes in third with 248.
bestsellers_appearances <- bestsellers |># saving the data into a new objectgroup_by(author) |># group by authorsummarize(total_appearances =n()) |># count appearancesarrange(desc(total_appearances)) # arrange in descending order by total_appearancesbestsellers_appearances |># start with the datafilter(total_appearances >152) |># only include rows with total_appearances over 152write_csv("data-processed/most-appearances.csv") |># save the data for data wrapperprint() # print it
# A tibble: 5 × 2
author total_appearances
<chr> <int>
1 Colleen Hoover 443
2 John Grisham 233
3 Delia Owens 193
4 David Baldacci 153
5 Taylor Jenkins Reid 153
What author has had the most appearances this year? last 2 years? 5 years? 10 years?
First, I’ll find which authors have had the most appearances this year.
Again, Colleen Hoover blows everyone else out of the water with almost double 2nd place. She has 193 appearances to be exact.
bestsellers |># start with the datafilter(year >2022) |># only include rows with a year over 2022group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >26) # only include appearances above 26
I’ll look at the past two years.
Colleen Hoover again. More than triple second place with 400.
bestsellers |># start with the datafilter(year >2021) |># only include rows with a year over 2021group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >45) # only include appearances above 45
I’ll look at the past five years.
Colleen Hoover again, still with a large lead. She has 427.
bestsellers |># start with the datafilter(year >2018) |># only include rows with a year over 2018group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >84) # only include appearances above 55
I’ll look at 10 years.
This is crazy to me that Colleen Hoover still has the lead despite only becoming popular a few years ago. Note that she only has 6 appearances 2013-2017 and still takes the lead. She has 434 total appearances at this point.
bestsellers |># start with the datafilter(year >2013) |># only include rows with a year over 2013group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >109) # only include appearances above 109
What authors have had the most No. 1 appearances?
It’s John Grisham with 48, though Delia Owens comes close with 44. Colleen Hoover takes third with 38.
bestsellers |># start with the datafilter(rank ==1) |># only include rows with a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >15) # only include appearances above 16
What author has had the most No. 1 appearances this year? last 2 years? 5 years? 10 years?
I’ll look at this year first.
It’s Colleen Hoover with 19.
bestsellers |># start with the datafilter(year >2022, rank ==1) |># only include rows with a year over 2022 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >1) # only include appearances above 1
Now two years.
It’s Colleen Hoover with 38.
bestsellers |># start with the datafilter(year >2021, rank ==1) |># only include rows with a year over 2021 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >2) # only include appearances above 2
5 years?
First thing Colleen Hoover has not won. Delia Owens takes it with 44, most likely with Where the Crawdads Sing. Colleen Hoover comes in second with 38.
bestsellers |># start with the datafilter(year >2018, rank ==1) |># only include rows with a year over 2018 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >5) # only include rows with appearances above 5
And finally, 10 years?
Also Delia Owens with 44 for Where the Crawdads Sing, but closely followed by John Grisham at 41.
bestsellers |># start with the datafilter(year >2013, rank ==1) |># only include rows with a year over 2013 and a rank of 1group_by(author) |># group by authorsummarize(appearances =n()) |># count appearances per authorarrange(desc(appearances)) |># arrange in descending order by appearancesfilter(appearances >12) # only include appearances above 12
What authors have had the most books on the list?
Danielle Steel has had the most with 74 books on the list. Christine Feehan comes in second with 56. Stuart Woods takes third with 44.
bestsellers |># start with the datadistinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >22) # only include titles above 22
What author has had the most books on the list in the last year? 2 years? 5 years? 10 years?
I’ll start with this year.
It’s Danielle Steel with 12.
bestsellers |># start with the datafilter(year >2022) |># only include rows with a year over 2022distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >3) # only include titles above 3
I’ll look at two years.
It’s Danielle Steel with 19. Colleen Hoover is in second with 11.
bestsellers |># start with the datafilter(year >2021) |># only include rows with a year over 2021distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >4) # only include titles above 4
I’ll look at the last 5 years.
It’s Danielle Steel again with 40.
bestsellers |># start with the datafilter(year >2018) |># only include rows with a year over 2018distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >10) # only include titles above 10
Now, I’ll look at the last 10 years.
It’s also Danielle Steel with 65.
bestsellers |># start with the datafilter(year >2013) |># only include rows with a year over 2013distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >18) # only include titles above 18
What books have been on the list for the most number of weeks?
I’ll use gsa again to find which books have been on the list for the most number of weeks.
Delia Owens wins with 193 appearances for Where the Crawdads Sing. Colleen Hoover is the only person with 2 books in the top 10 and takes second with 139. Gillian Flynn comes in third with 122 for Where the Crawdads Sing.
bestsellers |># start with the datagroup_by(author, title) |># group by author and titlesummarize(appearances =n()) |># count number of appearancesarrange(desc(appearances)) |># arrange in descending orderfilter(appearances >67) |># only include rows with appearances greater than 67write_csv("data-processed/most-weeks.csv") |># save the data for data wrapperprint() # print it
`summarise()` has grouped output by 'author'. You can override using the
`.groups` argument.
# A tibble: 8 × 3
# Groups: author [7]
author title appearances
<chr> <chr> <int>
1 Delia Owens WHERE THE CRAWDADS SING 193
2 Colleen Hoover IT ENDS WITH US 146
3 Taylor Jenkins Reid THE SEVEN HUSBANDS OF EVELYN HUGO 115
4 Colleen Hoover VERITY 94
5 Freida McFadden THE HOUSEMAID 86
6 Rebecca Yarros FOURTH WING 84
7 Celeste Ng LITTLE FIRES EVERYWHERE 81
8 Heather Morris THE TATTOOIST OF AUSCHWITZ 70
What books were No. 1 for the most number of weeks?
I’ll find which books were No. 1 for the most number of weeks by filtering first, and then using gsa.
Where the Crawdads Sing was number one for the most number of weeks by a good amount with 44. Fifty Shades of Grey by E. L. James is in second with 29.
bestsellers |># start with the datafilter(rank ==1) |># only include rows where the rank is 1group_by(author, title) |># group by author and titlesummarize(appearances =n()) |># count number of appearancesarrange(desc(appearances)) |># arrange in descending orderfilter(appearances >7) |># only include rows with appearances greater than 7write_csv("data-processed/most-one-weeks.csv") |># save the data for data wrapperprint() # print it
`summarise()` has grouped output by 'author'. You can override using the
`.groups` argument.
# A tibble: 6 × 3
# Groups: author [5]
author title appearances
<chr> <chr> <int>
1 Delia Owens WHERE THE CRAWDADS SING 44
2 Colleen Hoover IT ENDS WITH US 18
3 Colleen Hoover IT STARTS WITH US 17
4 Rebecca Yarros FOURTH WING 17
5 Kristin Hannah THE WOMEN 10
6 Paula Hawkins THE GIRL ON THE TRAIN 8
Let’s see if the books returned to the charts after the move adaptations came out.
First, I’ll look at Where the Crawdads Sing. he book was published in 2018 and the movie was released in July 2022. It looks like it stayed on the list pretty consistently after it first gained traction in 2019, though there’s a small dip in 2021, and then its sales faded in 2023 after the movie talk wore off.
bestsellers |># start with the datafilter(str_detect(title, "WHERE THE CRAWDADS SING")) |># only include Where the Crawdads Singgroup_by(year) |># group by yearsummarize(appearance =n()) # count the number of appearances in each year
Now, I’ll look at Fifty Shades of Grey. The book was published in 2011 and the movie was released in 2015. It looks like it got most traction in 2012 and 2013 (when the book actually got a redistribution deal after a small initial publication) with a slight resurgence in 2015 with the movie release.
bestsellers |># start with the datafilter(str_detect(title, "FIFTY SHADES OF GREY")) |># only include Fifty Shades of Greygroup_by(year) |># group by yearsummarize(appearance =n()) # count the number of appearances in each year
Finally, I’ll look at Gone Girl. The book was published in 2012 and the movie was released in 2014. It looks like it stayed on the list pretty consistently after publication, peaking in 2014 with the movie release and then dropping off in 2015.
bestsellers |># start with the datafilter(str_detect(title, "GONE GIRL")) |># only include Gone Girlgroup_by(year) |># group by yearsummarize(appearance =n()) # count the number of appearances in each year
What authors have had the most titles reach No. 1?
I’ll find which authors have had the most No. 1 titles.
Nora Roberts takes first on this one with 20 number one titles.
These authors were also top three for most appearances on the chart, just in a different order.
bestsellers |># start with the datafilter(rank ==1) |># only include rows with a rank of 1distinct(author, title) |># only include distinct author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count number of titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >8) # only include rows with titles greater than 8
What author has had the most No. 1 books in the last 2 years? 5 years? 10 years?
I’ll start with 2 years.
John Sandford takes first with 5 No. 1 books.
bestsellers |># start with the datafilter(rank ==1, year >2021) |># only include rows with a rank of 1 and a year over 2021distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >2) # only include rows with titles above 2
I’ll find which authors have had the most No. 1 books in the last 5 years.
David Baldacci is first with 11 No. 1 books.
bestsellers |># start with the datafilter(rank ==1, year >2018) |># only include rows with a rank of 1 and a year over 2018distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >4) # only include rows with titles above 4
Now, I’ll look at the last 10 years.
Nora Roberts is first with 17 and David Baldacci is second with 16.
bestsellers |># start with the datafilter(rank ==1, year >2013) |># only include rows with a rank of 1 and a year over 2013distinct(author, title) |># only include unique author/title combinationsgroup_by(author) |># group by authorsummarize(titles =n()) |># count titles per authorarrange(desc(titles)) |># arrange in descending order by titlesfilter(titles >8) # only include rows with titles above 8