New York Times Best Seller Analysis

Project Overview

This project provides data collection and analysis of the New York Times Best Seller lists, tracking trends and patterns in bestselling fiction books over time. The data contains two lists:

Hardcover Fiction (1931-present)
Combined Print & E-Book Fiction (2011-present)

Key Research Questions

How has the NYT bestseller list evolved over time?
Which authors and titles have dominated the lists?
What patterns can we observe in bestseller performance?
How do hardcover bestsellers compare to e-book inclusive listings?

Data Collection

Get the Data

Hardcover Fiction (1931-present)
- RDS
- CSV
Combined Print & E-Book Fiction (2011-present)
- RDS
- CSV

Data Sources

This project combines data from multiple sources:

Web Scraping: Current bestseller data collected directly from the New York Times website
Historical Archives: Hardcover fiction data (1931-2010) from the Post 45 Collective

Automated Updates

The dataset is automatically refreshed weekly through GitHub Actions workflows, ensuring the analysis remains current with the latest bestseller information.

About the Author

This project is maintained by Aislyn Gaddis, a senior journalism major at UT Austin interested in data, newsroom innovation and multimedia journalism. The work began as a class project in spring 2023 for Reporting with Data with Professor Christian McDonald at The University of Texas at Austin and has evolved into an ongoing project. In early 2025, I added GitHub Actions to automatically scrape the new list and update the data each week.

Repository

The complete code and data for this project are available on GitHub.