NYT Best Sellers Data
  • Hardcover Web Scrape
  • Combined Print and E-Book Web Scrape
  • Hardcover Analysis
  • Combined Print and E-Book Analysis

On this page

  • Project Overview
    • Key Research Questions
  • Data Collection
    • Get the Data
    • Data Sources
    • Automated Updates
  • Project Navigation
  • Related Projects
  • About the Author
  • Repository

New York Times Best Seller Analysis

Project Overview

This project provides data collection and analysis of the New York Times Best Seller lists, tracking trends and patterns in bestselling fiction books over time. The data contains two lists:

  • Hardcover Fiction (1931-present)
  • Combined Print & E-Book Fiction (2011-present)

Key Research Questions

  • How has the NYT bestseller list evolved over time?
  • Which authors and titles have dominated the lists?
  • What patterns can we observe in bestseller performance?
  • How do hardcover bestsellers compare to e-book inclusive listings?

Data Collection

Get the Data

  • Hardcover Fiction (1931-present)
    • RDS
    • CSV
  • Combined Print & E-Book Fiction (2011-present)
    • RDS
    • CSV

Data Sources

This project combines data from multiple sources:

  • Web Scraping: Current bestseller data collected directly from the New York Times website
  • Historical Archives: Hardcover fiction data (1931-2010) from the Post 45 Collective

Automated Updates

The dataset is automatically refreshed weekly through GitHub Actions workflows, ensuring the analysis remains current with the latest bestseller information.

Project Navigation

This website contains four main analysis sections:

  1. Hardcover Fiction Data Cleaning - Methodology for preparing the hardcover fiction dataset
  2. Combined List Data Cleaning - Methodology for preparing the combined print and e-book dataset
  3. Hardcover Fiction Analysis - Findings and visualizations from the hardcover fiction dataset
  4. Combined List Analysis - Findings and visualizations from the combined print and e-book dataset

Related Projects

I’ve made two projects based on this data:

  • What Books Top the New York Times Best Sellers List and Why? - Article exploring key findings
  • Data Dives: Investigating the NYT Bestseller Lists - Podcast discussion of the research methodology and results

About the Author

This project is maintained by Aislyn Gaddis, a senior journalism major at UT Austin interested in data, newsroom innovation and multimedia journalism. The work began as a class project in spring 2023 for Reporting with Data with Professor Christian McDonald at The University of Texas at Austin and has evolved into an ongoing project. In early 2025, I added GitHub Actions to automatically scrape the new list and update the data each week.

Repository

The complete code and data for this project are available on GitHub.