Newspaper Archive Delivery System

Completed: Summer 2018 Computer Science Senior Design Project, Spring 2018-Summer 2018 College of Engineering and Computer Science

The Chronicling America Parser is a research tool built to make America’s historic newspapers easier to search and analyze. It works alongside the Library of Congress’s Chronicling America project—a national effort by the National Endowment for the Humanities and the Library of Congress to digitize and preserve historic U.S. newspapers.

While Chronicling America makes scanned newspaper pages available online, those pages aren’t separated into individual articles, which limits how precisely researchers can search them. This project tackles that problem. Using a custom parser, it reads the underlying ALTO XML files for each page, breaks them into discrete articles, and stores them in a searchable database. From there, a powerful query tool lets users run advanced searches—using Boolean operators, exact phrases, proximity matching, and metadata filters like city, state, newspaper, and date—to surface the most relevant articles quickly.

The result is a faster, more precise way for historians, journalists, educators, and other researchers to explore the past. Users can save their queries, export results as CSV files for further analysis, and generate downloadable heat maps that visualize where articles originated geographically. Rather than replacing Chronicling America, the project extends it—turning a vast archive of historical newspapers into a more flexible resource for serious research and big-data exploration.