This phase of the Johnson’s Dictionary Quotations project set out to take a working but slow source-tracing tool and scale it up to handle the full dictionary. Sponsored by Dr. Amy Giroux and Connie Harper as part of CHDR’s Johnson’s Dictionary Online effort, it built on an earlier team’s Python algorithm that traces the roughly 300,000 literary quotations across two editions of Samuel Johnson’s 1755 Dictionary of the English Language back to their original sources—a task complicated by Johnson’s habit of paraphrasing quotes, altering titles, and misattributing authors. The original tool worked, but processing just 10,000 quotes took five days. The goal of this phase was to port the fuzzy-matching algorithm onto STOKES, UCF’s high-performance computing cluster, dramatically accelerating the work; to draw on the scholarly EEBO and ECCO text corpora for more authoritative matches; and to enrich the results with specific work titles, links to the original online texts, and Library of Congress linked open data for each author. The phase also aimed to produce a master searchable index of every quote, title, and author Johnson drew upon—letting researchers find where a canonical passage appears in the dictionary even when Johnson recorded it imperfectly.
Senior Design Project — Samuel Johnson’s Dictionary Online
