Did Gaius Julius Caesar Write De Bello Hispaniensi? A Computational Study of Latin Classics Authorship

Olivia R Zhang, Trevor Cohen, Scott McGill


This project addresses a two-millennium old mystery surrounding the authorship of ancient Latin war memoirs attributed to Caesar, using Distributional Semantics, a modern computational method for detecting written text patterns. The Civil War has been confirmed to be Caesar’s work, as well as the first seven of the eight chapters of the Gallic War, the eighth by Hirtius. The authorship of the African, Alexandrine, and Spanish Wars, though attributed to Caesar, is still under debate. Methods of distributional semantics derive representations of words from their distribution across a large amount of text, such that words that occur in similar contexts have similar representations.  These representations can then be combined to model larger units of text, such as chapters and whole books. SemanticVectors software was used to calculate the similarity between chapters or books after dimension reduction using Random Indexing. The results show that the Gallic War’s eighth chapter is significantly different from its other seven chapters and from the Civil War, verifying the ability of distributional semantics to detect different Latin authorships. The African, Alexandrine, and Spanish Wars are notably different from the Civil War andGallic War (first seven chapters), suggesting that Caesar did not write these three. Furthermore, the African, Alexandrine, and Spanish Wars are different from each other and from the Civil and Gallic Wars, suggesting that they were written by different authors. This project demonstrates the value of distributional semantics in classics research. Its implications for digital humanities and real world problems such as plagiarism are discussed.  


authorship attribution; Caesar; Classics; computational linguistics; distributional semantics; Latin

