The Canonizer

This tool predicts whether a novel is part of the literary canon based on its word frequencies, using machine learning. The system has been trained on a corpus of 1346 novels from 1800-2000 available in DBNL (a digital archive of Dutch literature) and is able to distinguish canonical and non-canonical texts with a crossvalidated accuracy score of 72%. For the purposes of this tool, a novel is considered canonical when DBNL has one or more secondary references to it (i.e., a reviewer or literary scholar discusses the novel in a secondary text that is part of DBNL). This "distant reading" tool serves to explore the relation between textual features and canonicity, and is intended purely for academic purposes; there is no substitute for actual reading!

arrow

OR

Enter (Dutch) text

arrow

OR

Enter one or two words: (e.g., telefoon, or zegt hij)

arrow

For more information, see the github repository.