Project Publications

Books

  • Ahnert, Ruth, et al. Collaborative Historical Research in the Age of Big Data. Cambridge University Press, forthcoming.

Articles, book chapters and conference proceedings

  • Ardanuy, Mariona Coll, et al. “Resolving Places, Past and Present: Toponym Resolution in Historical British Newspapers Using Multiple Resources.” Proceedings of the 13th Workshop on Geographic Information Retrieval, Association for Computing Machinery, 2019, pp. 1–6, https://doi.org/10.1145/3371140.3371143.
  • Arenas, D., et al. “Design Choices for Productive, Secure, Data-Intensive Research at Scale in the Cloud.” ArXiv:1908.08737 [Cs], Sept. 2019, http://arxiv.org/abs/1908.08737.
  • Arenas, Diego, et al. Design Choices for Productive, Secure, Data-Intensive Research at Scale in the Cloud. arXiv:1908.08737, arXiv, 15 Sept. 2019, http://arxiv.org/abs/1908.08737.
  • Beelen, Kaspar, Jon Lawrence, Daniel C. S. Wilson, and David Beavan. “Bias and Representativeness in Digitized Newspaper Collections: Introducing the Environmental Scan.” Digital Scholarship in the Humanities, July 2022, p. fqac037, https://doi.org/10.1093/llc/fqac037.
  • Beelen, Kaspar, Ruth Ahnert, David Beavan, Mariona Coll Ardanuy, Kasra Hosseini, et al. Contextualizing Victorian Newspapers. https://dh2020.adho.org/wp-content/uploads/2020/07/621_ContextualizingVictorianNewspapers.html.
  • Beelen, Kaspar, Ruth Ahnert, David Beavan, Mariona Coll Ardanuy, Emma Griffin, et al. Living with Machines: Exploring Bias in the British Newspaper Archive.
  • Beelen, Kaspar, Federico Nanni, Mariona Coll Ardanuy, Kasra Hosseini, et al. “When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation.” Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, 2021, pp. 2751–61, https://doi.org/10.18653/v1/2021.findings-acl.243.
  • Boyd Davis, Stephen, et al. “Can I Believe What I See? Data Visualization and Trust in the Humanities.” Interdisciplinary Science Reviews, vol. 46, no. 4, Oct. 2021, pp. 522–46, https://doi.org/10.1080/03080188.2021.1872874.
  • Coll Ardanuy, Mariona, Kasra Hosseini, et al. A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching.
  • Coll Ardanuy, Mariona, Federico Nanni, et al. “Living Machines: A Study of Atypical Animacy.” Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, 2020, pp. 4534–45, https://doi.org/10.18653/v1/2020.coling-main.400.
  • Coll Ardanuy, Mariona, Kaspar Beelen, et al. “Station to Station: Linking and Enriching Historical British Railway Data.” Proceedings of the Conference on Computational Humanities Research 2021, Proceedings of the Conference on Computational Humanities Research 2021.
  • CS, Ryan, Yann ,. Coll Ardanuy, Mariona ,. van Strien, Daniel ,. Hosseini, Kasra ,. Beelen, Kaspar ,. Hetherington, James ,. McDonough, Katherine ,. McGillivray, Barbara ,. Ridge, Mia ,. Vane, Olivia ,. Wilson, Daniel. Using Smart Annotations to Map the Geography of Newspapers. https://dh2020.adho.org/wp-content/uploads/2020/07/532_Usingsmartannotationstomapthegeographyofnewspapers.html. Accessed 13 July 2022.
  • Darby, Andrew, et al. AI Training Resources for GLAM: A Snapshot. arXiv, 10 May 2022, http://arxiv.org/abs/2205.04738.
  • Data Study Group Team. Data Study Group Final Report: The National Archives, UK. Zenodo, 18 June 2021, https://doi.org/10.5281/ZENODO.4981184.
  • —. Data Study Group Final Report: The National Archives, UK. Zenodo, 18 June 2021, https://doi.org/10.5281/ZENODO.4981184.
  • —. Data Study Group Final Report: WWF. Zenodo, 5 June 2020, https://doi.org/10.5281/ZENODO.3878457.
  • De Toni, Francesco, et al. Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0. arXiv:2204.05211, arXiv, 11 Apr. 2022, http://arxiv.org/abs/2204.05211.
  • Filgueira, Rosa, et al. “Defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data.” 2019 15th International Conference on EScience (EScience), IEEE, 2019, pp. 235–42, https://doi.org/10.1109/eScience.2019.00033.
  • Filgueira Vicente, R., et al. “Defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data.” 2019 IEEE 15th International Conference on E-Science (e-Science), 2019, p. 8, https://escience2019.sdsc.edu/.
  • Hosseini, Kasra, Federico Nanni, et al. “DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching.” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, 2020, pp. 62–69, https://doi.org/10.18653/v1/2020.emnlp-demos.9.
  • Hosseini, Kasra, Daniel C. S. Wilson, et al. MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale. arXiv:2111.15592, arXiv, 30 Nov. 2021, http://arxiv.org/abs/2111.15592.
  • Hosseini, Kasra, Katherine McDonough, et al. “Maps of a Nation? The Digitized Ordnance Survey for New Historical Research.” Journal of Victorian Culture, vol. 26, no. 2, May 2021, pp. 284–99, https://doi.org/10.1093/jvcult/vcab009.
  • Hosseini, Kasra, Kaspar Beelen, et al. Neural Language Models for Nineteenth-Century English. 24 May 2021, http://arxiv.org/abs/2105.11321.
  • McGillivray, Barbara, et al. The Challenges and Prospects of the Intersection of Humanities and Data Science: A White Paper from The Alan Turing Institute. Aug. 2020, p. 638792 Bytes, https://doi.org/10.6084/M9.FIGSHARE.12732164.
  • McMillan-Major, Angelina, et al. Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources. arXiv:2201.10066, arXiv, 24 Jan. 2022, http://arxiv.org/abs/2201.10066.
  • Ridge, Mia, et al. Historic Machines from “prams” to “Parliament”: New Avenues for Collaborative Linguistic Research. May 2022, https://doi.org/10.5281/zenodo.6578021.
  • Strien, Daniel van, et al. “An Introduction to AI for GLAM.” Proceedings of the Second Teaching Machine Learning and Artificial Intelligence Workshop, PMLR, 2022, pp. 20–24, https://proceedings.mlr.press/v170/strien22a.html.
  • van Strien, D., et al. Assessing the Impact of OCR Quality on Downstream NLP Tasks. 2020, https://doi.org/10.17863/CAM.52068.
  • Wilson, Daniel. “How We Got Here.” Time Travelers: Victorian Encounters with Time and History, edited by Adelene Buckland et al., University of Chicago Press, https://press.uchicago.edu/ucp/books/book/chicago/T/bo50699947.html. Accessed 13 July 2022.

Datasets, softwares and papers underlying code

  • British Library. 19th Century Books – Metadata with Additional Crowdsourced Annotations. British Library, 2021, https://doi.org/10.23636/BKHQ-0312.
  • British Library Labs, and British Library. Digitised Books. c. 1510 – c. 1900. JSONL (OCR Derived Text + Metadata). British Library, 2021, https://doi.org/10.23636/R7W6-ZY15.
  • Code and Supplementary Material for the Paper ‘A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching.’ https://github.com/Living-with-machines/LwM_SIGSPATIAL2020_ToponymMatching.
  • Code for Atypical Animacy. https://github.com/Living-with-machines/AtypicalAnimacy/.
  • Code for Station to Station: Linking and Enriching Historical British Railway Data & StopGB. https://github.com/Living-with-machines/station-to-station.
  • Code for Targeted Sense Disambiguation. https://github.com/Living-with-machines/TargetedSenseDisambiguation.
  • Code for the Paper “Assessing the Impact of OCR Quality on Downstream NLP Tasks.” https://github.com/Living-with-machines/lwm_ARTIDIGH_2020_OCR_impact_downstream_NLP_tasks.
  • Code for the Paper “Resolving Places, Past and Present: Toponym Resolution in Historical British Newspapers Using Multiple Resources.” https://github.com/Living-with-machines/lwm_GIR19_resolving_places.
  • Coll Ardanuy, Mariona, David Beavan, et al. “A Dataset for Toponym Resolution in Nineteenth-Century English Newspapers.” Journal of Open Humanities Data, vol. 8, Jan. 2022, p. 3, https://doi.org/10.5334/johd.56.
  • —. Dataset for Toponym Resolution in Nineteenth-Century English Newspapers.
  • Coll Ardanuy, Mariona, Kaspar Beelen, et al. StopsGB: Structured Timeline of Passenger Stations in Great Britain.
  • DeezyMatch. https://github.com/Living-with-machines/DeezyMatch.
  • Gh_orgstats. https://github.com/Living-with-machines/github_stats_report.
  • Hetherington, J., et al. Defoe – Analysis of Historical Books and Newspapers Data. 2020, https://github.com/alan-turing-institute/defoe.
  • Neural Language Models for Historical Research. https://github.com/Living-with-machines/histLM.
  • Nnanno. https://github.com/Living-with-machines/nnanno.
  • Press Directories // British Library. https://bl.iro.bl.uk/collections/580fe312-0e41-41fc-bb38-40122798cec1. Accessed 19 July 2022.
  • Ridge, Mia, et al. Living with Machines Alpha and Beta Zooniverse “accident” Task Data. British Library, 3 Sept. 2020, https://doi.org/10.23636/1197.
  • Tolfo, Giorgia, et al. Living Machines Atypical Animacy Dataset. British Library, 30 Oct. 2020, https://doi.org/10.23636/1215.
  • van Strien, Daniel. 19th Century United States Newspaper Advert Images with “illustrated” or “Non Illustrated” Labels. Zenodo, 14 Oct. 2021, https://doi.org/10.5281/ZENODO.5838410.
  • —. 19th Century United States Newspaper Images Predicted as Photographs with Labels for “Human”, “Animal”, “Human-Structure” and “Landscape.” Zenodo, 11 Jan. 2022, https://doi.org/10.5281/ZENODO.4487141.
  • —. British Library Books Genre Detection Model. Zenodo, 30 Sept. 2021, https://doi.org/10.5281/ZENODO.5245175.
  • Van Strien, Daniel. Flyswot-Models: 20210922. 20210922, Zenodo, 2021, https://doi.org/10.5281/ZENODO.5521125.
  • van Strien, Daniel. Images from Newspaper Navigator Predicted as Maps, with Human Corrected Labels. Zenodo, 30 Oct. 2020, https://doi.org/10.5281/ZENODO.4156510.

Podcast

  • Vilcins, S. “Free Thinking: Archiving, Curating and Digging for Data.” Free Thinking: Archiving, Curating and Digging for Data, BBC Radio 3, 12 May 2021, https://www.bbc.co.uk/programmes/m000vydf.

Our Partners

Our Funders