Living with Machines is a large research project with many outputs across a range of disciplines. This page lists some of our achievements and outputs to date.

To simplify navigation, first you’ll find the major thematic clusters, then the main outputs sorted per type, and at the end the list of publications.

 

Thematic clusters

 

Using 19th-century Newspapers 

The project has digitised over half a million pages of historical newspapers and press directories (which provide information about readership, places and dates of publication). These, together with the newspapers already available digitally on the British Newspapers Archive have made it possible to study the newspaper landscape of the 19th century. These papers are also used for historical research and as source documents for crowdsourced activities. And the work has also offered the opportunity to develop new software tools to help the digitisation process, analyse the quality of the Optical Character Recognition, develop copyright-aware data access infrastructure and much more.

Copyright for the newspapers has been assessed during the life of the project to ensure that any out of copyright content can be re-used by researchers including members of the public and can be found on the British Newspaper Archive website by using this link:https://blog.britishnewspaperarchive.co.uk/2021/08/09/introducing-free-to-view-pages-on-the-british-newspaper-archive/ 

 

What is a Machine

What do we talk about when we talk about machines? Are these simple tools, mechanical instruments used in factories, or means of transport? Or are they metaphors that refer to abstract entities like states and justice, or physical ones like bodies or theatre props? Mixing crowdsourced annotation activities and innovative computational linguistic analysis, we have tried to answer this question, writing academic papers testing new working methodologies and producing useful datasets.

 

Living Machines: the language of animacy

How did people in the 19th century imagine the relationship between humans and machines? In what ways have machines appeared to be alive? Using a number of large text corpora we created language models to help explore this question, which has long preoccupied historians. This research has produced papers in linguistics and soon in history; as well as freely available datasets and models.

 

Using 19th-century Maps

Industrialisation is visually evident on tens of thousands of Ordnance Survey maps printed during the 19th century. We have created new, computational tools (MapReader) to navigate and analyse digitised map collections from the National Library of Scotland and the British Library. Working with maps in this way allows us to turn visual content into machine-readable data. This means we can link information from maps to other sources, like place names in newspapers or streets in the census. 

 

Using 19th-century Census Microdata

Historical census information is a fantastic tool for researchers interested in how people were employed in the past. 

We have also used census data to investigate how new technologies and machines changed the type of work that people were doing. To do this, we developed algorithms to follow people through multiple censuses, to understand how their employment (and where they lived) changed over their lifetime.

 

Beyond the Tracks Research

Building on the methodological work to analyse Ordnance Survey maps and census data independently, we are now also bringing these massive Victorian datasets together. Using a further dataset we created that locates all passenger railway stations in Britain, we explore who lived close to or distant from railway infrastructure and stations. By combining these materials we are able to generate national coverage of all cities, towns, and villages across England and Wales. This research highlights how linking historical information by place sheds light on social change.

 

Making Spatial Humanities Data

The British Library hosts one of the largest collections of historical maps. Living with Machines has contributed to the digitisation and georeferencing of over 15,000 new maps with the collaboration of the National Library of Scotland. 

 

New collaborative methodologies

Bringing together experts and scholars from different fields, from historians to computational linguists, from copyright managers to software developers we have also experimented with new ways of working collaboratively using our expertise to shape new avenues for future research in the ever expanding world of digital humanities. This experience has been described in the forthcoming book Collaborative Historical Research in the Age of Big Data (CUP). 

 

Sharing knowledge about AI with the GLAM sector

We have shared some of the methodologies used within the project organising tutorials and workshops and have contributed on making AI more accessibile in the GLAM sector. One of the most important tutorials being the one released on The Programming Historian.

Datasets

Digitised collections

Press Directories

Press Directories list newspapers published each specific year, with locations, political leanings, cost, distribution and other information. Find out more about the Victorian newspapers landscape or view automatically transcribed text on the British Library open repository.

 

Historical Newspapers

We digitised over half a million pages of historical newspapers and made them ‘free to view’ on the British Newspaper Archive website. They cover a period ranging from c. 1780 to c. 1920 and cover most areas of England (excluding London).

 

Georeferenced OS Maps

Around 15,000 historical OS Maps have been digitised and fully georeferenced. We’re working on making these publicly available.

 

Pandaemonium: the coming of the machine as seen by contemporary observers, 1660-1886

From the personal papers of Humphrey Jennings, we digitised the original manuscript material for the posthumously published book (André Deutsch, 1985) a compendium of primary sources and testimonies about industrialisation in Britain. This archive contains three times more material than previously published. We are working to make these texts publicly available.

 

Derived datasets (new information or datasets that are created from existing data)

 

StopGB (Structured Timeline of Passenger Stations in Great Britain)

Taking Michael Quick’s book Railway Passenger Stations in Great Britain: a Chronology as a starting source, we transformed the listing of over 12,000 stations into a structured data format. Each station is given attributes such as operating companies and opening and closing dates. Where possible, they’re georeferenced and linked to Wikidata. This structured, linked, and georeferenced dataset could be a key resource for historical, digital library and semantic web communities, and others researching the impact of the railway in Great Britain. https://bl.iro.bl.uk/concern/datasets/0abea1b1-2a43-4422-ba84-39b354c8bb09.

Related code is below.

 

Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

455 annotated articles from newspapers based in four different locations in England (Manchester, Ashton-under-Lyne, Poole and Dorchester), published between 1780 and 1870. Place names within articles were manually annotated and linked (where possible) to Wikipedia. The dataset was produce to aid toponym resolution in English-language digitised historical newspapers. The dataset is especially of interest to researchers working on improving semantic access to historical newspaper content. https://bl.iro.bl.uk/concern/datasets/f3686eb9-4227-45cb-9acb-0453d35e6a03

 

Living Machines Atypical Animacy dataset

The Atypical Animacy detection dataset is based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitised by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness of the machine in the sentence. This dataset has been created as part of the following paper: Ardanuy, M. C., F. Nanni, K. Beelen, Kasra Hosseini, Ruth Ahnert, J. Lawrence, Katherine McDonough, Giorgia Tolfo, D.C.S. Wilson and B. McGillivray. “Living Machines: A study of atypical animacy.” In Proceedings of the 28th International Conference on Computational Linguistics (COLING2020). https://bl.iro.bl.uk/work/323177af-6081-4e93-8aaf-7932ca4a390a

Related code is below.

 

Living with Machines alpha and beta Zooniverse ‘accident’ task data

Annotations on some 19th century newspaper articles that possibly mentioned accidents involving machinery, via crowdsourcing tasks on the Zooniverse platform. Some personal, organisational and place names mentioned were transcribed with a brief summary of relevant accidents. https://doi.org/10.23636/1197

 

Neural Language Models for Historical Research

Four types of pre-trained neural language models trained on a large historical dataset of books in English, published between 1760-1900 and composed of ~5.1 billion tokens. The language model architectures include word type embeddings (word2vec and fastText) and contextualised models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the type embeddings (i.e., word2vec and fastText), and four instances considering different time slices for BERT. http://doi.org/10.5281/zenodo.4782245

Related code is below.

 

Geocoded census data (work-in-progress)

I-CeM census data enhanced with links to OS Open Roads and GB1900.

 

Geocoded census data enhanced with StopsGB distance and MapReader metrics (work-in-progress)

Geo-coded census data enhanced with distances to nearest StopsGB station and railspace score derived from MapReader identified patches

Data visualisations and visualisation tools

Macromap

‘Macromap’ is an interactive ‘small multiples’ visualisation for historical map collections. It is designed to help researchers understand what map sheets the British Ordnance Survey (OS) made, when and where. The interface could alternatively be used with other historical maps and map series metadata, and, more generally, to understand the geographic and temporal shape of large-scale polygon datasets. Find out more at our blog post Macromap: Interactive Maps in Time and Observable Macromap Notebook.

Press Tracer

A data visualisation to help trace the lineage of historical newspaper titles in the British Library. Find out more at Press Tracer: Visualise Newspaper Lineage and dig into the code behind Press Tracer at https://observablehq.com/@oliviafvane/press-tracer.

Branching sparklines line graphs

A notebook that demonstrates the branching design used in Press Picker: an interactive visualisation tool for newspaper metadata at the British Library.

https://observablehq.com/@oliviafvane/branching-sparklines-line-graphs

Exhibition and Public engagement

Exhibition

Our free exhibition, Living with Machines: human stories from the industrial age, is open at Leeds City Museum from July 29, 2022 to January 8, 2023.

 

Living with Machines: human stories in the industrial age, British Library with Leeds City Museum, 29 July 2022 - 08 January 2023

 

Crowdsourcing – digital volunteering to create research data

Crowdsourcing around digitised collections was built into our public engagement plans from the start of the project. Our tasks have been designed to expose some of the processes of data science and digital history to participants, while also collecting data to the quality required for the computational linguistic processes they support.

Our projects are built on Zooniverse and available at https://www.zooniverse.org/projects/bldigital/living-with-machines/.

Prizes and honours

Inter Circle U. Prize

Barbara McGillivray was awarded the Inter Circle U. Prize for inter- and transdisciplinary research for “The Language of Mechanisation” project, joined by Jon Lawrence, Mia Ridge, Kalle Westerling, Giorgia Tolfo and Nilo Pedrazzini. The prize is co-funded by the European Union’s Horizon 2020 Research and Innovation Programme.

Software packages, research tools and code supporting published papers

Software packages

Alto2Text

A plain text and metadata extraction tool that processes XML in METS/ALTO format into plain text and metadata fields. Will be available soon, in XSLT and Python versions.

 

DeezyMatch – A Flexible Deep Neural Network Approach to Fuzzy String Matching.

DeezyMatch, a new deep learning approach to fuzzy string matching and candidate ranking, is free, open-source  software. It addresses advanced string matching and candidate ranking challenges in a comprehensive and integrated manner. https://github.com/Living-with-machines/DeezyMatch

 

Zoonyper

Zoonyper is a work-in-progress Python library that facilitates interpretation and wrangling for Zooniverse files in Jupyter and Python more generally.

 

Map Reader

MapReader is a free open-source software library written in Python for analysing large map collections. MapReader allows users with little or no computer vision expertise to i) retrieve maps via web-servers; ii) preprocess and divide them into patches; iii) annotate patches; iv) train, fine-tune, and evaluate deep neural network models; and v) create structured data about map content. https://living-with-machines.github.io/MapReader/

 

PressPicker

PressPicker is a software package created to help the selection of newspapers to digitise in Living with Machines. Thanks to this Jupyter Notebook-based tool we could cater for research-led and practical factors, e.g. selecting titles by format, viewing the holdings of a specific title at a glance, and predicting retrieval and scanning timeframes and costs. Find out more about how PressPicker was created and how it works.

 

nnanno

Newspapers are a visual medium, communicating through text and visual information such as photographs, comics, maps and other images. Research on images within newspapers is advancing as developments in computer vision are powered by deep learning-based approaches. A recent project, Newspaper Navigator, from Benjamin Lee and the Library of Congress Labs extracted visual content from Chronicling America.

To help make it easier to work with this dataset, we created a tool called nnanno. nnanno helps with sampling from the newspaper navigator dataset, downloading images, annotating and experimentally applying computer vision models to the newspaper navigator data. https://github.com/Living-with-machines/nnanno

 

Flyswot

flyswot is a Command Line Tool which allows you to run Hugging Face Transformers image classification models available via the Hugging Face Hub 🤗 against a directory of images. It returns a CSV report containing the models predictions. It is being used by the Heritage Made Digital team at the British Library to run computer vision models which are used to predict whether an image of a manuscript page contains a ‘flysheet’ or not. https://github.com/davanstrien/flyswot

 

Hugit

hugit is a command line tool for loading ImageFolder style datasets into a HuggingFace 🤗 dataset and pushing to the HuggingFace 🤗 hub.

The primary goal of hugit is to help quickly get a local dataset into a format that can be used for training computer vision models. hugit was developed to support the workflow for flyswot where we wanted a quicker iteration between creating new training data, training a model, and using the new model inside flyswot.

 

Census Geo-coder (work-in-progress)

A Python package that can link historic Great British Census data to existing GIS datasets of streets using geo-blocking and fuzzy string matching. Currently we use OS Open Roads and GB1900 but it will accept any GIS of roads. https://github.com/Living-with-machines/historic-census-gb-geocoder

 

Research Tools

Neural Language Models for Historical Research

Four types of pre-trained neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include word type embeddings (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the type embeddings (i.e., word2vec and fastText), and four instances considering different time slices for BERT. This repository describes the models and has code that shows how to use the models: https://github.com/Living-with-machines/histLM

 

Living with Machines GitHub Stats report

GitHub provides statistics for repositories which include views and clones traffic. However, by default, this information is only shown for two weeks. This repository uses GitHub Actions and gh_orgstats to grab data every week and update a CSV file for public repositories under the Living with Machines GitHub Organization. You can find this documented in more detail below. This repository also uses Jupyter notebooks and nbconvert to update a report based on these GitHub stats automatically: https://github.com/Living-with-machines/github_stats_report

 

Gh_orgstats

Gh_orgstats https://github.com/Living-with-machines/gh_orgstats is a small Python wrapper for retrieving GitHub stats for a particular organization.

 

Zooniverse images uploader

Images from the digitised newspaper articles were selected and uploaded to Zooniverse for annotation. Defoe, a spark-based toolbox for analysing digital historical textual data, was used to select the images for annotation on Zooniverse. It can also be used in tasks such as sentence/document classification. https://github.com/alan-turing-institute/Living-with-Machines-code/tree/master/communities-mro/zooniverse/zooniverse_upload

 

Word2vec model explorers and lexicon expansion

Notebook for exploring word2vec models in order to build a lexicon that can trace certain topics in a collection. The Lexicon Expansion Interface allows users to navigate a vector space and expand a list of seed words into a Lexicon. https://github.com/alan-turing-institute/Living-with-Machines-code/tree/lexicon-expansion/language-lab-mro/lexicon_expansion/interactive_expansion/expansion_tools

 

Pipeline for processing the Newspaper Press Directories.

The series of notebooks includes a pipeline for processing the OCR (derived from the scans of Mitchell’s Press Directories). The stages include: annotation, preprocessing, automatic tagging and database ingest. https://github.com/alan-turing-institute/Living-with-Machines-code/tree/master/sources-lab-mro/ndp_pipeline

 

Bl-books-genre detection model

This fine-tuned distilbert-base-cased model is trained to predict whether a book from the British Library’s Digitised printed books (18th-19th century) book collection is fiction or non-fiction based on the title of the book.

 

Flyswot computer vision model

A fine-tuned computer vision model (convnext-tiny-224) that has been trained to classify different types of digitised manuscript pages in order to detect digitised manuscripts which have incorrect metadata associated with them.

 

Code supporting published papers

Code for Targeted Sense Disambiguation

Underlying code and materials for the paper ‘When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation’ (https://aclanthology.org/2021.findings-acl.243.pdf). Time-sensitive Targeted Sense Disambiguation (TSD) aims to detect instances of a sense or set of related senses in historical and time-stamped texts. It aims to 1) scrutinise the effect of applying historical language models on the performance of several TSD methods and 2) assess different disambiguation methods that take into account the year in which a text was produced.  https://github.com/Living-with-machines/TargetedSenseDisambiguation

 

Code for Atypical Animacy

Underlying code and materials for the paper ‘Living Machines: A Study of Atypical Animacy’ (COLING2020) https://github.com/Living-with-machines/AtypicalAnimacy/

 

Code for Station to Station: Linking and Enriching Historical British Railway Data & StopGB

Underlying code and materials for the paper ‘Station to Station: Linking and Enriching Historical British Railway Data’.

https://github.com/Living-with-machines/station-to-station

 

Code and supplementary material for the paper ‘A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching’

Underlying code and materials for the paper ‘A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching’, accepted to SIGSPATIAL2020 as a poster paper. This work looks for potential locations for each toponym (place name) identified in text. It addresses the issues of a high degree of variation in toponyms (due to regional spelling differences, transliterations strategies, cross-language and diachronic variation) and variations due to OCR errors.

Supplementary material: https://zenodo.org/record/4034818

Code: https://github.com/Living-with-machines/LwM_SIGSPATIAL2020_ToponymMatching

 

Code for the paper ‘Assessing the Impact of OCR Quality on Downstream NLP Tasks’

Underlying code for the paper ‘Assessing the Impact of OCR Quality on Downstream NLP Tasks’ The code runs experiments reported in the paper and generates the figures used in the paper. https://github.com/alan-turing-institute/lwm_ARTIDIGH_2020_OCR_impact_downstream_NLP_tasks

 

Code for the paper ‘Resolving Places, Past and Present: Toponym Resolution in Historical British Newspapers Using Multiple Resources’.

Resolving Places is one of the first outputs of Living with Machines, a collaborative digital history project at The Alan Turing Institute and the British Library. This research is part of our work to build a nineteenth-century gazetteer that combines place names derived from historical sources (GB1900) with online resources (Wikipedia and Geonames). The Living with Machines gazetteer follows best practices in combining multiple existing resources, and is novel in accounting for places that have different scales (e.g. streets, buildings, cities, counties). https://github.com/alan-turing-institute/lwm_GIR19_resolving_places/

Workshops and tutorials

Zooniverse: how to download and analyse your task annotations

This workshop was created for British Library staff to introduce the widely used Zooniverse platform, the services it offers and share new developments in using the Library’s IIIF items on Zooniverse. It was also aimed at teaching how to process annotations to obtain clean and readable spreadsheets for use in personal and library projects.

 

Genre Classification

This Jupyter book was created to document work to develop a machine learning model and associated datasets with the goal of classifying the genre of books from the British Library. The tutorial is available here: https://github.com/Living-with-machines/genre-classification and there’s background for the project here: https://livingwithmachines.ac.uk/can-we-use-machine-learning-to-classify-whether-a-book-is-fiction-or-non-fiction-from-its-title/

 

Intro to AI for GLAM

This lesson carpentry was developed with the aim of empowering GLAM (Galleries, Libraries, Archives, and Museums) staff by providing the foundation to support, participate in and begin to undertake in their own right, machine learning-based research and projects with heritage collections. https://carpentries-incubator.github.io/machine-learning-librarians-archivists/

 

How to use jupyter notebooks

A workshop given as part of the Digital Scholarship “hack and yack” cycle whose aim is to explain what a Jupyter notebook is and why they are used, how notebooks created by other people can be run and introduce the learner to some weird/wonderful stuff that can be done with notebooks. https://github.com/Living-with-machines/Jupyter-Notebooks-The-Weird-and-Wonderful

 

Working with maps at scale using Computer Vision and Jupyter notebooks

A workshop delivered as part of Digital Humanities and Digital Archives workshop at the National Library of Estonia to show how Jupyter notebooks can be particularly useful for working with digitised collections at scale, to give a brief sense of what is possible using computer vision with image collections and give some ideas for how existing GLAM infrastructure (in this case IIIF) can support new machine learning-based approaches.

https://github.com/Living-with-machines/maps-at-scale-using-computer-vision-and-jupyter-notebooks

 

Computer-Vision-for-the-Humanities-workshop

This workshop aims to provide an introduction to computer vision for humanities uses. In particular this workshop focuses on providing a high level overview of machine learning based approaches to computer vision focusing on supervised learning. The workshop includes discussion on working with historical data. The materials are based on a two-part Programming Historian lesson.

 

Programming Historian: Computer vision for the humanities: an introduction to deep learning for image classification

A two-part programming historian tutorial which aims to introduce humanities researchers, or those working with humanities data, to deep-learning based computer vision methods. Work-in-progress on part one, part two.

 

image-search: Materials for a workshop on image search for heritage data

Materials for a workshop on image search with a focus on heritage data. The workshop is based on a blog post Image search with HuggingFace 🤗datasets but goes into a little bit more detail: https://github.com/Living-with-machines/image-search

Publications

Books

  • Ahnert, Ruth, et al. Collaborative Historical Research in the Age of Big Data. Cambridge University Press, forthcoming.

Articles, book chapters and conference proceedings

  • Ardanuy, Mariona Coll, et al. “Resolving Places, Past and Present: Toponym Resolution in Historical British Newspapers Using Multiple Resources.” Proceedings of the 13th Workshop on Geographic Information Retrieval, Association for Computing Machinery, 2019, pp. 1–6, https://doi.org/10.1145/3371140.3371143.
  • Arenas, D., et al. “Design Choices for Productive, Secure, Data-Intensive Research at Scale in the Cloud.” ArXiv:1908.08737 [Cs], Sept. 2019, http://arxiv.org/abs/1908.08737.
  • Arenas, Diego, et al. Design Choices for Productive, Secure, Data-Intensive Research at Scale in the Cloud. arXiv:1908.08737, arXiv, 15 Sept. 2019, http://arxiv.org/abs/1908.08737.
  • Beelen, Kaspar, Jon Lawrence, Daniel C. S. Wilson, and David Beavan. “Bias and Representativeness in Digitized Newspaper Collections: Introducing the Environmental Scan.” Digital Scholarship in the Humanities, July 2022, p. fqac037, https://doi.org/10.1093/llc/fqac037.
  • Beelen, Kaspar, Ruth Ahnert, David Beavan, Mariona Coll Ardanuy, Kasra Hosseini, et al. Contextualizing Victorian Newspapers. https://dh2020.adho.org/wp-content/uploads/2020/07/621_ContextualizingVictorianNewspapers.html.
  • Beelen, Kaspar, Ruth Ahnert, David Beavan, Mariona Coll Ardanuy, Emma Griffin, et al. Living with Machines: Exploring Bias in the British Newspaper Archive.
  • Beelen, Kaspar, Federico Nanni, Mariona Coll Ardanuy, Kasra Hosseini, et al. “When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation.” Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, 2021, pp. 2751–61, https://doi.org/10.18653/v1/2021.findings-acl.243.
  • Boyd Davis, Stephen, et al. “Can I Believe What I See? Data Visualization and Trust in the Humanities.” Interdisciplinary Science Reviews, vol. 46, no. 4, Oct. 2021, pp. 522–46, https://doi.org/10.1080/03080188.2021.1872874.
  • Coll Ardanuy, Mariona, Kasra Hosseini, et al. A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching.
  • Coll Ardanuy, Mariona, Federico Nanni, et al. “Living Machines: A Study of Atypical Animacy.” Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, 2020, pp. 4534–45, https://doi.org/10.18653/v1/2020.coling-main.400.
  • Coll Ardanuy, Mariona, Kaspar Beelen, et al. “Station to Station: Linking and Enriching Historical British Railway Data.” Proceedings of the Conference on Computational Humanities Research 2021, Proceedings of the Conference on Computational Humanities Research 2021.
  • CS, Ryan, Yann ,. Coll Ardanuy, Mariona ,. van Strien, Daniel ,. Hosseini, Kasra ,. Beelen, Kaspar ,. Hetherington, James ,. McDonough, Katherine ,. McGillivray, Barbara ,. Ridge, Mia ,. Vane, Olivia ,. Wilson, Daniel. Using Smart Annotations to Map the Geography of Newspapers. https://dh2020.adho.org/wp-content/uploads/2020/07/532_Usingsmartannotationstomapthegeographyofnewspapers.html. Accessed 13 July 2022.
  • Darby, Andrew, et al. AI Training Resources for GLAM: A Snapshot. arXiv, 10 May 2022, http://arxiv.org/abs/2205.04738.
  • Data Study Group Team. Data Study Group Final Report: The National Archives, UK. Zenodo, 18 June 2021, https://doi.org/10.5281/ZENODO.4981184.
  • —. Data Study Group Final Report: The National Archives, UK. Zenodo, 18 June 2021, https://doi.org/10.5281/ZENODO.4981184.
  • —. Data Study Group Final Report: WWF. Zenodo, 5 June 2020, https://doi.org/10.5281/ZENODO.3878457.
  • De Toni, Francesco, et al. Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0. arXiv:2204.05211, arXiv, 11 Apr. 2022, http://arxiv.org/abs/2204.05211.
  • Filgueira, Rosa, et al. “Defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data.” 2019 15th International Conference on EScience (EScience), IEEE, 2019, pp. 235–42, https://doi.org/10.1109/eScience.2019.00033.
  • Filgueira Vicente, R., et al. “Defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data.” 2019 IEEE 15th International Conference on E-Science (e-Science), 2019, p. 8, https://escience2019.sdsc.edu/.
  • Hosseini, Kasra, Federico Nanni, et al. “DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching.” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, 2020, pp. 62–69, https://doi.org/10.18653/v1/2020.emnlp-demos.9.
  • Hosseini, Kasra, Daniel C. S. Wilson, et al. MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale. arXiv:2111.15592, arXiv, 30 Nov. 2021, http://arxiv.org/abs/2111.15592.
  • Hosseini, Kasra, Katherine McDonough, et al. “Maps of a Nation? The Digitized Ordnance Survey for New Historical Research.” Journal of Victorian Culture, vol. 26, no. 2, May 2021, pp. 284–99, https://doi.org/10.1093/jvcult/vcab009.
  • Hosseini, Kasra, Kaspar Beelen, et al. Neural Language Models for Nineteenth-Century English. 24 May 2021, http://arxiv.org/abs/2105.11321.
  • McGillivray, Barbara, et al. The Challenges and Prospects of the Intersection of Humanities and Data Science: A White Paper from The Alan Turing Institute. Aug. 2020, p. 638792 Bytes, https://doi.org/10.6084/M9.FIGSHARE.12732164.
  • McMillan-Major, Angelina, et al. Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources. arXiv:2201.10066, arXiv, 24 Jan. 2022, http://arxiv.org/abs/2201.10066.
  • Ridge, Mia, et al. Historic Machines from “prams” to “Parliament”: New Avenues for Collaborative Linguistic Research. May 2022, https://doi.org/10.5281/zenodo.6578021.
  • Strien, Daniel van, et al. “An Introduction to AI for GLAM.” Proceedings of the Second Teaching Machine Learning and Artificial Intelligence Workshop, PMLR, 2022, pp. 20–24, https://proceedings.mlr.press/v170/strien22a.html.
  • van Strien, D., et al. Assessing the Impact of OCR Quality on Downstream NLP Tasks. 2020, https://doi.org/10.17863/CAM.52068.
  • Wilson, Daniel. “How We Got Here.” Time Travelers: Victorian Encounters with Time and History, edited by Adelene Buckland et al., University of Chicago Press, https://press.uchicago.edu/ucp/books/book/chicago/T/bo50699947.html. Accessed 13 July 2022.

Datasets, softwares and papers underlying code

  • British Library. 19th Century Books – Metadata with Additional Crowdsourced Annotations. British Library, 2021, https://doi.org/10.23636/BKHQ-0312.
  • British Library Labs, and British Library. Digitised Books. c. 1510 – c. 1900. JSONL (OCR Derived Text + Metadata). British Library, 2021, https://doi.org/10.23636/R7W6-ZY15.
  • Code and Supplementary Material for the Paper ‘A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching.’ https://github.com/Living-with-machines/LwM_SIGSPATIAL2020_ToponymMatching.
  • Code for Atypical Animacy. https://github.com/Living-with-machines/AtypicalAnimacy/.
  • Code for Station to Station: Linking and Enriching Historical British Railway Data & StopGB. https://github.com/Living-with-machines/station-to-station.
  • Code for Targeted Sense Disambiguation. https://github.com/Living-with-machines/TargetedSenseDisambiguation.
  • Code for the Paper “Assessing the Impact of OCR Quality on Downstream NLP Tasks.” https://github.com/Living-with-machines/lwm_ARTIDIGH_2020_OCR_impact_downstream_NLP_tasks.
  • Code for the Paper “Resolving Places, Past and Present: Toponym Resolution in Historical British Newspapers Using Multiple Resources.” https://github.com/Living-with-machines/lwm_GIR19_resolving_places.
  • Coll Ardanuy, Mariona, David Beavan, et al. “A Dataset for Toponym Resolution in Nineteenth-Century English Newspapers.” Journal of Open Humanities Data, vol. 8, Jan. 2022, p. 3, https://doi.org/10.5334/johd.56.
  • —. Dataset for Toponym Resolution in Nineteenth-Century English Newspapers.
  • Coll Ardanuy, Mariona, Kaspar Beelen, et al. StopsGB: Structured Timeline of Passenger Stations in Great Britain.
  • DeezyMatch. https://github.com/Living-with-machines/DeezyMatch.
  • Gh_orgstats. https://github.com/Living-with-machines/github_stats_report.
  • Hetherington, J., et al. Defoe – Analysis of Historical Books and Newspapers Data. 2020, https://github.com/alan-turing-institute/defoe.
  • Neural Language Models for Historical Research. https://github.com/Living-with-machines/histLM.
  • Nnanno. https://github.com/Living-with-machines/nnanno.
  • Press Directories // British Library. https://bl.iro.bl.uk/collections/580fe312-0e41-41fc-bb38-40122798cec1. Accessed 19 July 2022.
  • Ridge, Mia, et al. Living with Machines Alpha and Beta Zooniverse “accident” Task Data. British Library, 3 Sept. 2020, https://doi.org/10.23636/1197.
  • Tolfo, Giorgia, et al. Living Machines Atypical Animacy Dataset. British Library, 30 Oct. 2020, https://doi.org/10.23636/1215.
  • van Strien, Daniel. 19th Century United States Newspaper Advert Images with “illustrated” or “Non Illustrated” Labels. Zenodo, 14 Oct. 2021, https://doi.org/10.5281/ZENODO.5838410.
  • —. 19th Century United States Newspaper Images Predicted as Photographs with Labels for “Human”, “Animal”, “Human-Structure” and “Landscape.” Zenodo, 11 Jan. 2022, https://doi.org/10.5281/ZENODO.4487141.
  • —. British Library Books Genre Detection Model. Zenodo, 30 Sept. 2021, https://doi.org/10.5281/ZENODO.5245175.
  • Van Strien, Daniel. Flyswot-Models: 20210922. 20210922, Zenodo, 2021, https://doi.org/10.5281/ZENODO.5521125.
  • van Strien, Daniel. Images from Newspaper Navigator Predicted as Maps, with Human Corrected Labels. Zenodo, 30 Oct. 2020, https://doi.org/10.5281/ZENODO.4156510.

Podcast episodes

  • Beavan, David, and Kasra Hosseini. 5 August 2022. ‘Living with Machines’. The Turing Podcast. https://turing.podbean.com/e/ttp-lwm/.
  • Sudbery, Clare, and Mia Ridge. 31 August 2021. ‘Crowdsourcing, with Dr Mia Ridge – Made Tech’. Making Tech Better. https://www.madetech.com/resources/podcasts/episode-14-mia-ridge-2/.
  • Vilcins, S. “Free Thinking: Archiving, Curating and Digging for Data.” Free Thinking: Archiving, Curating and Digging for Data, BBC Radio 3, 12 May 2021, https://www.bbc.co.uk/programmes/m000vydf.

Our Partners

Our Funders