Introducing the Sources Lab

Written by David BeavanJuly 3, 2019Comments: 0

On the face of it, what Sources Lab does sounds really straightforward: what do the historical sources say, and what new stories of the industrial revolution can we tell from them? Of course, it’s never quite as easy as it sounds.

We must fully understand the inherent biases in our sources (see ‘Transparency’), and how they can skew our understanding of history. For instance, newspapers in the past (like those today), were targeted at particular audiences, divided along lines of politics, place and socio-economic class. As such, those newspapers will contain differing viewpoints and voices. Creating a more representative account of the past requires seeing, then balancing these views in order to explore the plurality of histories, including minority voices too easily lost in the mix.

Living with Machines is operating at an enormous scale. This means we cannot humanly read all our sources (good luck with 20 million pages of newspapers, let alone every census record and each map etc.) and must rely on digitised and machine-readable sources. This brings additional challenges to the project. What has been digitised and what has not? Are the digital sources we have available a broadly representative sample of the whole? Or might the selection criteria have been more opportunistic and, therefore, skewed? We can, armed with this new knowledge, focus new digitisation efforts in a strategic fashion. Of course, others are asking similar questions.

The digital versions we use often contain errors introduced by the digitisation process itself, as in the case of typewritten text, where the image of the page has been read by automated Optical Character Recognition (OCR) software. Text in small typefaces on cheap paper from 150 years ago can be difficult to read for a human, but it is also hard for the computer. LwM will evaluate the extent of these inaccuracies across our corpus, and when we might be able to accept error and when we cannot, depending on the tasks.

Then there’s what happens when different sources are linked, joining people from census records to maps to biographies and news, but that will have to wait for another blog post.

Overall, we want scholars to be informed about the biases in what they search and see, so that they can more easily understand what their findings are based on, and what they may be missing in the process.

Latest posts from us

Imagine you’ve set up a shiny new crowdsourcing project. How do you let people who might potentially want to...

Read

A list of public domain newspaper titles available within the Living with Machines project; downloadable for re-use by...

Read

We’re delighted to share the news that our data paper has been published by the Journal of Open Humanities Data....

Read

Imagine you’ve set up a shiny new crowdsourcing project. How do you let people who might potentially want to...

Read

A list of public domain newspaper titles available within the Living with Machines project; downloadable for re-use by...

Read

We’re delighted to share the news that our data paper has been published by the Journal of Open Humanities Data....

Read

Introducing the Sources Lab

Latest posts from us

Outreach and marketing for crowdsourcing tasks

June 27, 2024

Imagine you’ve set up a shiny new crowdsourcing project. How do you let people who might potentially want to...

Public domain newspaper titles in Living with Machines

May 7, 2024

A list of public domain newspaper titles available within the Living with Machines project; downloadable for re-use by...

New ‘language of mechanisation’ publication and datasets released

May 2, 2024

We’re delighted to share the news that our data paper has been published by the Journal of Open Humanities Data....

Outreach and marketing for crowdsourcing tasks

June 27, 2024

Imagine you’ve set up a shiny new crowdsourcing project. How do you let people who might potentially want to...

Public domain newspaper titles in Living with Machines

May 7, 2024

A list of public domain newspaper titles available within the Living with Machines project; downloadable for re-use by...

New ‘language of mechanisation’ publication and datasets released

May 2, 2024

We’re delighted to share the news that our data paper has been published by the Journal of Open Humanities Data....

Our Funder and Partners