Introducing the Sources Lab
On the face of it, what Sources Lab does sounds really straightforward: what do the historical sources say, and what new stories of the industrial revolution can we tell from them? Of course, it’s never quite as easy as it sounds.
We must fully understand the inherent biases in our sources (see ‘Transparency’), and how they can skew our understanding of history. For instance, newspapers in the past (like those today), were targeted at particular audiences, divided along lines of politics, place and socio-economic class. As such, those newspapers will contain differing viewpoints and voices. Creating a more representative account of the past requires seeing, then balancing these views in order to explore the plurality of histories, including minority voices too easily lost in the mix.
Living with Machines is operating at an enormous scale. This means we cannot humanly read all our sources (good luck with 20 million pages of newspapers, let alone every census record and each map etc.) and must rely on digitised and machine-readable sources. This brings additional challenges to the project. What has been digitised and what has not? Are the digital sources we have available a broadly representative sample of the whole? Or might the selection criteria have been more opportunistic and, therefore, skewed? We can, armed with this new knowledge, focus new digitisation efforts in a strategic fashion. Of course, others are asking similar questions.
The digital versions we use often contain errors introduced by the digitisation process itself, as in the case of typewritten text, where the image of the page has been read by automated Optical Character Recognition (OCR) software. Text in small typefaces on cheap paper from 150 years ago can be difficult to read for a human, but it is also hard for the computer. LwM will evaluate the extent of these inaccuracies across our corpus, and when we might be able to accept error and when we cannot, depending on the tasks.
Then there’s what happens when different sources are linked, joining people from census records to maps to biographies and news, but that will have to wait for another blog post.
Overall, we want scholars to be informed about the biases in what they search and see, so that they can more easily understand what their findings are based on, and what they may be missing in the process.