First crowdsourced datasets available

Written by Mia RidgeSeptember 10, 2020Comments: 0

Thanks to all the online volunteers who contributed to our first crowdsourcing tasks, we’ve got two results datasets ready for sharing on the British Library’s research repository: https://doi.org/10.23636/1197. (See ‘Why is the Communities Lab asking people to read old news?‘ for some background on this crowdsourcing work.) The datasets are published with an ‘open access’ public domain licence, which means that they can be re-used by anyone for any purpose.

The datasets contain the classifications created as members of the public looked at a selection of articles from 19th century newspapers that mentioned machines and decided if they described an industrial accident. A further task asked participants to transcribe personal, organisational and place names mentioned, and add a brief summary of relevant accidents.

By itself it’s probably not the most useful dataset, but publishing the results from our alpha and beta tasks like this is an important step for the project. For example, it meant we had to work out various internal processes (such as redacting personal information like usernames and IP addresses), confirm copyright and licensing (CC0), convert the records from JSON to the more readable CSV so the file could be opened in common spreadsheet software, and figure out what documentation to provide. Now that we’ve done this once, it’ll be easier to repeat for additional datasets.

Screenshot of our beta crowdsourcing task on the Zooniverse platform

I’ve posted on the ‘Talk’ forum so that our wonderful participants (or indeed anyone with a Zooniverse login) can comment or ask questions: https://www.zooniverse.org/projects/bldigital/living-with-machines/talk/2795/1607419. We’re working on providing access to the underlying OCR text transcription to provide further context for the classifications, and in the meantime we’d love to know whether it’s useful as it currently stands. Is the documentation (on the dataset page or forum post) useful? Is the context of its creation clear from the documentation?

Latest posts from us

The BL Labs team provided support to the Living with Machines team at critical points in the project. If you’re...

Read

Welcome to Archiving The Railway UK For those with an interest in Britain’s railway heritage, Archiving The...

Read

I’m from Huddersfield, a town shaped by industrial revolution, which bred my lifelong curiosity with its history....

Read

The BL Labs team provided support to the Living with Machines team at critical points in the project. If you’re...

Read

Welcome to Archiving The Railway UK For those with an interest in Britain’s railway heritage, Archiving The...

Read

I’m from Huddersfield, a town shaped by industrial revolution, which bred my lifelong curiosity with its history....

Read

First crowdsourced datasets available

Latest posts from us

Digital posts at the British Library

December 13, 2023

The BL Labs team provided support to the Living with Machines team at critical points in the project. If you’re...

LwM Digital Residency: Archiving The Railway UK (AR-UK)

October 10, 2023

Welcome to Archiving The Railway UK For those with an interest in Britain’s railway heritage, Archiving The...

LwM Digital Residency: Learning from Pandemonium

October 4, 2023

I’m from Huddersfield, a town shaped by industrial revolution, which bred my lifelong curiosity with its history....

Digital posts at the British Library

December 13, 2023

The BL Labs team provided support to the Living with Machines team at critical points in the project. If you’re...

LwM Digital Residency: Archiving The Railway UK (AR-UK)

October 10, 2023

Welcome to Archiving The Railway UK For those with an interest in Britain’s railway heritage, Archiving The...

LwM Digital Residency: Learning from Pandemonium

October 4, 2023

I’m from Huddersfield, a town shaped by industrial revolution, which bred my lifelong curiosity with its history....

Our Funder and Partners