This question of commonly-seen tasks is one of the hardest ones for me, as a practitioner, to wrap my head around in terms of how to convey information in a useful way. For example, it’s *so* helpful to be able to point to a project using a similar type of data, with a similar goal, and say, “Here’s how this team did it, here’s what their output looked like, here’s how they processed their results,” so that teams can quickly wrap their head around the amount of resources they might need to run a project (resources here including staff time, data analysis skills, long term data management plan, etc.).
On the other hand, I don’t want to be overly prescriptive, in that by giving specific examples we run the risk of insinuating that this is how a project should be set up — we want there to be space for creativity, too. Ideally (in my head, anyway) these real-world examples would function as ‘templates’ to give people a starting point for further exploration and iteration. Perhaps best to put the information out there and let research teams decide whether that type of resource is useful.
Mia makes a good point here, too, about novel approaches which may require development — I wonder whether it’s clear from the perspective of project builders what the stakes are for novel vs out of the box options, and how it affects project outcome. For me, it’s really important to weigh the necessity of novel tech against 1) the cost; 2) the timeline; and 3) the potential for re-use (number 3 here I think being the most important).
Anyway, I would love to hear more thoughts on this!
Thanks, Ben! I’d certainly agree with your assessment above.
We’re easing into some audio transcription projects this year on Zooniverse (which is very exciting!), but there’s still a ton of work to be done. I know I’d love to hear more about other’s experiences (you all are starting some audio efforts on FtP, if you’ve not already done so, yes?) and certainly more about the Smithsonian’s process of branching out into the audio world.
Tabular data is also high on our list of data formats to tackle, because it’s *so* common. People have approached it on Zooniverse using a variety of tools, but we don’t yet have an elegant solution.
Machine learning and support for OCR/HTR integration is a huge one for us, too. But this then raises the very good question that’s been posted by Mia Ridge & others in recent months re: is crowdsourcing a data creating/processing task, or one for engagement with collections/heritage materials? OCR verification/editing may be more efficient, but is it as fulfilling? How do we strike a balance between realistic project lifecycles and ethical practices with volunteer communities? These are the questions that keep me up at night, anyway…