digitise

Digitise! is our crowdsourcing initiative to scan, OCR, clean and proofread large corpuses of texts from the Humanities and the Social Sciences. We hope to get YOU – whether student, scholar, professor (active or retired) or enthusiastic layman – to help us with this huge but essential task.

What’s this all about?

One of the first and biggest tasks facing sdvig press is the digitisation in a very high, error-free quality of very significant amounts of texts. Even with performant OCR software, it takes us about 30 hours to obtain a fully satisfying digitised version of a monograph (250-300 pages). At this rate, it will take years if not decades to digitise a satisfactory number of texts, even if we find the money to pay some poor souls to do this full time. If, on the other hand, we can mobilise several hundreds of volunteers to give us a helping hand, we could have full corpuses ready by the end of 2014!

Aren’t Google and the DPLA already doing this?

Yes, impressive digitisation efforts are being undertaken by Google, the DPLA and libraries such as the Bibliothèque Nationale de France, the Austrian National Library, etc. As such we will probably initiate collaborations with some of these partners in the future. At this point in time, though, none of them offers a satisfactory answer to our digitisation needs, mainly for the following reasons:

- They often do not allow free use or access to their texts
- Their digitised texts have not been cleaned and are full of errors
- Many of the texts most relevant to us have not been digitised

Why not use Wikisource?

Wikisource is a great project but it does not guarantee the level of editorial and bibliographical quality we are seeking to achieve. For instance, it is hard to reference and cite a Wikisource book in a scholarly publication. Our goal is to create an open access library of the highest scholarly quality and to be able to exercise rigorous editorial and bibliographical control over our corpuses, in collaboration with expert scholars in the domain. Our repository will moreover dispose of powerful search and text-mining tools that will be specifically geared towards scholarly research.

Why should you get involved?

As a student or a researcher, have you ever spent hours scanning or copying a book that you weren‘t allowed to borrow from the university library? Spent days trying to find a reference you forgot to note down properly? Spent weeks waiting for the single copy of a book to become available again? Have to pay to get a book shipped from a library at the other end of the country? Well, all these inconveniences will be a thing of the past once all or most of the texts relevant to your field of study have been digitised. You won’t even need to go the library, they will all be at hand on your PC or your mobile phone. Give us ten hours of your time, we’ll save you hundreds!

Why should you help sdvig press?

sdvigpress will guarantee contractually that all the texts digitised through its Digitise! initiative will be made fully available in open access on its platform before they are exploited in any other way. Even better, we will license the digitised texts under CC-BY at the latest 2 years after their publication on our website, meaning they will be freely available and exploitable by anybody. That 2 year delay, by the way, is not meant to restrict access, but to give us time to prepare commented print editions. Best of all, we are developping software to enable advance search, dynamic visualisations and text-mining on the digitised texts. In other words, you should help us because we are doing this for you!

How will it work?

Simple. First you must register on the sdvig press website (registrations are not open at this time). This will give you access to a list of titles that we will have prepared (broken down into single chapters or articles). By clicking on the titles, you will be able to download or upload a document and perform actions such as scanning, cleaning OCR or proofreading a cleaned copy. Once all the subparts of a text have been scanned, cleaned and proofread, we’ll put it up on the sdvig website.
The text that will thus be produced will only be a clean, high quality "copyedited" version, not a full critical edition. For some authors and texts, we might launch a critical edition later, but that will not be crowdsourced. The critical editions will be carried out under the supervision of a qualified editor, in cooperation with the relevant learned societies and with assistance from a selected editorial board. These new edition will also be made available in open access on the sdvig press platform.

Still confused? We’ll explain the detailed workflows and have someone on hand to help when the project starts in earnest!

What about copyright?

We will digitise only public domain texts, using public domain editions. Some difficulties might arise in determining copyright for some authors and texts, but in general works fall into the public domain 70 years after an author's death, and editions 50 years after publication.

How do you join?

You can’t officially register yet, but you can obtain regular information and already show your support for the initiative by liking the Digitise! Facebook page or following Digitise! on Twitter (@sdvigdigitise). We’ll let you know as soon as we launch the pilot.

Share the word!

Before the pilot even gets going we are looking to gather support both to test the waters and to have as many people as possible on board when we start. We are also applying for grants, so your support is more than welcome! Get the word out by sharing our Facebook page and tweeting about us!

Any other questions? Write to us at digitise[at]sdvigpress.org on Twitter @sdvigdigitise or write on our Facebook wall.