The blog Digital Scholarship in the Humanities has an interesting essay on this subject:
But how reliable are these electronic texts? Can researchers feel comfortable citing them and using them for text analysis? In my view, the quality of an electronic text and its appropriateness for use in scholarship depend on 6 factors:* Quality of the scanning: Is the complete page captured? Is the image skewed or distorted? Is the image of sufficient resolution?
* Quality of the OCR/text conversion: Is full text provided? What method was used to produce the textâdouble-keying or OCR? How accurate is the text? Are the texts marked up in TEI (Text Encoding Initiative)? Are words joined across line breaks? Are running heads preserved?
* Quality of the metadata: Is the bibliographic information accurate? Is it clear what edition you are looking at? If there are multiple volumes, do you know which volume you are getting and how to locate the other volume(s)?
* Terms of use: What are you legally able to do with the digitized work? Can you download the full-text and use tools to analyze it? Is the content freely and openly available, or do you have to pay for use?
* Convenience: Can you easily download the text and store it in your own collection? How much work do you have to do to convert the text into a format appropriate for use with text analysis tools? How hard is it to find the electronic text in the first place? Is there a Zotero translator for the collection?
* Reputation: Is the digital archive well-regarded in the scholarly community? If you cited the archive in your bibliography, would fellow researchers question your decision? Does the archive provide clear information about its process for selecting, digitizing, and preserving texts?I focused my evaluation on the main collections that I plumbed for the primary source works in my dissertation bibliography: Google Books (GB), Open Content Alliance (OCA), Early American Fiction (EAF), Project Gutenberg (PG), and Making of America (MOA). I found the OCA works in the Internet Archive (they are marked as belonging to the âAmerican Librariesâ or âCanadian Librariesâ collections.) I apologize in advance for the length of this post, but I want to dig into the details. ...



