As a comment to Ryan Shaw's post, "Libraries Look a Gift Horse in the Mouth," a bright guy named Patrick elaborates on what could be done with the right image formats:
Patrick wrote: October 22nd, 2007 at 1:21 pm
I would make an additional distinction that I wish the libraries understood before they signed these deals: There is text, and there is text with some format hinting. Knowing that one line is larger than those following is a hint that it is a section heading. Knowing how much bigger it is, where it occurred on a page, etc. adds additional information that allows you to recreate the structure of the work and not just its text. This in turn allows you to understand the knowledge model used in the book. And this finally lets you mine textbooks and other works of nonfiction for ontology. This is hugely valuable, and generally overlooked in the whole exercise.
It can be argued that these old books are out of date, but wouldn't it be cool to compare the domain models for something like chemistry or EE across time? If you want to find the history of a concept, you need to be able to search for that concept in old texts *using the then-contemporary knowledge models* for that concept.
I actually argued in our IP law class that this raised additional IP issues for Google Print (that of derivative works) not covered by the 'we only expose snippets' excuse they hide behind. It was an interesting exchange with the Google Print IP counsel.
Your friendly neighborhood ontogeek -