«  Ryan Shaw explains the "image problem" with Google Book Search Main A fuller account of the Google phone OS initiative  »


As a comment to Ryan Shaw's post, "Libraries Look a Gift Horse in the Mouth," a bright guy named Patrick elaborates on what could be done with the right image formats:

Patrick wrote: October 22nd, 2007 at 1:21 pm

I would make an additional distinction that I wish the libraries understood before they signed these deals: There is text, and there is text with some format hinting. Knowing that one line is larger than those following is a hint that it is a section heading. Knowing how much bigger it is, where it occurred on a page, etc. adds additional information that allows you to recreate the structure of the work and not just its text. This in turn allows you to understand the knowledge model used in the book. And this finally lets you mine textbooks and other works of nonfiction for ontology. This is hugely valuable, and generally overlooked in the whole exercise.

It can be argued that these old books are out of date, but wouldn't it be cool to compare the domain models for something like chemistry or EE across time? If you want to find the history of a concept, you need to be able to search for that concept in old texts *using the then-contemporary knowledge models* for that concept.

I actually argued in our IP law class that this raised additional IP issues for Google Print (that of derivative works) not covered by the 'we only expose snippets' excuse they hide behind. It was an interesting exchange with the Google Print IP counsel.

Your friendly neighborhood ontogeek -

arrow

Comments (1)

the universities can do o.c.r. too.
and save whatever info they want...

and yes, styling info _can_ be very
important in determining structure.
but it _is_ possible to do that job
even without it...

-bowerbird

Post a comment

We had to crank up the spam filter so it may take a little while to appear. Thanks.

A book in progress by

Siva Vaidhyanathan

Siva Vaidhyanathan

This blog, the result of a collaboration between myself and the Institute for the Future of the Book, is dedicated to exploring the process of writing a critical interpretation of the actions and intentions behind the cultural behemoth that is Google, Inc. The book will answer three key questions: What does the world look like through the lens of Google?; How is Google's ubiquity affecting the production and dissemination of knowledge?; and how has the corporation altered the rules and practices that govern other companies, institutions, and states? [more]

» Send links, questions and ideas:
siva [at] googlizationofeverything [dot] com

» To reach me for a press query, please write to SIVAMEDIA ut POBOX dut COM

» To reach me for a speaking invitation, please write to SIVASPEAK ut POBOX dut COM

» Visit my main blog: SIVACRACY.NET

» More about me

Topics

Like the Mind of God (38 posts)

All the World's Information (45 posts)

What If Big Ads Don't Work (18 posts)

Don't Be Evil (14 posts)

Is Google a Library? (68 posts)

Challenging Big Media (37 posts)

The Dossier (33 posts)

Global Google (8 posts)

Google Earth (4 posts)

A Public Utility? (27 posts)

About this Book (18 posts)

RSS Feed icon  RSS Feed


Powered by Movable Type 3.35