«  Google's new 411 service Main The Open Library: A Very Important Step  »


How Google Works by David Carr is a great nuts-and-bolts guide to how Google was built and how it operates such a stunning amount of data capacity.

Check out the section on the file system. Amazing.

And this section, about "Google's secrets,"is very helpful:

... For all the papers it has published, Google refuses to answer many questions. "We generally don't talk about our strategy ... because it's strategic," Page told Time magazine when interviewed for a Feb. 20 cover story.

One of the technologies Google has made public, PageRank, is Page's approach to ranking pages based on the interlinked structure of the Web. It has become one of the most famous elements of Google's technology because he published a paper on it, including the mathematical formula. Stanford holds the patent, but through 2011 Google has an exclusive license to PageRank.

Still, Yahoo's research arm was able to treat PageRank as fair game for a couple of its own papers about how PageRank might be improved upon; for example, with its own TrustRank variation based on the idea that trustworthy sites tend to link to other trustworthy sites. Even if competitors can't use PageRank per se, the information Page published while still at Stanford gave competitors a starting point to create something similar.

"PageRank is well known because Larry published it—well, they'll never do that again," observes Simson Garfinkel, a postdoctoral fellow at Harvard's Center for Research on Computation and Society, and an authority on information security and Internet privacy. Today, Google seems to have created a very effective "cult of secrecy," he says. "People I know go to Google, and I never hear from them again."

Because Google, which now employs more than 6,800, is hiring so many talented computer scientists from academia—according to The Mercury News in San Jose, it hires on average 12 new employees a day and recently listed 1,800 open jobs—it must offer them some freedom to publish, Garfinkel says. He has studied the GFS paper and finds it "really interesting because of what it doesn't say and what it glosses over. At one point, they say it's important to have each file replicated on more than three computers, but they don't say how many more. At the time, maybe the data was on 50 computers. Or maybe it was three computers in each cluster." And although the GFS may be one important part of the architecture, "there are probably seven layers [of undisclosed technology] between the GFS system and what users are seeing."

One of Google's biggest secrets is exactly how many servers it has deployed. Officially, Google says the last confirmed statistic for the number of servers it operates was 10,000. In his 2005 book The Google Legacy, Infonortics analyst Stephen E. Arnold puts the consensus number at 150,000 to 170,000. He also says Google recently shifted from using about a dozen data centers with 10,000 or more servers to some 60 data centers, each with fewer machines. A New York Times report from June put the best guess at 450,000 servers for Google, as opposed to 200,000 for Microsoft.

The exact number of servers in Google's arsenal is "irrelevant," Garfinkel says. "Anybody can buy a lot of servers. The real point is that they have developed software and management techniques for managing large numbers of commodity systems, as opposed to the fundamentally different route Microsoft and Yahoo went." ...

arrow

Post a comment

We had to crank up the spam filter so it may take a little while to appear. Thanks.

A book in progress by

Siva Vaidhyanathan

Siva Vaidhyanathan

This blog, the result of a collaboration between myself and the Institute for the Future of the Book, is dedicated to exploring the process of writing a critical interpretation of the actions and intentions behind the cultural behemoth that is Google, Inc. The book will answer three key questions: What does the world look like through the lens of Google?; How is Google's ubiquity affecting the production and dissemination of knowledge?; and how has the corporation altered the rules and practices that govern other companies, institutions, and states? [more]

» Send links, questions and ideas:
siva [at] googlizationofeverything [dot] com

» To reach me for a press query, please write to SIVAMEDIA ut POBOX dut COM

» To reach me for a speaking invitation, please write to SIVASPEAK ut POBOX dut COM

» Visit my main blog: SIVACRACY.NET

» More about me

Topics

Like the Mind of God (41 posts)

All the World's Information (54 posts)

What If Big Ads Don't Work (18 posts)

Don't Be Evil (14 posts)

Is Google a Library? (71 posts)

Challenging Big Media (39 posts)

The Dossier (37 posts)

Global Google (9 posts)

Google Earth (4 posts)

A Public Utility? (29 posts)

About this Book (20 posts)

RSS Feed icon  RSS Feed


Powered by Movable Type 3.35