Welcome to our live blogging coverage from SMX Advanced in Seattle. Google’s Matt Cutts, head of the web spam team, is doing his traditional keynote that ends the first day of the conference. He’ll be sitting down with Search Engine Land’s Danny Sullivan for a “You & A” conversation that’s due to start at 5:00 pm PDT.
The room is packed and we’re even videoconferencing the keynote to other parts of the building for folks who can’ find a seat.
Stay tuned for live blog coverage below in a matter of moments.
Danny and Matt have just put on life jackets for some reason that’s completely unknown to me. They’re making jokes about caffeine, which is a preview of the big announcement that’s coming … now.
We’ve published an in-depth look at this launch in Vanessa Fox’s article, Google’s New Indexing Infrastructure “Caffeine” Now Live.
Matt says Google Caffeine is now live at all data centers, all locations, all languages. Matt says way back when Alta Vista ruled the web, Google once went four months without an update. It used to take Google 7-10 days to index the web. They didn’t have the capacity to update the index all at once. These were the days of the “Google Dance.”
In 2003, Google switched an incremental indexing system. They would crawl a portion of the web every night and push that portion live. This was the “Fritz” update.
Now, Matt says, they have Caffeine. When we crawl a document, we immediately put it through indexing. We can push out content much faster into search results, almost immediately. Analogy: web documents used to be on a bus, now they’re on a limo.
It also ups Google’s capacity to index the web. Can index a lot more documents, and the index is about 50% fresher.
Matt says it’s also easier to annotate documents with information. He’s talking about metadata. This goes beyond just associating links and anchor text with web documents. Caffeine lets Google process data on the order of 100 petabytes.
Danny asks about the impact on local search — citations, mentions of a business on web pages that don’t have links.
Matt talks about a tweet — it can have extra data like where the tweet was posted. Web documents can also have more data about them, like location, IP addresses, and so forth.
Matt says this is like changing the engine on a moving car. For now, it’s all about indexing. The suggestion is that later it could have impact on rankings.
Danny asks about “Mayday update” from last month (which Vanessa wrote about in Google Confirms “Mayday” Update Impacts Long Tail Traffic).
Matt says Mayday is about looking at the state of web content in 2010 and what signals do we use to differentiate between quality and, for example, content farms. He says people affected by this update should step back and look at how much content they’re generating and how close does it come to being spam.
Danny asks what site Mayday was designed to kill, and Matt says Google doesn’t want to make value judgments about individual sites, preferring to make algorithmic changes.
Matt reiterates that Google will be looking more at video sitemaps in the future, repeating what he said this morning during the SEO For Bing Vs. Google session.
Now they’re moving on to the public Q&A.
Matt says Mayday update had no impact on Google News.
Danny asks about Caffeine update and HTML5. A “really good question,” Matt says. Matt says HTML5 is completely unrelated to Caffeine, and Google doesn’t give bonus points for code that validates. But Google does have an HTML parser in the wings.
Question about ranking data available in Webmaster Tools and will Google show data from real-time search results and other “blended/universal” results? Matt says it’s a good question, don’t know of any plans to do that. But one of the Webmaster Tools team members is in the audience – talk to him.
Question about paid links: Matt says they have a couple new tools that are “really fun” for dealing with paid links. He describes them as “laser-guided scalpels.”
Next question about no-follow tags and PageRank sculpting. Question — what signals do we have to tell Google what the most important pages are? Matt says the answer: the pages you link to from your root page, the pages you link to in your site architecture, are the important pages. He mentions the site “tree-like” architecture of DMOZ.org as a good idea.
Next a bunch of mostly off-topic chit-chat about browsers, Google Buzz, etc. Asking the audience how much they use various browsers, etc.
Question comes in accusing Google of using search results to promote Google properties. Matt mentions that, years ago, the YouTube engineers explained that they go out of their way not to favor YouTube results, and when you compare the quantity of videos on YouTube versus other sites, it’s not favored.
Question about rich snippets for e-commerce. Matt says they’re looking at rich snippets, and may make it an automatic thing — you won’t have to apply to have rich snippets. Says it may be a matter of weeks for this to happen.
Now the lightning round of Q&A:
Do we need separate sitemaps for Flash content? No, we don’t need a separate standard.
Can Google tell if a page is positive, negative, or neutral? Matt says they have sentiment analysis, but “I don’t think we use that as a ranking signal.”
Bounce rate and rankings? Matt says Google Analytics is not used in the general ranking algorithm. “To the best of my knowledge, the rankings team does not use bounce rate in any way.” He tiptoed around this question a bit, choosing his words very carefully.
Danny goes on a rant about indented listings — can’t they go away? Danny yells “death to the indents” and half the audience boos!
And that’s it. Thanks for following along with our live blog coverage. We’ll be back tomorrow with more from day two of SMX Advanced.