Matt Cutts, a google employee who mostly deals with search engine spam, had a Q & A Session on his blog.

In it, he brought up a few interesting things. Here are some highlights:

  • The BigDaddy Update is fully deployed.
  • As part of the BigDaddy release, Googlebot will be phased out and replaced by the Mozilla Googlebot…”Googlebot/2.1 (+http://www.google.com/bot.html)” will be replaced with “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html).” Others have noticed that the new Googlebot has crawled faster, explored areas restricted by robots.txt, and read files not read by previous bots (such as CSS and js files).
  • The BigDaddy Update means more supplemental results or on their way and new pageranks…but Matt wouldn’t say what else it would bring in coming months
  • “As Bigdaddy cools down, that frees us up to do new/other things.” in reference to the datacenter at 64.233.185.104 acting differently
  • Google crawls deeper based on pagerank and also prefer 1-2 parameters in the URL. This one is particularly interesting. Here is the full text:

    Q: “My sitemap has about 1350 urls in it. . . . . its been around for 2+ years, but I cannot seem to get all the pages indexed. Am I missing something here?�
    A: One of the classic crawling strategies that Google has used is the amount of PageRank on your pages. So just because your site has been around for a couple years (or that you submit a sitemap), that doesn’t mean that we’ll automatically crawl every page on your site. In general, getting good quality links would probably help us know to crawl your site more deeply. You might also want to look at the remaining unindexed urls; do they have a ton of parameters (we typically prefer urls with 1-2 parameters)? Is there a robots.txt? Is it possible to reach the unindexed urls easily by following static text links (no Flash, JavaScript, AJAX, cookies, frames, etc. in the way)? That’s what I would recommend looking at.

  • for those trying to get into Google’s head and guessing on new features: “Any time you’re considering a new feature (e.g. our numrange search), you have to trade off how much the index would get bigger versus the utility of the feature.” I can sure relate to that! :-)
  • “[I]f you sell links, you should mark them with the nofollow tag. Not doing so can affect your reputation in Google.”
  • on the topic of international audiences or those trying to target multi-national audiences: “If you’ve only got a small number of pages, I might start out with subdomains, e.g. de.mydomain.eu or de.mydomain.com. Once you develop a substantial presence or number of pages in each language, that’s where it often makes sense to start developing separate domains.” Does this imply that ccTLDs rank highest in country-specific searches, with subdomains a second and everything else afterward? no big surprise here.
  • you can search by daterange using an undocumented syntax. (not mentioned: there is a UI change, triggered by certain keywords)
  • Google might be considering figuring out a way to deal with directories and shopping comparison sites since some see them as spam (maybe is the operative word here…nothing explicit says so in Matt’s post). I only bring this up because Matt mentions he’s heard it before and people think of spam as anything that increases noise according to him
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
 
Loading ... Loading ...