<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Richy&#039;s Random Ramblings &#187; Net: Search Engines</title>
	<atom:link href="http://blog.rac.me.uk/category/net-search-engines/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.rac.me.uk</link>
	<description>Random ramblings and ravings of Richy C</description>
	<lastBuildDate>Mon, 16 Jan 2012 12:16:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Search Engines: Google&#8217;s 10 Commandments</title>
		<link>http://blog.rac.me.uk/2007/05/15/search-engines-googles-10-commandments/</link>
		<comments>http://blog.rac.me.uk/2007/05/15/search-engines-googles-10-commandments/#comments</comments>
		<pubDate>Tue, 15 May 2007 18:07:53 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>
		<category><![CDATA[google]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/2007/05/15/search-engines-googles-10-commandments/</guid>
		<description><![CDATA[Did you know Google have their own &#8220;10 Commandments&#8221; (ok, more 10 Philosophy things &#8211; but my way sounds better). It&#8217;s always a bit interesting to see things from inside the Googleplex and it has lots of &#8220;odd jokes&#8221; &#8211; such as &#8220;It&#8217;s best to do one thing really, really well&#8230; Google does search&#8221;: Google [...]]]></description>
			<content:encoded><![CDATA[<p>Did you know Google have their own <a href="http://www.google.com/corporate/tenthings.html">&#8220;10 Commandments&#8221;</a> (ok, more 10 Philosophy things &#8211; but my way sounds better). It&#8217;s always a bit interesting to see things from inside the Googleplex and it has lots of &#8220;odd jokes&#8221; &#8211; such as &#8220;It&#8217;s best to do one thing really, really well&#8230; Google does search&#8221;: Google now do a lot more than search! And not to forget &#8220;Google believes in instant gratification&#8221; <img src='http://blog.rac.me.uk/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2007/05/15/search-engines-googles-10-commandments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to find a New Zealand street in Google Maps</title>
		<link>http://blog.rac.me.uk/2006/08/12/how-to-find-a-new-zealand-street-in-google-maps/</link>
		<comments>http://blog.rac.me.uk/2006/08/12/how-to-find-a-new-zealand-street-in-google-maps/#comments</comments>
		<pubDate>Sat, 12 Aug 2006 20:29:15 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>
		<category><![CDATA[google maps]]></category>
		<category><![CDATA[new zealand]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/2006/08/12/how-to-find-a-new-zealand-street-in-google-maps/</guid>
		<description><![CDATA[I&#8217;ve found something that the all-knowing and all-seeing Google cannot find! Streets in New Zealand. This is despite these streets being in its database! So how can we pull up a nice Satellite view of a New Zealand street? Here&#8217;s how I did it. First of all, know the name of the street you are [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve found something that the all-knowing and all-seeing Google cannot find! Streets in New Zealand. This is despite these streets being in its database!</p>
<p>So how can we pull up a nice Satellite view of a New Zealand street? Here&#8217;s how I did it.</p>
<p>First of all, know the name of the street you are looking for. For this example, I will use &#8220;Reeves Road&#8221; in Christchurch (South Island).</p>
<p>Now go to <a href="http://www.wises.co.nz/">http://www.wises.co.nz/</a> and search for that road to make a map.</p>
<p>Now zoom out on the Wises map to get a good overview of the area.</p>
<p>Now go to Google maps at <a href="http://maps.google.co.uk/">http://maps.google.co.uk</a> and type in &#8220;Christchurch, New Zealand&#8221; to get as close as Google will allow you to search.</p>
<p>Compare the two maps until you can match up the major road names/locality names.</p>
<p>Zoom in on Google maps and low and behold &#8211; there&#8217;s your street! With it&#8217;s street name!</p>
<p>If you want to then see it in Google Earth (which doesn&#8217;t have street names and has the same resolution photos), then click &#8220;Link to this page&#8221; and in the URL bar at the top of Firefox/Internet Explorer, copy the &amp;ll=-49&#8230;&#8230;. section until the next &amp; sign and copy that bit into Google Earth.</p>
<p>There you go &#8211; not that difficult, but fiddly.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2006/08/12/how-to-find-a-new-zealand-street-in-google-maps/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search Engines: No Index Sections?</title>
		<link>http://blog.rac.me.uk/2004/01/04/search-engines-no-index-sections/</link>
		<comments>http://blog.rac.me.uk/2004/01/04/search-engines-no-index-sections/#comments</comments>
		<pubDate>Sun, 04 Jan 2004 22:32:51 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=590</guid>
		<description><![CDATA[A fellow blogger has suggested that a tag be introduced which would stop search engines such as Google from indexing certain sections of web pages. This would be extremely handy for all the blog comment spam which is currently going around (I&#8217;m personally using a combination of IP blocking [like Neil] and modification of /lib/MT/App/Comments.pm [...]]]></description>
			<content:encoded><![CDATA[<p>A fellow blogger has <a href="http://www.gungeralv.org/notes/archives/000561.php">suggested that a tag be introduced</a> which would stop search engines such as Google from indexing certain sections of web pages. This would be extremely handy for all the blog comment spam which is currently going around (I&#8217;m personally using a combination of IP blocking [<a href="http://www.neilturner.me.uk/2004/Jan/02/beware_of_ukranian_spammers.html">like Neil</a>] and modification of /lib/MT/App/Comments.pm to block certain words in submitted URLs), but instead of<br />
<code>&gt;!-- SearchEngine: Begin Anonymous Comment --&gt; / &lt;!-- SearchEngine: End Anonymous Comment --&gt;</code><br />
I would recommend something a bit more generalised such as:<br />
<code>&lt;!-- robots:noindex --&gt; / &amp;lt!-- /robots:noindex --&gt;</code></p>
<p>To try and fit in with the already existing <a href="http://www.robotstxt.org/">robots.txt</a> and <a href="http://www.robotstxt.org/wc/meta-user.html">robots meta tag</a> (it also could be extended to things like &lt;!&#8211; robots:nofollow &#8211;&gt; for sections of content).</p>
<p>This tag would be used to mark sections of web page content as being &#8220;not to index/search&#8221;: so if a spammer does managed to add their URL to a website, but the URL appears in between the &amp;lt!&#8211; robots:noindex &#8211;&gt; tag then the search engines will ignore the listing making the spam useless in regards to search engine placement/promotion.</p>
<p>However, there&#8217;s a number of drawbacks that I can see for this introduction to the search engine world:<br />
<span id="more-590"></span></p>
<ol>
<li>First thing is backwards compatibility. It&#8217;s conceivable that several Content Management Systems (CMS) may use comment tags starting robots: for internal markup purposes. In theory, these should be parsed out before the content is sent to the end user, but in practise that&#8217;s another matter. That said, I expect the number of sites currently using something like &lt;!&#8211; robots&#8230; to be extremely extremely low.</li>
<li>Second thing is backwards compatibility. But this time it&#8217;s more relating to existing sites that <i>should</i> use the tag. I estimate there&#8217;s somewhere in the region of 270,000 <a href="http://www.movabletype.org/">Movable Type</a> blog sites currently online (which compares well with SixLog&#8217;s own <a href="http://www.movabletype.org/about_movable_type.shtml" class="broken_link">download figures</a> of one quarter of a million times), but then you&#8217;ve got to take into account all the other sites which allow third party comments which you may not want search engines indexing sections of (for example, for major news sites it may be preferable to just allow search engines to index/cache the headline and the first paragraph as after a few days the article may become &#8220;pay to read&#8221; and hence the publisher may not want it archived). But getting nearly 1million webmasters to integrate the new tag in their site (and rebuild the entire site) could be problematic.</li>
<li>Third item is the potential abuse factor. As a search engine optimiser, I know full well how the existing HTML tags can be abused to make certain parts of web pages &#8220;invisible&#8221; to web spiders/robots/search engines (and the flip side, how to make content only visible to those and not &#8216;normal browsers&#8217;). I can see how a &lt;!&#8211; robots:noindex &#8211;&gt; tag could be easily abused (think of Javascript redirects hidden in that section, or to &#8216;hide&#8217; the bulk of the page so the keyword density on the rest of the page stays &#8216;just right&#8217;).</li>
<li>Forth factor is the &#8220;take up rate&#8221;. It&#8217;ll be good if a major search engine such as <a href="http://www.google.com/">Google</a> were to use this tag, but ideally we need widespread saturation &#8211; ideally <a href="http://www.altavista.com/">Altavista</a>/<a href="http://www.alltheweb.com/">AllTheWeb</a> (both owned by <a href="http://www.overture.com/">Overture</a> which is now owned by <a href="http://www.yahoo.com/">Yahoo Inc</a>) also need to support it as well as &#8220;non search engines&#8221; such as <a href="http://www.archive.org/">The Web Archive</a></li>
</ol>
<p>But it&#8217;s a good idea and I do hope that it&#8217;s implemented in one manner or another in the very near future&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2004/01/04/search-engines-no-index-sections/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Search: Google Calculator II</title>
		<link>http://blog.rac.me.uk/2003/08/14/search-google-calculator-ii/</link>
		<comments>http://blog.rac.me.uk/2003/08/14/search-google-calculator-ii/#comments</comments>
		<pubDate>Thu, 14 Aug 2003 00:00:14 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google calculator]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=523</guid>
		<description><![CDATA[LOL (laugh out loud!). Being a typical geek, I tried several calculations in the Google Calculator to get a screenshot showing 42: but I should have remembered my Douglas Adams and just searched for &#8220;answer to life, the universe and everything&#8221;. No kidding &#8211; Google returns &#8220;answer to life, the universe and everything = 42&#8243; [...]]]></description>
			<content:encoded><![CDATA[<p>LOL (laugh out loud!). Being a typical geek, I tried several calculations in the <a href="http://blog.rac.me.uk/archives/000520.html">Google Calculator</a> to get a screenshot showing 42: but I should have remembered my Douglas Adams and just searched for &#8220;answer to life, the universe and everything&#8221;.</p>
<p>No kidding &#8211; Google returns &#8220;answer to life, the universe and everything = 42&#8243; &#8211; <a href="http://www.google.com/search?hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=answer+to+life%2C+the+universe+and+everything&amp;btnG=Google+Search">try it yourself!</a>.</p>
<p>I now also know roughly how much I weigh in kilograms ( <a href="http://www.google.com/search?hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=13+stones+to+kilograms">13 stones to kilograms</a> : yes, Google does conversions as well), I&#8217;m sure that &#8220;<a href="http://www.google.com/search?hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=four+and+twenty">four and twenty</a>&#8221; does actually mean twenty four blackbirds, and I know how many Newtons I need to <a href="http://www.google.com/search?hl=en&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=1.21+GW+%2F+88+mph">travel back in time</a> to a <a href="http://www.google.com/search?hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=fortnight+plus+17+days">fortnight plus 17 days ago</a> with my copy of &#8220;<a href="http://www.google.com/search?hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=100+leagues">555.6 kilometers under the sea</a>.</p>
<p>Kewl! (well, it keeps me entertained).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2003/08/14/search-google-calculator-ii/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search: Google Calculator</title>
		<link>http://blog.rac.me.uk/2003/08/13/search-google-calculator/</link>
		<comments>http://blog.rac.me.uk/2003/08/13/search-google-calculator/#comments</comments>
		<pubDate>Wed, 13 Aug 2003 02:10:37 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google calculator]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=520</guid>
		<description><![CDATA[Alright, own up &#8211; who erased my memory so that I knew nothing about the Google calculator function until jake at Utterly Boring blogged about it? To think, I won&#8217;t have to drop to the command line at work and do perl -e "print 6*7 when I want to do quick sums anymore (yes, yes, [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="Google Calculator" src="http://blog.rac.me.uk/photos/2003/08/googlecalc.gif" width="110" height="110" border="0" align="left" />Alright, own up &#8211; who erased my memory so that I knew nothing about the <a href="http://www.google.com/help/features.html#calculator">Google calculator</a> function until <a href="http://utterlyboring.com/blog/archives/000885.php" class="broken_link">jake at Utterly Boring</a> blogged about it? To think, I won&#8217;t have to drop to the command line at work and do <code>perl -e "print 6*7</code> when I want to do quick sums anymore (yes, yes, Windows has a calc.exe function &#8211; but I just don&#8217;t like it for a reason I&#8217;m unable to explain to myself).</p>
<p>And there I was thinking that <a href="http://www.google.com/newsalerts">Google News Alerts</a> was the latest thing to be offered by &#8220;the big G&#8221; (whose anti site <a href="http://www.google-watch.org/">Google Watch</a> now has a watch site of it&#8217;s own <a href="http://www.google-watch-watch.org/">Google Watch Watch</a> [as spotted by <a href="http://www.neilturner.me.uk/entries/001144.html">Neil</a>]).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2003/08/13/search-google-calculator/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search: Choosing a good Search Engine Optimization Company</title>
		<link>http://blog.rac.me.uk/2003/08/04/search-choosing-a-good-search-engine-optimization-company/</link>
		<comments>http://blog.rac.me.uk/2003/08/04/search-choosing-a-good-search-engine-optimization-company/#comments</comments>
		<pubDate>Mon, 04 Aug 2003 20:09:45 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>
		<category><![CDATA[advert]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[search neigne optimisation]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=510</guid>
		<description><![CDATA[Huh, just came across something that slightly cheered me up. I just saw an advertisement (provided by Google Adsense: see top of the page) for a company offering one of the services I do for a job (search engine optimisation and placement). I went to their site to see how good they were (as I [...]]]></description>
			<content:encoded><![CDATA[<p>Huh, just came across something that slightly cheered me up. I just saw an advertisement (provided by Google Adsense: see top of the page) for a company offering one of the services I do for a job (search engine optimisation and placement). I went to their site to see how good they were (as I hadn&#8217;t heard of them before) and&#8230;</p>
<p>They&#8217;ve got a &#8220;Google PR&#8221; value of &#8220;0&#8243;, I can&#8217;t find their site on Google search for their company name (never a good sign) and finally some of the techniques they &#8220;suggest&#8221; are good for a site (such as &#8220;dynamic meta tags&#8221;) would, most likely, get your site banned from the search engine. A good SEO (search engine optimizer) will optimise a site in such a way that it&#8217;ll work as a &#8220;static site&#8221; OR dynamically driven (ok, some &#8220;hacks&#8221; may be needed to avoid query strings). I also checked their &#8220;recommend client&#8221; site: no optimisation (bar the &#8220;now-redundant&#8221; meta tags) and site can&#8217;t easily be found in Google!</p>
<p>Therefore, I&#8217;d like to suggest the following to anyone considering employing a search engine optimization company: First of all, can you easily find them in the search engines for a) their company name and b) one of their keyphrases (sometimes this is harder to figure out as it may not be easy to see what they are targeting).</p>
<p>Secondly: Do your research. If they suggest creating hidden pages/links, cloaking/fast redirects, duplicate pages (or <a href="http://www.google.com/webmasters/guidelines.html">anything else on Google&#8217;s &#8220;Do not do&#8221; list</a>), then steer clear of them as the site could easily be <a href="http://www.google.com/contact/spamreport.html">reported for spam</a><br />
<span id="more-510"></span><br />
Thirdly: Do your research x2. If they list any clients (very few SEO companies will be willing to release details of many of their clients because it&#8217;s, unfortunately, quite common for rival companies to &#8220;client pinch&#8221;), find out how they are performing in Google (or any other search engine) for the main keyphrases. Testimonials may be good (we get quite a few unsolicited ones every week), but see if you can back it up with &#8220;hard evidence&#8221;.</p>
<p>Fourthly: Check <a href="http://www.google.com/webmasters/seo.html">Google&#8217;s SEO recommendations</a> (and, yes, even Google gets the &#8220;I visited your website&#8230;&#8221; style emails: I know I&#8217;ve received some for my own sites and yet the site is doing really well in the main search engines).</p>
<p>Fifthly: Find out which search engines they target. 10? 20? 200? 2,000? Here&#8217;s a secret: whilst there may be in the region of 5,000 &#8220;search engines&#8221; &#8211; they aren&#8217;t all unique. Most either get their results from Google, <a href="http://www.av.com/">Altavista</a>, <a href="http://www.inktomi.com/" class="broken_link">Inktomi</a> or Pay Per Click services such as <a href="http://www.espotting.com/">Espotting</a> or <a href="http://www.overture.com/">Overture</a>. We target between 30 and 50 search engines as we know that we know get the visitors &#8211; what&#8217;s the point of paying a premium to get listed on an extra 1,500 search engines which are lucky to have a search per day?</p>
<p>Sixthly: Choose wisely. Does the search engine optimization company need to advertise? We rely on word of mouth and our search engine positions to get us customers &#8211; why does &#8220;company X&#8221; need to take out pay per click listings at $8 per click or great big banner advertisements? And where is that money for the adverts coming from?&#8230;</p>
<p>Seventhly: What are they offering? &#8220;Search engine registration&#8221; is good &#8211; but all they are doing is submitting your site to the search engines and that&#8217;s it. There are many programs which will do this for FREE! Do they actually offer &#8220;positions&#8221; or just &#8220;a listing&#8221;? Do they optimize your own site (which means make changes to it to make it more search engine friendly) or do they set up a gateway site which they own and promote that? If the latter, what happens when your contract expires: yep, they continue to get YOUR traffic.</p>
<p>Sevently: Guarantee levels. We are willing to guarantee our work (with either a money back guarantee or a &#8220;pay on performance&#8221; style scheme) as we know we&#8217;ll get the positions for the customer. We give fixed guarantee levels (&#8220;we will get you X top ten results&#8221;) &#8211; but no vague &#8220;we will increase your presence&#8221; promises: how much will they increase your presence in the search engine? 1 place? 10? or actually get you near the top?</p>
<p>Eighth: Contract length. When we sign up with a client, we aim to make the relationship last a minimum of a year (although the client is &#8220;free to walk&#8221; at anytime). Be wary of companies which off a &#8220;one off fee&#8221; or &#8220;no contact&#8221; as what will happen in 3 months time when your positions start to drop after the search engines change their algorithms (a large portion of our time is taken up just following all the major changes to the search engines &#8211; did you know Google alone uses over 100 different variables to even do a &#8220;basic ranking&#8221; of a website). If a client starts to slip in the positioning, we reoptimize before it&#8217;s too late.</p>
<p>Ninthly: Link backs/PR building. It&#8217;s become very important in the past couple of years (especially since Google came on the scene) to have good, high quality links back to your site &#8211; will the company you are paying work on getting these links for you? And will the be &#8220;reciprocal&#8221; (which means your homepage could get dotted with &#8220;buttons&#8221; for many many other sites- oh, and reciprocal links, if detected, tend to be &#8220;downgraded&#8221; for the search engines).</p>
<p>Finally: Cost. How much are they charging? Our lower end prices start at less then &pound;200 per year, but then depending on the guarantee level, competitiveness of the market it can go up to &pound;10,000! (it&#8217;s a lot easier to promote a site about &#8220;the inside of the knee&#8221; targetting a small village, than a mortgage or p-rn site aiming for international rankings. We actually won&#8217;t tuch &#8220;adult sites&#8221; as it&#8217;ll be doubtful we&#8217;ll get the client a very good return on investment)</p>
<p>That&#8217;s my advice (even though this was only meant to be a &#8220;one paragraph&#8221; post), and I hope somebody finds it useful (the stories we hear about clients being ripped over by rival companies are scary!). If you are interested in getting your website optimized by the company I work for, then just leave an appropriate comment and I&#8217;ll pass the details on. Why no link to them? Well, it&#8217;ll be shameless self-promotion for starters and anyway, we&#8217;re doing extremely well in the search engines as it is <img src='http://blog.rac.me.uk/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2003/08/04/search-choosing-a-good-search-engine-optimization-company/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Search: New Microsoft Search</title>
		<link>http://blog.rac.me.uk/2003/04/20/search-new-microsoft-search/</link>
		<comments>http://blog.rac.me.uk/2003/04/20/search-new-microsoft-search/#comments</comments>
		<pubDate>Sun, 20 Apr 2003 20:26:33 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[microsoft search]]></category>
		<category><![CDATA[msn]]></category>
		<category><![CDATA[msn search]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=426</guid>
		<description><![CDATA[At the start of April, a Reuters news article came out which quoted Bob Visse, director of Marketing for MSN: We do view Google more and more as a competitor. We believe that we can provide consumers with a better product and a better user experience. Sounds ominous doesn&#8217;t it? Many people expected Microsoft to [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="Microsoft Search" src="http://blog.rac.me.uk/photos/2003/04/microsoftgoogle.jpg" width="110" height="110" border="0" align="left" />At the start of April, a <a href="http://www.reuters.com/newsArticle.jhtml?type=internetNews&amp;storyID=2497128">Reuters news</a> article <a href="http://slashdot.org/article.pl?sid=03/04/02/2251204&amp;mode=nested&amp;tid=109&amp;tid=95">came out</a> which quoted Bob Visse, director of Marketing for MSN:</p>
<blockquote><p>We do view Google more and more as a competitor. We believe that we can provide consumers with a better product and a better user experience.</p></blockquote>
<p>Sounds ominous doesn&#8217;t it? <a href="http://www.webmasterworld.com/forum3/11160.htm">Many people</a> expected Microsoft to therefore create their own search engine (instead of just using Looksmart, Inktomi and Direct Hit), but it seems things have happened a bit quicker than expected!</p>
<p>Yep, a <a href="http://www.bitter-girl.com/archives/000776.html" class="broken_link">few people</a> have <a href="http://www.webmasterworld.com/forum11/1832.htm" class="broken_link">noticed</a> a <a href="http://www.microdocs-news.info/newsGoogle/2003/04/18.html">new robot</a> or crawler <a href="http://www.seochat.com/viewtopic.php?t=1318" class="broken_link">indexing the</a> internet and all signs point back to Microsoft at the moment.</p>
<p>Whilst it hasn&#8217;t yet hit my blog, I have been hit by it on one of my other sites with the following details:</p>
<blockquote><p>131.107.163.49 &#8211; - [20/Apr/2003:12:54:56 +0100] &#8220;GET /robots.txt HTTP/1.1&#8243; 200 763 &#8220;-&#8221; &#8220;MicrosoftPrototypeCrawler (please report obnoxious behaviour to newbiecrawler@hotmail.com)&#8221;</p></blockquote>
<p>The IP address 131.107.163.49 falls within the 131.107.0.0-131.107.255.255 (in otherwords a 131.107.0.0/16) netblock which is allocated to a certain Microsoft Corp of One Microsoft Way, Redmond, Washington, 98052, USA.</p>
<p>Using that information, I was then able to look at the logs again and saw quite a few page requests (I stopped counting after the 200th request made in the first 9 hours of today) from the IP address 131.107.65.225 (also owned by Microsoft) with the &#8220;Browser User-Agent&#8221; of &#8220;Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.2;+.NET+CLR+1.1.4322)&#8221;.</p>
<p>So, it would appear Microsoft <b>has</b> launched a new spider/robot out on the Internet and its name is <b>MicrosoftPrototypeCrawler</b>, but Microsoft want to keep it slightly quiet for now by mostly hiding the user-agent string (which states what sort of computer and web browser you are using) as being Microsoft Internet Explorer 6 on Windows NT 5.2 (Windows XP claims to be Windows NT 5.1, so I would guess the new crawler is pretending to be on Windows .NET or 2003).</p>
<p>If the results from the crawler will be made public or not (or if they are just for internal Microsoft development for some reason), or what affect it&#8217;ll have on the Internet and the way people search &#8211; especially considering that according to <a href="http://www.alexa.com/site/ds/top_sites?ts_mode=global&amp;lang=none">Alexa Research</a>, MSN.com is the 2nd most popular site world wide (Google is only 5th). But I&#8217;m wondering why MSN/Microsoft is so concerned about trying to semi-hide the crawler for now and why they are using a @hotmail.com address instead of a @microsoft.com one (the former doesn&#8217;t really give a lot of &#8220;respect&#8221; on the internet due to the fact anybody can get them for free).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2003/04/20/search-new-microsoft-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search: Changes, Talks and Flames</title>
		<link>http://blog.rac.me.uk/2003/01/30/search-changes-talks-and-flames/</link>
		<comments>http://blog.rac.me.uk/2003/01/30/search-changes-talks-and-flames/#comments</comments>
		<pubDate>Thu, 30 Jan 2003 23:54:04 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=304</guid>
		<description><![CDATA[Warning: extremely long post (2,838) words! Well, a lot has been happening in the &#8220;World of the Open Directory Project&#8221; (a.k.a. DMoz) in the last couple of weeks. First of all, because of server load issues the internal editor forums have been moved to a new server (yippee!) that authenticate with the main server to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.rac.me.uk/photos/2003/01/roboodp.html"><img src="http://blog.rac.me.uk/photos/2003/01/roboodp-thumb.jpg" width="110" height="83" border="0" align="left" alt="[Robozilla and the ODP]" /></a>Warning: extremely long post (2,838) words!</p>
<p>Well, a lot has been happening in the &#8220;World of the Open Directory Project&#8221; (a.k.a. <a href="http://dmoz.org/">DMoz</a>) in the last couple of weeks.</p>
<p>First of all, because of server load issues the internal editor forums have been moved to a new server (yippee!) that authenticate with the main server to ensure only valid users can log in. Good idea, but it&#8217;s had a few teething troubles (boo!).</p>
<p>Secondly, to help reduce the load on the main part of the ODP, editors have been given a &#8220;special&#8221; port number on which to connect to edit (hopefully reducing some of the overloading issues on the Apache webserver) &#8211; all good. Except if you are behind a corporate firewall and they block that port number <img src='http://blog.rac.me.uk/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
<p>Thirdly, the &#8220;mirror server&#8221; at <a href="http://dmoz.org/">http://ch.dmoz.org/</a> (which is hosted by a fellow editor in Zurich, Switzerland) seems to have &#8220;taken off&#8221; a bit and is being used by a larger number of people now (mainly as it&#8217;s a lot faster) &#8211; it&#8217;s transferring around 19Gb of data a month.</p>
<p>Fourthly, the ODP staff members have managed to produce &#8220;a&#8221; copy of the RDF dump. The RDF dump is, in fact, a big big big file which contains the URLs, titles and descriptions of all the (nearly) 4million sites listed in the ODP. Due to a large number of technical issues, this dump hasn&#8217;t been correctly produced since September last year. The RDF dump is usually downloaded by organisations such as &#8216;Google&#8217; to produces localised copies of the ODP (for instance the &#8220;PR enhanced&#8221; listings at the <a href="http://directory.google.com/">Google Directory</a>). ODP staff have (this week in fact) managed to produce an RDF dump which is available via <a href="http://rdf.dmoz.org/">rdf.dmoz.org</a>: there&#8217;s only a slight problem. It doesn&#8217;t contain &#8220;catid&#8221;s (unique category identifier numbers) &#8211; this is because these numbers got &#8220;clobbered&#8221; during the technical problems and so ODP staff are having to manually correct these database problems. Hopefully they&#8217;ll be fixed soon &#8211; but at least the ODP search has now been updated (since that uses the RDF dump) and there is an RDF dump for others to download and play with (which I&#8217;m intending to do this weekend).<br />
<span id="more-304"></span><br />
Fifthly, one of the founding members of the ODP (Rich <a href="http://www.skrenta.com/">Skrenta</a>) has recently given a talk about the early years of the ODP to &#8220;The Internet Developer Group&#8221;. The talk is available for viewing in a series of large JPEG files in an <a href="http://www.inetdevgrp.org/20030121/">online presentation</a> &#8211; but for speed of access, I&#8217;ve transcribed them here:</p>
<ol>
<li>
<blockquote><strong>Genesis Of The Open Directory Project</strong><br />
Rich Skrenta</p>
<p>January 21, 2003</p></blockquote>
</li>
<li>
<blockquote><strong>March 1998</strong></p>
<ul>
<li>Work project was winding down</li>
<li>Going up and down Sand Hill road trying to get a web-calendar startup funded</li>
<li>Read Danny Sullivan&#8217;s report on Yahoo&#8217;s listing problems on Search Engine Watch</li>
</ul>
</blockquote>
</li>
<li>(image of Danny Sullivan&#8217;s Search Engine Watch&#8217;s &#8220;Yahoo Special Report&#8221; from <a href="http://www.searchenginewatch.com/sereport/97/09-yahoo.html" class="broken_link">http://www.searchenginewatch.com/sereport/97/09-yahoo.html</a>)</li>
<li>(image of Wired News&#8217;s &#8220;Does Yahoo Still Yahoo?&#8221; article from <a href="http://www.wired.com/news/print/0,1294,10236,00.html">http://www.wired.com/news/print/0,1294,10236,00.html</a>)</li>
<li>
<blockquote><strong>Idea for GnuHoo</strong></p>
<ul>
<li>Yahoo seemed to be ignoring their core asset &#8211; the directory</li>
<li>How could we build a competitor?</li>
<li>Didn&#8217;t want to pay an editorial staff &#8211; even a cheap one</li>
<li>Tequila + Brainstorming = GnuHoo</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>Idea for GnuHoo</strong></p>
<ul>
<li>Use volunteer editors to build a web directory like Yahoo&#8217;s</li>
<li>Volunteers would do a better job than paid generalists, since they would be experts about their area &amp; have a personal interest</li>
<li>Restrict editors to sub-branches of the directory, to limit the harm they could do</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>Original Goals</strong></p>
<ul>
<li>Thought if we could reach 1,000 editors the directory would be successful</li>
<li>Bootstrap problem was key &#8211; how to get the first 10,000 sites. The directory had to look &#8220;real&#8221; from Day 1</li>
<li>Figured we needed 1M sites for a competitive directory</li>
<li>Original get-off-the-coach motivational goal: We told ourselves that if we could get a story in <i>Wired</i> out of the effort, it would be worth doing</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>&#8220;Seed&#8221; Problem</strong></p>
<ul>
<li>Needed a hierarchy &amp; 10,000 sites to launch the directory</li>
<li>Briefly considered Dewey Decimal
<ul>
<li>good thing we didn&#8217;t, it&#8217;s not free</li>
<li>didn&#8217;t seem to fit the web</li>
</ul>
</li>
<li>Original GnuHoo hierarchy mirrored Usenet</li>
</ul>
</blockquote>
</li>
<li>(shows how various USENET groups mapped to the relevant GnuHoo categories)</li>
<li>(image of the &#8220;Original Homepage Mock-Up&#8221;)</li>
<li>
<blockquote><strong>Category Bootstrapping</strong></p>
<ul>
<li>Scanned URLs mentioned in newsgroups to find seed sites for the corresponding directory category</li>
<li>This yielded something that looked pretty good at a casual glance</li>
<li>&#8230;but a lot of the original see URLs were bad sites or placed in the wrong category</li>
<li>The first editor in a category simply had to delete or move the bad entries, which left behind a good category</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>Coding &amp; Launch</strong></p>
<ul>
<li>Coded from April-June, 1998</li>
<li>Perl cgi and flat files</li>
<li>Simple HTML forms to add/edit/delete websites in the directory</li>
<li>Web pages served from static HTML files in a directory tree</li>
<li>HTML files regenerated whenever an edit was made</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>Simple Flat File Format</strong><br />
u: http://www.newhoo.com/<br />
t: NewHoo!<br />
d: The largest human-edited directory of the web<br />
c: Computers/Internet/Web_Directories</p></blockquote>
</li>
<li>
<blockquote><strong>Minimalist Design</strong></p>
<ul>
<li>Minimal locking, last-writer-wins semantics
<ul>
<li>flock() only used for category counts</li>
</ul>
</li>
<li>Write-with-append, rename() only safe operations</li>
<li>No big database</li>
<li>A few DBM files for minor stuff</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>Coding &amp; Launch</strong></p>
<ul>
<li>Used publicly-available software for keyword search of the directory; Originally Glimpse, later Isearch</li>
<li>First ran on BSDI, later moved to Linux
<ul>
<li>filesystem progression: ufs, ext2, vxfs</li>
</ul>
</li>
<li>Launched June 5, 1998</li>
<li>Acquired by Netscape in October, 1998</li>
</ul>
</blockquote>
</li>
<li>(image of the original NewHoo homepage)</li>
<li>(image of the Wired News &#8220;The Distributed Yahoo: &#8216;NewHoo&#8217;&#8221; news article from <a href="http://www.wired.com/news/print/0,1294,13625,00.html">http://www.wired.com/news/print/0,1294,13625,00.html</a>)</li>
<li>
<blockquote><strong>Early Press was Key to Growth</strong></p>
<ul>
<li>About 1% of the visitors to NewHoo applied to become editors</li>
<li>Some fraction of those would be accepted</li>
<li>The more traffic we got, the more editors we would get</li>
<li>We grubbed around for any hits we could in the beginning</li>
<li>Initial Slashdot, Netly, Wired, Red Herring stories were vital traffic sources</li>
<li>No matter what the story said, &#8220;Just spell our URL right&#8221;</li>
</ul>
</blockquote>
</li>
<li>(image of the &#8220;About the Open Directory Project&#8221; page from <a href="http://ch.dmoz.org/about.html" class="broken_link">http://ch.dmoz.org/about.html</a>)</li>
<li>
<blockquote><strong>Social Design of NewHoo</strong></p>
<ul>
<li>Not a free-for-all links page &#8211; every editor had to apply &amp; be approved</li>
<li>Every edit logged and possible to undo</li>
<li>Hierarchy of editors, with senior ones keeping an eye on the new ones</li>
<li>Emergent editing guidelines, enforced with peer review</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>Why Did You Apply to be a NewHoo Editor?</strong><br />
&#8220;There is a link to my old warwick uni account that has been dead for two years. As editor I could change it.&#8221;</p></blockquote>
</li>
<li>
<blockquote><strong>Why Did You Apply to be a NewHoo Editor?</strong><br />
&#8220;I&#8217;m already building Linux indexes and sites, better to have them all nicely integrated in computers/software/linux&#8221;</p></blockquote>
</li>
<li>
<blockquote><strong>Why Did You Apply to be a NewHoo Editor?</strong><br />
&#8220;We already maintain a site called CoinLink which lists over 800 coin related sites. We know the coin industry and could easily assist in building and maintaining this section of the index.&#8221;</p></blockquote>
</li>
<li>
<blockquote><strong>Why Did You Apply to be a NewHoo Editor?</strong><br />
&#8220;You have no category in Recreation/Collecting that focuses on Christmas ornament collecting. Ornament collecting is one of the fastest growing hobbies. I&#8217;ve collected ornaments for 25 years and feel I know many of the &#8220;best&#8221; web sites dealing with this subject.&#8221;</p></blockquote>
</li>
<li>
<blockquote><strong>Motivations to Edit</strong></p>
<ul>
<li>Same urge that makes you straighten a crooked picture you see on the wall</li>
<li>People were maintaining link lists on their own manually; they could do so more easily with NewHoo&#8217;s web forms</li>
<li>Didn&#8217;t need to see the whole directory finished to have their category be useful</li>
<li>&#8230;but knowing they were helping to build the pyramid was a warm fuzzy</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>Directory Editing is Amenable to Incremental Effort</strong></p>
<ul>
<li>First editor finds a good site and adds it</li>
<li>Second fixes a typo in the description</li>
<li>Third editor moves it to a more appropriate category</li>
<li>Fourth editor later notices the site moved and fixes the URL</li>
<li>Not as hard as writing device drivers; many can help</li>
<li>If you ask too much, results fall off quickly</ul>
</blockquote>
</li>
<li>
<blockquote><strong>The Free Use License</strong></p>
<ul>
<li>Netscape offered the data from the ODP under a free-use license</li>
<li>Directory data was adopted by Lycos, AltaVista, Google and other search engines</li>
<li>Only requirement was that the Add URL link point back to dmoz.org
<ul>
<li>helped keep dmoz authoritative &amp; prevent forks</li>
</ul>
</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>GnuHoo -&gt; NewHoo -&gt; ODP</strong></p>
<ul>
<li>FSF objected to the &#8220;Gnu&#8221;</li>
<li>Yahoo objected to the &#8220;Hoo&#8221;</li>
<li>Netscape renamed it to the Open Directory Project and hosted it on directory.mozilla.org</li>
<li>directory.mozilla.org was too long to type, so we shortened it to dmoz.org</li>
</ul>
</blockquote>
<li>
<li>
<blockquote><strong>Robozilla</strong></p>
<ul>
<li>Lloyd Tabb wrote a crawler to visit every site in the ODP to see if it was 404/301/302</li>
<li>Didn&#8217;t take action on its own, but alerted editors to potentially bad or moved sites</li>
<li>Brought bad sites in the ODP down to 0.25%</li>
<li>Our crawl of Yahoo showed 8% bad links</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>&#8220;That&#8217;s a Problem We Want to Have&#8221;</strong></p>
<ul>
<li>Design decisions were made in the interest of expediency. Why invest more time in the infrastructure if the site never takes off?</li>
<li>Still running much of the 1.0 code today, over 4 years later</li>
<li>Zillions of flat files in a gigantic VXFS filesystem</li>
<li>Were we wrong? No, I don&#8217;t think so</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>The ODP Won</strong></p>
<ul>
<li>55,000 total editors, probably 10,000 active</li>
<li>3.4M sites, 460K categories</li>
<li>Largest human-created taxonomy ever</li>
<li>Several times larger than competitors</li>
<li>Cited in 83 academic research papers<br />
(source: citeseer.nj.nec.com)</li>
</ul>
</blockquote>
</li>
<li>
<blockquote><strong>The ODP &#8220;Won&#8221;</strong><br />
&#8230;but directories no longer scale to the web for users:
<ul>
<li>small web: use a directory</li>
<li>big web: use keywords</li>
</ul>
<p>Everyone uses Google &#58;&#8212;&#41;</p></blockquote>
</li>
<li>
<blockquote><strong>&#8220;Lost Ark&#8221; Ending?</strong></p>
<ul>
<li>The traffic &amp; validation provided by Netscape was key to the ODP&#8217;s success</li>
<li>Possible future: lost server in an ops farm</li>
<li>What new idea can take the ODP to the next level?</li>
</ul>
</blockquote>
</li>
</ol>
<p>And, to round all this super long post of &#8211; I&#8217;ve been flamed! Yep, the following was posted over at <a href="http://www.resource-zone.com/">Resource-Zone</a> by user &#8220;odpobserver&#8221; at 30/01/03 08:16 PM:</p>
<blockquote><p>&gt;&gt;we do not care if Google lists your site because we do or decided to exclude it. All we care about is the content on <a href="http://dmoz.org/">http://ch.dmoz.org/</a> . </p>
<p>Beebware,</p>
<p>Let me get this straight. You are an SEO/&#8221;unpaid ODP editor&#8221; and you dont care that Google uses the ODP. I would suggest that you do care, because that is how you would be successful in fulfilling your services of SEO consultant. Getting your clients in prime ODP categories would greatly enhance your performance in Google!</p>
<p>Just what services do you provide? When a client is unable to achieve ODP listing, do you negotiate behind closed doors with other &#8220;unpaid&#8221; ODP editors to obtain ODP inclusion? Do you consult the internal ODP forums unavailable to the submitter to obtain information that you provide the submitter for a fee?</p>
<p>Beebware this is a good deal for you, isn&#8217;t it. Unpaid volunteer, right!</p>
<p>Your credibility is in severe question when you have such a blatant conflict of interest!</p>
<p>It is not only important to avoid a conflict of interest, it is important to avoid the appearance of a conflict of interest. Like so many &#8220;unpaid&#8221; ODP editors/SEO consultants, you have avoided neither!</p></blockquote>
<p>My response to this was to basically re-iterate what I said <a href="http://blog.rac.me.uk/archives/000211.html">in an earlier blog entry</a>, but in a terse and &#8220;pointed&#8221; manner (mainly to ensure it was understood): (posted at 30/01/03 08:57 PM)</p>
<blockquote><p>First of all, I&#8217;ve been an ODP editor for over 3 years but an SEO for less than a month.</p>
<p>Secondly, my editor logs are open to all editors to view if there are ANY allegations of abuse against me.</p>
<p>Thirdly, I have declared (again viewable by all editors) any and ALL sites I have ANY connections with (past and present employers, sites I&#8217;ve designed, sites I&#8217;ve promoted etc etc) to be open about these purposes.</p>
<p>Fourthly, the company that I am employed as does NOT guarantee listings in ODP (and, as I made clear at the interview) I will not compromise my editor position by placing our clients in the ODP. I actually thought I wouldn&#8217;t get the job because I wouldn&#8217;t compromise my position as an editor &#8211; but my new boss was totally understanding and hasn&#8217;t once even hinted at something like that. He did have an enquiry today about why a certain site isn&#8217;t listed in the ODP and I told him that it was likely because it was extremely similar to another site also actually owned by us. I have not and WILL NOT compromise my editor position. If you have ANY evidence at all that I have, please feel free to report it to a meta editor and they will remove me from the ODP.</p>
<p>Fifthly, a good SEO company can get high rankings in Google WITHOUT an ODP listing. Yes, most people think it is &#8220;essential&#8221; to be listed in the ODP to get high &#8220;PR&#8221; value &#8211; but the crux of the matter is, there&#8217;s a lot more to it then that. My employer owns and operates around half a dozen sites &#8211; only one of those is listed on the ODP (I have not edited _any_ of them except to add a note to the sites to indicate that I am affiliated/connected with them) &#8211; but all of the sites appear on the first page of results on Google for the targeted key phrases.</p>
<p>Yes, you could argue there _could be_ a conflict of interest: but only if I there be. I have, in fact, actually REMOVED one of our clients sites from the ODP (as it was a doorway site and should never have been listed in the first place) &#8211; I aim (as all editors should do) to treat all sites equally: if you feel that I haven&#8217;t, then (again) please report it to a meta editor and they will remove my editing rights in the ODP.</p>
<p>Next time you start throwing allegations about, please ensure that you have at least a minimal amount of proof to back them up&#8230;</p></blockquote>
<p>However, the moderators of the forum decided that the post was not appropriate to the forum and so they deleted odpobserver&#8217;s post and my reply (all in line with the <a href="http://www.resource-zone.com/guidelines.php">forum guidelines</a> where it states &#8220;Complaints about specific people working or volunteering their time at ODP.&#8221; / &#8220;Discussion of the ways in which ODP runs itself.&#8221; / &#8220;Discussion of how to use the ODP to optimize search engine rankings and site promotion. This ODP is not a search engine, and we don&#8217;t rank or optimize web sites.&#8221;)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2003/01/30/search-changes-talks-and-flames/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search: Do The Google Dance</title>
		<link>http://blog.rac.me.uk/2003/01/28/search-do-the-google-dance/</link>
		<comments>http://blog.rac.me.uk/2003/01/28/search-do-the-google-dance/#comments</comments>
		<pubDate>Tue, 28 Jan 2003 21:06:45 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=298</guid>
		<description><![CDATA[Kuro5hin has an interesting article about what the &#8220;Google Dance&#8221; is and how it affects your ranking on the worlds most popular search engine. Long story short: Dance equals Data. Servers. Moving. New Results. A more complete answer is that the &#8220;Google Dance&#8221; is the nickname that has been given to the time of the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.rac.me.uk/photos/2003/01/googledance.html"><img src="http://blog.rac.me.uk/photos/2003/01/googledance-thumb.jpg" width="100" height="100" border="0" align="left" alt="[Google Dancer]" /></a>Kuro5hin has an <a href="http://www.kuro5hin.org/story/2003/1/27/221829/873">interesting article</a> about what the &#8220;Google Dance&#8221; is and how it affects your ranking on the worlds most popular search engine.</p>
<p>Long story short: Dance equals Data. Servers. Moving. New Results.</p>
<p>A more complete answer is that the &#8220;<a href="http://www.google-dance.com/HTML-about.html" class="broken_link">Google Dance</a>&#8221; is the nickname that has been given to the time of the month (usually around the 28th) that the data that &#8220;<a href="http://www.google.com/bot.html">GoogleBot</a>&#8221; (<a href="http://www.google.com/">Google</a>&#8216;s little spider/robot that goes round &#8216;reading&#8217; the web) is introduced into the system. However, since Google has over <a href="http://www.google.com/press/highlights.html" class="broken_link">10,000 servers</a> it does take some time for the data to propagate around (&#8220;propagate&#8221; has now become my favourite and most used word for some reason). It has been long known that the start of the &#8220;Dance&#8221; can be found be watching when the data on the <a href="http://www2.google.com/" class="broken_link">www2</a> and <a href="http://www3.google.com/" class="broken_link">www3</a> starts &#8216;reading differently&#8217; than that on the main Google server (an illustration in the Kuro5hin article is to do a query for links to Yahoo!).<br />
<span id="more-298"></span><br />
One thing that has always bothered me about this is &#8211; obviously the main www server at Google points at multiple machines and it is extremely likely it is the same situation for www2+3, so why has Google separated these blocks? Surely it&#8217;ll make more sense to allow zero public access to the 2+3 data centres and just let the data spread across the servers without the public knowledge. Why do &#8220;we&#8221; need to know when a dance starts?</p>
<p>Well, I asked the question, now I&#8217;ll answer it. A good reason for noticing when a dance starts is so that search engine optimisers/optimizers and search engine placement specialists can look at the new Google search results as soon as they become available and see if any changes in the Google <a href="http://www.google.com/">PageRank</a> algorithm have taken place. If site X drops in the rankings but site Y rises, find out why and fix it before the next visit of Googlebot. I personally know of a number of changes that could really reck havoc across the SEO field when Google puts them in place (yes, I did say &#8220;when&#8221; and not &#8220;if&#8221; &#8211; I&#8217;m aware that Google knows about these &#8220;tricks of the SEO trade&#8221; and are just finalising how to work around them- it&#8217;ll probably be a couple more months before the code is in place though). Google employees are quite rightly proud of the technology behind the &#8216;Big G&#8217; and try not to mention what takes place or is planned, but sometimes you&#8217;ve just got to pay attention to what <b>isn&#8217;t</b> being said rather than what is.</p>
<p>Google is, at the time of writing, just coming to the end of one of its dances and I&#8217;ve already noticed a few minor changes. First of all, Google&#8217;s <a href="http://images.google.com/">image search</a> doesn&#8217;t seem as comprehensive as it was, they seem to have penalised a few more irrelevant sites (i.e. sites that included keywords some where on the page that had no relation to the content) and they also seem to have added a little extra &#8216;weighting&#8217; to sites from certain sources. It doesn&#8217;t look like a major change, just a few minor tweaks.</p>
<p>Oh &#8211; the good news is is that the Google &#8220;FreshBot&#8221; seems to really like my blog. The &#8220;FreshBot&#8221; is the name for the Google crawler/spider/robot that visits sites more often than the usual &#8220;once a month&#8221;. FreshBot is updating Google with the contents of my blog at least once a week &#8211; sometimes as often as every other day! Go me!</p>
<p>There are also a couple of interesting blogs about Google that you may like to read: <a href="http://googlevillage.info/" class="broken_link">Google Village</a> and <a href="http://google.blogspace.com/">Google Weblog</a>. You may also find <a href="http://www.webweavertech.com/ovidiu/weblog/">Ovidiu Predescu&#8217;s</a> weblog worth keeping an eye on as he&#8217;s recently started working for &#8216;G&#8217;.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2003/01/28/search-do-the-google-dance/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Search: Meetup Arrangements Progressing</title>
		<link>http://blog.rac.me.uk/2003/01/14/search-meetup-arrangements-progressing/</link>
		<comments>http://blog.rac.me.uk/2003/01/14/search-meetup-arrangements-progressing/#comments</comments>
		<pubDate>Tue, 14 Jan 2003 22:15:38 +0000</pubDate>
		<dc:creator>Richy C.</dc:creator>
				<category><![CDATA[Net: Search Engines]]></category>

		<guid isPermaLink="false">http://blog.rac.me.uk/?p=231</guid>
		<description><![CDATA[As hinted to in a previous entry the arrangements for the 2003 Open Directory Project UK editors real life meetup (phew &#8211; what a mouthful!) are progressing. We&#8217;ve already got a likely date (that I suggested ) and voting has already started on the likely locations &#8211; Cambridge, Oxford, Leeds and Bournemouth are the most [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.rac.me.uk/photos/odpmeetupkew.html"><img src="http://blog.rac.me.uk/photos/odpmeetupkew-thumb.jpg" width="110" height="99" border="0" alt="[ODP Meetup in Kew Gardens 2001]" align="left" /></a>As hinted to in <a href="http://blog.rac.me.uk/archives/000200.html">a previous entry</a> the arrangements for the 2003 Open Directory Project UK editors real life meetup (phew &#8211; what a mouthful!) are progressing. We&#8217;ve already got a likely date (that I suggested <img src='http://blog.rac.me.uk/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ) and voting has already started on the likely locations &#8211; Cambridge, Oxford, Leeds and Bournemouth are the most likely &#8211; Bradford, Gloucester, Exeter and Leicester are next. But it looks like no one wants to be sent to Coventry &#8211; it&#8217;s only got one vote so far..</p>
<p>I&#8217;ve invited the other <a href="#20020114_odpsix">6 editors</a> of <a href="http://ch.dmoz.org/Regional/Europe/United_Kingdom/England/Leicestershire/" class="broken_link">Regional/Europe/United_Kingdom/England/Leicestershire/</a> (and, yes, I have memorised that nice long URL &#8211; hence why I keep on quoting it on places such as <a href="http://www.resource-zone.com/">Resource-Zone</a>) to the internal forum thread discussing it &#8211; bringing the total of editors contacted regarding it to over 100. The attendance last year was reasonable at 13 editors (location was the Briar Rose pub in Birmingham), but hopefully we&#8217;ll have a few more come this year.</p>
<p>If you are an UK ODP editor reading this (I know at least 3 people have found this blog via my private <a href="http://dmoz.org/profiles/beebware.html" class="broken_link">ODP editor profile</a> &#8211; the private ones are accessible to &#8216;editors only&#8217; and only contain a few little extra nuggets of information), then login to the ODP (forgot your password? Then <a href="http://ch.dmoz.org/cgi-bin/forgot.cgi" class="broken_link">get a password reminder</a>!) and pop over to the internal forum &#8220;Penguin Cafe&#8221; and have a read of &#8220;A new year, a new UK editor get-together&#8221;).</p>
<p><a name="20020114_odpsix">Yes,</a> there are only 6 listed editors for the whole of Leicestershire &#8211; over 1,180 sites &#8211; but there are less than half-a-dozen unreviewed sites in the whole of Leicestershire: the majority of those are &#8220;dead&#8221; sites (i.e. sites that are returning 404 errors, DNS is currently unable to resolve etc.) that have been moved to unreview until they come &#8220;alive&#8221; again. Of course, any editors of <a href="http://ch.dmoz.org/Regional/Europe/United_Kingdom/England" class="broken_link">England/</a>, <a href="http://ch.dmoz.org/Regional/Europe/United_Kingdom/" class="broken_link">UK/</a>, <a href="http://ch.dmoz.org/Regional/Europe/" class="broken_link">Europe/</a> and <a href="http://ch.dmoz.org/Regional/" class="broken_link">Regional/</a> can also edit there &#8211; along with any editall or meta editor.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rac.me.uk/2003/01/14/search-meetup-arrangements-progressing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

