Press "Enter" to skip to content

Day: 4 January 2004

Search Engines: No Index Sections?

A fellow blogger has suggested that a tag be introduced which would stop search engines such as Google from indexing certain sections of web pages. This would be extremely handy for all the blog comment spam which is currently going around (I’m personally using a combination of IP blocking [like Neil] and modification of /lib/MT/App/Comments.pm to block certain words in submitted URLs), but instead of
>!-- SearchEngine: Begin Anonymous Comment --> / <!-- SearchEngine: End Anonymous Comment -->
I would recommend something a bit more generalised such as:
<!-- robots:noindex --> / &lt!-- /robots:noindex -->

To try and fit in with the already existing robots.txt and robots meta tag (it also could be extended to things like <!– robots:nofollow –> for sections of content).

This tag would be used to mark sections of web page content as being “not to index/search”: so if a spammer does managed to add their URL to a website, but the URL appears in between the &lt!– robots:noindex –> tag then the search engines will ignore the listing making the spam useless in regards to search engine placement/promotion.

However, there’s a number of drawbacks that I can see for this introduction to the search engine world: