[ Google-related ] |
|
Combating Search Engine SpamAt this time the application of Bayesian filters to search results is not often discussed but it does show promise. Some of the obstacles to widespread use of Bayesian filtering may be that it is considered too processing intensive to field as a wide spread solution. It requires human intervention to set it up initially. Another problem is that what is spam to one web surfer may be desirable results to another and vice versa.But the effort may be well worth it. Think how often a search turns up junk pages with no real content but instead pages filled with fragments of gibberish surrounded by ads. In the case of Adsense these sites are sometimes referred to as "MFA's" (Made For Adsense) Presumably when the perpetrators applied for Adsense they submitted a website with content that could pass a human review but then subsequently used their shiny new publisher id to place ads on hundreds or thousands of junk pages in link farms. It is even more frustrating to see sites with good quality content buried ten pages deep in the search results, well below sites like these MFA'a brimming with ads placed for the sole the purpose of tricking the unwary into clicking on one of them. At the time this article was written one tactic for eliminating spam from search results was to filter for duplicate content because many MFA's employed the technique of putting up sites consisting of little more than fragments of text copied from other sites. However, unique content is not that difficult to produce en masse. Here is some example output from a simple tool, the 'Gibberish Generator' that generates content automatically and even though it is gibberish it would pass the uniqueness test:
In fact it is possible to utilize a simple php script to create an entire site of nothing but gibberish which is demonstrated here: Gibberish Demo Site The holy grail of computing, true artificial intelligence may be a long way off but until such time as it arrives and the "I'm Feeling Lucky" button is changed to the "Do What I'm Thinking" button, using Bayesian filtering to rate web sites may be a viable method. Shown below is a mock up of the search results for 'forklift repair' showing what Bayesian filter rated search engine results could look like where a higher rating indicates a less spammy site. At the time the mock up was created the filter was only very lightly trained but it still managed to identify one site among the top ten results as having 0 value to someone who was in fact actually trying to get a fork lift repaired!
Hope you find what you're searching for! |
![]() |
© 2008 Michael Thompson |