Search Engines And Spam
Posted on 21 Feb 2002
by Daniel Bazac (daniel_bazac)
Rated 3.73 (Ratings: 12)
- More articles in Site Development
What is Spam?
The word "spam," as it applies to search engines (SEs), refers to any illegal technique used to improve a page's ranking in the Search Engine Results Pages (SERPs). Who decides what's illegal? The SEs, who else?
Tons of articles have been written about search engines and spam. Why the need for another article? Because, believe it or not, at the end of 2001, MOST of the major search engines are still vulnerable to MANY unethical techniques used by malicious webmasters.
Spamming techniques have been used for years. Today, many SEs say they know all the tricks and penalize those pages. But as you will see, most SEs only SAY that they will punish spammers, but in fact they DON'T.
What is Search Engine Optimization?
Search Engine Optimization — placement, positioning, ranking or whatever you want to call it — is the process of designing a web page that can be easily indexed by the SEs, improving its chances of ranking highly in the SERPs.
My Experience with Search Engines
Recently, using some major U.S. SEs, I made a search for a web design company in New York. Let me share with you my highly disappointing experience.
The search engine results pages in most of the search engines were full of web pages that used one or more spamming techniques. The most frequently encountered illegal strategies, used by many webmasters, were:
- Keyword Stuffing
- Page Redirect
- Mirror Domains
Keyword stuffing is the adding of many relevant, sometimes irrelevant, words in the "keywords" META tag and in the page's visible text.
1) Keywords META tag
One of the criteria some search engines use to rank pages in the results list is the presence of relevant words in the keywords META tag. From use to abuse is a small step, so some webmasters insert many words, repeating them many times in the hope that the page will rank higher. The record was a page that had 1,150 words (no mistake, WORDS not characters) in the keyword Meta tag, among which the word "design" was repeated 209 times!
What are search engines' positions regarding this spamming technique? The submission guidelines of one of the web's most important search engine, states that it will "exclude submissions" with "excessive keywords". Apparently 1,150 words are not "excessive" enough, because THAT SE and many others index the page.
Besides stuffing words in the keywords META tag, some webmasters also add lots of words in the visible text.
2) Visible page
Another criteria some search engines rank sites in the results list is the so-called "word frequency": the more times a word is repeated in the content of a page, the higher are the chances it will be near the top in the results list.
Some webmasters often abuse these criteria by repeating words or phrases many, MANY times, usually at the bottom of the page. That site with the 1,150-word META tag used this ploy, as well.
Site visitors would think it odd to see such a collection of words. So webmasters work around this problem two ways: "tiny text" and "hidden or invisible text."
"Tiny text" means that the webmaster formats the text in a very small size, most of the time hardly legible. "Invisible text" means the text is formatted in the same color as the page background. Users will NOT see the words but SE's spiders (the program that searches your site) WILL, ranking the page higher than it deserves.
The search engines' positions regarding this spamming technique? Guidelines at one of the web's most important search engines state: "We must sometimes exclude submissions" of "pages with text that is not easily read, either because it is too small or is obscured by the background of the page." Another SE says it will "significantly downgrade a page's ranking ... if words cannot be read due to their small size or color."
Despite these statements, both SEs (as well as others) were found to have indexed pages having text in the same color as the background. One page I found, has a whopping 936 keywords in BLACK text on a WHITE background, making a raw keyword list *visible*. So little respect for users.
With page redirection, the web site visitor visits a certain page, but the site immediately sends the visitor to a different page instead. The page can be redirected either by using the "refresh" META tag or by using server-level cloaking techniques. Why redirect a page? One of the legitimate reasons is if the site has a new URL (web address). But some webmasters abuse page redirection to obtain higher rankings.
Here's how it works:
1) Using the "refresh" META tag:
This technique consists of building two pages: one, a highly "optimized" — page; as in "spammed" — with many, MANY words in the keywords and description META tag and also in the Title tag. Most of the time, the text of the page is also abnormally stuffed with keywords — often as invisible text.
These pages called — "doorway," "gateway," "entry" or "bridge" — generally display only a message that says, "click here to enter the site" or simply "Enter" or sometimes "select Flash or HTML." The second page will appear after a predetermined number of seconds. If the time is set to "0" (zero), the viewer will NOT see the first page, and will effectively go directly to the second one.
But why two pages? The first one shown to SEs is highly "optimized" to help the page rank very high in the SERPs, cheating the SEs. The second one is "nicer," not too much spam, a good page for viewers.
Cloaking is probably the most controversial spamming technique. Like page refreshing, it uses two pages, one for the SEs and another for the viewer. The big difference between these techniques is this: with refreshing, it is possible for the knowledgeable user to see the code of the first page, but with cloaking, the user cannot view the code of the page shown to search engines.
Hiding code from users — especially so that they can't see the list of keywords — can provide a huge advantage in this current highly competitive market. There are known cases of webmasters who have stolen competitor's keywords in an attempt to rank higher.
What are search engines' positions regarding redirecting? At the time of my analysis, search engines' own submission guidelines pages advised not to submit "any site with an address that redirects to another address," "your site cannot mirror or redirect to another Web site," and "[we] may permanently ban from our index any sites or authors who engage in cloaking to distort their search rankings." One SE admonished simply: "Don't cloak."
Pretty clear, right? Yet, some webmasters insist on using cloaking techniques to hide their pages' code from prying eyes. Bad guys are not afraid to spam. Most of the time, SEs only SAY: "don't do that." But if you do, there will be NO punishment — or maybe very little.
Mirroring consists of building hundreds or even thousands of pages with the same content, but with altogether different URLs. The advantage is clear: by finding the "right" tricks to cheat the SE's algorithm, the marketer can "dominate" the SERPs with a multitude of listings, one page after another. One company had 62 pages in the Top 100 results list.
What are the search engines' positions regarding this spamming technique? "Do not submit mirror sites." "Your site cannot mirror or redirect to another Web site." "Do not submit ... the same pages from multiple domains." SEs don't like it, but in practice, most of them are vulnerable to this technique.
In light of these findings, I have several more-than-rhetorical questions for the parties involved: Search Engines, Webmasters and Site Owners.
Questions for Search Engines
When are you going to be SERIOUS about your job?
When will you PUNISH the spammers? Have CLEAN indexes? It is not difficult. A dialogue with the webmaster community might result in opinions such as the following: (which appears in the forums at webmasterworld.com)
"I can't understand why the search engines aren't professional enough to put their anti-spam efforts into a detailed agreement and have anyone that wants to be listed sign their agreement. Such an agreement would spell out very clearly what is and what isn't allowed. Anyone breaking the rules would be subject to specific penalties or banning, but would be notified and have a chance to fix the problem especially if the infraction was not too serious."
That would seem clear enough.
Why do SEs accept advertising from sites that use spamming techniques? With such advertising, if a page cannot achieve high rankings, for a few dollars it becomes a "featured" site or listing and voilà! on the top of the list. Is this the latest trick for a bad site to be listed highly? On one hand, in your guidelines you tell webmasters not to spam, on the other hand, if a spammer PAYS, that's no problem. Goodbye relevancy, hello profits! Here is a comment by a high-ranking official from a major SE: "the more we take payment for listings, the more you'll get great results." Excuse me? "Big" pockets' sites are NOT always more relevant than "poor" pockets' sites.
I know I'm not the first (see Commercial Alert Files Complaint Against Search Engines for Deceptive Ads) one to ask but when are you going to make a CLEAR distinction between paid listings and real results? The user needs to make an informed decision. The user needs to be helped, not confused.
Questions for Webmasters
Why use techniques seen by the SEs as spam? Why risk having your pages penalized, or even permanently banned from indexes? Two months of glory, then a new domain, blacklisted again - poor way to market any business.
Do you think it's impossible to get higher rankings WITHOUT using spam? My answer is NO. Do your homework, read thoroughly and abide to SEs guidelines, cross your fingers and you'll be #1.
Have you ever seen one of your page's ranking lower than a page which uses spamming techniques? I bet you did. Now, how did you react? Did you report it to the SE or you said "nah, that's no use anyway"? Or perhaps you thought it's not *nice* to turn in a webmaster. Okay, let me ask you something: how would you like it if while you were waiting in line to buy a movie ticket, somebody cut in front of you?
Or perhaps THIS (see How Spamdexers Achieve Higher Search Engine Placement and Positions) is the solution?
- A site will be nominated and posted on a listserv which any member can second then the offender will be notified of his conviction by the spamdex police.
- They will have a week to clean it up and resubmit and remove the offending listing from the search engines.
- After a week the offender will be reported to the search engines. The search engines will have a month to act or they will be added to the list as an accomplice to the activity.
What do you think?
Questions for Site Owners
Are you paying sufficient attention to these issues? As recently reported, (see Companies Lack Sound Search Engine Strategies ) "nearly 46 percent of the marketers surveyed said they allocate less than 0.5 percent of their annual marketing budgets on search engine optimization (SEO) services." Please read that again.
What are your rankings? When was the last time you checked, if ever? Are you in the Top 30 results for your strategic keywords when the user runs a search? You're not? Then, in practice, you don't exist for your prospects. Still wonder why so many dot-coms CLOSE? Are you going to be next?
You don't need a Ph.D. to see that the SEO specialist — your web site's "salesman" — is more necessary than ever. One of these days, you *will* discover the power of search engine marketing: the cheapest yet most effective way to promote your site to more than 400 million prospects. I see a growing need for the SEO person, don't you?
When are you going to understand that search engine optimization involves more than optimizing the keywords META tag? SEO is a highly specialized, time-consuming and sometimes expensive task, but it is absolutely NECESSARY. It's both an art and a science to position a Web page near the top in SERPs. So, forget about your in-house SEO "expert," shop around and find a reputable SEO firm or consultant.
Dos and don'ts when you talk to the SEO specialist:
- If a web marketer suggest you should get rid of that flashy or framed home page, just DO IT! Or, don't blame HIM (or her) for the consequences.
- If a search engine optimization expert suggests cloaking or doorways, run, and I mean it.
- Don't even think telling a SEO expert, that you'll pay him, AFTER the pages show up in the search engines. A SEO expert can know search engines intimately, but cannot ultimately control what they will do.
- Don' be immature by asking him to GUARANTEE you Top 10 positions! No honest SEO will do it. The only thing a SEO can guarantee is an increase of the number of pages indexed and an increase of the current rankings. With some luck your sales will skyrocket.
Last thing: please, avoid those submission tools and their hype: "submit your site to 500,000 SEs for $24.95." Submit your web site by hand. Period.
Since the summer of 1995, when I worked for the first time with a search engine, I saw some SEs disappearing, and lots of new SEs appearing.
Competition among search engines is good, but users might be confused with thousands of them. Besides, not everybody knows sophisticated searching techniques (such as the Boolean operators), so there is a lot of frustration out there. If we add the irrelevancy of the results in most of the SEs, we have a pretty sad picture.
A clean index should be the main priority for any search engine. If a SE gives irrelevant results, the user will switch to another SE. Can a SE afford losing users in today's fierce competition between SEs? I don't think so.
To wrap up, I am not saying here that ALL the search engines are vulnerable to ALL the known spamming techniques. All I say is that MOST of the Search Engines are vulnerable to MOST of the bad tricks.
I also believe that it is NOT important to list WHICH SEs are vulnerable to WHICH spamming techniques or WHICH sites spam WHICH SEs. The important thing is it STILL happens.
And this applies not only to American search engines. I've checked also Spanish, German, French and Italian SEs. They have less spam, but it's still there.