A user agent acts on behalf of a user. Software agents include servers,

proxies, spiders, browsers, and multimedia players.

W3C:

Architecture of the World Wide Web, Volume One. From the

Introduction

(December, 2004).

User Agents

The phrase User agent or user-agent or UA or browser

or client or client application or client software program...all pretty much refer to the same

thing.

Or maybe not.

Many people quickly assume this term is the jargon equivalent of browser

(there is a much better definition of user-agent on Wikipedia).

And while a browser may not represent you in the same way as a human agent, it does perform

an action on your behalf.

Or maybe not.

In other words, sometimes a user-agent can be malicious.

Agent in White

Most of the time you will initiate a request for a Web page, and in these cases a browser represents

you in a very direct way — in order to fetch the resource for you and return it so you

can view it. Or perhaps so you can listen to it.

But what about a crawler/spider/robot? These are also user-agents, though you personally

didn't ask them to do anything (at least not when they made the crawl). But

they do make it possible to find things on the Web. Consider that the next time you search

for (and find) something on Google. So maybe not so much on your

behalf, as on our behalf.

In an even broader sense, a Web server is also a user-agent. Let's look at a

typical client-server model event cycle: the browser (on your behalf) requests a page from

a Web server, which (if it has what you asked for) then fetches the document (also on your

behalf, but also on behalf of any advertisers or similar content that may be present on the

page). It then returns the document to your browser (excuse me, user-agent). Or any other

user-agent that requests the same resource. Assuming there are no restrictions on who, or

what, can request it.

Agent in Black

So how can a user-agent be malicious? In any of the same ways that people can be, naturally,

since user-agents don't write themselves. This isn't a movie folks.

A spider can be malicious because it may not follow the rules or is only looking for one thing

and doesn't care anything about adding to your Web experience. An email harvester

is an example of this. All these little beasts do is search the Web looking for addresses to

add to their owner's databases, so they can spam their victims until they probably

want to scream every time they open their email program (also a user-agent). So, a harvester is

also an agent, only the user in this equation is the spammer.

A crawler can also ignore acceptable behavior when you, as a Web site owner, edit your robots.txt

file to say: okay, you're allowed to look around here and there, but

not here
and the agent ignores this and pokes around anywhere it damn well pleases,

often trying to look directly at the things it's not supposed

to look at. And this can lead to some pretty interesting ideas from folks trying to combat the guys

in the black hats. Try this simple experiment sometime: Edit your robots.txt file

and add a rule that disallows access to an arbitrary directory, it's not important whether

it even exists. Now wait a few days and scan your log files looking for any agents that tried accessing

that directory. Hmm...

User-agent: *

Disallow: /agent_black/

You may also want to check out Project Honey Pot, which is a grassroots organization trying to at least

slow the flood of this stuff—a problem that I personally think accounts for a measurable

drain on the entire network. And for what RoI? I would love to

look at the numbers: Let's say, for every 100,000 people they piss-off, one, maybe

one clicks through to one of these dumb ads? And for those few that do, what

percentage actually buy something? I suspect that spammers have to make a lot of people very angry to get

a scant handful of sales. Sigh.

The list of bad guys goes on. What about these downloadable browser toolbars? They can certainly enhance

your online experience. But just as many can be deceiving and are really interested in sniffing around

on your personal computer looking for CC numbers or popping up ads in your face when you probably

don't like ads popped up in your face. In this example, it's the Web server

that's the bad guy. Or rather the people that configured it to deliver these adware,

spyware or other programs to your door. One in particular is not among this list. The Netcraft Toolbar

has a number of useful features, and is also used by a community of alert members to help prevent fraud

and phishing attacks.

And what about commercial browsers? Internet Explorer from Microsoft is a very popular user-agent,

but does it wear a white hat? While it never set out to be a bad guy, I would certainly lump it in

the category of not-so-nice user-agents. Why? Because the developers thumbed their noses at accepted

standards, and even worse, left the thing wide open for exploitation by the guys wearing the

really black hats.

Wear a White Hat

The Web is an amazing resource. It was built with openness and a free exchange of ideas and software,

and by a lot of very hard working people that you may never have heard of. Without Bill Joy, or Tim

Berners-Lee, or Larry Wall or any number of other people who didn't get rich, or ever

wanted to get rich, we wouldn't have the Web. Sadly, it's also awash with

rats and thieves. Often, some of their techniques are so sophisticated I have to wonder why they

don't expend some of that energy on legitimate enterprises. Even worse, many are mere

children who exchange little scripts and don't even understand what they're

doing and think it's cool trying bring down or deface someone's Web site.

Not cool at all.

In closing, I leave you with your very own three line Perl user-agent:

#!/usr/bin/perl

use LWP::Simple;

getprint shift;

And please, buy yourself a white hat.

A humorous post on the history of this article is available on my blog. You are welcome to

submit questions and comments there.