Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

Handy Little Perl Script

Rated 3.11 (Ratings: 8) (Add your rating)

Log in to add a comment
(16 comments so far)

Want more?

 
Picture of jesteruk

Jester uk

Member info | Full bio

User since: December 22, 2001

Last login: December 22, 2001

Articles written: 6

In this article I'm just going to show you a little script that will make working with forms and cookies in Perl a little easier. The form information, and any cookies present will already be dumped into variables matching their names, like in PHP.

Time Saving

I don't know about you, but when I'm coding in Perl I used to get sick of breaking the query string, POST input, cookie information up so i could get at it. This little script can be included at the top of each of your perl script to make it just that little bit easier. I know a lot of people have probably already got something similar for web programs with Perl, but I'll put it up here anyway.

Here it is

Process GET/Query String

#!/usr/bin/perl

# Process form variables for both POST and GET methods.
# Below Will process all query string variables.

if ($ENV{'REQUEST_METHOD'} eq "GET" || $ENV{'QUERY_STRING'} ne "")
{
 @pairs = split(/\&/, $ENV{'QUERY_STRING'});
 foreach $pair (@pairs)
 {
  ($name, $value) = split(/=/, $pair); # Split into name and value.
  $$name = $value; # Assign value to scalar matching name.
  $$name =~ s/%(..)/chr(hex($1))/ge; # Decode encoded stuff.
  $$name =~ s/\+/ /g; # substitute +'s for spaces.
 }
}

If the request method is GET the variables will be in the query string, all query string information will be processed by this bit of code whether it's from a form or not. We split the query string at the & and dump each pair into the @pairs array. Then we use the foreach function to go through each pair in the array, split them at the = and dump the first part into $name and the last part into $value. Then we assign the $value variable to a scalar variable matching the name of the $name variable. Say we had in the query string: ?action=mail, we would now have a variable $action containing "mail". Finally we decode anything that has been URL encoded, and replace any + signs with a space. Remember to format your query string correctly when using this script. Just http://host.com/index.pl?mail will cause weird things to happen. Assign "mail" to a handler ?action=mail.

Process POST

# Process POST form variables.

if ($ENV{'REQUEST_METHOD'} eq "POST")
{
 read(STDIN, $stuff, $ENV{'CONTENT_LENGTH'});
 @pairs = split(/\&/, $stuff);
 foreach $pair (@pairs)
 {
  ($name, $value) = split(/=/, $pair);
  $$name = $value;
  $$name =~ s/%(..)/chr(hex($1))/ge;
  $$name =~ s/\+/ /g;
 }
}

Reads the POST information into the variable $stuff and then processes them in the same way we did the query string, assigning the value to a scalar variable matching the form input name.

Process Cookies

# Process cookies.

if ($ENV{'HTTP_COOKIE'} ne "")
{
 @pairs = split(/\; /, $ENV{'HTTP_COOKIE'});
 foreach $pair (@pairs)
 {
  ($name, $value) = split(/=/, $pair);
  $$name = $value;
  $$name =~ s/%(..)/chr(hex($1))/ge;
  $$name =~ s/\+/ /g;
 }
}

This time, if $ENV{'HTTP_COOKIE'} contains data we split the information at the ; and then process it the same way we processed the POST and GET/query string information.

Include it

Just include the script at the top of each of your perl files.

do "$DOCUMENT_ROOT/script.pl";

The Full Script

#!/usr/bin/perl

# Process form variables for both POST and GET methods.
# Below Will process all query string variables.

if ($ENV{'REQUEST_METHOD'} eq "GET" || $ENV{'QUERY_STRING'} ne "")
{
 @pairs = split(/\&/, $ENV{'QUERY_STRING'});
 foreach $pair (@pairs)
 {
  ($name, $value) = split(/=/, $pair); # Split into name and value.
  $$name = $value; # Assign value to scalar matching name.
  $$name =~ s/%(..)/chr(hex($1))/ge; # Decode encoded stuff.
  $$name =~ s/\+/ /g; # substitute +'s for spaces.
 }
}

# Process POST form variables.

if ($ENV{'REQUEST_METHOD'} eq "POST")
{
 read(STDIN, $stuff, $ENV{'CONTENT_LENGTH'});
 @pairs = split(/\&/, $stuff);
 foreach $pair (@pairs)
 {
  ($name, $value) = split(/=/, $pair);
  $$name = $value;
  $$name =~ s/%(..)/chr(hex($1))/ge;
  $$name =~ s/\+/ /g;
 }
}

# Process cookies.

if ($ENV{'HTTP_COOKIE'} ne "")
{
 @pairs = split(/\; /, $ENV{'HTTP_COOKIE'});
 foreach $pair (@pairs)
 {
  ($name, $value) = split(/=/, $pair);
  $$name = $value;
  $$name =~ s/%(..)/chr(hex($1))/ge;
  $$name =~ s/\+/ /g;
 }
}

Expand It

If you're using Perl for CGI why not use this little script, and add to it some? Maybe you could define the content-type header in the script, as you'll probably be using HTML just add:

print "Content-type: text/html\n\n";

so you don't need to bother doing it in your scripts. Why not add a mailing subroutine into it (see MartinB's Article, A Simple CGI E-Mail Subroutine). Be creative.

Bibliography

Well, there's no links i can point you to really. Hummm, how about some Perl stuff?

I just like messing around with web design stuff, just a hobby.

Particularly perl, PHP and SQL.

http://www.free2code.net/

use modules if you can...

Submitted by warpedjedi on January 24, 2002 - 07:56.

this is ok for smaller perl scripts, but you should also look into using the immense library of perl modules available at CPAN. I work with perl on a regular basis and perl modules are a lifesaver!

for example, the above can be done easily using the CGI module:

#!/usr/bin/perl

use strict;
use CGI;

my $q = new CGI;

# all parameters that are submitted to this page via 
# post/get/cookie will be available using the param function

# print out the values of the "action" field
print $q->param("action");

more information on using the CGI module can be found here. i suggest using the object oriented approach, it helps to keep things organized a little better.

have a nice day!

-wj

login or register to post comments

CPAN and modules

Submitted by hitherto on January 25, 2002 - 07:00.

I'd have to agree with warpedjedi that CPAN modules can often be a real timesaver - many, many functions that you might need are already implemented there. You can get a good idea of what's available from search.cpan.org

However, installing modules from CPAN is not always easy, or even possible if you're running a smallish website on a shared server. If you don't have telnet or ssh access to your box, for example, you're unlikely to be able to install them.

As long as the version of perl installed on your host is 5.004 or higher (if you have command-line access, "perl -v" will tell you, if not, perhaps your hosting company or sysadmin can), then CGI.pm will come installed as part of the standard distribution, and the code supplied by warpedjedi will work without any further effort on your part.

login or register to post comments

Script archives and the wider world of perl

Submitted by hitherto on January 25, 2002 - 07:38.

I'll write a fuller article on this when I get enough tuits of the round type, but here are some points worth noting now:

  • Beware of code you find online - Whilst every search engine, and many articles may point you towards the CGI Resource Index, or Matt's Script Archive, much of the code at these places is far from ideal. It rarely implements proper error checking, and can contain major security holes. What's more, best programming practices (such as using "strict" to ensure that you aren't making icky mistakes with variables) are found next to never in these scripts, so if you're using them in order to learn more about perl yourself, you are likely to become at best a bad programmer.
  • nms - a trustworthy source of code - Luckily, the perl user group london.pm has spent the past few months working on a set of scripts which can be "dropped in" on top of Matt Wright scripts such as formmail.pl. They provide the same functionality (you can upload them over existing scripts without any pain), but have been peer-reviewed by some of the best perl coders in the business, to ensure that they are clean, efficient and secure. What's more, the code conforms to best practices, so you can learn from their techniques with confidence. You can find out more, and download the scripts, at the project's sourceforge page.
  • Explore the wider world of perl - Perl has an incredibly vibrant community of programmers surrounding it, who provide a great deal of support for newcomers. There are many Perl Mongers user groups worldwide, who have regular real-world meetings (see www.pm.org for more details), as well as several lively websites teeming with ideas and advice for anyone, from a complete perl beginner to the sort of nutter who spends their weekends writing the internals of the next version of perl for fun. Some of the best sites include: Some of the content at these sites can be a little daunting, but go and explore. It's fun!

login or register to post comments

re: Script archives and the wider world of perl

Submitted by StOne on January 25, 2002 - 13:48.

Thanks, hitherto. I started looking for cgi-scripts and got a few of those from Matt's Scripting Archive and a couple of other sites, but they're just sitting on my hard drive for now. I recall reading somewhere that Matt's scripts should be avoided as they are not the best, but the version of formmail I downloaded (last modified 08/03/01) claims to have fixed security holes found in previous versions...on the other hand, I think the other scripts I found were through The CGI Resource Index. I've yet to actually try even using a Perl script, so I appreciate the advice and the links.

login or register to post comments

Multiparts?

Submitted by eych on January 26, 2002 - 02:33.

Two issues.

1. Using CGI.pm is really simple at it really handles everything and it is installed by default with every Perl distribution (or am i wrong), but i gives a big overhead, because it is a 200+ kylobytes monster file that has to be read (first of all) from disk and then parsed and then compiled. It has some neat techniques for optimizing compilation time, but you can't skip the reading and parsing part. So, generally it adds about 0.15 seconds overhead.

2. And the last one - the script described in the article can't handle multipart forms, and w/o them you can't do file uploads.

login or register to post comments

perl can be optimized...

Submitted by warpedjedi on January 26, 2002 - 09:23.

eych, if you are looking for increased performance you can compile mod_perl into apache. This will start up an instance of the perl interpreter with the apache process, so that it is immediately available for parsing your scripts. You can also put "use CGI;" in the mod_perl startup file and it will get rid of that nasty overhead you were talking about by keeping the CGI module available in memory. It's really sweet!

Optimizing perl could be another complete article. *hmmmmmmm

-wj

login or register to post comments

mod_perl is the king!

Submitted by eych on January 29, 2002 - 02:07.

warpedjedi,

Yes, mod_perl is the thing that generally makes Perl a very good language for big overloaded sites. In my experience, the speed gain from using mod_perl was around 400%. The latency of any Perl software running on mod_perl is tend to zero. The price you pay, however, is the memory that you need -- mod_perl secret is in duplicating existing apache process in memory (with all compiled software) for every new http request instead of starting new process every time.

That is why, btw, PHP is a slower language then Perl by definition -- it doesn't compile, it interprets the code every time.

login or register to post comments

keeping memory under wraps...

Submitted by warpedjedi on January 29, 2002 - 07:49.

and in the case that you inherit a site that is too much of a memory hog, you can always use Apache::SizeLimit to keep those processes in check!

-wj

login or register to post comments

Use CPAN instead

Submitted by ngauruhoe on February 2, 2002 - 10:08.

There's a whole bunch of errors in this code, and in terms of security it's also a very questionable practice to allow users to set arbitrary variables.
  • % encoding of variable names should be catered to
  • %encoding should match /%([0-9A-Z][0-9A-Z])/i
  • This code interprets %2b as ' ' where it should be '+'
  • if a value contains '=' then it gets truncated
More importantly this code allows a malicious user to write to any variable in any package's namespace, which makes it substantially more difficult to keep track of how they might twist the behaviour of your code.

login or register to post comments

Ever hear of Tim Toady?

Submitted by cache on March 11, 2002 - 02:37.

Sure enough, someone dares to think outside the tyranny of CPAN Orthodoxy and the jackals show up to tell him he can't do it. Ever hear of TMTOWTDI - (tim-toady) There's More Than One Way To Do It. It happens to be THE fundamental concept behind Perl, not just a catchy slogan. The CPAN way of doing things is just that, just another "way to do it", nothing more.

As for the heart of jester

login or register to post comments

Ever hear of Tim Toady?

Submitted by cache on March 11, 2002 - 02:39.

Sure enough, someone dares to think outside the tyranny of CPAN Orthodoxy and the jackals show up to tell him he can't do it. Ever hear of TMTOWTDI - (tim-toady) There's More Than One Way To Do It. It happens to be THE fundamental concept behind Perl, not just a catchy slogan. The CPAN way of doing things is just that, just another "way to do it", nothing more.

As for the heart of jesteruk's script, parsing form values into symbolic references of the form names is around 5 years old, popularized by Bill Weinman, author of The CGI Book, bw.org.

There are two small errors in the script. As ngauruhoe pointed out, the script will truncate a value containing an = sign. That's easily fixed by including a limit argument to the split function, $pair,2 . But there is nothing at all wrong with the urlunencoding. s/%(..)/chr(hex($1))/ge produces the same results as matching /%([0-9A-Z][0-9A-Z])/i . The problem is that the urlunencoding takes place before the substitution that turns + signs into spaces. Those two lines need to be reversed in order.

BTW, hitherto, adding a perl module to a server without telnet/ssh access is pretty simple. @INC also includes the directory tree that the script is located in. Place the module in the same directory as the script, or place it in a directory in that directory's path. Or, define a directory inside the domain space with use lib 'the/path/' above the use statement for the module. Or, push() the directory path into @INC before the use statement for the module. Your pm group seems to be focusing on scripts that get used by web developers who are not the system admin on their web server so that may be useful.

login or register to post comments

thanks cache

Submitted by jesteruk on March 11, 2002 - 15:10.

The point of this script was, alot of people are not admin of the server their scripts are stored on, so they have no control over perl modules and such, the server may not have all these fancy modules installed, not everyone can use them. Which is why i made this little script, on my old host alot of modules i read about weren't there, i couldn't use them, this script just made life a little easier, it sucks, but it worked for me ;)

and yes cache, it's very useful to know i can use modules even though i don't have telnet access, i didn't know that.

I prefer PHP to perl these days, i use it alot more, and please, dont start a perl vs PHP war over this statement lol, just my preference.

peace,

-J

login or register to post comments

CPAN Orthodoxy, and why it ain't so bad

Submitted by hitherto on March 11, 2002 - 20:23.

cache,

I don't think anyone who has programmed with perl for more than a few weeks would doubt the power of TMTOWTDI. However, an important part of embracing TMTOWTDI is understanding that some Ways are better than others.

The modules on CPAN, and particularly those which have then been subsumed into the perl core, are heavily peer-reviewed and tested in all sorts of odd situations. So, the odds are high that they encapsulate one of the better ways of doing it.

Since CGI.pm has been in the core since perl 5 was first released, if you come across a hosting service these days where it isn't installed, then avoid the hosting service - god only knows what other old software, with long-solved security glitches, will be lurking next to your site!

I don't believe that there is a CPAN "Orthodoxy" as such, beyond the fact that, for people who work heavily with perl, it works, is relatively safe, and saves them from having to reinvent wheels like CGI form parsing.

Whilst The CGI Book may think that using symbolic references is a good idea, I'd question it on two main counts.

There is a reason that the use strict pragma doesn't allow you to push values into symbolic references. There is a potential confusion, because perl also has hard references, analogous to pointers in C, which use similar syntax.

So, when you say $$var_name, does $var_name contain the string foo, and you want to access ${$var_name} (or, in fact, $foo), or are you dereferencing the hard reference $var_name? As a casual perl user, relatively new to the language, the distinction is pretty much irrelevant. Hard references are weird voodoo. But to those who come later to maintain your website, or to your more experienced self in a year's time, it may be far more important.

Worse, using symbolic references like this is allowing any arbitrary user to chuck any variable they like into your system. Don't forget that I can always pass arbitrary URLs to your script - you should expect to receive anything at all, and not just the nicely laid out fields in your form. Whilst I may not be able to bring your server down with this, what happens if I pass the variable string name=hitherto&,=b into your script?

This code will merrily pick up my second name/value pair, and set the variable $, to b. It might not look like much, but $, is perl's internal variable for controlling the output field separator, and for the rest of the script's execution, it will now put a "b" wherever it should have made a carriage return, something you probably weren't expecting.

There are other variables which, in the hands of someone with more time and malice, could wreak havoc on your system.

Of course, as a hassled, overworked webmaster, you shouldn't have to worry about any of this, which is why it's good to trust in tried and tested modules.

Finally, cache, that's a good tip about frobbing @INC to allow scripts to access modules which aren't fully installed on the server. Bear in mind, of course, that if the module is installed in your web directory then it is publically accessible, and possibly on a badly configured system, executable, so you should exercise some caution in making sure that nothing nefarious can be achieved with it directly.

I think the NMS project would probably prefer to keep things as simple as possible, only using modules that were in the core when perl 5.00.54 was released, but I'll certainly remind the team about the possibility of doing this.

Cheers,

h

login or register to post comments

Hashes instead of symbolic references

Submitted by hitherto on March 12, 2002 - 05:25.

Writing comments at 1:30am means you tend to forget things...

If you must roll your own CGI parsing scripts, there is a better way of passing CGI parameters into your script, using a hash instead of symbolic links.

Instead of this:

  ($name, $value) = split(/=/, $pair); # Split into name and value.
  $$name = $value; # Assign value to scalar matching name.
try:
  my (%query, $name, $value); # declared explicitly, for the sake of use strict

  ($name, $value) = split(/=/, $pair); # Split into name and value.
  $query{$name}=$value;

Now, you can iterate through the keys of your hash using a foreach loop, or access them directly, as, for example, $query{name}.

One last point - you can minimise the damage any user-supplied data might do to your system if you run CGI scripts under perl's taint mode. It's easy to turn taint on, just add the flag -T to your script's shebang line (which reads #!/usr/bin/perl on most systems), so you end up with "#!/usr/bin/perl -T".

Taint mode stops perl doing unsafe things with user-supplied data, such as using it directly to supply a filename on disk, or running it as a system() call. You'll still be able to print, combine and do all sorts of other safe operations with the data, but you have to explicitly "ok" it before you can do anything else with it.

There's more about taint, and about how to untaint data if you really need to, here

login or register to post comments

I've Heard of Tim Toady.

Submitted by davorg on March 12, 2002 - 07:55.

Sure I've heard of TMTOWTDI, but I'm not sure what the relevance is here. Just because there are many ways to do something, it doesn't mean you should automatically choose the worse possibility.

And it's not a CPAN issue either. CGI.pm has been included with Perl for over five years. Any version of Perl that is old enough not to include CGI.pm has CERT security advisories against it so you really shouldn't be using it.

Sure, CGI.pm makes it easier to deal with CGI parameters, but that's not the most important thing. The most important thing is that it deals with all those corner conditions that you might not have considered. What if your program is passed a multivalued CGI parameter? What if you get code from a browser that supports the new CGI parameter separator of ';'. What if you get a multi-part form? CGI.pm will handle all of these. This code doesn't.

But not using CGI.pm is a choice you've made. A stupid choice IMO, but a valid one of course. That decision pales into insignificance, however, up against the choice to use symbolic references to set up the variables. I don't need to go into details as to why this is a bad idea as Mark Jason Dominus has already written three great articles that make this very clear. I'd like to ask one question tho'. What will your code do it someone passes it a CGI parameter called "pair" (or "name", or "value"). This is very fragile code and shuold be avoided completely.

I'd recommend you read this recent perl.com article about how to identify bad CGI scripts and the dangers in failing to do so. Yes, I wrote the article so I'm biased :)

login or register to post comments

Ever hear of Tim Toady?

Submitted by ngauruhoe on March 12, 2002 - 11:51.

cache claimed that:

... there is nothing at all wrong with the urlunencoding. s/%(..)/chr(hex($1))/ge produces the same results as matching /%([0-9A-Z][0-9A-Z])/i .

The distinction may not be important for URLs in normal use, but if an URL contains '%XY' then correct behaviour is to treat the % literally, not to replace the string with a "\0" byte.

A more important issue for some sites (which I ignored earlier) is handling unicode URLs in the format that IE produces. eg '%u0101' . Whatever you think of microsoft's extension to URL syntax, it's not possible to use form input in utf-8 with IE unless you handle these.

CGI.pm handles these escape codes. I submitted the code to do it.

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.