Search Engine Friendly Urls With Php And Apache

Posted on 20 Aug 2001

in Code

by Garrett Coakley (garrett)

Rated 4.16 (Ratings: 17)

Want more?

More articles in Code

Garrett Coakley

Member info

User since: 27 Jun 1999

Articles written: 4

Quick 'N' Dirty Intro

Anyone who has built template based sites that use query strings to select content will have at some point hit the problem of indexing (or lack of it) by search engines. Search engine spiders won't index dynamic sites, as they are worried about getting stuck in a maze of twisty URLs, all alike.

One way round this is to mask the fact that we're using a query string. Evolt itself uses this system (which you can see if you look at the current URL in your browsers location bar), albeit written in Cold Fusion. I'm going to show you a quick 'n' dirty way to do this in PHP. It's not the most feature complete way by any means, but it should at least provide you with a base to expand on.

Question Marks Are So 20th Century

We want to turn this

http://www.somesite.co.uk/site.php?section=books&subsection=architecture

into this

http://www.somesite.co.uk/site/books/architecture

which involves masking the file extension for the processing file and then giving the query string a shave and a haircut.

The Science Bit

Like all good things in life (Star Wars, cocktails), this little system is based on three ingredients:

Setting up Apache to process a file without an extension as a PHP file.

A function to process our URL and produce some variables.

Something that uses those variables to provide content.

First, setting up Apache. In your .htaccess file (if you don't have one, go ahead and create it) you want these lines:


<Files site>
	ForceType application/x-httpd-php
</Files>

This tells Apache that if someone requests the file 'site' then it should be treated as a PHP file. Of course, you don't have to call it 'site'. If you were running a small online shop you might want to call it 'catalog'.

Next up, we need something that will take our new url and extract the information out of it. We do that with this function called processURI()


// processURI(): 
// Takes the query string and extracts the vars by splitting on the '/'
// Returns an array $url_array containing keys argN for each variable.
function processURI() {
 global $REQUEST_URI;   // Define our global variables
 $array = explode("/",$REQUEST_URI);	// Explode the URI using '/'.
 $num = count($array);	// How many items in the array?
 $url_array = array();	// Init our new array	
	
 for ($i = 1 ; $i < $num ; $i++) {	         
	$url_array["arg".$i] = $array[$i];  
 }
// Insert each element from the
// request URI into $url_array
// with a key of argN. We start $i
// at 1 because exploding the URI
// gives us an empty ref in $array[0] 
// It's a hacky way of getting round it
// *:)
	
return $url_array;  // return our new shiny array
}

This is a pretty simple function. First up it takes the $REQUEST_URI (everything after the server address basically) and then splits it into an array. After that it builds a new array ($url_array) containing arg1 to argN as keys, with their respective values.

The final piece in the puzzle is creating the file 'site' and doing something with all these variables that you've lovingly created. I've done just this up on http://members.evolt.org/garrett/site/books/factual

This is just a quick example of pulling in content based on the variables extracted from the query string. It's a list of books and CD's in my room. Not very interesting, but it was either that or a list of fruit and vegetables.

All the files can be downloaded from http://evolt.org/files/search_urls_php.tar.gz. This contains a .htaccess file, 'site' which has the processURI() function plus a roughly cobbled together function displayContent() to show it in action. There is also a directory 'content' which holds the files for inclusion.

That's A Wrap!

It's a very handy trick, and you only have to use evolts archive search to see how effective it can be (evolt uses Googles database to search its archives). As I said, this is a very quick approach and it could be improved in a number of ways but hopefully it's given you some ideas for your own site.

Garrett has been working on the 'net since 1992 (he still gets misty eyed thinking about the first time he saw Mosaic) and now works for gencon as a developer / web standards monkey / Open Source advocate.

More of his ramblings and output can be found at his personal site

Start of page header

Other Fine Evolt.org Sites

Navigation Starts

Submit

Article Categories

Highest rated articles

Help Support evolt.org

Main Page Content