Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

Search Engine Friendly URLs (Part II)

Rated 4.22 (Ratings: 6) (Add your rating)

Log in to add a comment
(13 comments so far)

Want more?

 
Picture of bheerssen

Bruce Heerssen

Member info | Full bio

User since: November 14, 2000

Last login: November 14, 2000

Articles written: 1

This article is an extension of Garrett Coakley's excellent article Search Engine Friendly URLs with PHP and Apache.

The Intro

In part one, we learned how to use PHP and Apache to provide URLs that look like directories so that search engines would index them correctly. I also think they just look neat, much prettier than a bunch of ampersands and equal signs. This method also results in shorter URLs since there is no need for the variable names -- just the values. This is a definite plus, so I excitedly began to implement the method on my site.

I soon discovered, however, that all of the supporting pages lost their formatting. I quickly determined that this was due to having used relative URLs in my <link> tags. Further examination revealed that all of my internal links were broken as well, and for the same reason. This would have been easily fixed by simply specifying the absolute URL to the linked files, but I didn't want to do that because I would have to continually change them every time I updated the site from my development server. A more elegant solution was called for.

The Script

<pre>
&lt;?
/* Define our global variables */
global $REQUEST_URI;
global $SCRIPT_NAME;

/* Assign the value of $SCRIPT_NAME to $base_href.
   This value is used at the beginning of all links in the site
   so that absolute hrefs are created. Relative hrefs will not
   work with this method. */
$base_href = $SCRIPT_NAME;

/* Create an array ($path) out of $base_href
   so that we can get the name of the current template. */
$path = explode("/",$base_href);

/* Pop the template name at the end of the array and assign it to $template. */
$template = array_pop ($path);

/* Convert $path back to a string to use later in the template.
   This variable is used to reference CSS and JavaScript
   source files since $base_href includes the template name. */
$path = implode ("/",$path);

/* Extract the values from the end of the URI.
    Note that variable names are not included
    in the URI, just the values.
    It its therefore important that you always
    use the same order for your variables
    when creating links. The order itself is not
    important, so long as it's always the same. */
$vars = str_replace($SCRIPT_NAME, "", $REQUEST_URI);

/* create an array from the string $vars, then
   loop over the array, extract each
   value, and assign each to a temporary variable */
$array = explode("/",$vars);
$num = count($array); // How many items in the array?
for ($i = 1 ; $i < $num ; $i++) {
        $url_array["arg".$i] = $array[$i];
}

/* Since we know what order the values come
    in, we can assign each value to it's correct
    variable. This part can also be done later in
    whichever script needs the info. */
$page = $url_array["arg1"];
$message = $url_array["arg2"];
$message2 = $url_array["arg3"];

 ?&gt;
 </pre>

For the purpose of assigning values to variables, $REQUEST_URI is great. But to give an absolute path to the current directory -- without hard coding the directory into the script -- it's not enough. To do that, $SCRIPT_NAME is used, but $SCRIPT_NAME also includes (wait for it) the script's file name. We just want the directory. So, we convert the value of $SCRIPT_NAME to an array then pop off the last value in the array (the file name) and convert the array back into a string. Voila! A dynamically generated path to the current directory.

In the above script, this path value is contained in the variable $path. The $path variable should be used for all internally reference files such as .css and .js files. This will ensure that all of your style sheets and javascripts will work as expected. You can also use the $path variables in your links, but for convenience, I've created a variable $base_href that could be used instead. So, instead of &lt;a href="&lt;? echo "$path/index/someVar/anotherVar/"; ?&gt;"&gt;, you would have &lt;a href="&lt;? echo "$base_href/someVar/anotherVar/"; ?&gt;"&gt;.

Moving on

The rest of the script is the same as Garrett's, so we'll leave off further explanation, but there is one more improvement we can make, and it's to the .htaccess file. In Part One, Garrett showed us how to use .htaccess to force Apache to send our script to PHP for processing even though it does not have the .php file extension. I thought it would be nice to have the document listed as the default document for the directory so I added the following line to the .htaccess file:

DirectoryIndex index

This directive tells Apache to use the file "index" as the default document in this directory. Pretty sweet, huh?

Please visit my website (non-commercial)

Nice work

Submitted by garrett on November 7, 2001 - 04:26.

There are some cool extensions in there Bruce, way to go.

Does this mean the balls in my court again? *:)

login or register to post comments

A couple more improvements

Submitted by simonc on November 8, 2001 - 06:41.

Just a couple more things you could do. Firstly in PHP4 there is a function called dirname() which you can use to get the directory part of the path without converting to/from an array. Secondly, instead of having to write the path to every link, image, css file etc. why not make use of the &lt;base&gt; tag by including this in the document <head>:
&lt;base href=&quot;&lt;?php echo $path;?&gt;&quot;>
That will make all links appear to be referenced from $path so you can continue to use relative URLs.

login or register to post comments

Thanks for the input, Simon.

Submitted by bheerssen on November 8, 2001 - 15:06.

The tip about the dirname() function is great! I've incorporated it into my script.

The Updated Script (sans comments)

&lt;?<br>
global $REQUEST_URI;<br>
global $SCRIPT_NAME;<br>
$path = $SCRIPT_NAME;<br>
$base_href = dirname($path);<br>
$vars = str_replace($path, "", $REQUEST_URI);<br>
$array = explode("/",$vars);<br>
$num = count($array);<br>
for ($i = 0 ; $i < $num ; $i++) {<br>
        $url_array["arg".$i] = $array[$i];<br>
}<br>
$page = $url_array["arg1"];<br>
 ?&gt;

Note: I've switch $path and $base_href because it seems to make more sense that way.

However, the suggestion about the &lt;base href&gt; won't work in this circumstance. That's because all link paths used in the page must be absolute. If you use relative URLs, the browser will attempt to find linked documents based upon the URL in the address bar, which is wrong. So, if absolute URLs are used throughout the site, a &lt;base href&gt; becomes extraneous.

login or register to post comments

About the &lt;base&gt; tag

Submitted by simonc on November 8, 2001 - 15:24.

It was my understanding that the &lt;base&gt; tag overrides the document's URI. At least that is the impression I got from the W3C spec:
This attribute specifies an absolute URI that acts as the base URI for resolving relative URIs.

In HTML, links and references to external images, applets, form-processing programs, style sheets, etc. are always specified by a URI. Relative URIs are resolved according to a base URI, which may come from a variety of sources. The BASE element allows authors to specify a document's base URI explicitly.
I haven't tested it, but I'm fairly confident that it would work. Give it a go and let me know!

login or register to post comments

Deja Vu

Submitted by Girl_OSS on November 9, 2001 - 08:07.

When I read this article and the previous one, I experienced a sense of deja vu. True enough, I have read a similar article somewhere else. It is also entitled "Search Engine-Friendly URLs".

login or register to post comments

BASE HREF

Submitted by nathany on November 10, 2001 - 00:12.

Using BASE HREF="http://mydomain.com/" works fine for me. All URLS are relative to mydomain/, and shouldn't include an intial slash. mydomain.com can come from a variable and be set based on the SERVER_NAME so it works for local too.

I've heard there are some problems with BASE HREF though - with some *NIX spiders that don't recognize it, resulting in a lot of errors in your web logs and a not-indexed site. Not sure which/any popular search engines have this problem though.

login or register to post comments

JavaScript note

Submitted by nathany on November 10, 2001 - 00:16.

Forgot to mention, JavaScript in some browsers doesn't recognize the BASE HREF tag (some do). So if you use window.location = 'someurl' it is best to make use of your $BASEHREF variable. That way the script will always work.

I've written my own versions of your friendly search engine URL thingies in both ColdFusion and PHP, and use BASE HREF "extensively" for the web sites I develop professionally.

login or register to post comments

Re: Deja Vu

Submitted by garrett on November 10, 2001 - 08:48.

I hadn't seen that article, thanks for pointing it out.

When I wrote part one, it was based on some code that I saw on http://www.phpbuilder.com. It was tailored to a specific problem though, so I took those ideas and generalised them to produce an introductory tutorial.

The sitepoint article is quite interesting, because one of the methods he outlines is to use the 404 error method, which evolt.org uses in its CMS. Just goes to show, there's more than one way to skin a cat *:).

login or register to post comments

Nice

Submitted by mpgnet on November 12, 2001 - 15:32.

Good stuff.


login or register to post comments

Frustrated

Submitted by raphael on November 21, 2001 - 17:43.

Two of my main hosting solutions (France's Online and Nexen Services) prevent me from using this nifty solution: I can not use the ForceType directive.

As a matter of fact, they prevent me from using any of the useful solutions listed in the article mentioned in Girl OSS's comment: Online has Apache strip the query when using a custom error page, and Nexen prevents me from using a PHP file as custom error page...

I'm left with the http://mysite.com/page.php/param1/param2/ which is better (if Google indeed indexes pages constructed in that way) but not altogether satisfying. Indeed, one of the key benefits of these tricks, aside from their "indexability" is to create nice looking URLs, which people can pass around in emails, over the phone, on paper, etc.

login or register to post comments

Anyone tried using mod_rerite for same?

Submitted by stew on December 7, 2001 - 07:49.

I have managed quite well using this method.
In a .htaccess file put something like the following .
.
RewriteEngine on.
RewriteRule ^index-(.*)\.html$ index.php?cat_id=$1.
.
So, any data in the above url between the opening index- and the ending .html would be assigned to the variable $1 and the url re-written accordingly..
.
ie index-001.html would become > index.php?cat_id=001.
.
If I need to use more variable, I simply separate the parts of the url to be parsed as variable by commas, thus.
.
RewriteEngine on.
RewriteRule ^index-(.*),(.*)\.html$ index.php?cat_id=$1&name=$2.
.
I can also create virtual directory structures, thereby putting keywords into the url and then chopping them out and throwing it away when re-writing the url..
.
.
ie RewriteRule ^/search/(.*)/get-details-(.*),(.*)\.html$ index.php?cat_id=$2&name=$3.
.
The first (.*) would be created on the fly for the purpose of a meaningfull url and the second two (.*) would contain the query. .
.
This is probably easier to implement than it has been for me to explain, I got my inspiration from Tobias Ratschiller's excellent article here:.
.
PHP Wizard also try the Apache web site for more tips on mod_rewrite.
.
Stew :D.

login or register to post comments

How to I use the processURI() function in my php f

Submitted by martinkuria on March 30, 2004 - 03:40.

How to I use the processURI() function in my php file I am generating my links from making a query from a database where my links look like: ?catid=489&ses=e6ddde761378852041205069425fceac&kt=1&ctnm=Carvings where do I include the processURI fuction please advice

login or register to post comments

Dynamic content and virtual folders

Submitted by DaButcher on December 21, 2004 - 01:47.

Hi, I first tried the Search_Engine_Friendly_URLs_Part_I, but to my despair, it rendered all my dynamic pages useless! It seemed that when I where in the virtual folders, the paths where all messed up, like stated above. I now have tried on a test-site, with the Search_Engine_Friendly_URLs_Part_II, and the $base_href. I think this works great, as far as I can tell. I used

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.