Search Engine Friendly Ssi Image Gallery
Posted on 19 Sep 2001
by Morbus Iff (morbus)
Rated 4.29 (Ratings: 14)
- More articles in Site Development
Well, that's certainly a mouthful, eh? In this article, I'll show you how I created a search engine friendly dynamic image gallery, using only the features built into a core distribution of the Apache web server. This image gallery will support supplementary file information, as well as error damage control. You can see an example of this image gallery at Disobey.com's Ghost Sites. The techniques can be used for far more than just images; any type of downloadable file (movies, mp3s, etc.) could be used to create a gallery.
What You'll Need
- Apache web server (any current version).
- Server Side Includes turned on for your directory.
Note: It's outside the realm of this article to tell you how to enable Server Side Includes (SSI) for your web site. Please consult the Apache documentation or contact your web host for more information.
- Dynamic: Hard coded pages are "static" - once you upload them to your web site, they don't change again until you manually make the changes yourself. "Dynamic" pages, on the other hand, react and change their behavior and appearance based on environment variables, user preference, direction of the sun, what Mishka's cat was fed (or wasn't fed), etc. This article describes how to create a dynamic image gallery.
- Environment Variables: Every time you visit a web page, Apache creates a number of "environment variables" such as PATH_INFO, DOCUMENT_ROOT, HTTP_USER_AGENT, and so forth. These environment variables contain helpful bits of information and can be used within SSI (as well as CGI-based scripts) to cater to the user in question. In this article, we use PATH_INFO to determine which image name is being sent to our SSI page.
- Search Engine Friendly: Search engines generally only index static pages (see "Dynamic", above), skipping over dynamic pages due to the assumption that the data changes frequently and the index would become out of date. Most search engines will skip over any URL that has a "?" in the title (such as "http://www.example.com/?dynamic=1"). In this article, we trick the search engines into indexing our dynamic image gallery.
- Server Side Includes: SSI give you the opportunity to use a very limited control language built within Apache to include other pages into the body of another, to test files for their date and file size, to handle conditionals based on page data, environment, and so on and so forth. They are a lifesaver for large sites. In this article, we use SSI to look at environment data sent to Apache, as well as some file tests.
Step 1: Creating Our Gallery Page
Our first step is to create our gallery page. This page will be used to show every single individual image in our gallery of delight. For this article, we're going to make a pretty ugly page to keep things brief. I'll save it as "show.shtml":
<html> <title>My Ugly Image Gallery</title> <body> <img src="example.jpg" alt="beautiful image on ugly page" /> </body></html>
Pretty damn elite, if I do say so. We'll be ripping the above apart and making it dynamic in just a bit. For more, let's understand how an environment variable works, specifically PATH_INFO.
Step 2: Understanding PATH_INFO
When you load a normal old web page in a browser, you're loading a single page that has no added frills attached. A boring old URL would be something like "http://www.example.com/show.shtml". When you load pages like this, environment variables are created, but none that are all too exciting to work with.
A URL with added frills, however, could look like "http://www.example.com/show.shtml?image=jack.jpg". As mentioned above, this page would generally be ignored by search engines, but gives the "show.shtml" page something to work with, namely the added information of "image=jack.jpg". This information is stored in an environment variable called QUERY_STRING. What "show.shtml" does with QUERY_STRING is irrelevant in this article for one reason: search engines won't like us.
How do we get some engine love? Just like we can pass information to "show.shtml" by adding a question mark to the URL, we can also pass information by passing a slash to the page. So, to modify our above URL, we could use "http://www.example.com/show.shtml/jack.jpg". Even though the URL looks to have a subdirectory of "show.shtml" in which "jack.jpg" exists, that's not the case. You really have a page named "show.shtml" that was passed an environment variable of PATH_INFO, which contained "/jack.jpg".
And that's where our magic kicks in.
Since we're not using question marks in the URL, search engines will index us; since we passed extra information to the "show.shtml" via PATH_INFO, we now have something to react to. "http://www.example.com/show.shtml/jack.jpg" is different from "http://www.example.com/show.shtml/jill.jpg" by PATH_INFO alone - the same old "show.shtml" is just being passed different information. This allows us to construct any number of links to the same "show.shtml" page, just with different PATH_INFO.
Caveat: I've assumed that search engines will look at the above URLs, see the ".jpg" extension and not index the page. I've also assumed the same of browsers - that a browser will see the ".jpg" extension and try to display the result as an image, without consulting the server for the mime-type (if you don't understand mime-types, don't worry about it). Thus, we're going to modify our URLs thusly: "http://www.example.com/show.shtml/jill". This won't cause us any problems at all (see below).
Step 3: Reacting to PATH_INFO
So, we're passing PATH_INFO, but we haven't told our "show.shtml" page how to react to it. This is actually pretty dang'd simple, so let's modify the ugly HTML page we started with:
<html> <title>My Ugly Image Gallery</title> <body> <h1>Your PATH_INFO is: <!--#echo var="PATH_INFO"--></h1> <img src="example.jpg" alt="can you smell the flowers?" /> </body></html>
See that "echo" line? That's an SSI that just echoes the value of a variable - in this case, we're shoving the value of the PATH_INFO environment variable into an header for display. Loading "http://www.example.com/show.shtml/jill", our page would tell us that "Your PATH_INFO is: /jill" (if you've uploaded this page to your Apache server and nothing seems to happen, then SSI probably is not enabled or configured. Again, this is outside the scope of this article).
Can you see where we're heading with this? A modified example:
<html> <title>My Ugly Image Gallery</title> <body> <img src="<!--#echo var="PATH_INFO"-->.jpg" alt="sheer beauty!" /> </body></html>
When Apache displays this page, it'll replace PATH_INFO with "/jill":
<html> <title>My Ugly Image Gallery</title> <body> <img src="/jill.jpg" alt="sheer beauty!" /> </body></html>
Caveat: Looking at the above result, it makes sense to think that if "jill.jpg" is in the same directory as "show.shtml" that everything should work fine. This isn't the case. Because the added slash of the URL *seems* to create an added hierarchy, we've actually got to tell our browser to go up one directory. Thus, change "
img src=" to "
img src=.." to achieve the desired effect (we've done this for the rest of our examples).
Further Caveat: Say you load "http://www.example.com/show.shtml" - what happens? Since no extra information was sent to the page, PATH_INFO is set to "(none)", and your user will get a big old broken image with no added explanation, as the browser will try to load a file named "(none).jpg". We'll cater to this problem by cheating - read Step 4 (below).
At this point, you could call it a day. Simply create a page of different URLs, all directing to "show.shtml", and the SSI will slavishly load the passed image name. But what happens when the image isn't there? We should graciously tell the user, shouldn't we?
Step 4: Handling Errors
Your SSI image gallery is humming along quite nicely... when, suddenly, you mistype a link on your web page, and thousands of your users are trying to load "http://www.example.com/show.shtml/jock" instead of "jack". Instead of seeing that boyish child of our youth, they're seeing a dastardly broken image link. Thankfully, using the power of SSI, we can notify the user with a custom error message.
Modify your HTML to the following:
<html> <title>My Ugly Image Gallery</title> <body> Image Last Modified: <!--#flastmod virtual="..$PATH_INFO.jpg"--> <br />Image File Size: <!--#fsize virtual="..$PATH_INFO.jpg"--> <img src="..<!--#echo var="PATH_INFO"-->.jpg" alt="wonderful plumage!" /> </body></html>
The above shows how to use SSI to get the "last modified" date of the image file, as well as its file size. While this information isn't too important for the normal user to know about, the side effect of using these SSI commands is. When "http://www.example.com/show.shtml/jill" is loaded in the browser, Apache will translate the SSI commands to:
<html> <title>My Ugly Image Gallery</title> <body> Image Last Modified: Friday, 31-Aug-2001 02:06:21 EDT <br />Image File Size: 70k <img src="../jill.jpg" alt="wonderful plumage!" /> </body></html>
If, on the other hand, we load a broken URL like "http://www.example.com/show.shtml/jock", we get the following:
<html> <title>My Ugly Image Gallery</title> <body> Image Last Modified: [an error occurred while processing this directive] <br />Image File Size: [an error occurred while processing this directive] <img src="../jock.jpg" alt="no curls here" /> </body></html>
Since the "last modified" and "file size" SSI commands make the server return info on a file, Apache will let us know when something goes wrong. This "something" could be the file not being there (as in the above case), permission problems, or some other weird thing. We can't know for sure without actually looking in our log files.
We can, however, modify the error message to be a bit friendlier:
<html> <title>My Ugly Image Gallery</title> <body> <!--#config errmsg="This image does not exist!"--> Image Last Modified: <!--#flastmod virtual="..$PATH_INFO.jpg"--> <br />Image File Size: <!--#fsize virtual="..$PATH_INFO.jpg"--> <img src="..<!--#echo var="PATH_INFO"-->.jpg" alt="failure coming, i bet" /> </body></html>
Our last example, I promise. Now, Apache will now show the following when something goes wrong:
<html> <title>My Ugly Image Gallery</title> <body> Image Last Modified: This image does not exist! <br />Image File Size: This image does not exist! <img src="../jock.jpg" alt="failure coming, i bet" /> </body></html>
Of course, you can modify your "errmsg" to anything you want, even including HTML if you wish.
We've really only scratched a small part of what Apache can do with SSI. One major feature we didn't touch on here includes conditionals (where we could say something like "if PATH_INFO equals jill, then make the background color blue"). I'll leave that as an exercise to the reader, however, both in diving into Apache's documentation, as well as experimenting with an oft ignored feature that many people limit to "header / footer templating".
Caveat: Although I mention that we're making an "image gallery", we've really only made the "image display page". To make the gallery portion of your masterpiece, simply start making links to "show.shtml", passing different PATH_INFOs for each one of your images.
Anal Caveat: Think "http://www.example.com/show.shtml/jack" looks ugly because it looks like a filename is being used as a directory name? Yeah, I did too. If you have .htaccess control in your Apache directory, then you can add the following to that file:
RedirectMatch /show/(.*) ../show.shtml/$1 and then refer to the URLs as "http://www.example.com/show/jack". Not only does this make the URL nicer looking, it also allows you to change the method of serving the files later on down the line, without changing links or causing 404's. It's an exercise to the user to find out what the hell I'm talking about (as it's a long explanation).