Searching With Sherlock Part2 Producing A Plugin
Posted on 10 Nov 1999
by Martin Burns (MartinB)
Rated 3.89 (Ratings: 0)
- More articles in Site Development
Now that you've seen what Sherlock can do, it's time to think about how you can use it to drag Mac users to your site by producing a plugin. For this tutorial, we'll look at the process I went through to produce the plugin for evolt.org.
Plugins are very, very simple. All they are are text files with HTML-like commands which tell Sherlock how to get and parse a search results page. All the user has to do is drop the file into their System Folder, and the next time they fire up Sherlock, your site is one of the options they'll have.
What you'll need
- A site with a search engine
Sherlock is just an interface which throws queries at search engines and presents the results. If you don't have a search engine, you're a bit screwed.
- A Macintosh computer, with OS8.5 or later
Sorry about this, but you're going to be creating special file formats which Windows doesn't do, and you'll need Sherlock for testing, of course
- A text editor
I use BBEdit, but we're not doing enormous files with extensive validation, so SimpleText would do
- A utility to change the file type
The classic one is ResEdit (which also has uses for creating custom icons, which we'll go into next time), but simpler and safer is a Contextual Menu Module called More File Info. This is the one I'll use for this tutorial. If you know how to use ResEdit, you won't need the demonstration, so can get on with it yourself.
First go to the search page of your site, and save the source. Then enter a query, and save the source of the results page. These two HTML files are the absolute core of producing the plugin. This is because the plugin does two things for you - punches the data from the search form into the engine, and interprets the results for you.
Next, we'll have a look at the code for thesite's plugin:
# Created by evolt.org, with help from # apple-donuts.com
<searchname = "thesite.evolt.org" method = POST action = "http://www.evolt.org/index.cfm?menu=11" update = "http://www.easyweb.co.uk/tutorials/thesite.evolt.org.src.hqx" updateCheckDays = 3 > <input name="criteria" accesskey="s" user> <input type="Submit" name='Submit' value="go"> <input name="MaxRows" value='15'> <interpret resultListStart = "<!-- start sherlock -->" resultListEnd = "<!-- end sherlock -->" resultItemStart="<TR bgcolor=" resultItemEnd="</TR>" relevanceStart="0." relevanceEnd=" " > </search>
Part A - submitting the query
The first part of the plugin replicates the HTML form. It has to duplicate the fields, but nominate one as the prime one for user entry, as Sherlock only has one entry field. Here's the HTML of the search form (formatting & text removed for clarity):
<input type="Text" title="criteria" size="10" accesskey="s">
<form action="index.cfm?menu=11" method="POST">
<input type="Submit" value="go">
What we have to do is tell Sherlock to replicate this. So we take it, and put into Sherlock-ese, into a bunch of <input> tags, and the properties of the <search> tag:
method = POST
This is whatever method that the form uses. Evolt.org's engine uses POST, so that's what we'll use.
action = "http://www.evolt.org/index.cfm?menu=11"
...and this is where it posts it. Taken straight from the form.
<input name="criteria" accesskey="s" user>
The 'user' property tells Sherlock that this is where the input field is put. Not being a ColdFusion person, I haven't a clue what the accesskey="s" bit does, but it was in the form, so I'm not removing it (actually, I did try removing it, and the plugin stopped working, so working on the "if it ain't broke, don't fix it" theory, I recommend leaving well alone).
<input type="Submit" name='Submit' value="go">
This is again straight from the form.
<input name="MaxRows" value='15'>
This is an extra one, pinched from the version on the results page. Our search engine has the ability to specify how many results it returns, defaulting to 10. I thought that was a bit on the low side, so pushed it up to 15.
The other properties in this bit of the plugin are:
name = "thesite.evolt.org"
This is the name that Sherlock will use in the dialogue box, so it's a good idea to use it to give people an idea of what they're searching. You can also create custom icons, but that's beyond the scope of this tutorial.
update = "http://www.easyweb.co.uk/tutorials/thesite.evolt.org.src.hqx"
This tells Sherlock where to look for new versions of the plugin. After all, if you change how your search engine works, you'll need to change the plugin, and it's a real pain to find out everyone who's got a copy, and email it to them, with instructions for installation. This way, Sherlock will check the URL at the interval specified in updateCheckDays, and download and reinstall any updated versions it finds.
Part B - interpreting the results
The rest of the plugin is all about interpreting what the search engine returns. Remember that as far as the search engine is concerned, Sherlock is just another browser. It's received the query over HTTP, just as if you'd hit 'Submit' yourself, and returns the results as HTML. What you have to do is work out a consistent way of telling
- Where the list starts and ends
- Where each result starts and ends
- Where any relevence score starts and ends for each result
If you've produced the search engine yourself, life's a bowl of cherries. You can have a results page which marks each point with HTML comments, or you can even do what Altavista do, and produce a special version for Sherlock, so average users don't even see this.
For the evolt.org search engine, I don't have access to the ColdFusion template the search engine uses, so in the time between proposing this series, and writing it, only limited changes were made. So I had to work round the problems. By the time you read this, more changes may have been made, so you'll be using a more recent version of the plugin which uses them. If you've already downloaded the original plugin, you'll be automatically updated (see above - I told you it was useful).
The interpretation information is all held as properties within an <interpret> tag.
Spotting the list
resultListStart = "<!-- start sherlock -->"
This tells Sherlock to start parsing here. If you don't have anything obvious, you could just use <html>. If your plugin doesn't work, this is probably a good default to go back to.
resultListEnd = "<!-- end sherlock -->"
Pretty much the same as above, it tells Sherlock that the list is done and not to bother going on from here.
Grabbing each result
Sherlock grabs everything between the markers you specify for the start and end of each result. It puts whatever is the linked text in the list it returns, and displays the full text (less any HTML tags) when you click on one.
Now you'd expect the sensible way to do this is to specify the whole tag which starts each row. However, if you look at the results page, you'll notice that for visual clarity, every other row is a subtly different colour. So we have to end it here. Sherlock will display all the text between here and the resultItemEnd, missing out obvious HTML. So the fact that the result is across several table cells won't make any difference, but we will get the trailing part of the <TR> tag in the result, along with the relevence score. Looks a bit poor, but there's not a lot you can do.
Naturally, this tells Sherlock where the result ends. If you've got a simple list, without irrelevant rubbish in between, you can miss this tag out.
Sherlock can interpret relevence scores expressed as numbers from 0-100, and use them as a ranking criteria. Results without relevence scores are ranked as if they have a relevence of 0, so will drop to the bottom of the list. All the major engines provide handy data for Sherlock to work with, so it's really worthwhile doing this if you can.
With the evolt.org results list as it stands at the moment, there's no convenient <!-- start relevence --> tag to mark where the relevence score starts. So we have to use a rather clunky workaround. Unless a result is 100% relevent, it will start with '0.' So that's what I've used as an interim measure. Now if you search on 'Oracle', or 'Frames', you'll get a 100% result, and this will appear as if doesn't have a relevence, and will drop to the bottom of the list. At the moment, I have to live with that.
Guess what? This tells Sherlock that if it hasn't found a number by now, it's not finding one. I could have used </TD>, but as there's a handy non-breaking space at the end of each score, I might as well use it.
And that's pretty much all of your plugin text done. Save the file, and we'll make it work.
Part C - making it work
Unlike Windows machines, which use file extensions to tell the OS what application to open a file in, Macs use a couple of pieces of meta-data bundled into the file; a file type and a file creator. BBEdit uses TEXT/R*ch to save its files; DreamWeaver uses TEXT/DmWr. And Sherlock plugins are all issp/fndf. To hack the meta-data into a format Sherlock will recognise, we're going to use the More File Info CMM. Download the file and drop it into your closed System Folder (the OS will put into the right place). Now control-click onto your plugin file, and go to the appropriate option (the first time you do this, you'll have to go into the 'Other' option and enter 'issr' as the file type and 'fndf' as the creator). The file will then turn into a generic Sherlock icon:
Your new plugin should be good to go now. Drop it into your System Folder and test it. All things being equal, it should work a treat. BinHex it with StuffIt Lite and you can stick it on your site as a download.
And that's about it. In a wee while, Part 3 of 2 will go into a bit more detail on some of Sherlock's advanced features, which will allow you to specify a banner to appear at the bottom of the Sherlock window.