Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

DHTML Text Marker - An Experiment

Rated 3.93 (Ratings: 3) (Add your rating)

Log in to add a comment
(2 comments so far)

Want more?

 
Picture of Junglee

Ashok Hariharan

Member info | Full bio

User since: January 27, 2002

Last login: January 27, 2002

Articles written: 5

What is a text marker?

Anyone who uses google newsgroups search will encounter a text marker a.k.a highlighter. For instance, search on http://groups.google.com for a recipe to make Minestrone. View any of the search results, the word 'minestrone' will appear highlighted (see screen-cap below).

This feature makes a search engine on a website much more user friendly, the user now does not have to scan the page visually for the part of the text that has the 'minestrone' recipe. Instead, the user now knows immediately which part of the content has the recipe.

Google achieves this 'marking' of the text by applying a

<span style="background-color:#color">
  
tag around the search keyword.
We will look at one particular way to implement this feature.

Google groups page RIGHT : Screen shot of page from Google Groups, I searched for a recipe to make 'Minestrone' (this is a tasty italian creation). This was one of the search result pages that I opened.

Various Implementation methods

There are a lot of "out of the box" search engine tools available that can be used with a website. A few of these search engine solutions come with some kind of search results text highlighting, but what about the rest? (I am aware of a couple : Lotus Domino search and Perlfect Search ). Some people also end up rolling out their own site-specific search engine solutions.

A good way to implement this feature would be on the server side (as Google Groups search does). This involves script based processing of the content using the search keyword (in our example: minestrone) as the parameter, and then output the tagged/highlighted content to the browser. The negative side of this approach is that a different implementation might be needed for a different server side platform. Like many kinds of server side programs this could also turn out be be resource and processor intensive.

I decided for the quick 'n' dirty client-side JavaScript route - which I thought I could then re-use on any server side platform or even for static html pages (though as you read on you will learn I wasn't exactly successful).

Compatibility Issues

There are a few compatibility issues in the JavaScript approach. The original (and single unto now) guinea-pig-implementation for the code is a restricted Intranet scenario (my workplace). Almost everybody is compulsorily on IE 6, so I basically wrote it without giving a damn about other browsers (now, is that evil or what?). But the good news is I got it to work with minimum fuss on Mozilla (yeah! I am a decent guy now). I have tested it successfully on the following browsers:

  • IE 6 (Windows versions)
  • Mozilla 1 RC1

I tested the script on IE 5.0 and IE crapped out completely: when the browser reached a particular regular expression function in the code my CPU hit 100% processor utilization and IE came to a grinding halt. I guess the regular expression object in IE 5 isn't anywhere as sturdy as the one in IE6 and Mozilla.

I don't have IE 5.5, IE 5.01, Netscape 6.x installed, nor do I have MacOS running anywhere close by, so I have no idea of how the code would behave on those browsers. Though if the code is not working on IE 5, with IE 4.x your guess is as good as mine.

Theoretically it should be possible to make it work on NN4 or any browser that supports the innerHTML object, (i.e. if the regular expression object can handle the stress). I did try to make it work on Opera 6 but it doesn't support innerHTML :-(.

Which is why I have called this article an experiment ! But at least the article is futuristic with its upward compatibility ;).

JavaScripting our implementation

Here is a summarized list of steps needed to implement this:

  • Enclose the main content portion of the page within a single named
    <div>
        
    tag
  • Add an onLoad event to the <body> tag .
  • Call the highlighting function from the onLoad event, if a particular query string has been passed to the page.

Step 1 : <div>ing your content

To begin with I designed all my content pages such that, the whole content section on a page was wrapped inside a named <div> tag, something like this:

<div id="contentdiv"><!--begin content section-->
  <p> 
	  skajfl sa fjla safkljasl lkfasj ja akjflka .....<br>
	  <a href="dudu.com">my link</a> <br>
  </p>
  <ul><li>....</ul>
  <table> ......</table>
  <p>......</p>
</div><!--end content section-->

Note: this is the content area of the page, kind of similar to the article content part of a page on evolt.org. This doesn't include stuff like the evolt side bar or the top menu.

Why did I have to use a &lt;div&gt; tag?

More for convenience, to access all the HTML Content area in a page through JavaScript all I would need to do now is:

 elemObj = document.getElementById('contentdiv');<br> strInHtml = elemObj.innerHTML;
  

strInHtml is now a string containing all the HTML contained by

&lt;div
  id='contentdiv'&gt;
. I used the innerHTML property of the &lt;div&gt; tag to access the raw HTML. The good thing about the innerHTML property is , it is also a settable property.

I can do something like:

x.innerHTML = '&lt;h2&gt;My new stuff &lt;/h2&gt;';

which will overwrite all the HTML within the &lt;div&gt; with my new stuff. Our implementation will be using this powerful little DOM property.

Note: There is a big ongoing debate about the pros and cons of using innerHTML. Read all about it!. The fact remains that innerHTML is very convenient as against using the more complicated DOM methods which seems the more politically correct method.

Step 2 : the onLoad event

When a keyword on the page needs to be highlighted, the keyword is passed to the page using a query string. If the page is normally invoked like this:

http://server/page/index.asp

With the highlight query string it will be invoked like this:

http://server/page/index.asp?hilite=minestrone

Add an onLoad event in the &lt;body&gt; tag. Something like:

&lt;body onLoad=&quot;javascript:onLoad();&quot;&gt;

Step 3 : Calling the highlighting function from onLoad

Here I read the innerHTML property of the &lt;div&gt; as raw HTML into a string variable. Then I do a search and replace of every instance of the keyword with the highlighted version of the keyword. Then finally I write the replaced version of the string back into the &lt;div&gt;.

The highlighting is achieved using a simple &lt;span&gt; tag with style=&quot;background-color:yellow;&quot; .

There is a precaution to take here when inserting the tags, consider some content like this:

<p>
  .............
  <!--content-->............
  <a href="minestrone.com" title="link to minestrone home page">minsestrone 
  home page</a>.
  .............
  <!--more content-->............
</p>

The replace function should not place highlight tags around the word minestrone found within the "title" attribute of the &lt;a&gt; tag, that would break the HTML. It should replace only stuff within the &lt;a&gt;&lt;/a&gt; tags.

I used the javascript RegExp object to filter out illegal matches of this kind.

Just a couple of points before we dive into the actual code.

  • Instead of trying to match only text within an opening &lt;*&gt; and closing &lt;*/&gt; I match everything to the right of
    &lt;* &gt; . This is because not all tags have opening and closing pairs and people invariably forget to close the good old  &lt;p&gt; tag.
  • The code assumes there are no &lt;script&gt; tags within the content. I didn't build any checks in for these tags.

There is some preliminary stuff that onLoad does, that I will not be getting into explaining here as they have been dealt with earlier in articles by other people on this website:

  • Extracting the keyword to be highlighted from a querystring using a javascript DOM property (document.location.href).
  • Extracting the innerHTML from the &lt;div&gt; again using the DOM properties.
  • Writing back the highlighted text into the &lt;div&gt;'s innerHTML.

The actual function that applies the &lt;span&gt; highlighting tags to the innerHTML is quite small , lets examine it part by part:

function markText(txtKeyword, inputHtml) 
{	
var re; /*regex object*/
var varMatches; /*matches array*/
var outHtml; /*output html*/
var replaceText;/*build the span tag with the keyword in advance*/

replaceText = '<span style="background-color:yellow;color:red;font-weight:bold;">'+txtKeyword+ '</span>';

The function takes two paramters, the keyword to be highlighted (txtKeyword) and the raw HTML content string extracted using the innerHTML property (inputHtml).

All the neccessary string and regular expression object variables are declared. The highlighted keyword string is built up in advance, in Line 6 by prefixing & suffixing it with a &lt;span style=....&gt; tag.

re=new RegExp("(\<[^>][^<]*\>)([^<]*)","g"); /*create non-greedy regex match*/
outHtml=new String('');	/*init html string*/
	

A new instance of the RegExp Object is declared ,every opening (&lt;) and closing (&gt;) tag is matched and any non-tag expression to the right of the closing tag. The second parameter to the RegExp object ("g") indicates that the RegExp match will be done recursively(globally).

I had to slip in the extra [^&lt;] in the first part of the expression, sometimes the match used to bomb on encountering a non-visible character. The extra expression seemed to fix that.

	
while ((varMatches = re.exec(inputHtml)) != null)/*exec sequentially to apply span tags*/
 {
outHtml+=varMatches[1]; 	/*html tag part*/
outHtml+=replaceMe(varMatches[2], txtKeyword, replaceText); /*call the search & replace function*/
 }
return outHtml;
}

The innerHTML string is now evaluated against the regular expression object. The exec() method searches the string using the regular expression and returns an array (varMatches) containing the results of the search. Dimension 1 of the array (varMatches[1]) contains the matched HTML tag and Dimension 2 (varMatches[2]) contains the non-tagged text to the right of the matched tag .

For example if the following is one of the matches :
&lt;p class=&quot;xclass&quot;&gt;hello there
varMatches[1] would contain  &lt;p class=&quot;xclass&quot;&gt;
and
varMatches[2] would contain the string: "hello there"

The string in varMatches[2] is now searched for the keyword to be highlighted and every instance of it is replaced with the &lt;span&gt; tagged keyword (using the replaceMe() function).

Subsequently the highlighted output string from the markText() function is written back to the &lt;div&gt; tag by setting the innerHTML property , something like :

contentDivObj.innerHTML = strOutputFromMarkText;

The sample code should be self explanatory and it is commented. Most of the layer writing methods like reading and setting the innerHTML, I learnt from ppk's website.

That's about it.

There is a working example available : DHTML marker sample

Some possible improvements / optimizations :

  • Portable code for other minor browsers .
  • Right now the code treats multiple keywords as a phrase, changing this code to handle each word in the phrase individually shouldn't be hard to implement .
  • I don't do character code conversions. For example: if someone searched for a word like: bonnie&clyde. I don't convert it to bonny&amp;clyde. So maybe this could be added.

Ashok is based in Nairobi, Kenya. When not busy dodging vagrant matatus in Nairobi traffic, he keeps himself upto date by evolt-ing.

I guess I would stick to server-side..

Submitted by shanx24 on July 13, 2002 - 02:22.

..parsing of text if possible. But a good attempt at JS! Cheers.

login or register to post comments

The PHP alternative

Submitted by Markavian on July 17, 2002 - 14:46.

Hey good article, a little big to comprehend all of it. And I lost a bit of interest when you said it didn't work on all browsers.

However, when browsing through the PHP online documentation, I found a great function which allowed me to do the same thing.. but in PHP and server side. If you use PHP, take a read:

Evolt Article: PHP Text Marker

- Markavian

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.