Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

PHP Text Marker : An alternative

Rated 3.92 (Ratings: 2) (Add your rating)

Log in to add a comment
(17 comments so far)

Want more?

 
Picture of Markavian

Markis Brachel

Member info | Full bio

User since: July 08, 2002

Last login: July 08, 2002

Articles written: 1

Text Highlighting

Websites like google.com have the ability to highlight text on any page based on a search item.

Previous experiments by Ashok Hariharan (Junglee) in his article DHTML Text Marker An Experiment. were brought to my attention when I found a way of doing just this, using PHP.

Read on to learn how to highlight any text on a page using a simple PHP script.

The Function

The following code has come straight off the php.net online manual. The ob_start() function. Basically what it does is buffer any/all page output until you say ob_end_flush(); .. and then it outputs the whole page.

In this example a function called callback has been created, which searches through the entire output, and replaces the word 'apples' with 'oranges'. The 'callback' function is called when ob_end_flush() is initialised. e.g. ob_end_flush("callback");.

The code is like so:

<?php
function callback($buffer) {

// replace all the apples with oranges
  
return (ereg_replace("apples""oranges"$buffer));

}

ob_start("callback");
?>

It's like comparing apples to oranges.

<?php
ob_end_flush
();
?>

All the output is buffered. Each time an output is detected, it is run through the callback function, which replaces all occurances of 'apples', with 'oranges'.

At the end of the script, ob_end_flush(); outputs the filtered buffer contents.

Creating Useful Code

We want to search through a document, and highlight a specified search item.
We already have a search and replace function above. To highlight a word we simply surround the text with a <span class="highlight"> tag.

Firstly we need to set up the CSS class:

.highlight {
<ul>
background: #44AA44<br>
color: white;
</ul>
}

Next the code is altered to search for a specified search criteria, and when found. surround this text with the <span class="highlight"> tag.
The search criteria used is a variable called $searchItem.
For the 'callback' function to work $searchItem must be set.
NB: The line: global $searchItem; - . If you don't know, please see PHP variable scope for details.

The altered code follows:

<?php
if(!isset($searchItem)) { $searchItem "randomstringset"; }
function 
callback($buffer) {

global 
$searchItem;
// surround search item items with highlight class
return (ereg_replace($searchItem"<span class='highlight'>$searchItem</span>"$buffer));

}

ob_start("callback");
?>
.highlight { background: #44AA44; color: white; }

It's like comparing bannanas to oranges to apples to peaches.

<?php
ob_end_flush
();
?>

Operator's Guide

Now if you send $searchItem=oranges to the script, the word 'oranges' will be highlighted.
e.g. URL highlight.php?searchItem=oranges

If you've got PHP, go on.. try it!
Also try changing the search item from oranges, to apples, to peaches. etc. and see how it highlights the different words.

Extending the script

Thats the basic principle.. however, you may like to do a improved matching filter, e.g. oranges aren't the same as Oranges.. and maybe the user would prefer orange and Orange and Oranges to be highlighted.

Also, theres a small snag, if someone searches for 'p', it messes up the code a tad.

Comments

Please try out the script, and comment on any successes / added features below. If you can think of any faster / better ways of doing this search / replace then please post.

Reference

And for your reference:

Previous experiments by Ashok Hariharan (Junglee) in his article DHTML Text Marker An Experiment. - A semi inspiration to write this article.

www.php.net - Still the best source for PHP documentation.

PHP Function ob_start() - The all important 'buffer output' command.

PHP variable scope - A useful page about variable scope, variables inside functions, and things which aren't obvious to the newer programmer.

This author likes building websites, playing around with images, and making things that do something.
Style and functionality are what matter. Make it do something, make it look good, and you're on to a winner.
- Markavian

tag filtering

Submitted by Junglee on July 18, 2002 - 02:10.

I am actually pretty much a zero, when it comes to PHP ! Is the output buffering you have mentioned similar to response buffering in ASP ?
b.t.w I think you can get over that tag matching problem (the part you mentioned about the snag where it matches the html tag <p> if someone searches for 'p' ) by filtering out text appearing within html tags using a regular expression around your replace search term (should be possible in PHP).
something like : <[^>][^<]*\>)([^<]*) (could be different in PHP)

Cheers !

login or register to post comments

ereg_replace

Submitted by luminosity on July 18, 2002 - 02:51.

You only need to use the function ereg_replace when you are using regular expressions. For all the times when your search and replace is just using strings, you're better off going with the easier to type, and faster str_replace.

I would imagine that it'd also be easy — and a lot better for a cut and paste script — to write a function that will only search and replace around tags as well (just as Ashok's JavaScript function did).

login or register to post comments

str_replace

Submitted by Markavian on July 18, 2002 - 03:32.

I blame the PHP manual for not using str_replace in the first place.. I just blindly copied their code straight over. str_replace is considered to be a much faster search and replace function. I did notice that while I was reading about ob_start .. but I forgot to change my code.

Junglee, I don't know how ot program in ASP, so I wouldn't know. I might do some research later to see if I could implement your idea. It'd make my script more useful.

You can also move the ob_start(); command just to cover the content you want filtering, such as just before a file include or database function.. that'd save even more time/effort with the search and replace.. and cut down on errors hopefully.

login or register to post comments

str_replace

Submitted by webqs on July 18, 2002 - 21:33.

I agree - str_replace is much better and less intensive than ereg_replace on simple string replacements such as this. Although I have a feeling that str_replace is case sensitive. I find eregi_replace good for a case insensitive search and replace... rgds James

login or register to post comments

Something to use for your callback function

Submitted by endquote on July 19, 2002 - 01:03.

It was funny to read the DHTML highlighter tutorial, since I just made a similar thing in PHP, and here's another one on how to do it in PHP. Just using str_replace is a major problem, though, because if your highlight word is in a tag (or in a tag attribute, or an onmouseover argument, etc), things get way messed up. I tried to make a solution for all cases with regular expressions, but could never get it quite right, and it ran very slowly (when trying to highlight 200 different words in the text, for example - the project I'm working on currently has this requirement). The following is a function that will do it perfectly, and much quicker, but it's pretty long. Feel free to use the code (an email would be nice, though), and if you have any improvements or find any bugs, I'd love to hear about them.
/*
markup_words - A function to place tags around specified words in a string of HTML text. Returns the modified text.

$sText (string, required) - The HTML text to examine.

$aWords (array, required) - An associative array of words to place markup around. Each word is a key, with the tag to be placed before it as a value. Eg, array('word'->'')

$sCloseTag (string, required) - The tag to be placed after each of the words (eg, '')

$boolGlobal (boolean, required) - If true, replace all instances of the words in $aWords. If false, replace only the first one.

The function ignores instances of the words within HTML comments, HTML tags, and, if $sCloseTag = '', HTML links.

Dependencies - check_ignore().
*/
function markup_words($sText, $aWords, $sCloseTag, $boolGlobal) {
// by default, don't markup anything in comments, tags, or links
	if($sCloseTag == '') {
		$aIgnoreTokens = array('', '', '');
	} else {
		$aIgnoreTokens = array('', '');
	}	
// keep a lowercase copy of $sText for searching
	$sLower = strtolower($sText);

// store the sections of $sText that are within the $aIgnoreTokens tags into an array
// for checking later.
	$aIgnoreSections = array();
	for($i=0; $i= 1) { $sCharBefore = $sLower[$iPos-1]; } else { $sCharBefore = ''; }
				if($iPos  
  

login or register to post comments

Well that didn't work

Submitted by endquote on July 19, 2002 - 01:08.

The code in my previous comment got totally mangled, so let's try again.
/*
markup_words - A function to place tags around specified words in a string
	of HTML text. Returns the modified text.

$sText (string, required) - The HTML text to examine.

$aWords (array, required) - An associative array of words to place markup
	around. Each word is a key, with the tag to be placed before it as a
	value. Eg, array('word'->'')

$sCloseTag (string, required) - The tag to be placed after each of the words
	(eg, '')

$boolGlobal (boolean, required) - If true, replace all instances of the
	words in $aWords. If false, replace only the first one.

The function ignores instances of the words within HTML comments, HTML tags,
	and, if $sCloseTag = '', HTML links.

Dependencies - check_ignore().
*/
function markup_words($sText, $aWords, $sCloseTag, $boolGlobal) {
// by default, don't markup anything in comments, tags, or links
	if($sCloseTag == '') {
		$aIgnoreTokens = array('', '', '');
	} else {
		$aIgnoreTokens = array('', '');
	}	
// keep a lowercase copy of $sText for searching
	$sLower = strtolower($sText);

// store the sections of $sText that are within the $aIgnoreTokens tags into an array
// for checking later.
	$aIgnoreSections = array();
	for($i=0; $i= 1) { $sCharBefore = $sLower[$iPos-1]; } else { $sCharBefore = ''; }
				if($iPos  
  

login or register to post comments

Oh well.

Submitted by endquote on July 19, 2002 - 01:10.

Still mangled. Looks fine in preview, but most of it is missing on post. Someone oughta fix that.

login or register to post comments

Flush

Submitted by DomitianX on July 19, 2002 - 12:51.

Junglee it is the same as response.flush. You could do this very easy in ASP with the replace function or Regular Expressions. Regular Expressions would be much faster than replace though.

login or register to post comments

Bad plan

Submitted by dffuller on July 19, 2002 - 17:06.

You really shouldn't try to do this work using regular expressions. You should do this work using an HTML parser. Your work will either be slower or incorrect if you don't use a good parser. You aren't going to do better without spending a great deal of time on it and even then, you aren't likely to do as well as the excellent tools that are already available. I speak from a great deal of experience in this.

login or register to post comments

-

Submitted by Markavian on July 19, 2002 - 19:56.

Dfuller, a HTML parser such as?

I think evolt.org could have helped posters out a bit more by changing carrige returns into
tags or

tags or something automatically.. it'd save some hassle on our part spacing stuff out. then you could write the code tag once, and you'd be laughing.

also those text areas are badly sized. When I _wrote_ my article I use a bit of CSS to expand the textarea to fit my code without needing to scroll.

login or register to post comments

yes

Submitted by sunboy on July 19, 2002 - 21:54.

<?php
if(!isset($searchItem)) { $searchItem "randomstringset"; }
function 
callback($buffer) {

global 
$searchItem;
// surround search item items with highlight class
return (ereg_replace($searchItem"<span class='highlight'>$searchItem</span>"$buffer));

}

ob_start("callback");
?>
.highlight { background: #44AA44; color: white; }

It's like comparing bannanas to oranges to apples to peaches.

<?php
ob_end_flush
();
?>

login or register to post comments

HTML Parser

Submitted by dffuller on July 20, 2002 - 22:49.

Well, for a pure HTML Parser I'd recommend the Perl package HTML::Parser. It's probably the best of that sorta thing out there. If you're willing to go with XHTML, you can use any XML parser. That's probably what I'd recommend. The problem with the current approach is that searching for something like "p" (already mentioned, also unlikely), "html", "head", "body", "title" (not mentioned, more likely) will break what would have been valid HTML code. You also wouldn't want to make replacements within the <title> tags or the <meta> tags, as this will cause funny things appear in the browser's title bar. I'm sure that I'm missing 100 other nasty posibilities that exist when you look at this sorta thing. By using a parser correctly, you can easily avoid these issues. Well, at least easier than by rolling your own parser. ;-)

login or register to post comments

What about plain content

Submitted by webqs on July 21, 2002 - 05:02.

The worry with breaking HTML code content by highlighting strings would not occur if the text was stored sans-html in a database. It would then simply be a matter of performing the required replacements prior to slotting the content into the HTML framework. OR Performing the highlight by wrapping the required text in html prior to db insert. Although this wouldn't help if the highlight 'dictionary' was constantly being updated. Easy! rgds James

login or register to post comments

endquote's mangled solution

Submitted by emilyw on October 21, 2002 - 18:18.

Hello, I would love to see the code posted by mr/ms. endquote on 07/19/2002. If it cannot be posted with success on this site could you email it to me? Many many thanks. -Emma emilyw@altern.org

login or register to post comments

JavaScript to highlight words on page

Submitted by drstuey on April 30, 2003 - 19:29.

This can be done with JavaScript (in IE at least). Try out this "bookmarklet"

This script is from the Bookmarklets.com page on Navigation Tools.

If you are using IE, click the link and enter the word you want to highlight on this page. Or right-click, choose Add to Favourites and then you can access this tool for any webpage you currently have open. WOW!!!

Show Occurrences of Word... (Explorer version)

Here's hoping this code works in this comment....

login or register to post comments

Usability problems in highlighting words on a page

Submitted by drstuey on April 30, 2003 - 19:49.

Here is a semi-cheeky email I sent to Jakob Nielsen, about a usability failing on his site (www.useit.com) regarding highlighting search terms on the page of content after a search. Just something to bear in mind when you construct one of these highlight words on page things. i.e. does the method you use produce unusable URLs?

--------------------------------------------

Hi,

have you done any research on the usability of highlighting the search term searched for on the page of content?

It seems to me that this is poor usability because it breaks the short bookmarkable and emailable URL. Also, if I want to know where on the page the search term I searched for is, I can use the browser's "Find on Page" command.

for instance I went to:

http://useit.mondosearch.com/cgi-bin/MsmGo.exe?grab_id=51714975&EXTRA_ARG=GRAB_ID%3D50964906
%00%26EXTRA_ARG%3D%00%26HOST_ID%3D2%00%26PAGE_ID%3D219%00%26HIWORD%3D
BACK%2BBUTTON%2BBUTTONS%2BBACKED%2BBACKS%2BBACKING%2B&host_id=2&page_id=241
&query=Back+Button&hiword=BACK+BUTTON+BACKING+BUTTONS+BACKED+BACKS+


and to actually see what the real URL was I had to click the Next Alertbox link and then click the Previous Alertbox link to get to the page with no highlighting of search term.

I can see that highlighting search term in a page of search engine results where you show the context of the keyword is a good thing, but highlighting the search term on the page of content causes more problems that the benefit it supposedly provides. At the very least, whenever you do a page like this, you should provide a link to the "real" page with proper URL and no highlighting.

--------------------------------------------

Yes that IS the real URL. Try a search on www.useit.com and you'll see this in action. cheers

login or register to post comments

Highlight multiple words ignoring HTML

Submitted by DisasterMan on March 6, 2004 - 11:25.

Untitled Document

My solution doesn't create the same problem that drstuey is finding.

This function can handle a string of several words and higlight them seperately with different background colours for each word (up to 14 - add more colours to the array if you REALLY need more! And, yes, they are gross colours - change them to your needs)

I use eregi_replace as it is case insensitive and requires no regular expression for the needle ($needle instead of "'($needle)'si"), but it is slow. Apparently PHP 5 CVS includes str_ireplace, which would be best, and less memeory intensive, but that isn't running on my server...

I have gotten around the problem of mangling HTML by only buffering individual results, rather than a whole HTML chunk, which was buggering me up for ages! As the highlighter is a function you can apply it to any size chunk of code, and reuse it as many times as necessary.

I am using MATCH [sql field/s] AGAINST [search terms] technique for searching as it is quite powerful with built in boolean operators and can even give results relevance by the number of times the keywords appear. You can also use < and > to adjust the importance of a search word, although I have yet to implement this, if I ever will.
One downside is that the minimum search is 4 chars, so I provide a simple LIKE search option too, as I am searching a database with many short entries.
So, anyway I have to strip operators from my search string and turn it into an array of terms ready for highlighting.

see http://www.mysql.com/doc/en/Fulltext_Search.html for more on MATCH AGAINST - it's phat.

Below the highlight function is a sample of my code using the highlight on several results.

<?
//convert search string to array minus operators
//$terms = explode(" ",$search);
//if you don't need to strip anything from the string
$terms = explode(" ", str_replace(array("+","-","*","~","\"","(",")","<",">","\"),"",$search));

// highlight search terms function function highlight($buffer) { global $terms; //create an array of colours for highlights $colors = array('80A000','922292','990000','707070','008080','999900','009999','1111FF','FF00FF','808080','008000','006666','800033','000080'); //set $i - this controls which colour is used $i=0; // do the replace for each word in the array foreach($terms as $needle){ //if you run out of colours, start over if ($i>count($colors)-1) { $i=0; } //surround matches with highlight span $buffer = eregi_replace($needle, "<span style='background-color:#$colors[$i]'></span>",$buffer); //increment $i so next colour is different $i++; } //send the results back to output return $buffer; } ?> insert head, css, java and the first part of your page here <? // start highlighting search terms ob_start("highlight"); ?> insert data to be highlighted here <? // end highlighting search terms ob_end_flush(); ?>

And here are some table cells with the results waiting to be highlighted.

&lt;td width=&quot;25%&quot; align=&quot;left&quot;&gt;File: &lt;? ob_start(&quot;highlight&quot;);
  echo $result[&quot;file&quot;]; ob_end_flush(); ?&gt;&lt;/td&gt;<br>
  &lt;td width=&quot;25%&quot; align=&quot;left&quot;&gt;Timecode: &lt;?
  ob_start(&quot;highlight&quot;);
echo $result[&quot;timecode&quot;]; ob_end_flush(); ?&gt;&lt;/td&gt;<br>
&lt;td width=&quot;25%&quot; align=&quot;left&quot;&gt;Title: &lt;?
ob_start(&quot;highlight&quot;);
echo $result[&quot;song_title&quot;]; ob_end_flush();?&gt;&lt;/td&gt;<br>
&lt;td width=&quot;25%&quot; align=&quot;left&quot;&gt;Ref: &lt;?
ob_start(&quot;highlight&quot;);
echo $result[&quot;ab_code&quot;]; ob_end_flush();?&gt; &lt;/td&gt;


I hope this is useful because I have wasted a lot out of time on it!

The project I am working on (an archive of material by and about a friend who is sadly no longer with us) is not fully live yet, but should be soon (it all works, but the content needs to be checked). In the mean time, as a favor for my time here, take a look at anno.co.uk, check out some free music and amazing words, and come back soon to see the archive in action.

Keep on hacking away down the code mine!

DisasterMan = :)

Credit is due to everybody who you find when you google for PHP highlight search result and similar things!

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.