Some typical questions from customers who want a website developed are...

So to address these questions what I decided to do was to come up with a simpler tagging scheme than HTML which even a layman could understand, which could be easily parsed into a HTML tag, and which the browser could understand. I will try to illustrate this with a simple example.

The main advantages to this method would be:

Recreating the <img> Tag

Let's take up the simple example of the image tag:

<img src="xyz001.jpg" alt="xyz image" align="right">

This is a typical way the image tag would be found. In a page-length article on a website, typically images come within the text, so it makes sense to allow the user to place the image where they want and to place as many images as they want.

So what I did was reduce the image tag to:

<g n="index">l</g>

The tag, <g>, is not an HTML tag, but is our very own tag that our server-side script recognises.

Let's examine the various parts of our tag for this example:

<g> </g>

This just indicates to our parsing program the beginning and ending of our tag.

n="index"

This attribute indicates to the parser which image the user has requested, but the index need not be the image name. We will come to that later.

<g n="index">l</g>

The 'l' between the <g></g> is an alignment indicator which asks the image to align left ('l') or right ('r').

If you notice the "alt" attribute is not part of our tag, we shall come to that (again) later. Otherwise, this is definitely easier than entering a complete <img> tag.

Just a small clarification here, that i decided to add after reading some of

the comments below :

I have used a tag called <g> in this example, some people

might consider that too cryptic, I could have very well used something like

<image name="x">left</image>, The parser

described below will handle something like that by just changing the input parameters

Applying our Custom Tag

So the content entry users have a form (I don't know what most people use… probably VB forms, Access forms or HTML forms like evolt.org) where they enter the content. In the case where I used this technique, the data store was a Domino database, so I used Domino forms (pretty rare…huh…!). The same technique can be easily duplicated on other systems. Some typical content entered by the user would be:

X company reached IPO on Feb 14th , but the CEO was disbarred from attending the conference by the police as he had 6 arms
<g n="CEO">l</g>. But federal investigator Mr. Mulder came up with an alternate theory…. Blah… blah… Blah… blah… Blah… blah… Blah… blah… Blah… blah…which conclusively
<g n="Mulder">r</g> proved that the CEO was an alien.

The parsing program goes through this and transforms the <g> tags to meaningful HTML <img> tags with the correct image name, which is then rendered in the browser, so the converted output would be:

X company reached IPO on Feb 14th, but the CEO was disbarred from attending the conference by the police as he had 6 arms <img src="path/ceo.180x100.jpg" align="left" alt="CEOs at Lunch">. But federal investigator Mr. Mulder came up with an alternate theory…. Blah… blah… Blah… blah… Blah… blah… Blah… blah… Blah… blah… which conclusively <img src="path/mulder.jpg" align="right" alt="Agent Mulder"> proved that the CEO was an alien.

The two main components that accomplish this are:

A Centralized Image Store

I have a database table and form system (let's call it an image store…) which is managed by the content entry users where they upload the image in a user-friendly form and enter the image attributes there, which gets stored in the backend database. The parser uses the n="index" attribute to locate the requested image from this table.

The image entry form

A typical procedure for entering the image would be like this:

  1. Add the image…

    The image can be put in a database or in a file system, though in the case of a file system, we would also store the path information. (In my case, it was in a database for the sake of ease of management and the database had to be replicated as well - 2 birds with 1 stone). Typically the image has an unfriendly name, like img001_180x100.jpg. Users find these names very difficult to remember or to search for.
  2. Specify an index name for the image…

    This is a simple human understandable name which will be used by the content entry in our custom tag to identify the image: <g n='indexname'>.
  3. Specify an "alt" text for the image…

    This eases things for the user as they do not have to manually enter "alt=" text every time they put an image tag in the content. The parser automatically picks it up. There is an additional advantage, since alt texts are stored centrally. In the case of images which are used in many places on a website, changing alt texts globally for an image would just imply a change in one place.
  4. Enter a short description about the image…

    I found this, along with alt text, very useful, since it allows building of an intuitive image search on a website.

This is what my form would look like for entering images:

Picture of sample Form

The image store table

My table structure for the image manager would look like this:

Img_index Image Image_AltText Image_desc
Fishcatch8.jpgFish catch at hemingwayFishing for sailfish at hemingways in the bay of biscay
CeoCeo180x100.jpgMr. Ceo LaughingCEO of X corporation, Mr CEO

I used a global setting for things like "hspacing" and "vspacing" for images, even though this could have been incorporated into the form. I could have calculated things like height and width of the image when the user uploaded it… but let me leave it as another exercise.

The Image Parser

Regarding the Parser, I try to make it flexible for my needs. In reality, I use it not only to handle <g> tags, but also a variety of other custom tags. I have one for handling links and one for embedded tables. So it is a generic parser in Java which has generic routines from which other specialised parsers were extended.

How the parser works together with the Image store

Parser structure

Image, below: structure of parsers

Picture of Parser Structure

Source Code Listings

Source Code for baseParser

This is a stripped out version of the baseParser. I removed a lot of application specific stuff to reduce the size of the parser as much as possible. There are descriptive comments along with source code.

public class baseParser

{

private int nPreParseLength =0 ;

private String strBeginp;

private String strEndp;

private String strRefp;

private String str;

private int lnkBeginLength;

private int lnkRefLength;

private int lnkEndLength;

//these 2 variables are used to keep track of tag position in the content string

//during an iterative scan through the content string

private int m_nLastIndex=0;

private int m_nPrevIndex=0;

public baseParser()

{

strBeginp="";

strEndp="";

strRefp="";

str="";

lnkBeginLength=0;

lnkRefLength=0;

lnkEndLength=0;

}

//String toBeParsed - string with content+custom tags which requires parsing

//beginP - beginning tag e.g. <g

//endP - ending tag e.g. </g>

//refP - ref.attribute tag n="

public baseParser(String toBeParsed, String beginP, String endP, String refP)

{

str = toBeParsed;

strBeginp = beginP;

strEndp = endP;

strRefp = refP;

lnkBeginLength = strBeginp.length();

lnkRefLength = strRefp.length();

lnkEndLength = strEndp.length();

}

//iterator function which scans for tags sequentially

public String parseUnit(int nBeginIndex)

{

//look for beginning

int nPrevIndex = str.indexOf(strBeginp,nBeginIndex);

//beginining tag not found..so end it

if (nPrevIndex == -1)

return "";

//look for ending

m_nPrevIndex = nPrevIndex;

int nLastIndex = str.indexOf(strEndp,nPrevIndex);

m_nLastIndex = nLastIndex+lnkEndLength;

return str.substring(nPrevIndex, nLastIndex+lnkEndLength);

}

//helper function for iterator

public String parseUnit()

{

return parseUnit(lastUnit());

} //parses out the reference attribute of the tag

// in <tag ref="refattrib">prop</tag>

//this parses out refattrib...

public String parseRef(String strCur)

{

int nRefBegin=0;

int nRefEnd = 0;

nRefBegin=strCur.indexOf(strRefp);

nRefEnd = strCur.indexOf('"', nRefBegin+lnkRefLength);

return strCur.substring(nRefBegin+lnkRefLength,nRefEnd);

}

//parses out the property of the tag

// in <tag ref="refattrib">prop</tag>

//this parses out prop...

public String parseProp(String strCur)

{

int nLnkEnd = 0;

int n = 0;

nLnkEnd = strCur.lastIndexOf(strEndp);

for (n=nLnkEnd; strCur.charAt(n) != '>' && n >= 0; n--);

if (strCur.charAt(n) != '>')

return "";

String strLnk = strCur.substring(n, nLnkEnd);

nLnkEnd = strLnk.indexOf(">");

strLnk = strLnk.substring(nLnkEnd+1);

return strLnk;

}

//helper function for iterator

public int prevUnit()

{

return m_nPrevIndex;

}

//helper function for iterator

public int lastUnit()

{

if (str.indexOf(strBeginp, m_nLastIndex) == -1)

return -1;

else

return m_nLastIndex ;

}

};

The image parser

The base parser gets used by the imgParser, as shown below. Again, I removed lot of application specific error checking and caching code to reduce size.

import java.util.Vector;

public class imgParser

{

private baseParser m_spa;

private String m_strMain="";

private String m_strKey="";

public imgParser()

{

m_strMain = "";

m_strKey = "";

}

//in reality i was also passing the db connection from the page

//script to the parser...

//this allowed me to reuse the connection i made to the db for

//displaying the page

//instead of creating new connections for each instance of the parser

public imgParser(String toBeParsed)

{

m_strMain = toBeParsed;

//seed the base parser with our required tags

m_spa = new baseParser(toBeParsed,"<g","</g>","n=\"");

}

private Vector lookupImageDb(String sImageNo )

{

// this was a lotus domino routine which looked up the

// index name in the database and returned a row of

//information as a java Vector

//for illustrative purposes...i am just returning a dummy vector

//but your routine to query the db would come over here...

//this returned a vector with 2 columns

// 1st col - image filename

// 2nd col - image alt text

Vector v = new Vector(2);

v.addElement(new String("image.jpg"));

v.addElement(new String("just a dummy image"));

return v;

} private String getPath()

{

//i didnt hard code any paths...

//this function simply calculated the path to the image inside the

//database...since i was storing the image in the database

//you could use it to calculate paths...

//for illustrative purposes, i just return a dummy path...

return new String("/images") ;

}

private int IMG_FILE_NAME_COL=0;

private int IMG_ALT_NAME_COL=1;

private String ParseImageTag(String sImageNo, String sAlign)

{

String sTemp = "";

try{

//lookup image info from image store

Vector v = lookupImageDb(sImageNo);

//string used to store alt tag

String strAltTag=new String("");

//string use to store image file name or handle in database

String strFileName = new String("");

//get the alt tage and file name

strAltTag = (String)v.elementAt(IMG_ALT_NAME_COL);

strFileName = (String)v.elementAt(IMG_FILE_NAME_COL);

//build the image tag

sTemp+="<img src=\""+getPath()+"/"+strFileName +"\"";

//set alignment depending on l or r properties

if (sAlign.equals("l"))

sTemp+=" align=\"left\"";

else

sTemp+=" align=\"right\"";

//other tags

//apply alt tag only if it exists

if (strAltTag.length() != 0)

sTemp += " alt=\""+strAltTag+"\"";

sTemp+=" hspace=\"8\" vspace=\"8\" ";

sTemp+="border=0 />";

}

catch(Exception e)

{

e.printStackTrace();

}

finally{

return sTemp;

}

}

//the only function called externally for this parser

public String ParsedString()

{

String strTemp="";

int nBegin=0;

int nPrev = 0;

int nLast = 0;

//call the base parser routine to iteratively scan for <g> tags

String sRet = m_spa.parseUnit(nBegin);

if (sRet.equals("")) //no image tags....so return string as is

return m_strMain;

//<g> tag found parse out attributes and properties

String sRef = m_spa.parseRef(sRet);

String sTxt = m_spa.parseProp(sRet);

nPrev = m_spa.prevUnit();

if (nPrev != -1)

{

//skip string to position at the end of <g></g>

strTemp = m_strMain.substring(0,nPrev);

//now build the <img> tag from the <g> tag

strTemp+=ParseImageTag(sRef, sTxt);

}

nLast = m_spa.lastUnit(); //check if end

while (nLast!=-1)

{

//parse the next <g> tag set

sRet = m_spa.parseUnit();

//<g> tag found parse out attributes and properties

sRef = m_spa.parseRef(sRet);

sTxt = m_spa.parseProp(sRet);

nPrev = m_spa.prevUnit();

if (nPrev != -1)

{

//skip string to position at the end of current <g></g>

strTemp+= m_strMain.substring(nLast,nPrev);

//parse out next image tag...

strTemp+=ParseImageTag(sRef, sTxt);

}

nLast = m_spa.lastUnit();

}

strTemp+=m_strMain.substring(nPrev+sRet.length());

return strTemp;

}

};

Using the parser in script

I use the imgParser in my code in this manner:

//..other code on page

String strContent;

String strParsedContent;

strContent = get_Content_As_String_From_Content_Table_Field();

//in reality i pass my database connection handle of the page script

//to the parser, so i can reuse it

//create our image parser object and pass the content string to it

imgParser ipObj = new imgParser(strContent);

//strParsedContent contains the transformed string now...

strParsedContent = ipObj.ParsedString();

//...output parsed content to page

printwriterObj.print(strParsedContent);

Finally…

Just a few thoughts before I finish this…

I use quite a few parsers built around the base parser. I use a link parser, which basically uses a custom tag for links, that appends a particular class name and a target=_blank for links external to the site. I use another tag to embed a small table dynamically within the text with a specific alignment.

There might be performance implications in heavily trafficked sites, since the parser runs every time someone hits the page. What I do for such cases is pre-parse the content periodically on the server and cache it in the database, so when the user accesses the page, instead of parsing every time, they see the cached version.

The way the parser handles only single attributes (the "n=" in <g n="">) was by design. I wanted to make the tags as simple as possible for the user entering content. In almost all the cases where I used the parser, I never had to use more than a single attribute. Also I never check for incorrect tags. This was because I was checking it at the client side at the point of data entry of the content.

Using this technique certainly saved time and heartburn for me. Other than dealing with clients, I also had to deal with a copy writer who liked getting acquainted with Gilbey's dry gin more than with "complicated" HTML tags. But, he actually enjoyed entering these custom tags along with his copy, since for the first time he had complete control over where to position images in the copy.