Using Custom Tags To Ease Content Management

Posted on 02 Mar 2002

in Site Development

by Ashok Hariharan (Junglee)

Rated 3.3 (Ratings: 10)

Want more?

More articles in Site Development

Ashok Hariharan

Member info

User since: 28 Jan 2002

Articles written: 5

Some typical questions from customers who want a website developed are...

I want to manage the content myself. Will you have a system for it?

I don't know HTML or programming. Can I still manage my content?

I can't understand these HTML tags? Is there an easier way out?

So to address these questions what I decided to do was to come up with a simpler tagging scheme than HTML which even a layman could understand, which could be easily parsed into a HTML tag, and which the browser could understand. I will try to illustrate this with a simple example.

The main advantages to this method would be:

A layuser can manage content/layout features very easily with very little knowledge of HTML.

Managing the content becomes much easier.

The same concept can be reused and replicated in other website projects.

Recreating the <img> Tag

Let's take up the simple example of the image tag:

<img src="xyz001.jpg" alt="xyz image" align="right">

This is a typical way the image tag would be found. In a page-length article on a website, typically images come within the text, so it makes sense to allow the user to place the image where they want and to place as many images as they want.

So what I did was reduce the image tag to:

<g n="index">l</g>

The tag, <g>, is not an HTML tag, but is our very own tag that our server-side script recognises.

Let's examine the various parts of our tag for this example:

<g> </g>: This just indicates to our parsing program the beginning and ending of our tag.

n="index": This attribute indicates to the parser which image the user has requested, but the index need not be the image name. We will come to that later.

<g n="index">l</g>: The 'l' between the <g></g> is an alignment indicator which asks the image to align left ('l') or right ('r').

If you notice the "alt" attribute is not part of our tag, we shall come to that (again) later. Otherwise, this is definitely easier than entering a complete <img> tag.

Just a small clarification here, that i decided to add after reading some of

the comments below :

I have used a tag called <g> in this example, some people

might consider that too cryptic, I could have very well used something like

<image name="x">left</image>, The parser

described below will handle something like that by just changing the input parameters

Applying our Custom Tag

So the content entry users have a form (I don't know what most people use… probably VB forms, Access forms or HTML forms like evolt.org) where they enter the content. In the case where I used this technique, the data store was a Domino database, so I used Domino forms (pretty rare…huh…!). The same technique can be easily duplicated on other systems. Some typical content entered by the user would be:

X company reached IPO on Feb 14th , but the CEO was disbarred from attending the conference by the police as he had 6 arms
<g n="CEO">l</g>. But federal investigator Mr. Mulder came up with an alternate theory…. Blah… blah… Blah… blah… Blah… blah… Blah… blah… Blah… blah…which conclusively
<g n="Mulder">r</g> proved that the CEO was an alien.

The parsing program goes through this and transforms the <g> tags to meaningful HTML <img> tags with the correct image name, which is then rendered in the browser, so the converted output would be:

X company reached IPO on Feb 14th, but the CEO was disbarred from attending the conference by the police as he had 6 arms <img src="path/ceo.180x100.jpg" align="left" alt="CEOs at Lunch">. But federal investigator Mr. Mulder came up with an alternate theory…. Blah… blah… Blah… blah… Blah… blah… Blah… blah… Blah… blah… which conclusively <img src="path/mulder.jpg" align="right" alt="Agent Mulder"> proved that the CEO was an alien.

The two main components that accomplish this are:

A centralized image storage system.

An image parser.

A Centralized Image Store

I have a database table and form system (let's call it an image store…) which is managed by the content entry users where they upload the image in a user-friendly form and enter the image attributes there, which gets stored in the backend database. The parser uses the n="index" attribute to locate the requested image from this table.

The image entry form

A typical procedure for entering the image would be like this:

Add the image…
The image can be put in a database or in a file system, though in the case of a file system, we would also store the path information. (In my case, it was in a database for the sake of ease of management and the database had to be replicated as well - 2 birds with 1 stone). Typically the image has an unfriendly name, like img001_180x100.jpg. Users find these names very difficult to remember or to search for.

Specify an index name for the image…
This is a simple human understandable name which will be used by the content entry in our custom tag to identify the image: <g n='indexname'>.

Specify an "alt" text for the image…
This eases things for the user as they do not have to manually enter "alt=" text every time they put an image tag in the content. The parser automatically picks it up. There is an additional advantage, since alt texts are stored centrally. In the case of images which are used in many places on a website, changing alt texts globally for an image would just imply a change in one place.

Enter a short description about the image…
I found this, along with alt text, very useful, since it allows building of an intuitive image search on a website.

This is what my form would look like for entering images:

The image store table

My table structure for the image manager would look like this:

Img_index	Image	Image_AltText	Image_desc
Fishcatch	8.jpg	Fish catch at hemingway	Fishing for sailfish at hemingways in the bay of biscay
Ceo	Ceo180x100.jpg	Mr. Ceo Laughing	CEO of X corporation, Mr CEO

I used a global setting for things like "hspacing" and "vspacing" for images, even though this could have been incorporated into the form. I could have calculated things like height and width of the image when the user uploaded it… but let me leave it as another exercise.

The Image Parser

Regarding the Parser, I try to make it flexible for my needs. In reality, I use it not only to handle <g> tags, but also a variety of other custom tags. I have one for handling links and one for embedded tables. So it is a generic parser in Java which has generic routines from which other specialised parsers were extended.

How the parser works together with the Image store

Get the whole content to be displayed in a string.

The string has all the text content with the embedded custom tags.

The parser then scans through the string, and when it locates a <g> tag, it extracts the tag attributes.

The parser then uses the index specified in n="" to locate the image name in the image store.

From the image store, it extracts all the associated info for the image, such as the "alt" attribute text.

The information between the <g> and </g> (the "l" or "r") is used to set the align="right" or align="left" attributes for the <img> tag.

With all this information, the <img> tag is finally built.

Lastly, the parser replaces the <g></g> tags in the content string with the <img> tag.

The content string is then further scanned for <g></g> tags and the same process happens again.

Finally the transformed content string is outputted to the browser

Parser structure

There is a generic parser called the baseParser which has generic routines for scanning for a specified tag in a content string.

There is a task specific parser, in this case an image parser called imgParser, which uses the iterative routines of the baseParser to extract its tags from a string. The task specific parser also has a function to output a custom tag in a particular way (in this case it's as an <img> tag).

Image, below: structure of parsers

Source Code Listings

Source Code for baseParser

This is a stripped out version of the baseParser. I removed a lot of application specific stuff to reduce the size of the parser as much as possible. There are descriptive comments along with source code.


public class baseParser
{
private int nPreParseLength =0 ;
private String strBeginp;
private String strEndp;
private String strRefp;
private String str;
private int lnkBeginLength;
private int lnkRefLength;
private int lnkEndLength;
//these 2 variables are used to keep track of tag position in the content string
//during an iterative scan through the content string
private int m_nLastIndex=0;
private int m_nPrevIndex=0;
public baseParser()
{
strBeginp="";
strEndp="";
strRefp="";
str="";
lnkBeginLength=0;
lnkRefLength=0;
lnkEndLength=0;
}
//String toBeParsed - string with content+custom tags which requires parsing
//beginP - beginning tag e.g. <g
//endP - ending tag e.g. </g>
//refP - ref.attribute tag n="
public baseParser(String toBeParsed, String beginP, String endP, String refP)
{
str = toBeParsed;
strBeginp = beginP;
strEndp = endP;
strRefp = refP;
lnkBeginLength = strBeginp.length();
lnkRefLength = strRefp.length();
lnkEndLength = strEndp.length();
}
 
//iterator function which scans for tags sequentially
public String parseUnit(int nBeginIndex)
{
//look for beginning
int nPrevIndex = str.indexOf(strBeginp,nBeginIndex);
//beginining tag not found..so end it
if (nPrevIndex == -1)
return "";
//look for ending
m_nPrevIndex = nPrevIndex;
int nLastIndex = str.indexOf(strEndp,nPrevIndex);
m_nLastIndex = nLastIndex+lnkEndLength;
return str.substring(nPrevIndex, nLastIndex+lnkEndLength);
}
  
//helper function for iterator 
public String parseUnit()
{
return parseUnit(lastUnit());
} //parses out the reference attribute of the tag
// in <tag ref="refattrib">prop</tag>
//this parses out refattrib...
public String parseRef(String strCur)
{
int nRefBegin=0;
int nRefEnd = 0;
nRefBegin=strCur.indexOf(strRefp);
nRefEnd = strCur.indexOf('"', nRefBegin+lnkRefLength);
return strCur.substring(nRefBegin+lnkRefLength,nRefEnd);
} 
  
//parses out the property of the tag 
// in <tag ref="refattrib">prop</tag>
//this parses out prop...
public String parseProp(String strCur)
{ 
int nLnkEnd = 0;
int n = 0;
nLnkEnd = strCur.lastIndexOf(strEndp);
for (n=nLnkEnd; strCur.charAt(n) != '>' && n >= 0; n--);
if (strCur.charAt(n) != '>')
return "";
String strLnk = strCur.substring(n, nLnkEnd);
nLnkEnd = strLnk.indexOf(">");
strLnk = strLnk.substring(nLnkEnd+1); 
return strLnk;
} 
  
//helper function for iterator 
public int prevUnit()
{
return m_nPrevIndex; 
}
//helper function for iterator
public int lastUnit()
{
if (str.indexOf(strBeginp, m_nLastIndex) == -1)
return -1;
else
return m_nLastIndex ;
}
};

The image parser

The base parser gets used by the imgParser, as shown below. Again, I removed lot of application specific error checking and caching code to reduce size.


import java.util.Vector;
public class imgParser
{
private baseParser m_spa; 
private String m_strMain="";
private String m_strKey="";
public imgParser()
{
m_strMain = "";
m_strKey = "";
}
//in reality i was also passing the db connection from the page 
//script to the parser...
//this allowed me to reuse the connection i made to the db for 
//displaying the page
//instead of creating new connections for each instance of the parser
public imgParser(String toBeParsed)
{
m_strMain = toBeParsed;
//seed the base parser with our required tags
m_spa = new baseParser(toBeParsed,"<g","</g>","n=\"");
}
  private Vector lookupImageDb(String sImageNo )
  {
  
  // this was a lotus domino routine which looked up the
  // index name in the database and returned a row of 
  //information as a java Vector
  //for illustrative purposes...i am just returning a dummy vector
  //but your routine to query the db would come over here...
  //this returned a vector with 2 columns
  // 1st col - image filename
  // 2nd col - image alt text
  Vector v = new Vector(2);
  v.addElement(new String("image.jpg"));
  v.addElement(new String("just a dummy image"));
  return v; 
  } private String getPath()
  {
  //i didnt hard code any paths...
  //this function simply calculated the path to the image inside the
  //database...since i was storing the image in the database
  //you could use it to calculate paths...
  //for illustrative purposes, i just return a dummy path...
  return new String("/images") ;
  }
  
  private int IMG_FILE_NAME_COL=0;
  private int IMG_ALT_NAME_COL=1;
  
  private String ParseImageTag(String sImageNo, String sAlign)
  {
  String sTemp = "";
  try{
    //lookup image info from image store	
  Vector v = lookupImageDb(sImageNo);
  //string used to store alt tag
  String strAltTag=new String("");
 	//string use to store image file name or handle in database
  String strFileName = new String("");
  //get the alt tage and file name
  strAltTag = (String)v.elementAt(IMG_ALT_NAME_COL);
  strFileName = (String)v.elementAt(IMG_FILE_NAME_COL);
  //build the image tag
  sTemp+="<img src=\""+getPath()+"/"+strFileName +"\"";
     //set alignment depending on l or r properties
  if (sAlign.equals("l"))
  sTemp+=" align=\"left\"";
  else
  sTemp+=" align=\"right\"";
  //other tags
  //apply alt tag only if it exists
  if (strAltTag.length() != 0)
  sTemp += " alt=\""+strAltTag+"\"";
  sTemp+=" hspace=\"8\" vspace=\"8\" ";
  sTemp+="border=0 />";
  }
  catch(Exception e)
  {
  e.printStackTrace();
  }
  finally{ 
  return sTemp; 
  }
  }
  
  //the only function called externally for this parser
  public String ParsedString() 
  {
  String strTemp="";
  
  int nBegin=0;
  int nPrev = 0;
  int nLast = 0;
  //call the base parser routine to iteratively scan for <g> tags
  String sRet = m_spa.parseUnit(nBegin);
  if (sRet.equals("")) //no image tags....so return string as is
  return m_strMain;
  //<g> tag found parse out attributes and properties 
  String sRef = m_spa.parseRef(sRet);
  String sTxt = m_spa.parseProp(sRet);
  nPrev = m_spa.prevUnit();
  if (nPrev != -1)
  {
  //skip string to position at the end of <g></g>
  strTemp = m_strMain.substring(0,nPrev);
  //now build the <img> tag from the <g> tag 
  strTemp+=ParseImageTag(sRef, sTxt);
  }
  nLast = m_spa.lastUnit(); //check if end
  while (nLast!=-1)
  {
  //parse the next <g> tag set
  sRet = m_spa.parseUnit();
  //<g> tag found parse out attributes and properties 
  sRef = m_spa.parseRef(sRet);
  sTxt = m_spa.parseProp(sRet);
  nPrev = m_spa.prevUnit();
  if (nPrev != -1)
  {
  //skip string to position at the end of current <g></g> 
  strTemp+= m_strMain.substring(nLast,nPrev);
  //parse out next image tag...
  strTemp+=ParseImageTag(sRef, sTxt);
  }
  nLast = m_spa.lastUnit();
  } 
  strTemp+=m_strMain.substring(nPrev+sRet.length());
  return strTemp; 
  }
  };

Using the parser in script

I use the imgParser in my code in this manner:


//..other code on page
String strContent;
String strParsedContent;
strContent = get_Content_As_String_From_Content_Table_Field();
//in reality i pass my database connection handle of the page script 
//to the parser, so i can reuse it
//create our image parser object and pass the content string to it
imgParser ipObj = new imgParser(strContent);
//strParsedContent contains the transformed string now...
strParsedContent = ipObj.ParsedString();
//...output parsed content to page
printwriterObj.print(strParsedContent);

Finally…

Just a few thoughts before I finish this…

I use quite a few parsers built around the base parser. I use a link parser, which basically uses a custom tag for links, that appends a particular class name and a target=_blank for links external to the site. I use another tag to embed a small table dynamically within the text with a specific alignment.

There might be performance implications in heavily trafficked sites, since the parser runs every time someone hits the page. What I do for such cases is pre-parse the content periodically on the server and cache it in the database, so when the user accesses the page, instead of parsing every time, they see the cached version.

The way the parser handles only single attributes (the "n=" in <g n="">) was by design. I wanted to make the tags as simple as possible for the user entering content. In almost all the cases where I used the parser, I never had to use more than a single attribute. Also I never check for incorrect tags. This was because I was checking it at the client side at the point of data entry of the content.

Using this technique certainly saved time and heartburn for me. Other than dealing with clients, I also had to deal with a copy writer who liked getting acquainted with Gilbey's dry gin more than with "complicated" HTML tags. But, he actually enjoyed entering these custom tags along with his copy, since for the first time he had complete control over where to position images in the copy.

Ashok is based in Nairobi, Kenya. When not busy dodging vagrant matatus in Nairobi traffic, he keeps himself upto date by evolt-ing.

Start of page header

Other Fine Evolt.org Sites

Navigation Starts

Submit

Article Categories

Highest rated articles

Help Support evolt.org

Main Page Content