Main Page Content
Automatic TOC Generation
Rated 3.72 (Ratings: 4) (Add your rating)
Log in to add a comment
(13 comments so far)
Many one-page HTML documents need to have a simple Table of
Contents
(abbreviated to TOC for the rest of this article). Writing it by
hand can be difficult to maintain after some time. To have an
automatically generated TOC I see two solutions:
writing the document using a WYSIWYG editor that knows to generate a TOC. This has the unacceptable disadvantage that the generated code is usually atrocious and in some cases it does not even validate.
writing the document without a TOC and have the TOC automatically generated through the wonders of JavaScript and the DOM.
In this article we will discuss the second method. So the motto is: let the browser do it.
What we get
First, let's just assume that we already have the script and see how are we going to use it. Basically, we want to be able to write the BODY tag like this:
<body onload="generate_TOC('toc');">And somewhere inside the BODY content, where we want the TOC to appear, we should have the following:
<div id="toc"></div>This DIV will be known as the parent of the TOC.
With this setup, when a browser which haa proper support for JavaScript and the DOM loads our page, it will automatically generate the TOC inside the parent.
Problem overview
The generate_TOC function should walk through the document
and remember all the headlines (<Hx> tags where x is 1,
2, …, 6). For each headline it should extract the text inside, create a
link which has the headline as its target and add the link to the parent DIV.
The script will not handle styling. Styling can — and should — be done using external CSS, knowing the fact that the parent DIV has the ID "toc" and some other small details that we will discuss later.
Tasks
Here is the complete definition of tasks involved in TOC generation:
Retrieving text. We will create a function that retrieves the text from an element. While this function should work with any element type, we will only use it for retrieving text from headlines.
Finding the headlines. We will create a list of objects that contain a reference to the headline element, the text inside it and an integer variable representing the TOC item level (for proper indentation).
Creating the TOC. For all items retrieved at step 2 we will:
Check if the headline element has an ID assigned. If not we will add an automatically generated ID. This will allow us to create a link that targets the headline.
Create a DIV element (D) that has the class "levelx", where x is the level of the TOC element (1, 2, ..., 6). This will help us style the TOC using CSS.
Create a link element that displays the text retrieved at step 2 for the current item and has the ID of the current headline as its HREF (remember, if the ID did not exist then we have generated it). Append this link to the DIV created above.
Finally, append the (D) DIV into the TOC's parent DIV.
The Code
Now, for the most interesting part of this article, the code.
The text retrieval function
For retrieving text from a simple heading like
"<H2>Overview</H2>" we could use simple
code like "text = element.firstChild.data;".
However, we want to be able to retrieve code even from complicated headings,
like "
<H2><b>This</b> is a
<em>complicated</em>
<u>heading</u></H2>
function H_getText(el) {
var text = "";
for (var i = el.firstChild; i != null; i = i.nextSibling) {
if (i.nodeType == 3 /* Node.TEXT_NODE, IE doesn't speak constants */)
text += i.data;
else if (i.firstChild != null)
text += H_getText(i);
}
return text;
}
The only parameter to this function, el, is a reference to the
HTML element from which we need to extract the text. We can get one, for
instance, using document.getElementById().
A simple JavaScript object
JavaScript arrays are very powerful tools. However, they can only
contain one value for each element. If we need to store more
values inside, say, a[1], then we need a function that creates
a new object containing all of these values and then we can store that
object (or more correctly, a reference to it) in
a[1].
Note that another good approach would be to just store another
Array object, instead of a customized object. But using an
object customized for our problem is more elegant, not to mention allowing better
code readability.
Our object needs three properties: a reference to the headline element (el), the text inside it (text) and the item's TOC level (level). The code needed to create it is below.
function TOC_EL(el, text, level) {
this.element = el;
this.text = text;
this.level = level;
}
Retrieving headlines
A very simple solution for this problem is to use
document.getElementsByTagName("*"), which should return all
elements from the HTML document. Then, for each element, we compare the
tagName attribute with "H1", "H2", etc.
However, this method does not work for Internet Explorer 5.0 because the
browser doesn't properly understand the "*" parameter and
returns no elements. Therefore we created the following function to deal
with this problem. The function returns an Array object
containing all the headlines found in the document, as objects defined in
the previous section (thus also having the title and the TOC level). It
should be called with a reference to the <BODY>
element.
function getHeadlines(el) {
var l = new Array;
var rx = /[hH]([1-6])/;
// internal recursive function that scans the DOM tree
var rec = function (el) {
for (var i = el.firstChild; i != null; i = i.nextSibling) {
if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
if (rx.exec(i.tagName))
l[l.length] = new TOC_EL(i, H_getText(i), parseInt(RegExp.$1));
rec(i);
}
}
}
rec(el);
return l;
}
Some notes:
getHeadlines()contains a nested function. This function doesn't have a name, but we "assign" it to the variablerecso that we can call it. This kind of construct is very useful to avoid putting too many parameters or local variables inside the recursive function — it has access to variables defined in the containing function.- We are using a RegExp to test if the current element is a headline,
that is, if it's
tagNamematches any of Hx, where x is 1 .. 6. - A somewhat confusing construct is used to push an element into the
array: "
l[l.length] = ...". That is because IE5 does not have thepushmethod in theArrayobject. I recently found a good article that shows how to solve such problems at a more general level.
The generate_TOC function
This function is the main entry point into this script. It is intended
to be called from the <body onload=""> handler with the ID of
some <div> element (or anything else) that will be the
parent of TOC.
It will construct the list of all headings, using
getHeadlines(), then iterate through it and create elements
inside the parent for each headline. The created elements will
consist of a <div> having the class name
"levelX", where X is the level of indentation (1 to
6) of that TOC entry. This will allow easy customization through CSS.
function generate_TOC(parent_id) {
var parent = document.getElementById(parent_id);
var hs = getHeadlines(document.getElementsByTagName("body")[0]);
for (var i = 0; i < hs.length; ++i) {
var hi = hs[i];
var d = document.createElement("div");
if (hi.element.id == "")
hi.element.id = "gen" + i;
var a = document.createElement("a");
a.href = "#" + hi.element.id;
a.appendChild(document.createTextNode(hi.text));
d.appendChild(a);
d.className = "level" + hi.level;
parent.appendChild(d);
}
}
For example, following is the HTML code that the above function generates for this page. As a side note, I used a handy feature of Mozilla: if you select something on the page then right click, in the menu that appears you see this option: "View Selection Source". It shows the HTML code for the selected block, as it is right at that moment — therefore, it also shows code generated by JavaScript-s.
<div id="toc"> <div class="level1"><a href="#gen0">Automatic TOC Generation</a></div> <div class="level2"><a href="#wwg">What we get</a></div> <div class="level2"><a href="#gen2">Problem overview</a></div> <div class="level3"><a href="#gen3">Tasks</a></div> <div class="level2"><a href="#gen4">The Code</a></div> <div class="level3"><a href="#gettext">The text retrieval function</a></div> <div class="level3"><a href="#gen6">A simple JavaScript object</a></div> <div class="level3"><a href="#getheadlines">Retrieving headlines</a></div> <div class="level3"><a href="#generatetoc">The generate_TOC function</a></div> <div class="level2"><a href="#styling">Styling and indentation</a></div> <div class="level2"><a href="#gen10">Putting all together</a></div> </div>
Styling and indentation
We can heavily style the TOC using external CSS and just knowing the ID of the parent DIV, and the fact that items of different levels will have different classes (starting from "level1" to "level6"). Indentation is also possible with CSS, so the script simply doesn't need to know how much to indent levels.
An example style is shown below. For a more fancy look you can check this page.
#toc {
float: right;
font-size: 80%;
border: 1px solid #000;
margin: 0px 0px 20px 20px;
padding: 5px;
background: #ddd;
}
#toc .level2 { margin-left: 1em; }
#toc .level3 { margin-left: 2em; }
#toc .level4 { margin-left: 3em; }
#toc .level5 { margin-left: 4em; }
#toc .level6 { margin-left: 5em; }
Putting all together
Usage is simple: just dump all these functions inside a ".js" file, load that file into your page that needs a TOC and do the simple setup described in section What we get.
To get a properly indented TOC you should also include a stylesheet, like the one above. Further customization is possible, i.e. different background / color for different TOC levels, or a fancy hover / active style for links inside #toc, etc.



