Skip to page content or Skip to Accesskey List.

Work

Main Page Content

Automatic Toc Generation

Rated 3.72 (Ratings: 4)

Want more?

  • More articles in Code
 

Mihai Bazon

Member info

User since: 27 Oct 2002

Articles written: 1

Many one-page HTML documents need to have a simple Table of

Contents (abbreviated to TOC for the rest of this article). Writing it by

hand can be difficult to maintain after some time. To have an

automatically generated TOC I see two solutions:

  • writing the document using a title="What You See Is What You Get">WYSIWYG editor that knows

    to generate a TOC.

    This has the unacceptable disadvantage

    that the generated code is usually atrocious and in some cases it does not

    even validate.

  • writing the document without a TOC and have the TOC automatically

    generated through the wonders of JavaScript and the

    DOM.

In this article we will discuss the second method. So the motto is:

let the browser do it.

What we get

First, let's just assume that we already have the script and see how are

we going to use it. Basically, we want to be able to write the BODY tag

like this:

<body onload="generate_TOC('toc');">

And somewhere inside the BODY content, where we want the TOC to appear,

we should have the following:

<div id="toc"></div>

This DIV will be known as the parent of the TOC.

With this setup, when a browser which haa proper support for JavaScript and the

DOM loads our page, it

will automatically generate the TOC inside the parent.

Problem overview

The generate_TOC function should walk through the document

and remember all the headlines (<Hx> tags where x is 1,

2, …, 6). For each headline it should extract the text inside, create a

link which has the headline as its target and add the link to the parent DIV.

The script will not handle styling. Styling can — and should

— be done using external title="Cascading Style Sheet">CSS, knowing the fact that the

parent DIV has the ID "toc" and some other small details that we

will discuss later.

Tasks

Here is the complete definition of tasks involved in TOC generation:

  1. Retrieving text. We will create a function

    that retrieves the text from an element. While this function should work

    with any element type, we will only use it for retrieving text from

    headlines.

  2. Finding the headlines. We will create a

    list of objects that contain a reference to the headline element, the text

    inside it and an integer variable representing the TOC item level (for

    proper indentation).

  3. Creating the TOC. For all items retrieved at

    step 2 we will:

    • Check if the headline element has an ID assigned. If not we will

      add an automatically generated ID. This will allow us to create a link

      that targets the headline.

    • Create a DIV element (D) that has the class

      "levelx", where x is the level of the TOC

      element (1, 2, ..., 6). This will help us style the TOC using CSS.

    • Create a link element that displays the text retrieved at step

      2 for the current item and has the ID of the current

      headline as its HREF (remember, if the ID did not exist then we have generated it).

      Append this link to the DIV created above.

    • Finally, append the (D) DIV into the TOC's

      parent DIV.

The Code

Now, for the most interesting part of this article, the code.

The text retrieval function

For retrieving text from a simple heading like

"<H2>Overview</H2>" we could use simple

code like "text = element.firstChild.data;".

However, we want to be able to retrieve code even from complicated headings,

like "<H2><b>This</b> is a

<em>complicated</em>

<u>heading</u></H2>
". They just might contain

more than simple text, therefore we need to have a recursive function that

walks through all children elements and accumulates every little piece of

text it finds. The code is given below.

function H_getText(el) {

var text = "";

for (var i = el.firstChild; i != null; i = i.nextSibling) {

if (i.nodeType == 3 /* Node.TEXT_NODE, IE doesn't speak constants */)

text += i.data;

else if (i.firstChild != null)

text += H_getText(i);

}

return text;

}

The only parameter to this function, el, is a reference to the

HTML element from which we need to extract the text. We can get one, for

instance, using document.getElementById().

A simple JavaScript object

JavaScript arrays are very powerful tools. However, they can only

contain one value for each element. If we need to store more

values inside, say, a[1], then we need a function that creates

a new object containing all of these values and then we can store that

object
(or more correctly, a reference to it) in

a[1].

Note that another good approach would be to just store another

Array object, instead of a customized object. But using an

object customized for our problem is more elegant, not to mention allowing better

code readability.

Our object needs three properties: a reference to the headline element

(el), the text inside it (text) and the item's TOC level

(level). The code needed to create it is below.

function TOC_EL(el, text, level) {

this.element = el;

this.text = text;

this.level = level;

}

Retrieving headlines

A very simple solution for this problem is to use

document.getElementsByTagName("*"), which should return all

elements from the HTML document. Then, for each element, we compare the

tagName attribute with "H1", "H2", etc.

However, this method does not work for Internet Explorer 5.0 because the

browser doesn't properly understand the "*" parameter and

returns no elements. Therefore we created the following function to deal

with this problem. The function returns an Array object

containing all the headlines found in the document, as objects defined in

the previous section (thus also having the title and the TOC level). It

should be called with a reference to the <BODY>

element.

function getHeadlines(el) {

var l = new Array;

var rx = /[hH]([1-6])/;

// internal recursive function that scans the DOM tree

var rec = function (el) {

for (var i = el.firstChild; i != null; i = i.nextSibling) {

if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {

if (rx.exec(i.tagName))

l[l.length] = new TOC_EL(i, H_getText(i), parseInt(RegExp.$1));

rec(i);

}

}

}

rec(el);

return l;

}

Some notes:

  • getHeadlines() contains a nested function. This function

    doesn't have a name, but we "assign" it to the variable

    rec so that we can call it. This kind of construct is very

    useful to avoid putting too many parameters or local variables inside the

    recursive function — it has access to variables defined in the containing

    function.

  • We are using a RegExp to test if the current element is a headline,

    that is, if it's tagName matches any of Hx, where

    x is 1 .. 6.

  • A somewhat confusing construct is used to push an element into the

    array: "l[l.length] = ...". That is because IE5

    does not have the push method in the Array

    object. I recently found a

    target="_blank" title="Remedial JavaScript. Opens in new window.">

    good article
    that shows how to solve such problems at a more general

    level.

The generate_TOC function

This function is the main entry point into this script. It is intended

to be called from the <body onload=""> handler with the ID of

some <div> element (or anything else) that will be the

parent of TOC.

It will construct the list of all headings, using

getHeadlines(), then iterate through it and create elements

inside the parent for each headline. The created elements will

consist of a <div> having the class name

"levelX", where X is the level of indentation (1 to

6) of that TOC entry. This will allow easy customization through CSS.

function generate_TOC(parent_id) {

var parent = document.getElementById(parent_id);

var hs = getHeadlines(document.getElementsByTagName("body")[0]);

for (var i = 0; i < hs.length; ++i) {

var hi = hs[i];

var d = document.createElement("div");

if (hi.element.id == "")

hi.element.id = "gen" + i;

var a = document.createElement("a");

a.href = "#" + hi.element.id;

a.appendChild(document.createTextNode(hi.text));

d.appendChild(a);

d.className = "level" + hi.level;

parent.appendChild(d);

}

}

For example, following is the HTML code that the above function generates

for this page. As a side note, I used a handy feature of href="http://mozilla.org" title="The Browser's Home Page"

target="_blank">Mozilla: if you select something on the page then right

click, in the menu that appears you see this option: "View Selection

Source". It shows the HTML code for the selected block, as it is right

at that moment — therefore, it also shows code generated by

JavaScript-s.

<div id="toc">

<div class="level1"><a href="#gen0">Automatic TOC Generation</a></div>

<div class="level2"><a href="#wwg">What we get</a></div>

<div class="level2"><a href="#gen2">Problem overview</a></div>

<div class="level3"><a href="#gen3">Tasks</a></div>

<div class="level2"><a href="#gen4">The Code</a></div>

<div class="level3"><a href="#gettext">The text retrieval function</a></div>

<div class="level3"><a href="#gen6">A simple JavaScript object</a></div>

<div class="level3"><a href="#getheadlines">Retrieving headlines</a></div>

<div class="level3"><a href="#generatetoc">The generate_TOC function</a></div>

<div class="level2"><a href="#styling">Styling and indentation</a></div>

<div class="level2"><a href="#gen10">Putting all together</a></div>

</div>

Styling and indentation

We can heavily style the TOC using external CSS and just knowing the ID

of the parent DIV, and the fact that items of different levels will

have different classes (starting from "level1" to

"level6"). Indentation is also possible with CSS, so the script

simply doesn't need to know how much to indent levels.

An example style is shown below. For a more fancy look you can check href="http://students.infoiasi.ro/~mishoo/site/calendar.epl" target="_blank"

title="JS calendar, on my personal site. Opens in new window.">this page.

#toc {

float: right;

font-size: 80%;

border: 1px solid #000;

margin: 0px 0px 20px 20px;

padding: 5px;

background: #ddd;

}

#toc .level2 { margin-left: 1em; }

#toc .level3 { margin-left: 2em; }

#toc .level4 { margin-left: 3em; }

#toc .level5 { margin-left: 4em; }

#toc .level6 { margin-left: 5em; }

Putting all together

Usage is simple: just dump all these functions inside a ".js"

file, load that file into your page that needs a TOC and do the simple setup

described in section What we get.

To get a properly indented TOC you should also include a stylesheet, like

the one above. Further customization is possible, i.e. different background

/ color for different TOC levels, or a fancy hover / active style for links

inside #toc, etc.

More about mishoo in his personal home-page.

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.org Evolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.