Main Page Content
Automatic Toc Generation
Many one-page HTML documents need to have a simple Table of
writing the document using a title="What You See Is What You Get">WYSIWYG editor that knows
to generate a TOC. This has the unacceptable disadvantage that the generated code is usually atrocious and in some cases it does not even validate.writing the document without a TOC and have the TOC automatically
generated through the wonders of JavaScript and the DOM.
In this article we will discuss the second method. So the motto is:
let the browser do it.What we get
First, let's just assume that we already have the script and see how are
we going to use it. Basically, we want to be able to write the BODY taglike this:And somewhere inside the BODY content, where we want the TOC to appear,
we should have the following:This DIV will be known as the parent of the TOC.
With this setup, when a browser which haa proper support for JavaScript and the
DOM loads our page, itwill automatically generate the TOC inside the parent.Problem overview
The generate_TOC
function should walk through the document
The script will not handle styling. Styling can — and should
— be done using external title="Cascading Style Sheet">CSS, knowing the fact that theparent DIV has the ID "toc" and some other small details that wewill discuss later.Tasks
Here is the complete definition of tasks involved in TOC generation:
Retrieving text. We will create a function
that retrieves the text from an element. While this function should work with any element type, we will only use it for retrieving text from headlines.Finding the headlines. We will create a
list of objects that contain a reference to the headline element, the text inside it and an integer variable representing the TOC item level (for proper indentation).Creating the TOC. For all items retrieved at
step 2 we will:Check if the headline element has an ID assigned. If not we will
add an automatically generated ID. This will allow us to create a link that targets the headline.Create a DIV element (D) that has the class
"levelx", where x is the level of the TOC element (1, 2, ..., 6). This will help us style the TOC using CSS.Create a link element that displays the text retrieved at step
2 for the current item and has the ID of the current headline as its HREF (remember, if the ID did not exist then we have generated it). Append this link to the DIV created above.Finally, append the (D) DIV into the TOC's
parent DIV.
The Code
Now, for the most interesting part of this article, the code.
The text retrieval function
For retrieving text from a simple heading like
"<H2>Overview</H2>
" we could use simplecode like "text = element.firstChild.data;
".However, we want to be able to retrieve code even from complicated headings,like "<H2><b>This</b> is a<em>complicated</em><u>heading</u></H2>
". They just might containmore than simple text, therefore we need to have a recursive function thatwalks through all children elements and accumulates every little piece oftext it finds. The code is given below.function H_getText(el) { var text = ""; for (var i = el.firstChild; i != null; i = i.nextSibling) { if (i.nodeType == 3 /* Node.TEXT_NODE, IE doesn't speak constants */) text += i.data; else if (i.firstChild != null) text += H_getText(i); } return text;}
The only parameter to this function, el, is a reference to the
HTML element from which we need to extract the text. We can get one, forinstance, usingdocument.getElementById()
.A simple JavaScript object
JavaScript arrays are very powerful tools. However, they can only
contain one value for each element. If we need to store morevalues inside, say,a[1]
, then we need a function that createsa new object containing all of these values and then we can store thatobject (or more correctly, a reference to it) ina[1]
.Note that another good approach would be to just store another
Array
object, instead of a customized object. But using anobject customized for our problem is more elegant, not to mention allowing bettercode readability.Our object needs three properties: a reference to the headline element
(el), the text inside it (text) and the item's TOC level(level). The code needed to create it is below.function TOC_EL(el, text, level) { this.element = el; this.text = text; this.level = level;}
Retrieving headlines
A very simple solution for this problem is to use
document.getElementsByTagName("*")
, which should return allelements from the HTML document. Then, for each element, we compare thetagName
attribute with "H1", "H2", etc.However, this method does not work for Internet Explorer 5.0 because the
browser doesn't properly understand the"*"
parameter andreturns no elements. Therefore we created the following function to dealwith this problem. The function returns an Array
objectcontaining all the headlines found in the document, as objects defined inthe previous section (thus also having the title and the TOC level). Itshould be called with a reference to the <BODY>
element.function getHeadlines(el) { var l = new Array; var rx = /[hH]([1-6])/; // internal recursive function that scans the DOM tree var rec = function (el) { for (var i = el.firstChild; i != null; i = i.nextSibling) { if (i.nodeType == 1 /* Node.ELEMENT_NODE */) { if (rx.exec(i.tagName)) l[l.length] = new TOC_EL(i, H_getText(i), parseInt(RegExp.$1)); rec(i); } } } rec(el); return l;}
Some notes:
getHeadlines()
contains a nested function. This function doesn't have a name, but we "assign" it to the variablerec
so that we can call it. This kind of construct is very useful to avoid putting too many parameters or local variables inside the recursive function — it has access to variables defined in the containing function.- We are using a RegExp to test if the current element is a headline, that is, if it's
tagName
matches any of Hx, where x is 1 .. 6. - A somewhat confusing construct is used to push an element into the array: "
l[l.length] = ...
". That is because IE5 does not have thepush
method in theArray
object. I recently found a target="_blank" title="Remedial JavaScript. Opens in new window."> good article that shows how to solve such problems at a more general level.
The generate_TOC
function
This function is the main entry point into this script. It is intended
to be called from the<body onload="">
handler with the ID ofsome <div>
element (or anything else) that will be theparent of TOC.It will construct the list of all headings, using
getHeadlines()
, then iterate through it and create elementsinside the parent for each headline. The created elements willconsist of a <div>
having the class name"levelX", where X is the level of indentation (1 to6) of that TOC entry. This will allow easy customization through CSS.function generate_TOC(parent_id) { var parent = document.getElementById(parent_id); var hs = getHeadlines(document.getElementsByTagName("body")[0]); for (var i = 0; i < hs.length; ++i) { var hi = hs[i]; var d = document.createElement("div"); if (hi.element.id == "") hi.element.id = "gen" + i; var a = document.createElement("a"); a.href = "#" + hi.element.id; a.appendChild(document.createTextNode(hi.text)); d.appendChild(a); d.className = "level" + hi.level; parent.appendChild(d); }}
For example, following is the HTML code that the above function generates
for this page. As a side note, I used a handy feature of href="http://mozilla.org" title="The Browser's Home Page"target="_blank">Mozilla: if you select something on the page then rightclick, in the menu that appears you see this option: "View SelectionSource". It shows the HTML code for the selected block, as it is rightat that moment — therefore, it also shows code generated byJavaScript-s.<div id="toc"> <div class="level1"><a href="#gen0">Automatic TOC Generation</a></div> <div class="level2"><a href="#wwg">What we get</a></div> <div class="level2"><a href="#gen2">Problem overview</a></div> <div class="level3"><a href="#gen3">Tasks</a></div> <div class="level2"><a href="#gen4">The Code</a></div> <div class="level3"><a href="#gettext">The text retrieval function</a></div> <div class="level3"><a href="#gen6">A simple JavaScript object</a></div> <div class="level3"><a href="#getheadlines">Retrieving headlines</a></div> <div class="level3"><a href="#generatetoc">The generate_TOC function</a></div> <div class="level2"><a href="#styling">Styling and indentation</a></div> <div class="level2"><a href="#gen10">Putting all together</a></div></div>
Styling and indentation
We can heavily style the TOC using external CSS and just knowing the ID
of the parent DIV, and the fact that items of different levels willhave different classes (starting from "level1" to"level6"). Indentation is also possible with CSS, so the scriptsimply doesn't need to know how much to indent levels.An example style is shown below. For a more fancy look you can check href="http://students.infoiasi.ro/~mishoo/site/calendar.epl" target="_blank"
title="JS calendar, on my personal site. Opens in new window.">this page.#toc { float: right; font-size: 80%; border: 1px solid #000; margin: 0px 0px 20px 20px; padding: 5px; background: #ddd;}#toc .level2 { margin-left: 1em; }#toc .level3 { margin-left: 2em; }#toc .level4 { margin-left: 3em; }#toc .level5 { margin-left: 4em; }#toc .level6 { margin-left: 5em; }
Putting all together
Usage is simple: just dump all these functions inside a ".js"
file, load that file into your page that needs a TOC and do the simple setupdescribed in section What we get.To get a properly indented TOC you should also include a stylesheet, like
the one above. Further customization is possible, i.e. different background/ color for different TOC levels, or a fancy hover / active style for linksinside #toc, etc.