Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

A Simple Character Entity Chart

Rated 4.51 (Ratings: 39) (Add your rating)

Log in to add a comment
(45 comments so far)

Want more?

 
Picture of aardvark

Adrian Roselli

Member info | Full bio

User since: December 13, 1998

Last login: January 03, 2012

Articles written: 85

For those who use characters in their copy that don't normally appear on the keyboard, it's always been a hit-and-miss game of tracking down the ISO character entity and choosing between the named and numeric value. In fact, so many books on HTML, as well as online resources, have provided the wrong entities for so long, few knew it until the W3C validator started throwing them back as errors. You can't imagine how many em-dashes I had to find and convert from — to the correct —.

The W3C character entity reference is certainly definitive, but not practical. It offers no display of how these entities might be rendered, leaving you to copy and paste based on descriptions until you find the right one.

Well, I got sick of guessing, so I took the W3C documentation and turned it into a handy chart for my staff. Now it sits on our intranet for all to enjoy. Oh... and here... now... as well.

For those who might not normally use character entities, or who've never heard of The Elements of Typographic Style, you might not have been exposed to reasons why you would even care about characters not on your keyboard. I'd suggest you surf over to the A List Apart article The Trouble With EM 'n EN.

By the way, if you need to know what browsers support what character entities, just view this page and look for the characters in the tables. One column shows the character as called by its numeric entity, and one shows it as called by its named entity. This is the chart for HTML 4.x, but also applies to XHTML 1.x.

table.data { border-top: 1px solid #000000 ; border-right: 0px ; border-bottom: 0px ; border-left: 1px solid #000000 ; } table.data th, table.data td { border-top: 0px ; border-right: 1px solid #000000 ; border-bottom: 1px solid #000000 ; border-left: 0px ; padding: 3px 3px 3px 3px ; }

Character entity references in HTML 4

The charts below will allow you to copy and paste the appropriate character and numeric entities for your documents. To be sure a particular browser supports the entities (both named and numeric), simply open your browser to this pages and view the charts. If the character you want doesn't appear in the target browser, it doesn't work (simple, huh?).

What you'll find below is the copy of the character entity specification from the W3C with my tabled versions of the entities interspersed.

24.1 Introduction to character entity references

A character entity reference is an SGML construct that references a character of the document character set.

HTML 4.x, as well as XHTML 1.x, supports several sets of character entity references:

The following sections present the complete lists of character entity references.

ISO 8859-1

24.2 Character entity references for ISO 8859-1 characters

The character entity references in this section produce characters whose numeric equivalents should already be supported by conforming HTML 2.0 user agents. Thus, the character entity reference ÷ is a more convenient form than ÷ for obtaining the division sign (÷).

To support these named entities, user agents need only recognize the entity names and convert them to characters that lie within the repertoire of [ISO88591].

Character 65533 (FFFD hexadecimal) is the last valid character in UCS-2. 65534 (FFFE hexadecimal) is unassigned and reserved as the byte-swapped version of ZERO WIDTH NON-BREAKING SPACE for byte-order detection purposes. 65535 (FFFF hexadecimal) is unassigned.

24.2.1 The list of characters

Character
Entity
Numeric
Entity
Char Num Entity
Description
        non-breaking space
¡ ¡ ¡ ¡ inverted exclamation mark
¢ ¢ ¢ ¢ cent sign
£ £ £ £ pound sign
¤ ¤ ¤ ¤ currency sign
¥ ¥ ¥ ¥ yen sign
¦ ¦ ¦ ¦ broken bar
§ § § § section sign
¨ ¨ ¨ ¨ diaeresis
© © © © copyright sign
ª ª ª ª feminine ordinal indicator
« « « « left-pointing double angle quotation mark
¬ ¬ ¬ ¬ not sign
­ ­ ­ ­ soft hyphen
® ® ® ® registered sign
¯ ¯ ¯ ¯ macron
° ° ° ° degree sign
± ± ± ± plus-minus sign
² ² ² ² superscript two
³ ³ ³ ³ superscript three
´ ´ ´ ´ acute accent
µ µ µ µ micro sign
¶ pilcrow sign
· · · · middle dot
¸ ¸ ¸ ¸ cedilla
¹ ¹ ¹ ¹ superscript one
º º º º masculine ordinal indicator
» » » » right-pointing double angle quotation mark
¼ ¼ ¼ ¼ vulgar fraction one quarter
½ ½ ½ ½ vulgar fraction one half
¾ ¾ ¾ ¾ vulgar fraction three quarters
¿ ¿ ¿ ¿ inverted question mark
À À À À latin capital letter A with grave
Á Á Á Á latin capital letter A with acute
    latin capital letter A with circumflex
à à à à latin capital letter A with tilde
Ä Ä Ä Ä latin capital letter A with diaeresis
Å Å Å Å latin capital letter A with ring above
Æ Æ Æ Æ latin capital letter AE
Ç Ç Ç Ç latin capital letter C with cedilla
È È È È latin capital letter E with grave
É É É É latin capital letter E with acute
Ê Ê Ê Ê latin capital letter E with circumflex
Ë Ë Ë Ë latin capital letter E with diaeresis
Ì Ì Ì Ì latin capital letter I with grave
Í Í Í Í latin capital letter I with acute
Î Î Î Î latin capital letter I with circumflex
Ï Ï Ï Ï latin capital letter I with diaeresis
Ð Ð Ð Ð latin capital letter ETH
Ñ Ñ Ñ Ñ latin capital letter N with tilde
Ò Ò Ò Ò latin capital letter O with grave
Ó Ó Ó Ó latin capital letter O with acute
Ô Ô Ô Ô latin capital letter O with circumflex
Õ Õ Õ Õ latin capital letter O with tilde
Ö Ö Ö Ö latin capital letter O with diaeresis
× × × × multiplication sign
Ø Ø Ø Ø latin capital letter O with stroke
Ù Ù Ù Ù latin capital letter U with grave
Ú Ú Ú Ú latin capital letter U with acute
Û Û Û Û latin capital letter U with circumflex
Ü Ü Ü Ü latin capital letter U with diaeresis
Ý Ý Ý Ý latin capital letter Y with acute
Þ Þ Þ Þ latin capital letter THORN
ß ß ß ß latin small letter sharp s
à à à à latin small letter a with grave
á á á á latin small letter a with acute
â â â â latin small letter a with circumflex
ã ã ã ã latin small letter a with tilde
ä ä ä ä latin small letter a with diaeresis
å å å å latin small letter a with ring above
æ æ æ æ latin small letter ae
ç ç ç ç latin small letter c with cedilla
è è è è latin small letter e with grave
é é é é latin small letter e with acute
ê ê ê ê latin small letter e with circumflex
ë ë ë ë latin small letter e with diaeresis
ì ì ì ì latin small letter i with grave
í í í í latin small letter i with acute
î î î î latin small letter i with circumflex
ï ï ï ï latin small letter i with diaeresis
ð ð ð ð latin small letter eth
ñ ñ ñ ñ latin small letter n with tilde
ò ò ò ò latin small letter o with grave
ó ó ó ó latin small letter o with acute
ô ô ô ô latin small letter o with circumflex
õ õ õ õ latin small letter o with tilde
ö ö ö ö latin small letter o with diaeresis
÷ ÷ ÷ ÷ division sign
ø ø ø ø latin small letter o with stroke
ù ù ù ù latin small letter u with grave
ú ú ú ú latin small letter u with acute
û û û û latin small letter u with circumflex
ü ü ü ü latin small letter u with diaeresis
ý ý ý ý latin small letter y with acute
þ þ þ þ latin small letter thorn
ÿ ÿ ÿ ÿ latin small letter y with diaeresis

Symbols, Mathematical Symbols, and Greek Letters

24.3 Character entity references for symbols, mathematical symbols, and Greek letters

The character entity references in this section produce characters that may be represented by glyphs in the widely available Adobe Symbol font, including Greek characters, various bracketing symbols, and a selection of mathematical operators such as gradient, product, and summation symbols.

To support these entities, user agents may support full [ISO10646] or use other means. Display of glyphs for these characters may be obtained by being able to display the relevant [ISO10646] characters or by other means, such as internally mapping the listed entities, numeric character references, and characters to the appropriate position in some font that contains the requisite glyphs.

When to use Greek entities. This entity set contains all the letters used in modern Greek. However, it does not include Greek punctuation, precomposed accented characters nor the non-spacing accents (tonos, dialytika) required to compose them. There are no archaic letters, Coptic-unique letters, or precomposed letters for Polytonic Greek. The entities defined here are not intended for the representation of modern Greek text and would not be an efficient representation; rather, they are intended for occasional Greek letters used in technical and mathematical works.

24.3.1 The list of characters

Character
Entity
Numeric
Entity
Char Num Entity
Description
ƒ ƒ ƒ ƒ latin small f with hook
Α Α Α Α greek capital letter alpha
Β Β Β Β greek capital letter beta
Γ Γ Γ Γ greek capital letter gamma
Δ Δ Δ Δ greek capital letter delta
Ε Ε Ε Ε greek capital letter epsilon
Ζ Ζ Ζ Ζ greek capital letter zeta
Η Η Η Η greek capital letter eta
Θ Θ Θ Θ greek capital letter theta
Ι Ι Ι Ι greek capital letter iota
Κ Κ Κ Κ greek capital letter kappa
Λ Λ Λ Λ greek capital letter lambda
Μ Μ Μ Μ greek capital letter mu
Ν Ν Ν Ν greek capital letter nu
Ξ Ξ Ξ Ξ greek capital letter xi
Ο Ο Ο Ο greek capital letter omicron
Π Π Π Π greek capital letter pi
Ρ Ρ Ρ Ρ greek capital letter rho
Σ Σ Σ Σ greek capital letter sigma
Τ Τ Τ Τ greek capital letter tau
Υ Υ Υ Υ greek capital letter upsilon
Φ Φ Φ Φ greek capital letter phi
Χ Χ Χ Χ greek capital letter chi
Ψ Ψ Ψ Ψ greek capital letter psi
Ω Ω Ω Ω greek capital letter omega
α α α α greek small letter alpha
β β β β greek small letter beta
γ γ γ γ greek small letter gamma
δ δ δ δ greek small letter delta
ε ε ε ε greek small letter epsilon
ζ ζ ζ ζ greek small letter zeta
η η η η greek small letter eta
θ θ θ θ greek small letter theta
ι ι ι ι greek small letter iota
κ κ κ κ greek small letter kappa
λ λ λ λ greek small letter lambda
μ μ μ μ greek small letter mu
ν ν ν ν greek small letter nu
ξ ξ ξ ξ greek small letter xi
ο ο ο ο greek small letter omicron
π π π π greek small letter pi
ρ ρ ρ ρ greek small letter rho
ς ς ς ς greek small letter final sigma
σ σ σ σ greek small letter sigma
τ τ τ τ greek small letter tau
υ υ υ υ greek small letter upsilon
φ φ φ φ greek small letter phi
χ χ χ χ greek small letter chi
ψ ψ ψ ψ greek small letter psi
ω ω ω ω greek small letter omega
ϑ ϑ ϑ ϑ greek small letter theta symbol
ϒ ϒ ϒ ϒ greek upsilon with hook symbol
ϖ ϖ ϖ ϖ greek pi symbol
• bullet
… horizontal ellipsis
′ primeminutes
″ double prime
‾ overline
⁄ fraction slash
℘ script capital P
ℑ blackletter capital I
ℜ blackletter capital R
™ trade mark sign
ℵ alef symbol
← leftwards arrow
↑ upwards arrow
→ rightwards arrow
↓ downwards arrow
↔ left right arrow
↵ downwards arrow with corner leftwards
⇐ leftwards double arrow
⇑ upwards double arrow
⇒ rightwards double arrow
⇓ downwards double arrow
⇔ left right double arrow
∀ for all
∂ partial differential
∃ there exists
∅ empty set
∇ nabla
∈ element of
∉ not an element of
∋ contains as member
∏ n-ary product
∑ n-ary sumation
− minus sign
∗ asterisk operator
√ square root
∝ proportional to
∞ infinity
∠ angle
∧ logical and
∨ logical or
∩ intersection
∪ union
∫ integral
∴ therefore
∼ tilde operator
≅ approximately equal to
≈ almost equal to
≠ not equal to
≡ identical to
≤ less-than or equal to
≥ greater-than or equal to
⊂ subset of
⊃ superset of
⊄ not a subset of
⊆ subset of or equal to
⊇ superset of or equal to
⊕ circled plus
⊗ circled times
⊥ up tack
⋅ dot operator
⌈ left ceiling
⌉ right ceiling
⌊ left floor
⌋ right floor
⟨ left-pointing angle bracket
⟩ right-pointing angle bracket
◊ lozenge
♠ black spade suit
♣ black club suit
♥ black heart suit
♦ black diamond suit

Markup-Significant and Internationalization Characters

24.4 Character entity references for markup-significant and internationalization characters

The character entity references in this section are for escaping markup-significant characters (these are the same as those in HTML 2.0 and 3.2), for denoting spaces and dashes. Other characters in this section apply to internationalization issues such as the disambiguation of bidirectional text (see the section on bidirectional text for details).

Entities have also been added for the remaining characters occurring in CP-1252 which do not occur in the HTMLlat1 or HTMLsymbol entity sets. These all occur in the 128 to 159 range within the CP-1252 charset. These entities permit the characters to be denoted in a platform-independent manner.

To support these entities, user agents may support full [ISO10646] or use other means. Display of glyphs for these characters may be obtained by being able to display the relevant [ISO10646] characters or by other means, such as internally mapping the listed entities, numeric character references, and characters to the appropriate position in some font that contains the requisite glyphs.

24.4.1 The list of characters

Character
Entity
Numeric
Entity
Char Num Entity
Description
" " " " quotation mark
& & & & ampersand
&lt; < < < less-than sign
&gt; > > > greater-than sign
&OElig; Œ Œ Œ latin capital ligature OE
&oelig; œ œ œ latin small ligature oe
&Scaron; Š Š Š latin capital letter S with caron
&scaron; š š š latin small letter s with caron
&Yuml; Ÿ Ÿ Ÿ latin capital letter Y with diaeresis
&circ; ˆ ˆ ˆ modifier letter circumflex accent
&tilde; ˜ ˜ ˜ small tilde
&ensp; en space
&emsp; em space
&thinsp; thin space
&zwnj; zero width non-joiner
&zwj; zero width joiner
&lrm; left-to-right mark
&rlm; right-to-left mark
&ndash; en dash
&mdash; em dash
&lsquo; left single quotation mark
&rsquo; right single quotation mark
&sbquo; single low-9 quotation mark
&ldquo; left double quotation mark
&rdquo; right double quotation mark
&bdquo; double low-9 quotation mark
&dagger; dagger
&Dagger; double dagger
&permil; per mille sign
&lsaquo; single left-pointing angle quotation mark
&rsaquo; single right-pointing angle quotation mark
&euro; euro sign

A founder of evolt.org, Adrian Roselli (aardvark) is the Senior Usability Engineer at Algonquin Studios, located in Buffalo, New York.

Adrian has years of experience in graphic design, web design and multimedia design, as well as extensive experience in internet commerce and interface design and usability. He has been developing for the World Wide Web since its inception, and working the design field since 1993. Adrian is a founding member, board member, and writer to evolt.org. In addition, Adrian sits on the Digital Media Advisory Committee for a local SUNY college and a local private college, as well as the board for a local charter school.

You can see his brand-spanking-new blog at http://blog.adrianroselli.com/ as well as his new web site to promote his writing and speaking at AdrianRoselli.com

Adrian authored the usability case study for evolt.org in Usability: The Site Speaks for Itself, published by glasshaus. He has written three chapters for the book Professional Web Graphics for Non Designers, also published by glasshaus. Adrian also managed to get a couple chapters written (and published) for The Web Professional's Handbook before glasshaus went under. They were really quite good. You should have bought more of the books.

About Time

Submitted by rifferte on February 17, 2002 - 22:31.

I have suffered with finding/figuring out entity references for way too long! Not to mention I often thought about making a list such as this! Thanks so much - this is definitly going to be a bookmark for us designers/developers! - Ron

login or register to post comments

Thanks.

Submitted by haidary on February 18, 2002 - 04:21.

Thank you for all the work you put into making up these charts. A lot of help with no head ache.

login or register to post comments

Thanks.

Submitted by riffola on February 18, 2002 - 13:01.

Only request I have is that the above list should also contain the Hex codes. Like this character code chart, the problem with it is that it uses the numbers  through Ÿ.

login or register to post comments

mine ain't pretty

Submitted by aardvark on February 18, 2002 - 14:47.

Well, at least I made no assertions that mine was attractive, just that it's a better way to view the W3C specs.

Riffola, the one you link to makes the unfortunate mistake of using entities which won't validate, such as &amp;#151; instead of &amp;#8212; for the em-dash. The chart I've provided above shows only the valid numeric and named entities, so it leaves out entities that are illegal and won't validate. As far as I know, the hex values won't fly on a web page, so someone should tell me the best way to test that.

login or register to post comments

Oh no I think your chart is better.

Submitted by riffola on February 18, 2002 - 18:25.

What I meant was that the chart I linked has the Hex values which would be a good addition to your chart. As I mentioned in my comment, I know that the chart I linked uses the wrong entity numbers (the 129 - 159 range)

As for the use of Hex values, I use them to make URLs which assign values to variables, I am sure some people here do it. And since your chart is great, I figured it would be a nice to add the Hex values to it too. Makes for an even better reference.

login or register to post comments

ahhhh....

Submitted by aardvark on February 18, 2002 - 19:52.

Sorry, I gotcha now, I tend to refer to those as escaped entities. And yeah, that's a darn good idea. I'll work on that.

login or register to post comments

Excellent

Submitted by riffola on February 19, 2002 - 02:10.

Thanks!

login or register to post comments

Cheers!

Submitted by Zaccix on February 19, 2002 - 07:25.

Thanks for sharing the chart with us, it's gone straight into my bookmarks.

One idea would be to add a “quick conversion chart” from invalid characters in the 127-159 range to their valid equivalents, to help people who are correcting their pages. That'd just be an additional bonus, though. The chart's great as-is. Thanks!

login or register to post comments

ahh!

Submitted by aardvark on February 19, 2002 - 08:13.

You're killing me!

Another great idea, one I'll work on. Don't expect to see that stuff this week, but I will get to it. When I update it, I'll leave a comment so y'all get notified.

login or register to post comments

It would be nice if...

Submitted by neoliminal on February 19, 2002 - 10:46.

You could also put the normal entities list here too. I would love to be able to see the whole thing on one page, rather than having to look all over for the different listings. ;-)

login or register to post comments

Excellent!

Submitted by Jay Blanchard on February 19, 2002 - 11:04.

I wish I had had this chart a few weeks ago, it would have saved me some major headaches. Thanks aardvark!

login or register to post comments

what about unicode?

Submitted by htd on February 20, 2002 - 05:28.

I think entities became useless since all major browsers now support UTF-8 and UTF-16 encoded pages. IE, NN6, Opera6, lynx just to name a few. It's easier to create and read unicode encoded pages, at least for people that have a different character set than english. just one thing that doesn't display yet even when correctly encoded is the Euro sign... so there's no way around than to use it.

login or register to post comments

uh oh...

Submitted by r937 on February 21, 2002 - 19:56.

so, how come i can see the character and numeric representations of &spades;, &clubs;, &hearts;, but not &diams;? what's up with that? presumably, it must have something to do with the page's character set, right? but this page uses iso-8859-1, so, um, what happened to the diamond? compare to another web page like http://pixels.pixelpark.com/~koch/chars/math/ which does not declare a character set -- the diamonds show up! at least, they do for me (win2kpro, ie5.5) -- what's going on?

Tormented in Toronto

login or register to post comments

curious

Submitted by aardvark on February 21, 2002 - 20:11.

Netscape 6 doesn't do this, and I can't figure out why IE does it. But I'm seeing the same thing in IE5.5/win2k as well. Opera 6 just fails to display any of the suits.

login or register to post comments

uh oh, part deux

Submitted by r937 on February 21, 2002 - 21:11.

i just fired this page up in win95 ie5 and the first thing that happened was, ie asked me if it was okay to download Uniscribe (whatever that is), but at 126k on a dialup modem, i always go "no thanks" -- is that perhaps what's doing it? so what is Uniscribe, and how's it different from iso-8859-1?

login or register to post comments

entities != encoding

Submitted by htd on February 22, 2002 - 04:20.

html entities are supposed to work no matter what encoding is used. What i tried to say is that for _normal_ characters that are used by languages different from english, it's easier to use utf-8 encoding than to replace e.g. hyphenated chars with entities.

For this list of math signs - opera 6.0 does not display all of them, but mozilla 0.9.8 does ...

login or register to post comments

reference images

Submitted by branko on February 28, 2002 - 06:50.

You write that one reason for making these charts is that other charts "[offer] no display of how these entities might be rendered". Then you go ahead and offer only entities, so that we only know how browser X might render the character entities. Perhaps in a next version you could provide bitmap images too.

login or register to post comments

images

Submitted by aardvark on February 28, 2002 - 08:45.

Only problem is, if I tried to, I couldn't include all the entities in an image anyway, since they don't all render in any of my browsers. It's something to think about, although it won't be soon in coming, since I'd have to find a good way to present it, and ensure it's readable and still fits within the article and site layout.

login or register to post comments

Excellent Reference - Cheers!

Submitted by ShugMiller on March 18, 2002 - 12:51.

I used to have a page of these with a WYSIWIG I got on a mag coverdisc, but I lost it when I got rid of the editor. I searched the web for exactly this kind of thing just last night and got scunnered after the first three pages promised lots but gave little. Thanks a lot, bookmarked this one :¬)

login or register to post comments

re: reference images

Submitted by francois on March 25, 2002 - 03:30.

There is already a reference providing a large number (although not all) of these characters as bitmaps:
http://www.htmlhelp.com/reference/charset/
I'm sure they won't mind you using them.

Quick question: Does anyone know, as a general rule, which format is most widely supported: numeric or character entities?

login or register to post comments

re: reference images

Submitted by aardvark on March 25, 2002 - 09:39.

francois, the main reason I made this chart is to allow developers to test support across all target browsers for their audience. I might say numeric are more widely supported, but if your audience is all using browsers that don't support numeric so well, then my assertion wouldn't work for you.

Your best bet is to view this page in all the browsers that you care about for a project, and make that determination on a case-by-case basis.

login or register to post comments

Form elements in IE 5 for Macintosh

Submitted by borgendorf on March 29, 2002 - 14:07.

Grasping at straws here, but I'd like to be able to Greek characters in form elements. It seems to work "out of the box" in Windows Explorer, but any greek characters I use in form elements on the Mac IE show up as questions marks, even though they print in regular page elements. I can make other sequences (like [“] “) show up in forms, but not things like [Θ] Θ

Has anyone dealt with this before? Alternatively, is there something bigger that I'm missing here? Do I even want to use these types of characters in form fields?

All comments are appreciated (except "Don't use Macs" :)

login or register to post comments

Browser support table

Submitted by francois on April 4, 2002 - 16:09.

In the spirit of Eric Meyer's CSS support table, I've made a start on an entity support table. Here's the preamble. I'm hoping to make the page remotely editable in the near future (if I can get it working the way I want it), which will allow other people to update it themselves.

login or register to post comments

entities support

Submitted by michelv on April 10, 2002 - 04:01.

More browsers support numeric entities, from what I witnessed. For example Netscape 4 does a great job at that, but throw named entities at it and it will barf.

login or register to post comments

Complete Character Entity List?

Submitted by linden on April 25, 2002 - 03:20.

I believe I have found a 13,000+ entries listing of character entity, numeric entity, hexadecimal values, and the rendering of the character on screen. http://www.hclrss.demon.co.uk/unicode/basic_latin.html

login or register to post comments

everybody's gettin into this...

Submitted by r937 on June 10, 2002 - 11:29.

The Web Developer's Virtual Library (part of the internet.com hegemony) has published HTML Special Characters and Browser Compatibility including a breakdown of entities which work in various browsers.

login or register to post comments

Detailed haracter chart from author of Em 'n En

Submitted by psheerin on June 19, 2002 - 18:13.

Adrian,

Thanks for referencing my article (The Trouble With Em 'n En [ALA version] [my version]). I'd also like to point you to a table I compiled that provides some additional detail, plus images:

http://www.petesguide.com/WebStandards/entities/

It's actually three tables, one for each section, because with all of the images and comments it became too large for one page. Your compact tables are great for simply referencing because they are much more compact than mine, which offer enough detail so as to be both wide and tall.

Some of the highlights:

  • For each displayable character there is an image of the glyph.
  • For each character in the DOS, Windows, or Mac character sets that is different in Unicode, I have provided cross references to the correct characters.
  • Each character is rendered using named references, numeric references, and as raw UTF-8.

The first table won't validate because I include the invalid characters to demonstrate what happens.

I could actually add another column for hexadecimal entities (as in A {I think; perhaps the # should not be there} for the letter A), but have not done so.

To answer others' questions, the numeric form is the most reliable method of inserting characters, and works all the way back to 4.0 browsers. Past that, and it's not pretty at all.

login or register to post comments

what else can i say?

Submitted by aardvark on June 20, 2002 - 23:01.

Cool. And by cool, I mean totally sweet.

I think I'll just skip updating this article — there's no way I'm going to be able to do all the glyphs anytime soon anyway. Thanks for the links!

login or register to post comments

Steal away

Submitted by psheerin on June 21, 2002 - 10:39.

(Or should that be “lead away”, since we’re talking about type?)

You’re more than welcome to use the glyphs I made in your table, assuming their size is appropriate (I think mine are 25×25px). I’ll see if I can package them up in a zip file for you. Do you prefer GIF or PNG?

And yes, it did take a while to make them—you really don’t want the migrane duplicating the task will create. ☺

Say, I’m thinking about writing a follow-up article for ala—to cover such sticky-wickets as nice-looking fractions and superscripts. Does anyone have any pet-peeves about typography they just can’t figure out how to solve?

login or register to post comments

Some characters are boxes&#8212;something is wrong

Submitted by psheerin on June 21, 2002 - 21:17.

I’m using IE 6.0 to view this page, and from past experience, I know that it supports all of the characters in the tables above, yet many of them are displayed as boxes.

I’m not quite sure what is wrong, but it could be the use of a non-Unicode character set, or it could be the particular font that evolt uses. I can’t tell for sure, since I can find no way to view the style sheet content used, but my bet is on the latter.

Does anybody know what font evolt uses?

login or register to post comments

character boxen on IE6

Submitted by aardvark on June 22, 2002 - 00:08.

I'm using IE5.5, and resisting the IE6 upgrade. However, I can help you out with the CSS. The CSS file uses font-family: Verdana, Arial, Helvetica, sans-serif; throughout. As far as I know, Verdana has all the characters you need. The page encoding is iso-8859-1 and the language is English. It's most likely something specific to IE6.

login or register to post comments

them image glyphs

Submitted by aardvark on June 22, 2002 - 00:11.

Psheerin, .gif files would be fine. You can send 'em off to the address in my bio, or just email me with a URL. As for your ALA follow-up, i'd love to see a breakdown of support for all the text styling CSS, although that can be pretty tough. How about a side-by-side comparison of how you do in CSS what you just did in Quark / InDesign / Pagemaker / CorelDraw / FreeHand / Illustrator / etc.

login or register to post comments

IE 5.5 doesn't show 'em all either

Submitted by psheerin on June 23, 2002 - 15:36.

I'm looking at this page using IE 5.5 on W2K, now, and the same missing glyphs are here that I saw on IE 6.0

Netscape 6.2 doesn't have any problem—I think what's going on is that one of those fonts doesn't actually have all the characters in it, and that Netscape is able to substitute those glyphs from another font, while IE can't.

And this reminds me that I have a table which attempts to demonstrate character support for different fonts. I'll have to go back and add those fonts that I'm missing, but I think you'll see that IE doesn't do very well on this test, and that NN 6.2 does well.

P.S.
I sent off a zip with the glyphs to you. Hope you find them useful.

login or register to post comments

Stuff for the follow-up article (Pet peeves etc.)

Submitted by francois on June 27, 2002 - 02:56.

It would be nice if the follow-up article can summarise recommendations about which characters are safe to use and which are best left alone. Practical tips on how to more easily use them would also be good. For example Dreamweaver modifications and a Textpad clip library (the former probably made obsolete by DW MX). I speculated that the MS Word word-processing environment is the most congenial for using certain common special characters (curly quotes, dashes and a few others), and that perhaps we just need to find a more reliable route from Word to HTML. As a start, I did a Textpad macro that converts quotes and dashes pasted into the editor from Word. This led to a javascript by gazingus.org that converts quotes and dashes in HTML textareas to the appropriate entities. This can be used in CMSs like Moveable Type, for example at Webgraphics.com. (I have some doubts about the appropriateness of inserting all these entity codes into a plaintext environment, though.) I also tried to do a similar font comparison table to yours, Peter, but that's ended up on the back burner. Sorry for all the self-plugs, but then, you enquired after pet peeves.

login or register to post comments

follow-up article?

Submitted by aardvark on June 27, 2002 - 08:28.

Francois, are you suggesting the follow-up for psheerin, or me? Frankly (pun intended), it sounds like you've already got a good rough outline and resources to support it. I'd suggest you consider writing up for here on evolt.org.

login or register to post comments

Excellent&#8230;

Submitted by mattrix2k on September 11, 2002 - 14:13.

Thanks alot, I find this page indespensible!

login or register to post comments

Quick Script: Code 2 Numeric Entity

Submitted by gkep on September 11, 2002 - 20:33.

Here's a quick PHP script to convert a string to numeric entities.

I use it for hiding email addresses from spambots and even used it to convert the below code so I could post it here!

<form method="post"> <textarea name="str"></textarea> <input type="submit"> </form> <?php for ($i=0; $i < strlen($str); $i++) { $output .= "&#" . ord($str[$i]) . ";"; } print $str; print '<br><br>'.htmlentities($output); ?>

login or register to post comments

browser support for character entities

Submitted by twirlyhair on October 7, 2002 - 13:31.

sitesleuth has a chart of some browser support for some entities. http://www.sitesleuth.com/display_tests/test_display.cfm?test_id=37&cat_id=text.special_characters has the general info with links to the results of testing in different browsers seeing all of the data requires registration -- which is free and only requires email address and password

login or register to post comments

Dude!

Submitted by gnarly on November 8, 2002 - 11:43.

Thanks again for this Aadvark, I keep coming back to this - its awesome. Thats five times today! I needed a lozenge this time... ◊ :-)

login or register to post comments

Visibone, keeps, underlined descenders

Submitted by frankfarm on January 2, 2003 - 20:09.

Thanks, everyone, for the great discussion and collection of useful links.

Visibone Curious Web Characters

I have one to suggest as well: Curious Web Characters at Visibone. Notably, this reference lists numeric entities from 0 to 65535 on a single page.

I find François Jordaan's entity chart particularly useful because it seems to be the only one that answers the question: "If I use — instead of — or &emdash; what browsers will it look bad in?" In other words, what browser compatibility do I lose when I chose this over that?

Pet Peeve: Keeps

For Pete Sheerin (will you see this 6 months after your last post?)—a web typography pet peeve of mine is that there seems to be no way to specify a paragraph-level keep like you can in Quark or InDesign or PageMaker. CSS has orphan, widow, and page-break-before, but no way I could find to say, "Keep this whole paragraph (or div or span) together." white-space:nowrap isn't the same thing&#8212I want the normal wrapping, but just keep together what I want to keep together. This is typically more applicable to printed pages, but it could also apply to the screen, e.g., if multi-column layouts for CSS are approved. (As of January 2003, this spec still seems to be in draft—does anyone know?)

Pet Peeve: Underlined Descenders

One more: When link text contains characters with descenders, browsers underline those characters such that with some fonts and at certain sizes underlined g characters look visually identical to underlined q characters making it ever-so-slightly harder to read. In other cases, the tail intersecting with the underline rule looks ugly to my eye. Removing the underlining with a span tag indicating text-decoration:none works (in most browsers), but I wish I didn't have to manually do this. I only do this for lowercase g characters because this fix doesn't look as good for other characters with descenders—too much underlining is removed and the visual effect becomes worse than with the underlining. Also, this workaround doesn't work perfectly with some filters (e.g., W3C Validator and AltaVista's Babelfish translator) which insert space characters around a span tag where there was none before, so "Programs" in a link would become "Pro g rams" and consequently those words don't get read or translated by the filter properly. From my point of view, they're not filtering properly since I'm using the span to affect only the visual appearance, but I'm not holding my breath for them to change their ways.

To see an example of removing the underline from link text with lowercase g characters, see my site: UCSF School of Pharmacy. This was the best workaround I could come up with. I feel this is an issue that browser vendors need to address, but they have such a hard enough time getting everything else to work according to the standards that I can't bear to give them this one more seemingly insignificant task. In my opinion, the best way for them to handle these is to keep the underlining except for a few pixels to the left and right of any intersection points.

login or register to post comments

Workarounds

Submitted by francois on January 3, 2003 - 03:03.

Thanks for the Visibone reminder — I've long been a fan, but I haven't looked at his entity table yet.

Re keeps — besides paged media (which is addressed in CSS2 as you mention) and multi-column layouts (not yet approved), I can't really see where else wrapping would really be a problem... And one could say that multi-column layouts make more sense in print media than on the web, where vertical space is not in short supply. (Then again, with monitor resolutions ever increasing, the case for multi-column layouts will become stronger.)

Re underlined descenders — I agree the default underline style is extremely ugly and it's annoying that browser manufacturers don't address this in their text rendering. I've seen some people address this problem by using a CSS-border below (which can be positioned lower or higher) instead of text-decoration:underline. Another approach (which I tend to use) is to turn off the underlines on the hover state — that way, if the underline is impeding legibility, the user can always hover over the link to get a clearer view. The link is always "clarified" just before clicking. Your method is certainly a valiant effort in a good cause, but I fear it causes as many problems as it solves. I'm pretty sure it will trip up screen readers (e.g. JAWS) too.

login or register to post comments

Widows and Orphans

Submitted by psheerin on January 3, 2003 - 15:13.

frankfarm—

There are a couple of properties that can do this, but they haven't been implemented well enough—and are hence out of CSS 2.1.

page-break-inside: avoid will keep an entire element together, preventing any line breaks.

orphans: 2 specifies that at least 2 lines of text must be left at the bottom of the page.

widows: 2 specifies that at least 2 lines of text must be left at the top of a page.

There are a few other break-related properties in CSS2, in section 13.3—Page Breaks

login or register to post comments

end of waiting

Submitted by cyberdoc on April 14, 2007 - 23:16.

thank you so much - the waiting is over (respectively the looking for it by opening a HTML-editor - typing - and then look at the code...)

login or register to post comments

Greek letters with diacriticals, HTML entities

Submitted by simplesimon on November 29, 2010 - 19:32.

Wonderful chart, I've bookmarked it. Any chance you could add the Greek letters with accents and breathing marks? We use these for interpreting the Greek letters in (public domain) books that we convert into e-books and post to the network of Project Gutenberg sites. In Canada, Distributed Proofreaders runs a fully utf-8 compliant site www.pgdpcanada.net, so we want to represent all the Greek possibilities. The books we do are very old, and the accents and breathing marks are important for accurate transcription. Thanks simplesimon

login or register to post comments

Thank you for good

Submitted by thepwai on December 23, 2010 - 02:05.

Thank you for good information. I will bookmark this to use on my work it has most benefit.

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.