Unicode Mathematical Alphanumeric Characters

Selected Unicode Mathematical Alphanumeric Characters
Version of Wednesday 8 May 2024.
Dave Barber's other pages.

§1 Introduction. The Unicode standard includes hundreds of letters and digits in special fonts intended to be useful to mathematicians, residing largely in the character block of code point range 1D400-1D7FF. This report organizes a large subset of them, as well as related characters from elsewhere within Unicode.

Unicode Technical Report #25 provides guidance and rationales for use of mathematical characters.

The present author has a report that covers Unicode's rendering of the Arabic digits in various scripts.

§2 Listings of characters. Below are links to pages with detailed information, mainly in table format, about Unicode's mathematical alphanumeric characters.

§2A. Every Unicode character is represented by a number (or occasionally a short sequence of numbers) known as its code point. Code points are the definitive way to identify a character. Glyphs do not suffice for this purpose, because two different fonts might provide substantially different glyphs for the same character; or because two different characters might have glyphs that look similar or identical.

§2B. For each character in these listings, the code point is given in the context of a numeric character reference (NCR) as used in HTML. Although decimal numbers can be used in an NCR, this report always uses hexadecimal numbers, as is the practice in official Unicode documentation.

Further, the tables generally include character entity references (CERs) when they exist. For example, either the NCR ℬ or the CER &Bscr; yields the glyph ℬ (table L-5). Either kind of reference can be copied-and-pasted directly into HTML source code. In non-HTML contexts, this character would often be represented by the symbol U+212C.

§2C. In the early days of Unicode, it was anticipated that 16 bits, equivalent to four hexadecimal digits, would be enough to represent all desired characters. From this arose the custom to write code points with four digits, adding leading zeroes as necessary. For example, the character whose code point is 61 ("a") would be denoted as U+0061. This report, however, omits leading zeroes in the NCRs.

Unicode has grown, and 21 bits are necessary to encode all the currently-defined characters. This means that many of the newest characters require 5 hexadecimal digits for writing their code points, such as U+1D76F (𝝯). Unicode has defined over 149,000 characters, and few available fonts attempt to render them all; and in many fonts characters are rendered inconsistently. The current Unicode architecture can accomodate growth to 32-bit character designations.

§2D. For each entry in a table, shown are:

the character as it appears in the font of the user's browser;
the NCR;
one or more CERs if they exist.

Throughout the tables, the characters are generally displayed in columns which are tinted various colors for ease of reading, mostly in alphabetical or numerical order. In some tables, the plainest version of each character, which is not always suitable for precise mathematical typography, is included in a gray column at the left for comparison.

Yellow cells emphasize characters whose code point is out of numerical sequence; irregularities arise in part because Unicode is an evolving standard. Many categories of characters, once created in part, are subsequently expanded using whatever code numbers are still available at the later time.

In the tables, superscripts and subscripts are shown inside a pair of reverse brackets for comparison. Some other classes of characters receive special contexts explained at the point of use.

Links to detail pages

Latin
letters

table L-1 — Latin letters, sans-serif — sample:

plain text		non-italic non-bold		non-italic bold		italic non-bold		italic bold
A `A`	a `a`	𝖠 `𝖠`	𝖺 `𝖺`	𝗔 `𝗔`	𝗮 `𝗮`	𝘈 `𝘈`	𝘢 `𝘢`	𝘼 `𝘼`	𝙖 `𝙖`

table L-2 — Latin letters, avec-serif — sample:

plain text		italic non-bold		non-italic bold		italic bold
A `A`	a `a`	𝐴 `𝐴`	𝑎 `𝑎`	𝐀 `𝐀`	𝐚 `𝐚`	𝑨 `𝑨`	𝒂 `𝒂`

table L-3 — Latin letters, monospaced — sample:

ordinary		fullwidth
𝙰 `𝙰`	𝚊 `𝚊`	Ａ `Ａ`	ａ `ａ`

table L-4 — Latin letters, enclosed — sample:

Ⓐ
Ⓐ ⓐ
ⓐ 🅐
🅐 🄰
🄰 🅰
🅰 🄐
🄐 ⒜
⒜

table L-5 — Latin letters, miscellaneous — sample:

double- struck		script non-bold		script bold		fraktur non-bold		fraktur bold		outline Unicode 16.0
𝔸 `𝔸` `&Aopf;`	𝕒 `𝕒` `&aopf;`	𝒜 `𝒜` `&Ascr;`	𝒶 `𝒶` `&ascr;`	𝓐 `𝓐`	𝓪 `𝓪`	𝔄 `𝔄` `&Afr;`	𝔞 `𝔞` `&afr;`	𝕬 `𝕬`	𝖆 `𝖆`	𜳖 `𜳖`

Greek
letters

table G-1 — Greek letters, main — sample:

plain text		avec-serif non-italic bold		avec-serif italic non-bold		avec-serif italic bold		sans-serif non-italic bold		sans-serif italic bold
Α `Α` `Α`	α `α` `α`	𝚨 `𝚨`	𝛂 `𝛂`	𝛢 `𝛢`	𝛼 `𝛼`	𝜜 `𝜜`	𝜶 `𝜶`	𝝖 `𝝖`	𝝰 `𝝰`	𝞐 `𝞐`	𝞪 `𝞪`

table G-2 — Greek letters, miscellaneous — sample:

plain text		avec-serif non-italic bold		avec-serif italic non-bold		avec-serif italic bold		sans-serif non-italic bold		sans-serif italic bold
∇ `∇` `∇`	∂ `∂` `∂`	𝛁 `𝛁`	𝛛 `𝛛`	𝛻 `𝛻`	𝜕 `𝜕`	𝜵 `𝜵`	𝝏 `𝝏`	𝝯 `𝝯`	𝞉 `𝞉`	𝞩 `𝞩`	𝟃 `𝟃`

Numerals

table N-1 — Numerals, general — sample:

plain text	sans-serif non-bold	sans-serif bold	avec-serif bold	ordinary monospace	fullwidth monospace	double- struck	superscript	subscript	outline Unicode 16.0	segmented Unicode 16.0
0 `0`	𝟢 `𝟢`	𝟬 `𝟬`	𝟎 `𝟎`	𝟶 `𝟶`	０ `０`	𝟘 `𝟘`	]⁰[ `⁰`	]₀[ `₀`	𜳰 `𜳰`	🯰 `🯰`

Discussion of figure width

table N-2 — Enclosed numerals — sample:

➀
➀ ⓵
⓵ ❶
❶ ➊
➊ ⓫
⓫ ⑴
⑴ ⑾
⑾

table N-3 — Roman numerals — sample:

value	plain text		Roman			value	chars	value	chars
1	I `I`	i `i`	Ⅰ `Ⅰ`	ⅰ `ⅰ`		500	ⅠↃ	1,000	ⅭⅠↃ

table N-4 — Greek numerals — sample:

1 Α
Α
Α α
α
α 10 Ι
Ι
Ι ι
ι
ι 100 Ρ
Ρ
Ρ ρ
ρ
ρ

table N-5 — Cyrillic numerals — sample:

1 А
А
&Acy; 10 І
І
&Iukcy; Ї
Ї
&YIcy; 100 Р
Р
&Rcy;

table N-6 — Fractions — sample:

½
½
½
&half; ⅓
⅓
&frac13; ¼
¼
¼ ⅕
⅕
&frac15; ⅙
⅙
&frac16; ⅐
⅐ ⅛
⅛
&frac18; ⅑
⅑ ⅒
⅒ ⅟
⅟

Combining
characters

table series D — Diacritics — sample:

`Ẏẏ`	`Ẏẏ`	`Ỵỵ`	`Ỵỵ`
`Ÿÿ`	`Ÿÿ`	`Y̤y̤`	`Y̤y̤`

§3 Other scripts. Latin and Greek letters receive extensive support for the font variations required by mathematicians: sans-serif, avec-serif, italic, bold, et cetera. By contrast, Cyrillic letters, which resemble Greek letters, receive little accomodation for the needs of mathematicians. Cyrillic_numerals, which resemble Greek numerals, get only basic Unicode coverage.

Benetia et alii discuss Arabic mathematical symbols in Unicode.

Two systems of Braille notation for mathematics are Nemeth and Gardner–Salinas. Unicode Braille defines 256 dot patterns, but does not specify what they might mean.

Unicode has special mathematical characters for the first four letters of the Hebrew alphabet, as below. No rationale is evident for omitting the rest of the alphabet. Hebrew is read from right to left.

Table H-1 Four Unicode Hebrew letters
plain text	ד `ד`	ג `ג`	ב `ב`	א `א`
math use	ℸ `ℸ` `&daleth;`	ℷ `ℷ` `&gimel;`	ℶ `ℶ` `&beth;`	ℵ `ℵ` `&aleph;`

§4 What Unicode does not do. As Unicode is a character set, and not a markup language, it does not attempt to provide comprehensive support for the typography of superscripts, subscripts, and fractions. To provide an example of what might be done in a markup language, however, here are some ways of effecting these in HTML:

HTML source code	result
`base<sup>superscript</sup>`	base^superscript
`base<sub>subscript</sub>`	base_subscript
`<sup>numer</sup>&frasl;<sub>denom</sub>`	^numer⁄_denom

HTML allows superscripts and subscripts (hence numerators and denominators) to be nested, although the results may be difficult to read correctly. In fact, HTML allows almost anything to be nested when it makes logical sense.

Unfortunately, there is no guarantee that the three following ways of rendering the number one-half will yield identical glyphs:

HTML source code	result
`<sup>1</sup>&frasl;<sub>2</sub>`	¹⁄₂
`½`	½
`⅟<sub>2</sub>`	⅟₂

§5 Documentation of the Unicode standard.

Unicode blocks pertinent to this report

code points	subject
`0000- 007F`	Basic Latin
`0080- 00FF`	Latin-1 Supplement
`0100- 017F`	Latin Extended-A
`0180- 024F`	Latin Extended-B
`0250- 02AF`	IPA Extensions
`0300- 036F`	Combining Diacritical Marks
`0370- 03FF`	Greek and Coptic
`0400- 04FF`	Cyrillic
`0590- 05FF`	Hebrew
`1AB0- 1AFF`	Combining Diacritical Marks Extended
`1D00- 1D7F`	Phonetic Extensions
`1DC0- 1DFF`	Combining Diacritical Marks Supplement
`2000- 206F`	General Punctuation
`2070- 209F`	Superscripts and Subscripts
`20D0- 20FF`	Combining Diacritial Marks for Symbols
`2100- 214F`	Letterlike Symbols
`2150- 218F`	Number Forms
`2200- 22FF`	Mathematical Operators

code points	subject
`2300- 23FF`	Miscellaneous Technical
`2460- 24FF`	Enclosed Alphanumerics
`25A0- 25FF`	Geometric Shapes
`2700- 27BF`	Dingbats
`2800- 28FF`	Braille Patterns
`2C00- 2C5F`	Glagolitic
`3200- 32FF`	Enclosed CJK Letters and Months
`A640- A69F`	Cyrillic Extended-B
`A720- A7FF`	Latin Extended-D
`FB00- FB4F`	Alphabetic Presentation Forms
`FE20- FE2F`	Combining half marks
`FF00- FFEF`	Halfwidth and Fullwidth Forms
`10140-1018F`	Ancient Greek Numbers
`1D400-1D7FF`	Mathematical Alphanumeric Symbols
`1EE00-1EEFF`	Arabic Mathematical Alphabetic Symbols
`1F100-1F1FF`	Enclosed Alphanumeric Supplement
`1FB00-1FBFF`	Symbols for Legacy Computing

There are hundreds of non-pertinent blocks.

Although the above documents do a throrough job of defining the standard, they are not always convenient for people seeking particular characters. Many web sites, including the present one, have sprung up to make such character searches easier; notable among them is Compart. Also, Wikibooks has a handy listing of mathematical characters.

§6 Miscellaneous.

Opinion.

Unicode is a very good thing. Of course it is not perfect, in part because a huge number of people have been involved in its development, and they have often had diverging views. Also, new insights occasionally emerge that tend to alter its direction of progress. Still, Unicode fully deserves the near-ubiquity it has achieved throughout the world's computers. In particular, Unicode's UTF-8 compression standard has been a great success.

Colophon.

In the original work of year 2022, the pages containing the tables in the L-, G-, and N- series were generated by a custom-written C++ program, which also generated the samples that appear on this page. This was done in a Mac Xcode environment. Other parts of this page were created directly with a text editor. Then they were all combined with a Unix script. This indirect approach was made necessary in order to manage the many characters to be treated, and the need to devise a consistent format for organizing them, which in turn required seemingly endless tinkering.

In the extensive revisions of year 2024, design of the project had stabilized, simplifying further development. At this point the C++ program was discarded, with further changes being made directly in the HTML files using the Xcode text editor, which is HTML-aware.