§1 Introduction. The Unicode standard includes hundreds of letters and digits in special fonts intended to be useful to mathematicians, residing largely in the character block of code point range 1D400-1D7FF. This report organizes a large subset of them, as well as related characters from elsewhere within Unicode.
Unicode Technical Report #25 provides guidance and rationales for use of mathematical characters.
The present author has a report that covers Unicode's rendering of the Arabic digits in various scripts.
§2 Listings of characters. Below are links to pages with detailed information, mainly in table format, about Unicode's mathematical alphanumeric characters.
§2A. Every Unicode character is represented by a number (or occasionally a short sequence of numbers) known as its code point. Code points are the definitive way to identify a character. Glyphs do not suffice for this purpose, because two different fonts might provide substantially different glyphs for the same character; or because two different characters might have glyphs that look similar or identical.
§2B. For each character in these listings, the code point is given in the context of a numeric character reference (NCR) as used in HTML. Although decimal numbers can be used in an NCR, this report always uses hexadecimal numbers, as is the practice in official Unicode documentation.
Further, the tables generally include character entity references (CERs) when they exist. For example, either the NCR ℬ or the CER ℬ yields the glyph ℬ (table L-5). Either kind of reference can be copied-and-pasted directly into HTML source code. In non-HTML contexts, this character would often be represented by the symbol U+212C.
§2C. In the early days of Unicode, it was anticipated that 16 bits, equivalent to four hexadecimal digits, would be enough to represent all desired characters. From this arose the custom to write code points with four digits, adding leading zeroes as necessary. For example, the character whose code point is 61 ("a") would be denoted as U+0061. This report, however, omits leading zeroes in the NCRs.
Unicode has grown, and 21 bits are necessary to encode all the currently-defined characters. This means that many of the newest characters require 5 hexadecimal digits for writing their code points, such as U+1D76F (𝝯). Unicode has defined over 149,000 characters, and few available fonts attempt to render them all; and in many fonts characters are rendered inconsistently. The current Unicode architecture can accomodate growth to 32-bit character designations.
§2D. For each entry in a table, shown are:
Throughout the tables, the characters are generally displayed in columns which are tinted various colors for ease of reading, mostly in alphabetical or numerical order. In some tables, the plainest version of each character, which is not always suitable for precise mathematical typography, is included in a gray column at the left for comparison.
Yellow cells emphasize characters whose code point is out of numerical sequence; irregularities arise in part because Unicode is an evolving standard. Many categories of characters, once created in part, are subsequently expanded using whatever code numbers are still available at the later time.
In the tables, superscripts and subscripts are shown inside a pair of reverse brackets for comparison. Some other classes of characters receive special contexts explained at the point of use.
Links to detail pages | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Latin letters | table L-1 — Latin letters, sans-serif — sample:
| ||||||||||||||||||||||||
table L-2 — Latin letters, avec-serif — sample:
| |||||||||||||||||||||||||
table L-3 — Latin letters, monospaced — sample:
| |||||||||||||||||||||||||
table L-4 — Latin letters, enclosed — sample:
| |||||||||||||||||||||||||
table L-5 — Latin letters, miscellaneous — sample:
| |||||||||||||||||||||||||
Greek letters | table G-1 — Greek letters, main — sample:
| ||||||||||||||||||||||||
table G-2 — Greek letters, miscellaneous — sample:
| |||||||||||||||||||||||||
Numerals | table N-1 — Numerals, general — sample:
| ||||||||||||||||||||||||
table N-2 — Enclosed numerals — sample:
| |||||||||||||||||||||||||
table N-3 — Roman numerals — sample:
| |||||||||||||||||||||||||
table N-4 — Greek numerals — sample:
| |||||||||||||||||||||||||
table N-5 — Cyrillic numerals — sample:
| |||||||||||||||||||||||||
table N-6 — Fractions — sample:
| |||||||||||||||||||||||||
Combining characters | table series D — Diacritics — sample:
|
§3 Other scripts. Latin and Greek letters receive extensive support for the font variations required by mathematicians: sans-serif, avec-serif, italic, bold, et cetera. By contrast, Cyrillic letters, which resemble Greek letters, receive little accomodation for the needs of mathematicians. Cyrillic_numerals, which resemble Greek numerals, get only basic Unicode coverage.
Benetia et alii discuss Arabic mathematical symbols in Unicode.
Two systems of Braille notation for mathematics are Nemeth and Gardner–Salinas. Unicode Braille defines 256 dot patterns, but does not specify what they might mean.
Unicode has special mathematical characters for the first four letters of the Hebrew alphabet, as below. No rationale is evident for omitting the rest of the alphabet. Hebrew is read from right to left.
Table H-1
Four Unicode Hebrew letters | ||||
plain text | ד ד | ג ג | ב ב | א א |
---|---|---|---|---|
math use | ℸ ℸ ℸ | ℷ ℷ ℷ | ℶ ℶ ℶ | ℵ ℵ ℵ |
§4 What Unicode does not do. As Unicode is a character set, and not a markup language, it does not attempt to provide comprehensive support for the typography of superscripts, subscripts, and fractions. To provide an example of what might be done in a markup language, however, here are some ways of effecting these in HTML:
HTML source code | result |
---|---|
base<sup>superscript</sup> | basesuperscript |
base<sub>subscript</sub> | basesubscript |
<sup>numer</sup>⁄<sub>denom</sub> | numer⁄denom |
HTML allows superscripts and subscripts (hence numerators and denominators) to be nested, although the results may be difficult to read correctly. In fact, HTML allows almost anything to be nested when it makes logical sense.
Unfortunately, there is no guarantee that the three following ways of rendering the number one-half will yield identical glyphs:
HTML source code | result |
---|---|
<sup>1</sup>⁄<sub>2</sub> | 1⁄2 |
½ | ½ |
⅟<sub>2</sub> | ⅟2 |
§5 Documentation of the Unicode standard.
Unicode blocks pertinent to this report | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There are hundreds of non-pertinent blocks. |
Although the above documents do a throrough job of defining the standard, they are not always convenient for people seeking particular characters. Many web sites, including the present one, have sprung up to make such character searches easier; notable among them is Compart. Also, Wikibooks has a handy listing of mathematical characters.
§6 Miscellaneous.
Opinion.
Unicode is a very good thing. Of course it is not perfect, in part because a huge number of people have been involved in its development, and they have often had diverging views. Also, new insights occasionally emerge that tend to alter its direction of progress. Still, Unicode fully deserves the near-ubiquity it has achieved throughout the world's computers. In particular, Unicode's UTF-8 compression standard has been a great success.
Colophon.
In the original work of year 2022, the pages containing the tables in the L-, G-, and N- series were generated by a custom-written C++ program, which also generated the samples that appear on this page. This was done in a Mac Xcode environment. Other parts of this page were created directly with a text editor. Then they were all combined with a Unix script. This indirect approach was made necessary in order to manage the many characters to be treated, and the need to devise a consistent format for organizing them, which in turn required seemingly endless tinkering.
In the extensive revisions of year 2024, design of the project had stabilized, simplifying further development. At this point the C++ program was discarded, with further changes being made directly in the HTML files using the Xcode text editor, which is HTML-aware.