Unicode defines a huge set of characters for use with computer displays and printers. Some characters denote entire words (as in many Asian languages), alphabetic letters (as in many European languages), digits, and punctuation; while other characters have sundry purposes. This report examines the Unicode characters in two categories: box drawing and block elements, which here are for convenience aggregated as geometric.
The hardware and software of most modern general-purpose computers support control of individual display pixels in a rectangular array known as a raster, and this capability is practically required for a graphical user interface. However, embedded computers often entail relatively simple output, and might perform satisfactorily with a simpler display constructed as a two-dimensional array of perhaps a thousand characters rather than a million pixels. Also, many computers built before 1990 used character-based displays because of the higher costs of hardware at that time; a standard size was 25 lines of 80 characters each, using a monospaced font.
One symbol can be displayed within each character cell, which is typically implemented as a small array of pixels. On some machines, the height of a pixel exceeds its width; this design reduces the number of pixels while preserving the taller-than-wide aspect exhibited by most letters and digits in traditional typography. However, there is a tendency in modern equipment to make pixels square.
A practical minimum for text in the English language is a character size of eight pixels wide by eight pixels high; some other languages need more pixels for legibility. In the following example, which uses tall pixels, each character resides in such an eight-by-eight region, but the rightmost column and bottommost row of pixels are left unused to provide spacing between characters. Hence the effective size is seven by seven. For this illustration, pixels not used by the characters are rendered in a checkerboard pattern to show the extent of each.
If only Arabic numerals are to be shown, a width of four pixels and a height of six pixels (with a usable size of three by five) will suffice. In the following example, square pixels happen to be used.
Character-based displays can be very efficient because the geometric details of how to draw each character are stored in the display device itself, and are compact: the pattern for each character would require no more than eight bytes of storage for the alphabetic example above, and three bytes for the numeric. To make a character appear, the controlling computer need send merely a brief code, often only eight bits, and then the display figures out how to draw the character. A tradeoff is that the user often has limited choice of factors that affect the appearance of the font, such as face, size or color.
More generally, nothing in Unicode requires that the display be organized into pixels; some other technology could instead be used as long as the characters have a suitable appearance. For instance vector-based displays have been built, but are rare today; for hard copy, plotters are more frequently seen.
The earliest devices generally had light-colored (white and green were common) characters on a dark background. The IBM 3279 display, frequently seen in the 1980s, was among the first to give a choice of colors, producing red, green, blue, yellow, cyan, magenta and white figures on a black background. Each character could be a different color, but colors were never mixed within a character.
Many displays have offered:
Meanwhile, printers will almost always yield dark characters on a paper-white background.
By the early 1980's, many desktop computers were extending their character sets by adding a variety of geometric characters. The most basic were straight lines, right angles, tees and crosses, but many other shapes were included. These figures could be arranged on the display to divide the screen into sections or to draw simple pictures. Early examples are PETSCII, code page 437. Also Videotex, an early communications service, defined some geometric characters.
Motivating the establishment of Unicode was a lack of standardization among manufacturers' character sets, not only in the geometric characters but also in Latin alphabet letters with diacritical marks (such as accents and umlauts), and in non-Latin alphabets (such as Greek and Cyrillic). Moreover in some languages (such as Chinese) words cannot be decomposed into sequences of letters, but rather are an integral symbols; Unicode addresses these also.
Here is an demonstration of why a monospaced font is essential when using these geometic characters. This first screenshot has twelve characters per line, and is rendered in the font Deja Vu Sans Mono, wherein all characters are the same width:
In the well-known Times New Roman font each character is of its natural width, which on the one hand makes for easy reading of large blocks of prose, but which on the other hand makes for difficulty in aligning columns of text. This font is used in the next screenshot, where alignment of the box drawing characters fails severely even though precisely the same characters are used as the previous screenshot:
Adding or deleting characters gives at best an approximate fix:
A display that is arranged as a grid of characters automatically gives monospacing and averts this problem.
Unicode's box drawing characters and block elements, in table one below, resemble many of the early geometric characters. Exhibited are sample glyphs for the 160 Unicode characters numbered 2500-259F (in hexadecimal) or 9472-9631 (in decimal). Although the standards (box drawing, block elements) do indicate the general appearance of the glyphs, implementors are granted flexibility in rendering them.
For each character of table one, four items appear:
The gif images are necessary for two reasons. First, many fonts support Unicode poorly. Second, some alternative glyphs that are not found in Unicode will appear later in the discussion, and the gifs allow all to be drawn in the same format.
In the charts of the Unicode standard the glyphs are shown as black on a white background, but this report uses instead a gray background to show how the figure meets the cell boundary. In the actual display, each pixel is either fully on or fully off, because Unicode does not support intermediate levels of brightness. In order to make details clearer, the gifs below (mostly 24 × 48 pixels) are considerably larger than what is usually found on computer displays.
Table one. | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Box drawing | 2500 9472 ─ | 2501 9473 ━ | 2502 9474 │ | 2503 9475 ┃ | 2504 9476 ┄ | 2505 9477 ┅ | 2506 9478 ┆ | 2507 9479 ┇ | 2508 9480 ┈ | 2509 9481 ┉ | 250A 9482 ┊ | 250B 9483 ┋ | 250C 9484 ┌ | 250D 9485 ┍ | 250E 9486 ┎ | 250F 9487 ┏ |
2510 9488 ┐ | 2511 9489 ┑ | 2512 9490 ┒ | 2513 9491 ┓ | 2514 9492 └ | 2515 9493 ┕ | 2516 9494 ┖ | 2517 9495 ┗ | 2518 9496 ┘ | 2519 9497 ┙ | 251A 9498 ┚ | 251B 9499 ┛ | 251C 9500 ├ | 251D 9501 ┝ | 251E 9502 ┞ | 251F 9503 ┟ | |
2520 9504 ┠ | 2521 9505 ┡ | 2522 9506 ┢ | 2523 9507 ┣ | 2524 9508 ┤ | 2525 9509 ┥ | 2526 9510 ┦ | 2527 9511 ┧ | 2528 9512 ┨ | 2529 9513 ┩ | 252A 9514 ┪ | 252B 9515 ┫ | 252C 9516 ┬ | 252D 9517 ┭ | 252E 9518 ┮ | 252F 9519 ┯ | |
2530 9520 ┰ | 2531 9521 ┱ | 2532 9522 ┲ | 2533 9523 ┳ | 2534 9524 ┴ | 2535 9525 ┵ | 2536 9526 ┶ | 2537 9527 ┷ | 2538 9528 ┸ | 2539 9529 ┹ | 253A 9530 ┺ | 253B 9531 ┻ | 253C 9532 ┼ | 253D 9533 ┽ | 253E 9534 ┾ | 253F 9535 ┿ | |
2540 9536 ╀ | 2541 9537 ╁ | 2542 9538 ╂ | 2543 9539 ╃ | 2544 9540 ╄ | 2545 9541 ╅ | 2546 9542 ╆ | 2547 9543 ╇ | 2548 9544 ╈ | 2549 9545 ╉ | 254A 9546 ╊ | 254B 9547 ╋ | 254C 9548 ╌ | 254D 9549 ╍ | 254E 9550 ╎ | 254F 9551 ╏ | |
2550 9552 ═ | 2551 9553 ║ | 2552 9554 ╒ | 2553 9555 ╓ | 2554 9556 ╔ | 2555 9557 ╕ | 2556 9558 ╖ | 2557 9559 ╗ | 2558 9560 ╘ | 2559 9561 ╙ | 255A 9562 ╚ | 255B 9563 ╛ | 255C 9564 ╜ | 255D 9565 ╝ | 255E 9566 ╞ | 255F 9567 ╟ | |
2560 9568 ╠ | 2561 9569 ╡ | 2562 9570 ╢ | 2563 9571 ╣ | 2564 9572 ╤ | 2565 9573 ╥ | 2566 9574 ╦ | 2567 9575 ╧ | 2568 9576 ╨ | 2569 9577 ╩ | 256A 9578 ╪ | 256B 9579 ╫ | 256C 9580 ╬ | 256D 9581 ╭ | 256E 9582 ╮ | 256F 9583 ╯ | |
2570 9584 ╰ | 2571 9585 ╱ | 2572 9586 ╲ | 2573 9587 ╳ | 2574 9588 ╴ | 2575 9589 ╵ | 2576 9590 ╶ | 2577 9591 ╷ | 2578 9592 ╸ | 2579 9593 ╹ | 257A 9594 ╺ | 257B 9595 ╻ | 257C 9596 ╼ | 257D 9597 ╽ | 257E 9598 ╾ | 257F 9599 ╿ | |
Block elements | 2580 9600 ▀ | 2581 9601 ▁ | 2582 9602 ▂ | 2583 9603 ▃ | 2584 9604 ▄ | 2585 9605 ▅ | 2586 9606 ▆ | 2587 9607 ▇ | 2588 9608 █ | 2589 9609 ▉ | 258A 9610 ▊ | 258B 9611 ▋ | 258C 9612 ▌ | 258D 9613 ▍ | 258E 9614 ▎ | 258F 9615 ▏ |
2590 9616 ▐ | 2591 9617 ░ | 2592 9618 ▒ | 2593 9619 ▓ | 2594 9620 ▔ | 2595 9621 ▕ | 2596 9622 ▖ | 2597 9623 ▗ | 2598 9624 ▘ | 2599 9625 ▙ | 259A 9626 ▚ | 259B 9627 ▛ | 259C 9628 ▜ | 259D 9629 ▝ | 259E 9630 ▞ | 259F 9631 ▟ |
The display device should insert no gap between characters, or breaks will appear in what ought to be continuous lines.
Wrong | Right |
---|---|
Unicode does not attempt to supply all geometric characters that might be needed for all purposes, rather providing a moderately sized set that fulfills many needs, section 15.7 of the standard giving a rationale. Applications requiring fine control of appearance will need to use displays that allow control of individual pixels.
As an example of the variation in acceptable glyph appearance, the next table shows ten ways in which character 256C is rendered in the bitmapped fonts of Microsoft Windows XP SP2. Note that some are asymmetric.
Character size in pixels | Width | 4 | 5 | 6 | 7 | 8 | 8 | 10 | 12 | 16 | 16 |
---|---|---|---|---|---|---|---|---|---|---|---|
Height | 6 | 12 | 8 | 12 | 8 | 12 | 18 | 16 | 8 | 12 | |
Glyph magnified 2x |
Eighty-one of the characters in table one form an orthogonal set of box drawing characters, here called the plain-heavy system. Orthogonal means that each of the four arms can be selected separately to be absent, plain, or heavy. A space character was required to complete the chart; chosen was character 0020 from many options.
This and later tables often contain a gif image and hexadecimal code only.
Table two. | |||||||||
---|---|---|---|---|---|---|---|---|---|
Col 1 | Col 2 | Col 3 | Col 4 | Col 5 | Col 6 | Col 7 | Col 8 | Col 9 | |
Row 1 | 0020 | 2574 | 2578 | 2577 | 2510 | 2511 | 257B | 2512 | 2513 |
Row 2 | 2576 | 2500 | 257E | 250C | 252C | 252D | 250E | 2530 | 2531 |
Row 3 | 257A | 257C | 2501 | 250D | 252E | 252F | 250F | 2532 | 2533 |
Row 4 | 2575 | 2518 | 2519 | 2502 | 2524 | 2525 | 257D | 2527 | 252A |
Row 5 | 2514 | 2534 | 2535 | 251C | 253C | 253D | 251F | 2541 | 2545 |
Row 6 | 2515 | 2536 | 2537 | 251D | 253E | 253F | 2522 | 2546 | 2548 |
Row 7 | 2579 | 251A | 251B | 257F | 2526 | 2529 | 2503 | 2528 | 252B |
Row 8 | 2516 | 2538 | 2539 | 251E | 2540 | 2543 | 2520 | 2542 | 2549 |
Row 9 | 2517 | 253A | 253B | 2521 | 2544 | 2547 | 2523 | 254A | 254B |
Directory of table two. | |||
---|---|---|---|
Absent | Plain | Heavy | |
North arm | Rows 1, 2, 3 | Rows 4, 5, 6 | Rows 7, 8, 9 |
East arm | Rows 1, 4, 7 | Rows 2, 5, 8 | Rows 3, 6, 9 |
South arm | Columns 1, 2, 3 | Columns 4, 5, 6 | Columns 7, 8, 9 |
West arm | Columns 1, 4, 7 | Columns 2, 5, 8 | Columns 3, 6, 9 |
The 40 box drawing characters of code page 437 are retained, but the system is not expanded into orthogonality. The sample glyphs in this report render a plain stroke of this plain-double system no different from the plain stroke of the plain-heavy system. Below each image are its Unicode number (in hexadecimal) and its number (in decimal) from code page 437.
Table three. | ||||||
---|---|---|---|---|---|---|
250C 218 | 252C 194 | 2510 191 | 2500 196 | 2553 214 | 2565 210 | 2556 183 |
251C 195 | 253C 197 | 2524 180 | 255F 199 | 256B 215 | 2562 182 | |
2514 192 | 2534 193 | 2518 217 | 2559 211 | 2568 208 | 255C 189 | |
2502 179 | 0020 020 | 2551 186 | ||||
2552 213 | 2564 209 | 2555 184 | 2550 205 | 2554 201 | 2566 203 | 2557 187 |
255E 198 | 256A 216 | 2561 181 | 2560 204 | 256C 206 | 2563 185 | |
2558 212 | 2567 207 | 255B 190 | 255A 200 | 2569 202 | 255D 188 |
A number preceded by the letter 'N' identifies a character that is not part of Unicode, but included in this report for purposes of comparison.
Some characters in the plain-heavy system | 2578 | 257F | 2527 | 2532 | 253D | 2545 | 2549 |
---|---|---|---|---|---|---|---|
Equivalent plain-double characters not in Unicode | N-11 | N-12 | N-13 | N-14 | N-15 | N-16 | N-17 |
Unicode also does not have box-drawing characters that mix heavy arms and double arms:
N-18 | N-19 |
Meanwhile, the curved figures offer neither a heavy nor a double option:
256D | 256E | 256F | 2570 |
Characters 2571, 2572 and 2573 involve diagonals.
2571 | 2572 | 2573 |
As with the curved characters, there are no double strokes nor heavy strokes. Many combinations of arms do not appear:
N-21 | N-22 | N-23 | N-24 |
Strokes of the diagonal characters meet the edges of the bounding box at the corners, not the midpoints of the sides. In this regard, they are incompatible with the other 125 box-drawing characters, as strokes will not connect. This could have been addressed by making the diagonal characters similar to the examples below, which resemble characters 44 through 4B (hexadecimal) of Videotex:
N-25 | N-26 | N-27 | N-28 | N-29 |
In this report, glyphs for the dashed lines are asymmetric; one reason is that few displays will have enough pixels to allow all the dashed figures to be symmetric.
254C | 2504 | 2508 | 254E | 2506 | 250A |
254D | 2505 | 2509 | 254F | 2507 | 250B |
Although plain and heavy lines are supported, double lines are not.
N-31 | N-32 | N-33 | N-34 | N-35 | N-36 |
Block elements 2581-258F suggest that the definers of Unicode had in mind that a character cell would be eight pixels high and eight pixels wide, or some multiple of eight, as illustrated by these examples:
Character | 2581 | 2582 | 2583 | 2589 | 258A | 258B |
---|---|---|---|---|---|---|
Unicode Name | lower one-eighth block | lower one-quarter block | lower three-eighths block | left seven-eighths block | left three-quarters block | left five-eighths block |
The Unicode characters allow drawing at the bottom or left-hand side of the character position. To draw on the top or right, inverse video is required:
Normal video | 2581 | 2582 | 2583 | 2584 | 2585 | 2586 | 2587 |
---|---|---|---|---|---|---|---|
Inverse video | 2581 | 2582 | 2583 | 2584 | 2585 | 2586 | 2587 |
Normal video | 2580 | 2594 |
Normal video | 2589 | 258A | 258B | 258C | 258D | 258E | 258F |
---|---|---|---|---|---|---|---|
Inverse video | 2589 | 258A | 258B | 258C | 258D | 258E | 258F |
Normal video | 2595 | 2590 |
Within each character, the drawn and undrawn portions are contiguous respectively, hence characters such as these are not included:
Normal video | N-41 | N-42 | N-43 | N-44 |
---|---|---|---|---|
Inverse video | N-45 | N-46 | N-47 | N-48 |
Retained from code page 437 are the shaded characters, each of which consumes the entire area of the cell:
Normal video | 0020 | 2591 | 2592 | 2593 | 2588 |
---|---|---|---|---|---|
Inverse video | 0020 | 2591 | 2592 | 2593 | 2594 |
Normal video | 2588 | 2593 | 2592 | 2591 | 0020 |
At the time that the Unicode standard was developing, inverse video was available on some, but not all, displays. When inverse video is available, characters 2580, 2588, 2590, 2593 and 2594 are redundant. When on the other hand inverse video is not available, then an irregular subset of the block element characters is unsupported. Explaining the inconsistency is that Unicode incorporates some older standards in toto, even if those standards are not entirely compatible.
Note also that 259A the invere of 259E, 2598 is the inverse of 259F, et cetera.
259A | 2598 | 259D | 2597 | 2596 |
259E | 259F | 2599 | 259B | 259C |
Unicode has a wealth of other symbols similar to those discussed so far, including the terminal graphics characters:
23B8 ⎸ | 2502 │ | 23B9 ⎹ |
23BA ⎺ | 23BB ⎻ | 2500 ─ | 23BC ⎼ | 23BD ⎽ |
In some renderings, the glyph of 23B8 might not differ from that of 258F, 23BD from 2581, et cetera.
Certain of the Unicode geometric shapes are shown below. As drawn, each triangle spans the full width of the cell, but not the full height. The standard is silent on whether a narrower or taller triangle is acceptable.
25E2 ◢ | 25E3 ◣ | 25E4 ◤ | 25E5 ◥ |
Something that would combine well with the box elements is a tall triangle:
25E2 | 25E3 | 25E4 | 25E5 |