Critique of Unicode box-drawing and block-element characters.
Version of 28 July 2008
Dave Barber's other pages.

Unicode defines a huge set of characters for use with computer displays and printers. Some characters denote entire words (as in many Asian languages), alphabetic letters (as in many European languages), digits, and punctuation; while other characters have sundry purposes. This report examines the Unicode characters in two categories: box drawing and block elements, which here are for convenience aggregated as geometric.


The hardware and software of most modern general-purpose computers support control of individual display pixels in a rectangular array known as a raster, and this capability is practically required for a graphical user interface. However, embedded computers often entail relatively simple output, and might perform satisfactorily with a simpler display constructed as a two-dimensional array of perhaps a thousand characters rather than a million pixels. Also, many computers built before 1990 used character-based displays because of the higher costs of hardware at that time; a standard size was 25 lines of 80 characters each, using a monospaced font.

One symbol can be displayed within each character cell, which is typically implemented as a small array of pixels. On some machines, the height of a pixel exceeds its width; this design reduces the number of pixels while preserving the taller-than-wide aspect exhibited by most letters and digits in traditional typography. However, there is a tendency in modern equipment to make pixels square.

A practical minimum for text in the English language is a character size of eight pixels wide by eight pixels high; some other languages need more pixels for legibility. In the following example, which uses tall pixels, each character resides in such an eight-by-eight region, but the rightmost column and bottommost row of pixels are left unused to provide spacing between characters. Hence the effective size is seven by seven. For this illustration, pixels not used by the characters are rendered in a checkerboard pattern to show the extent of each.

If only Arabic numerals are to be shown, a width of four pixels and a height of six pixels (with a usable size of three by five) will suffice. In the following example, square pixels happen to be used.

Character-based displays can be very efficient because the geometric details of how to draw each character are stored in the display device itself, and are compact: the pattern for each character would require no more than eight bytes of storage for the alphabetic example above, and three bytes for the numeric. To make a character appear, the controlling computer need send merely a brief code, often only eight bits, and then the display figures out how to draw the character. A tradeoff is that the user often has limited choice of factors that affect the appearance of the font, such as face, size or color.

More generally, nothing in Unicode requires that the display be organized into pixels; some other technology could instead be used as long as the characters have a suitable appearance. For instance vector-based displays have been built, but are rare today; for hard copy, plotters are more frequently seen.


The earliest devices generally had light-colored (white and green were common) characters on a dark background. The IBM 3279 display, frequently seen in the 1980s, was among the first to give a choice of colors, producing red, green, blue, yellow, cyan, magenta and white figures on a black background. Each character could be a different color, but colors were never mixed within a character.

Many displays have offered:

Meanwhile, printers will almost always yield dark characters on a paper-white background.


By the early 1980's, many desktop computers were extending their character sets by adding a variety of geometric characters. The most basic were straight lines, right angles, tees and crosses, but many other shapes were included. These figures could be arranged on the display to divide the screen into sections or to draw simple pictures. Early examples are PETSCII, code page 437. Also Videotex, an early communications service, defined some geometric characters.

Motivating the establishment of Unicode was a lack of standardization among manufacturers' character sets, not only in the geometric characters but also in Latin alphabet letters with diacritical marks (such as accents and umlauts), and in non-Latin alphabets (such as Greek and Cyrillic). Moreover in some languages (such as Chinese) words cannot be decomposed into sequences of letters, but rather are an integral symbols; Unicode addresses these also.


Here is an demonstration of why a monospaced font is essential when using these geometic characters. This first screenshot has twelve characters per line, and is rendered in the font Deja Vu Sans Mono, wherein all characters are the same width:

In the well-known Times New Roman font each character is of its natural width, which on the one hand makes for easy reading of large blocks of prose, but which on the other hand makes for difficulty in aligning columns of text. This font is used in the next screenshot, where alignment of the box drawing characters fails severely even though precisely the same characters are used as the previous screenshot:

Adding or deleting characters gives at best an approximate fix:





A display that is arranged as a grid of characters automatically gives monospacing and averts this problem.


Unicode's box drawing characters and block elements, in table one below, resemble many of the early geometric characters. Exhibited are sample glyphs for the 160 Unicode characters numbered 2500-259F (in hexadecimal) or 9472-9631 (in decimal). Although the standards (box drawing, block elements) do indicate the general appearance of the glyphs, implementors are granted flexibility in rendering them.

For each character of table one, four items appear:

The gif images are necessary for two reasons. First, many fonts support Unicode poorly. Second, some alternative glyphs that are not found in Unicode will appear later in the discussion, and the gifs allow all to be drawn in the same format.

In the charts of the Unicode standard the glyphs are shown as black on a white background, but this report uses instead a gray background to show how the figure meets the cell boundary. In the actual display, each pixel is either fully on or fully off, because Unicode does not support intermediate levels of brightness. In order to make details clearer, the gifs below (mostly 24 × 48 pixels) are considerably larger than what is usually found on computer displays.

Table one.
Box drawing
2500
9472

2501
9473

2502
9474

2503
9475

2504
9476

2505
9477

2506
9478

2507
9479

2508
9480

2509
9481

250A
9482

250B
9483

250C
9484

250D
9485

250E
9486

250F
9487

2510
9488

2511
9489

2512
9490

2513
9491

2514
9492

2515
9493

2516
9494

2517
9495

2518
9496

2519
9497

251A
9498

251B
9499

251C
9500

251D
9501

251E
9502

251F
9503

2520
9504

2521
9505

2522
9506

2523
9507

2524
9508

2525
9509

2526
9510

2527
9511

2528
9512

2529
9513

252A
9514

252B
9515

252C
9516

252D
9517

252E
9518

252F
9519

2530
9520

2531
9521

2532
9522

2533
9523

2534
9524

2535
9525

2536
9526

2537
9527

2538
9528

2539
9529

253A
9530

253B
9531

253C
9532

253D
9533

253E
9534

253F
9535

2540
9536

2541
9537

2542
9538

2543
9539

2544
9540

2545
9541

2546
9542

2547
9543

2548
9544

2549
9545

254A
9546

254B
9547

254C
9548

254D
9549

254E
9550

254F
9551

2550
9552

2551
9553

2552
9554

2553
9555

2554
9556

2555
9557

2556
9558

2557
9559

2558
9560

2559
9561

255A
9562

255B
9563

255C
9564

255D
9565

255E
9566

255F
9567

2560
9568

2561
9569

2562
9570

2563
9571

2564
9572

2565
9573

2566
9574

2567
9575

2568
9576

2569
9577

256A
9578

256B
9579

256C
9580

256D
9581

256E
9582

256F
9583

2570
9584

2571
9585

2572
9586

2573
9587

2574
9588

2575
9589

2576
9590

2577
9591

2578
9592

2579
9593

257A
9594

257B
9595

257C
9596

257D
9597

257E
9598

257F
9599
Block elements
2580
9600

2581
9601

2582
9602

2583
9603

2584
9604

2585
9605

2586
9606

2587
9607

2588
9608

2589
9609

258A
9610

258B
9611

258C
9612

258D
9613

258E
9614

258F
9615

2590
9616

2591
9617

2592
9618

2593
9619

2594
9620

2595
9621

2596
9622

2597
9623

2598
9624

2599
9625

259A
9626

259B
9627

259C
9628

259D
9629

259E
9630

259F
9631

The display device should insert no gap between characters, or breaks will appear in what ought to be continuous lines.

WrongRight

Unicode does not attempt to supply all geometric characters that might be needed for all purposes, rather providing a moderately sized set that fulfills many needs, section 15.7 of the standard giving a rationale. Applications requiring fine control of appearance will need to use displays that allow control of individual pixels.


As an example of the variation in acceptable glyph appearance, the next table shows ten ways in which character 256C is rendered in the bitmapped fonts of Microsoft Windows XP SP2. Note that some are asymmetric.

Character size
in pixels
Width 4 5 6 7 8 810121616
Height 612 812 8 121816 812
Glyph
magnified 2x


Eighty-one of the characters in table one form an orthogonal set of box drawing characters, here called the plain-heavy system. Orthogonal means that each of the four arms can be selected separately to be absent, plain, or heavy. A space character was required to complete the chart; chosen was character 0020 from many options.

This and later tables often contain a gif image and hexadecimal code only.

Table two.
Col 1Col 2Col 3Col 4 Col 5Col 6Col 7Col 8Col 9
Row 1
0020

2574

2578

2577

2510

2511

257B

2512

2513
Row 2
2576

2500

257E

250C

252C

252D

250E

2530

2531
Row 3
257A

257C

2501

250D

252E

252F

250F

2532

2533
Row 4
2575

2518

2519

2502

2524

2525

257D

2527

252A
Row 5
2514

2534

2535

251C

253C

253D

251F

2541

2545
Row 6
2515

2536

2537

251D

253E

253F

2522

2546

2548
Row 7
2579

251A

251B

257F

2526

2529

2503

2528

252B
Row 8
2516

2538

2539

251E

2540

2543

2520

2542

2549
Row 9
2517

253A

253B

2521

2544

2547

2523

254A

254B

Directory of table two.
AbsentPlainHeavy
North armRows 1, 2, 3Rows 4, 5, 6Rows 7, 8, 9
East armRows 1, 4, 7Rows 2, 5, 8Rows 3, 6, 9
South armColumns 1, 2, 3Columns 4, 5, 6Columns 7, 8, 9
West armColumns 1, 4, 7Columns 2, 5, 8Columns 3, 6, 9


The 40 box drawing characters of code page 437 are retained, but the system is not expanded into orthogonality. The sample glyphs in this report render a plain stroke of this plain-double system no different from the plain stroke of the plain-heavy system. Below each image are its Unicode number (in hexadecimal) and its number (in decimal) from code page 437.

Table three.

250C
218

252C
194

2510
191

2500
196

2553
214

2565
210

2556
183

251C
195

253C
197

2524
180

255F
199

256B
215

2562
182

2514
192

2534
193

2518
217

2559
211

2568
208

255C
189

2502
179

0020
020

2551
186

2552
213

2564
209

2555
184

2550
205

2554
201

2566
203

2557
187

255E
198

256A
216

2561
181

2560
204

256C
206

2563
185

2558
212

2567
207

255B
190

255A
200

2569
202

255D
188

A number preceded by the letter 'N' identifies a character that is not part of Unicode, but included in this report for purposes of comparison.

Some characters in the plain-heavy system
2578

257F

2527

2532

253D

2545

2549
Equivalent plain-double characters not in Unicode
N-11

N-12

N-13

N-14

N-15

N-16

N-17

Unicode also does not have box-drawing characters that mix heavy arms and double arms:


N-18

N-19

Meanwhile, the curved figures offer neither a heavy nor a double option:


256D

256E

256F

2570


Characters 2571, 2572 and 2573 involve diagonals.


2571

2572

2573

As with the curved characters, there are no double strokes nor heavy strokes. Many combinations of arms do not appear:


N-21

N-22

N-23

N-24

Strokes of the diagonal characters meet the edges of the bounding box at the corners, not the midpoints of the sides. In this regard, they are incompatible with the other 125 box-drawing characters, as strokes will not connect. This could have been addressed by making the diagonal characters similar to the examples below, which resemble characters 44 through 4B (hexadecimal) of Videotex:


N-25

N-26

N-27

N-28

N-29


In this report, glyphs for the dashed lines are asymmetric; one reason is that few displays will have enough pixels to allow all the dashed figures to be symmetric.


254C

2504

2508

254E

2506

250A

254D

2505

2509

254F

2507

250B

Although plain and heavy lines are supported, double lines are not.


N-31

N-32

N-33

N-34

N-35

N-36

Block elements 2581-258F suggest that the definers of Unicode had in mind that a character cell would be eight pixels high and eight pixels wide, or some multiple of eight, as illustrated by these examples:

Character
2581

2582

2583

2589

258A

258B
Unicode Name lower
one-eighth
block
lower
one-quarter
block
lower
three-eighths
block
left
seven-eighths
block
left
three-quarters
block
left
five-eighths
block

The Unicode characters allow drawing at the bottom or left-hand side of the character position. To draw on the top or right, inverse video is required:

Normal video
2581

2582

2583

2584

2585

2586

2587
Inverse video
2581

2582

2583

2584

2585

2586

2587
Normal video
2580

2594

Normal video
2589

258A

258B

258C

258D

258E

258F
Inverse video
2589

258A

258B

258C

258D

258E

258F
Normal video
2595

2590

Within each character, the drawn and undrawn portions are contiguous respectively, hence characters such as these are not included:

Normal video
N-41

N-42

N-43

N-44
Inverse video
N-45

N-46

N-47

N-48

Retained from code page 437 are the shaded characters, each of which consumes the entire area of the cell:

Normal video
0020

2591

2592

2593

2588
Inverse video
0020

2591

2592

2593

2594
Normal video
2588

2593

2592

2591

0020

At the time that the Unicode standard was developing, inverse video was available on some, but not all, displays. When inverse video is available, characters 2580, 2588, 2590, 2593 and 2594 are redundant. When on the other hand inverse video is not available, then an irregular subset of the block element characters is unsupported. Explaining the inconsistency is that Unicode incorporates some older standards in toto, even if those standards are not entirely compatible.

Note also that 259A the invere of 259E, 2598 is the inverse of 259F, et cetera.


259A

2598

259D

2597

2596

259E

259F

2599

259B

259C


Unicode has a wealth of other symbols similar to those discussed so far, including the terminal graphics characters:


23B8

2502

23B9


23BA

23BB

2500

23BC

23BD

In some renderings, the glyph of 23B8 might not differ from that of 258F, 23BD from 2581, et cetera.

Certain of the Unicode geometric shapes are shown below. As drawn, each triangle spans the full width of the cell, but not the full height. The standard is silent on whether a narrower or taller triangle is acceptable.


25E2

25E3

25E4

25E5

Something that would combine well with the box elements is a tall triangle:


25E2

25E3

25E4

25E5