The Unicode consortium provides a file containing annotations on many unicode characters. This library contains a compiled version of this file so that programs can access these data easily.
The library contains a very large (sparse) array with one entry for each unicode code point (U+0000 - U+10FFFF). Each entry contains two strings, a name and an annotation. Either or both may be NULL. The library also contains a (much smaller) list of all the Unicode blocks.
struct unicode_block { int start, end; const char *name; }; struct unicode_nameannot { const char *name, *annot; }; extern const struct unicode_block UnicodeBlock[124]; #define UNICODE_NAME_MAX 94 #define UNICODE_ANNOT_MAX 372 extern const struct unicode_nameannot * const *const UnicodeNameAnnot[]; /* Index by: UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff] */ /* At the beginning of lines (after a tab) within the annotation string, a */ /* * should be replaced by a bullet U+2022 */ /* x should be replaced by a right arrow U+2192 */ /* : should be replaced by an equivalent U+224D */ /* # should be replaced by an approximate U+2245 */ /* = should remain itself */
This package consists of one header file and one library file. The header is
<uninameslist.h>. To find the name of a given unicode character
Download the source from sourceforge. Then:
uni
use
UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].name
while the annotation string is:
UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].annot
The both strings are in US ASCII, but the annotation string is intended to be
modified slightly by the having any '*' characters which immediately follow a
tab at the start of a line converted to a bullet character. Etc.
Installation and Build instructions
$ tar xf libuninameslist*.tgz
$ gunzip libuninameslist*.tgz ; tar xf libuninameslist*.tar
$ cd libuninameslist
$ configure
$ make
$ su
# make install
See Also