Feed Icon RSS 1.0 XML Feed available

USCII: Character Codes With Meaning

Tags: , , ,

USCII
USCII ("you-ski") stands for Universal Semiotic Coding for Information Interchange. It is a system for embedding pictures inside the numbers agreed upon to represent symbols and control codes. I was inspired to create it by the famous Arecibo Message, which attempted to convey humanity's physics knowledge without assuming a cultural context other than math.
For instance, instead of ASCII's encoding of 65 for "A" and 66 for "B"...we might consider using the number 15621226033 for "A" and 16400753439 for "B". To see the bitmaps, you must first convert these values into binary:
  • 15621226033 (base 10) = 01110100011000110001111111000110001 (base 2)
  • 16400753439 (base 10) = 11110100011000111110100011000111110 (base 2)
When transmitted in a medium which hints at the significance of a 35-bit pattern, the semiprime nature of 35 suggests decomposing it into the factors 7 and 5. That produces (small) images of an A and a B. Larger prime factor choices could be used to get more coverage in Unicode--such as a 23x23 font for Chinese.

Overview Video

Live Demo

There is an online encoder, as well as a decoder... (currently only for the standard "USCII-5x7-ENGLISH-C0"). The encoder walks you through how the standard works. So just try typing something in the input box and read the explanation there:
The decoder is descriptive as well, and explains the steps. But if you send a message to a friend who hasn't heard of USCII, I'd be interested to know how many quickly figure it out without using the decoder. Any stories about that are welcome.

Arecibo ASCII

I've developed a draft specification of USCII variation "5x7-ENGLISH-C0". This uses 35 bits per character, and includes printable characters as well as the "C0 control codes". You can read the script that generates it, which contains comments on why I picked the bit patterns:
I've informally named this variant "Arecibo Ascii". That's because it is possible to losslessly convert a stream of conventional ASCII characters into USCII-5x7-ENGLISH-C0 (and back again). It's still a work in progress, but here's the table as it currently stands:
ASCIICharacterArecibo ASCII (35-bit binary)
0Null character10101010101010101010101010101010101
1Start of Header10101101111010110111101011011110101
2Start of Text11011111111101111111110111111111011
3End of Text11011110111101111011110111111111011
4End of Transmission11111111111111111111111111001110011
5Enquiry11111111010000011101101110000010111
6Acknowledgment11111101011111111111011101000111111
7Bell11011100011000110001000001111111011
8Backspace11111110111011100000101111101111111
9Horizontal Tab00000000000000111101000010000000000
10Line Feed11100001000010000100111110111000100
11Vertical Tab00100001000010000100001000000001110
12Form feed11111011100010000000111110111000100
13Carriage return00001000010010101101111110110000100
14Shift Out00100101111101111011110111110111100
15Shift In11100111011101111011110111011100100
16Data Link Escape11111111110010001110001001111111111
17Device Control 111111101111001110001100111011111111
18Device Control 211011110110101001010011100111010001
19Device Control 311111101011010110101101011010111111
20Device Control 411111100011000110001100011000111111
21Negative Acknowledgement11111101011111111111100010111011111
22Synchronous Idle11111111111111111111111110101011111
23End of Trans. Block11111000000111001010011100000011111
24Cancel10001000000101000100010100000010001
25End of Medium11111100010110001010001101000111111
26Substitute10001011101111011101110111111111011
27Escape00011001100101011110011100111010001
28File Separator10101101011010110101101011010110101
29Group Separator11011110111101111011110111101111011
30Record Separator11110111101101010010000001001111011
31Unit Separator11111111111111111111100111101110111
32Space00000000000000000000000000000000000
33!00100001000010000100000000000000100
34"01010010100101000000000000000000000
35#01010010101111101010111110101001010
36$00100011111010001110001011111000100
37%11000110010001000100010001001100011
38&01100100101010001000101011001001101
39'01100001000100000000000000000000000
40(00010001000100001000010000010000010
41)01000001000001000010000100010001000
42*00000001001010101110101010010000000
43+00000001000010011111001000010000000
44,00000000000000000000011000010001000
45-00000000000000011111000000000000000
46.00000000000000000000000000110001100
47/00000000010001000100010001000000000
48001110100011001110101110011000101110
49100100011000010000100001000010001110
50201110100010000100010001000100011111
51311111000100010000010000011000101110
52400010001100101010010111110001000010
53511111100001111000001000011000101110
54600110010001000011110100011000101110
55711111000010001000100010000100001000
56801110100011000101110100011000101110
57901110100011000101111000010001001100
58:00000011000110000000011000110000000
59;00000011000110000000011000010001000
60<00010001000100010000010000010000010
61=00000000001111100000111110000000000
62>01000001000001000001000100010001000
63?01110100010000100010001000000000100
64@01110100010000101101101011010101110
65A01110100011000110001111111000110001
66B11110100011000111110100011000111110
67C01110100011000010000100001000101110
68D11100100101000110001100011001011100
69E11111100001000011110100001000011111
70F11111100001000011110100001000010000
71G01110100011000010111100011000101111
72H10001100011000111111100011000110001
73I01110001000010000100001000010001110
74J00111000100001000010000101001001100
75K10001100101010011000101001001010001
76L10000100001000010000100001000011111
77M10001110111010110101100011000110001
78N10001100011100110101100111000110001
79O01110100011000110001100011000101110
80P11110100011000111110100001000010000
81Q01110100011000110001101011001001101
82R11110100011000111110101001001010001
83S01111100001000001110000010000111110
84T11111001000010000100001000010000100
85U10001100011000110001100011000101110
86V10001100011000110001100010101000100
87W10001100011000110001101011010101010
88X10001100010101000100010101000110001
89Y10001100011000101010001000010000100
90Z11111000010001000100010001000011111
91[01110010000100001000010000100001110
92\00000100000100000100000100000100000
93]01110000100001000010000100001001110
94^00100010101000100000000000000000000
95_00000000000000000000000000000011111
96`01000001000001000000000000000000000
97a00000000000111000001011111000101111
98b10000100001000011110100011000111110
99c00000000000111110000100001000001111
100d00001000010000101111100011000101111
101e00000000000111010001111111000001111
102f00010001010010001110001000010000100
103g00000000000111110001011110000111110
104h10000100001000011110100011000110001
105i00000001000000000100001000010000100
106j00010000000001000010000101001001100
107k01000010000100101010011000101001001
108l01100001000010000100001000010001110
109m00000000001101110101101011010110001
110n00000000001011011001100011000110001
111o00000000000111010001100011000101110
112p00000000001111010001111101000010000
113q00000000000111110001011110000100001
114r00000000001011011001100001000010000
115s00000000000111110000011100000111110
116t00100001001111100100001000010100010
117u00000000001000110001100011000101110
118v00000000001000110001100010101000100
119w00000000001000110001101011010101010
120x00000000001000101010001000101010001
121y00000000001000101010001000010001000
122z00000000001111100010001000100011111
123{00011001000010001000001000010000011
124|00100001000010000000001000010000100
125}11000001000010000010001000010011000
126~00001011101000000000000000000000000
127Delete11111110001010100010101011100011111
It isn't enough just to pick the character values, however. The Arecibo message was only sending one big bitmap; and a USCII string will consist of many characters. Two 35-bit characters in sequence suddenly have a length of 70 bits, and the semiprime hint is gone.
There are arguably a lot of ways to build a container format for USCII codes, but I picked a fairly general one, establishing what I call "meter" and "silence" enclosing the characters. For a WxH bitmap size choice, the layout is this:
  • leading silence: H repetitions of a sequence of WxH + W zero bits
  • leading meter: W repetitions of a sequence of WxH one bits, followed by W zero bits
  • message: each WxH character, individually followed by W zero bits
  • trailing meter: H repetitions of a sequence of WxH one bits, followed by W zero bits
  • trailing silence: W repetitions of a sequence of WxH + W one bits
It's something easier to see in the online encoder and decoder than by trying to grasp it abstractly. But that's the formula. It has the added bonus of producing multiples of 8 bits for all string lengths with W=5 and H=7, or W=23 and H=23.
The C0 codes are admittedly rather tricky. Especially to depict things like "Data Link Escape" or "Device Control 1"! It would be possible to use a larger bit size and get clearer images. But I'd like to see how far the 35 bit standard can go in cueing people who aren't familiar with ASCII into what the bitmaps signify...
Another thing I've increasingly been considering is that any meaningful and expressive picture will probably find its way into a symbolic font. This means that if Arecibo ASCII is to extend into "USCII Unicode", any pictures chosen for control codes are probably on the table as literal pictures. I'm not sure how to handle that problem.
Again, ideas are welcome!
Business Card from SXSW
Copyright (c) 2007-2018 hostilefork.com

Project names and graphic designs are All Rights Reserved, unless otherwise noted. Software codebases are governed by licenses included in their distributions. Posts on blog.hostilefork.com are licensed under the Creative Commons BY-NC-SA 4.0 license, and may be excerpted or adapted under the terms of that license for noncommercial purposes.