USCII ("you-ski") stands for Universal Semiotic Coding for Information Interchange. It is a system for embedding pictures inside the numbers agreed upon to represent symbols and control codes. I was inspired to create it by the famous Arecibo Message, which attempted to convey humanity's physics knowledge without assuming a cultural context other than math.
For instance, instead of ASCII's encoding of 65 for "A" and 66 for "B"...we might consider using the number 15621226033 for "A" and 16400753439 for "B". To see the bitmaps, you must first convert these values into binary:
- 15621226033 (base 10) = 01110100011000110001111111000110001 (base 2)
- 16400753439 (base 10) = 11110100011000111110100011000111110 (base 2)
When transmitted in a medium which hints at the significance of a 35-bit pattern, the semiprime nature of 35 suggests decomposing it into the factors 7 and 5. That produces (small) images of an A and a B. Larger prime factor choices could be used to get more coverage in Unicode--such as a 23x23 font for Chinese.
Overview Video
Live Demo
There is an online encoder, as well as a decoder... (currently only for the standard "USCII-5x7-ENGLISH-C0"). The encoder walks you through how the standard works. So just try typing something in the input box and read the explanation there:
The decoder is descriptive as well, and explains the steps. But if you send a message to a friend who hasn't heard of USCII, I'd be interested to know how many quickly figure it out without using the decoder. Any stories about that are welcome.
Arecibo ASCII
I've developed a draft specification of USCII variation "5x7-ENGLISH-C0". This uses 35 bits per character, and includes printable characters as well as the "C0 control codes". You can read the script that generates it, which contains comments on why I picked the bit patterns:
I've informally named this variant "Arecibo Ascii". That's because it is possible to losslessly convert a stream of conventional ASCII characters into USCII-5x7-ENGLISH-C0 (and back again). It's still a work in progress, but here's the table as it currently stands:
ASCII | Character | Arecibo ASCII (35-bit binary) |
0 | Null character | 10101010101010101010101010101010101 |
1 | Start of Header | 10101101111010110111101011011110101 |
2 | Start of Text | 11011111111101111111110111111111011 |
3 | End of Text | 11011110111101111011110111111111011 |
4 | End of Transmission | 11111111111111111111111111001110011 |
5 | Enquiry | 11111111010000011101101110000010111 |
6 | Acknowledgment | 11111101011111111111011101000111111 |
7 | Bell | 11011100011000110001000001111111011 |
8 | Backspace | 11111110111011100000101111101111111 |
9 | Horizontal Tab | 00000000000000111101000010000000000 |
10 | Line Feed | 11100001000010000100111110111000100 |
11 | Vertical Tab | 00100001000010000100001000000001110 |
12 | Form feed | 11111011100010000000111110111000100 |
13 | Carriage return | 00001000010010101101111110110000100 |
14 | Shift Out | 00100101111101111011110111110111100 |
15 | Shift In | 11100111011101111011110111011100100 |
16 | Data Link Escape | 11111111110010001110001001111111111 |
17 | Device Control 1 | 11111101111001110001100111011111111 |
18 | Device Control 2 | 11011110110101001010011100111010001 |
19 | Device Control 3 | 11111101011010110101101011010111111 |
20 | Device Control 4 | 11111100011000110001100011000111111 |
21 | Negative Acknowledgement | 11111101011111111111100010111011111 |
22 | Synchronous Idle | 11111111111111111111111110101011111 |
23 | End of Trans. Block | 11111000000111001010011100000011111 |
24 | Cancel | 10001000000101000100010100000010001 |
25 | End of Medium | 11111100010110001010001101000111111 |
26 | Substitute | 10001011101111011101110111111111011 |
27 | Escape | 00011001100101011110011100111010001 |
28 | File Separator | 10101101011010110101101011010110101 |
29 | Group Separator | 11011110111101111011110111101111011 |
30 | Record Separator | 11110111101101010010000001001111011 |
31 | Unit Separator | 11111111111111111111100111101110111 |
32 | Space | 00000000000000000000000000000000000 |
33 | ! | 00100001000010000100000000000000100 |
34 | " | 01010010100101000000000000000000000 |
35 | # | 01010010101111101010111110101001010 |
36 | $ | 00100011111010001110001011111000100 |
37 | % | 11000110010001000100010001001100011 |
38 | & | 01100100101010001000101011001001101 |
39 | ' | 01100001000100000000000000000000000 |
40 | ( | 00010001000100001000010000010000010 |
41 | ) | 01000001000001000010000100010001000 |
42 | * | 00000001001010101110101010010000000 |
43 | + | 00000001000010011111001000010000000 |
44 | , | 00000000000000000000011000010001000 |
45 | - | 00000000000000011111000000000000000 |
46 | . | 00000000000000000000000000110001100 |
47 | / | 00000000010001000100010001000000000 |
48 | 0 | 01110100011001110101110011000101110 |
49 | 1 | 00100011000010000100001000010001110 |
50 | 2 | 01110100010000100010001000100011111 |
51 | 3 | 11111000100010000010000011000101110 |
52 | 4 | 00010001100101010010111110001000010 |
53 | 5 | 11111100001111000001000011000101110 |
54 | 6 | 00110010001000011110100011000101110 |
55 | 7 | 11111000010001000100010000100001000 |
56 | 8 | 01110100011000101110100011000101110 |
57 | 9 | 01110100011000101111000010001001100 |
58 | : | 00000011000110000000011000110000000 |
59 | ; | 00000011000110000000011000010001000 |
60 | < | 00010001000100010000010000010000010 |
61 | = | 00000000001111100000111110000000000 |
62 | > | 01000001000001000001000100010001000 |
63 | ? | 01110100010000100010001000000000100 |
64 | @ | 01110100010000101101101011010101110 |
65 | A | 01110100011000110001111111000110001 |
66 | B | 11110100011000111110100011000111110 |
67 | C | 01110100011000010000100001000101110 |
68 | D | 11100100101000110001100011001011100 |
69 | E | 11111100001000011110100001000011111 |
70 | F | 11111100001000011110100001000010000 |
71 | G | 01110100011000010111100011000101111 |
72 | H | 10001100011000111111100011000110001 |
73 | I | 01110001000010000100001000010001110 |
74 | J | 00111000100001000010000101001001100 |
75 | K | 10001100101010011000101001001010001 |
76 | L | 10000100001000010000100001000011111 |
77 | M | 10001110111010110101100011000110001 |
78 | N | 10001100011100110101100111000110001 |
79 | O | 01110100011000110001100011000101110 |
80 | P | 11110100011000111110100001000010000 |
81 | Q | 01110100011000110001101011001001101 |
82 | R | 11110100011000111110101001001010001 |
83 | S | 01111100001000001110000010000111110 |
84 | T | 11111001000010000100001000010000100 |
85 | U | 10001100011000110001100011000101110 |
86 | V | 10001100011000110001100010101000100 |
87 | W | 10001100011000110001101011010101010 |
88 | X | 10001100010101000100010101000110001 |
89 | Y | 10001100011000101010001000010000100 |
90 | Z | 11111000010001000100010001000011111 |
91 | [ | 01110010000100001000010000100001110 |
92 | \ | 00000100000100000100000100000100000 |
93 | ] | 01110000100001000010000100001001110 |
94 | ^ | 00100010101000100000000000000000000 |
95 | _ | 00000000000000000000000000000011111 |
96 | ` | 01000001000001000000000000000000000 |
97 | a | 00000000000111000001011111000101111 |
98 | b | 10000100001000011110100011000111110 |
99 | c | 00000000000111110000100001000001111 |
100 | d | 00001000010000101111100011000101111 |
101 | e | 00000000000111010001111111000001111 |
102 | f | 00010001010010001110001000010000100 |
103 | g | 00000000000111110001011110000111110 |
104 | h | 10000100001000011110100011000110001 |
105 | i | 00000001000000000100001000010000100 |
106 | j | 00010000000001000010000101001001100 |
107 | k | 01000010000100101010011000101001001 |
108 | l | 01100001000010000100001000010001110 |
109 | m | 00000000001101110101101011010110001 |
110 | n | 00000000001011011001100011000110001 |
111 | o | 00000000000111010001100011000101110 |
112 | p | 00000000001111010001111101000010000 |
113 | q | 00000000000111110001011110000100001 |
114 | r | 00000000001011011001100001000010000 |
115 | s | 00000000000111110000011100000111110 |
116 | t | 00100001001111100100001000010100010 |
117 | u | 00000000001000110001100011000101110 |
118 | v | 00000000001000110001100010101000100 |
119 | w | 00000000001000110001101011010101010 |
120 | x | 00000000001000101010001000101010001 |
121 | y | 00000000001000101010001000010001000 |
122 | z | 00000000001111100010001000100011111 |
123 | { | 00011001000010001000001000010000011 |
124 | | | 00100001000010000000001000010000100 |
125 | } | 11000001000010000010001000010011000 |
126 | ~ | 00001011101000000000000000000000000 |
127 | Delete | 11111110001010100010101011100011111 |
It isn't enough just to pick the character values, however. The Arecibo message was only sending one big bitmap; and a USCII string will consist of many characters. Two 35-bit characters in sequence suddenly have a length of 70 bits, and the semiprime hint is gone.
There are arguably a lot of ways to build a container format for USCII codes, but I picked a fairly general one, establishing what I call "meter" and "silence" enclosing the characters. For a WxH bitmap size choice, the layout is this:
- leading silence:
H
repetitions of a sequence ofWxH + W
zero bits - leading meter:
W
repetitions of a sequence ofWxH
one bits, followed byW
zero bits - message: each
WxH
character, individually followed byW
zero bits - trailing meter:
H
repetitions of a sequence ofWxH
one bits, followed byW
zero bits - trailing silence:
W
repetitions of a sequence ofWxH + W
one bits
It's something easier to see in the online encoder and decoder than by trying to grasp it abstractly. But that's the formula. It has the added bonus of producing multiples of 8 bits for all string lengths with W=5 and H=7, or W=23 and H=23.
The C0 codes are admittedly rather tricky. Especially to depict things like "Data Link Escape" or "Device Control 1"! It would be possible to use a larger bit size and get clearer images. But I'd like to see how far the 35 bit standard can go in cueing people who aren't familiar with ASCII into what the bitmaps signify...
Another thing I've increasingly been considering is that any meaningful and expressive picture will probably find its way into a symbolic font. This means that if Arecibo ASCII is to extend into "USCII Unicode", any pictures chosen for control codes are probably on the table as literal pictures. I'm not sure how to handle that problem.
Again, ideas are welcome!