The Standards Blog

The Unicode 8.0: A Song of Praise for Unsung Heroes

Monday Witness

Unicode marks the most significant advance in writing systems since the Phoenicians
          James J. O'Donnell, Provost, Georgetown University

What is 11 1/8" x 8 3/4" x 2 1/4" and weighs 7.89 pounds? Among other things, the hardbound copy of the Unicode Standard 4.0, the Oxford English Dictionary of computerized language characters, numbers and symbols, contemporary and archaic, mainstream and obscure. The home of Khmer Lunar codes, Ogham alphabets and Cyrillic supplements. An alphanumeric expression of the means of human communication.
          Me, October 26, 2003, Savoring the Unicode

Two weeks ago, I got a call from a reporter who had stumbled on two pieces I wrote in praise of new releases of the Unicode, the first in 2003 (on the occasion of the release of Unicode 4.0, referred to above), and the second in 2006, two releases later. The reason for the call was the release of Unicode version 8.0 by its stewards, the Unicode Consortium.

What is the Unicode? In the driest sense, it is the computer code used to visually present characters on a screen or in a printout. But in a broader sense, it is what allows the richness of human history, culture, identity and knowledge to remain accessible as the fixed media of the past crumble into dust or (worse) are shredded into pulp as we race, pell mell into the digital future.

If that sounds a bit over the top, let me try again. Here's how I phrased it in 2006:

There are fundamental standards that are constantly in the news, such as XML (and its many offspring).  And there are standards development organizations, like the W3C, that enjoy a high profile in part because of the importance of the technical domains that they serve.  Some standards have even taken on socio-political significance, becoming pawns in international diplomacy, such as the root domains of the Internet, despite the fact that they are insignificant in size and design.

But there are other standards that go largely unheralded, and are developed by consortia that are virtually never in the news, despite the vast social and technical significance of the standard in question.  Perhaps chief among them is the Unicode, created and constantly extended by the Unicode Consortium, whose loyal and widely distributed team of contributors for the most part labor quietly in the background of information technology.

Notwithstanding the low profile of the Unicode and its creators, it is this standard that enables nearly all those living in the world today to communicate with each other in their native language character sets.  It even permits the words of many of those that lived in the past to become accessible to those alive today in electronic form, and in their original character sets as well.

So now its 2015, and the Unicode Consortium has just issued a new release. Tragicomically, while previous releases of the Unicode received virtually no mention in the press whatsoever, the latest release of the Unicode garnered broad media attention. Why? Because version 8.0 includes new emojis - happy faces and their ilk. A typical headline read, Unicode Consortium Releases Unicode 8 With Taco, Cheese and Unicorn Face Emoji. The text below that headline include the following trenchant coverage:

Of the 37 new emoji, inclusions based on popular request include taco, cheese wedge, burrito, bottle with popping cork, hot dog, popcorn, turkey, and unicorn face. Missing sports symbols like badminton and volleyball are also included, as are several new faces: face with rolling eyes, zipper-mouth face, robot face, upside-down face, and hugging face.

So which is it? Foundational cultural tool, or cheese wedge? Here's what I told the reporter, and you can decide for yourself.

Q: Why was it important to create a standard for computerized characters?

A: Prior to computerization, knowledge could only reliably preserved over generations and shared across continents by recording it in a fixed medium (parchment, papyrus, and eventually paper). That was a tedious time consuming process when performed by hand. But with the advent of mechanical printing, the preservation of and access to knowledge exploded - and the advancement of civilization increased apace. With the invention of the computer, and then the Internet and Web, the ability to record (by anyone, anywhere, anytime) and share (with anyone connected to the Internet) expanded yet again in dramatic fashion. 

Computerized text is different, however. With a book, once the ink is on the paper the only non-physical barrier to sharing is language skills. With a computer, however, knowledge has to not only be entered into the computer, but it has to come out again in a human-readable form. Unless everyone is using exactly the same computer, a way needs to be provided for the receiving computer to not only understand what the source computer has sent it, but how to render it in human-readable form. That's where standards come in. In the case of the Unicode, the standard is used by the software application (whatever type it might be) on the source computer to turn the key stroke into data that another computer, employing the same standard, can turn back into the same character. It's one of those jobs - and a very large one at that - that is utterly essential and invariably taken for granted.

Q: What has this achievement allowed that would not have otherwise been possible?

A. It might be useful to start by saying what wouldn't be possible without the Unicode that we would also take for granted. Like the ability to communicate in your own language, if your language uses characters other than the familiar Roman alphabet. Until the good folks at the Unicode came along, many early users of the Internet were unable to communicate in their native languages, because their character sets were not yet supported. It took an enormous amount of labor to add Cyrillic, Farsi, and less well known character sets to the Unicode so that computer users could communicate even with each other, much less with a relative abroad.

Here's another example, and one that illustrates the efforts to which the Unicode project has gone. Not only can you read modern languages, but extinct ones as well. Academics can therefore share the contents of precious documents directly, without having to scan delicate originals into non-editable and perhaps difficult to read forms.

There's also a quality of expression aspect. If you read eBooks, you'll see that years after they became popular your reading options are, to put it mildly, pitiful. Not only does Amazon not support the same document format (ePub) that the other vendors support (admittedly to varying degrees), but it only supports a small handful of fonts. Now, to be clear, fonts are another step up from Unicode characters, and take a great deal of additional work to create over and above the underlying Unicode that describes the characters themselves. But it does provide a rather stark contrast between what the Unicode project has done, on a shoestring, to provide an extraordinary number and variety of character sets to the world, while a multi-billion dollar industry continues to provide products with only the most limited number of fonts, rendering, and even formatting options.

Q: As you mentioned in your blog post, most people don't know what Unicode is or what it does. Is that a problem? Why or why not?

A. That's an interesting question, and one that my last example may help to illustrate.

The profound value that standards provide is to level the playing field, such that a broad variety of competitors can compete to create extra value in services and features above the standardized layer. Obviously, that result will be favored by many and undesirable to some (most obviously, already dominant vendors). In the case of eBooks, as with many formats previously (VCR vs. Betamax, CD and DVD formats, music formats, and so on), many companies have sought to either prevent the adoption of a common standard, or to make sure that a standard implementing patents that they control (and can charge for) prevails. The result is that product releases are delayed, innovation and competition are stifled, and consumers pay more and have fewer choices.

What format standards have in common is that they are very high up in the product design chain, and therefore have tremendous strategic and economic value. It's no surprise, for example, that after thirty years almost everyone still uses Microsoft Word to create documents, that there has been so little improvement in it, or that its price remains so high. It's also no surprise that Amazon is in no hurry to make it easy to read its eBooks on any device, but has no interest in helping (or even allowing) you to read a book you bought at Barnes & Noble on a Kindle.

It's much easier to create standards that are farther down the strategic stack, and that is certainly true for the humble Unicode. It's so far down, in fact, that not only have most people never heard of it, but most companies don't help fund it, either, event though every high tech company in existence depends completely on its existence.  Happily, they have been able to receive enough contributions to get by, and equally happy, because they are so far down the strategic chain, they can go about their business of making all knowledge, ever created or still to be created, anywhere, and by anyone, without being blocked or dragooned by proprietary interests.

Q: In you opinion, what are the cultural implications of having an international standard for characters?

A: It lies at the very bedrock of equal access to the Internet, and equal opportunity to benefit from it by being able to influence others. As earlier mentioned, it gives voices to those who would otherwise be voiceless, except in the language of another culture. With it, the world is a level playing field. Without it, we would be stuck in an upgraded example of a colonial world, where historically first world nations continue to force their cultures and rules on emerging nations and their peoples.

Q: Does it matter that Unicode remain free to use and open to new voting members? Why or why not?

A: For all of the same reasons. Not because I think that the current stewards are likely to step back from the principles they have always supported, but because new members bring new ideas, new viewpoints and first hand knowledge to the process, often representing other cultures first hand. Standards also rely on having an unsullied reputation for "openness," and even if very few members ever learn about the Unicode or decide to dedicate time to it, the fact that they are free to do so remains important and valuable.

As I've said before and always appreciate the opportunity to say again, those that have contributed their time and energy to the Unicode Project are true unsung heroes of the modern age. Even though very few people are aware of their work, there is no one alive today that can read, much less use a computer, that has not benefited from their work. The world is truly a better place because the Unicode Project is part of it.

PS: Here's another new release, the second in my series of cybersecurity thrillers. If you enjoyed The Alexandria Project, I think you'll enjoy this one, too. It's called The Lafayette Connection, A Tale of Deception and Elections, and the plot line follows the hacking of a presidential election. The action starts right where we are in the current election run up, and if you're looking for a good summer read, let me humbly suggest that this may be it. Why not give it a try?