Universal Coded Character Set an attempt for multilingual environmen

Universal Coded Character Set
an attempt for multilingual environment

Language Diversity in Information Society
2001/03/10
Unesco Paris

This paper describes the importance of maintaining compatibility and global cooperation which is in regard with the IT standardization activity where I have been involved with, especially in conjunction with the language and culture dependent aspects based on my experiences.

There are three major topics the paper discusses;

  1. Pax Americana syndrome.
  2. Chinese way of standardization in a ISO activity.
  3. Standardization of Myanmar script on International coded character set.

[Pax Americana syndrome]
I have been attending Unicode Technical Committee who maintains the Unicode Standard since while ago as a representative of JUSTSYSTEM who is a Japanese software vender where I worked for at that time. Throughout entire Unicode Technical Committee history, none of Japanese companies other than JUSTSYSTEM has been a full member of Unicode Consortium.

Unicode is a coded character set. Unicode is designed to cover very large range of scripts in the World if not all. One believes it is fairly practical, one said that Unicode is a full of self-conceited arrogance, just like building the Tower of Babel. After a series of complex course of events, Unicode became virtually identical with ISO/IEC 10646, one of the international de jure standards rather than a de facto standard. Since then Unicode Consortium and the ISO/IEC working group collaboratively develop and maintain Unicode Standard and ISO/IEC 10646 in parallel.

The Unicode Consortium, which develops and promotes Unicode, is a non-profit organization based in California, USA, consists of many U.S.A origin global corporations/enterprises like IBM, Microsoft, Apple Computer, Sun Microsystems, etc.
The Unicode Technical Committee, abbreviated as UTC hereafter, which is the primary decision making body within the Unicode Consortium, is responsible for the development and maintenance of the Unicode Standard.

Here I’d like to share my surprising experience with you when I attended UTC meeting first time. UTC was held together with NCITS/L2(the U.S. National Committee for Information Technology Standards/L2 ) meeting as joint meeting. NCITS/L2 is the national body of the U.S.A for ISO/IEC JTC1/SC2/WG2. ISO/IEC JTC1/SC2/WG2 is the working group of ISO/IEC who is in charge of ISO/IEC 10646, the international standard compatible with the Unicode standard. It appeared awfully inappropriate enough to me that they held such meeting jointly on the ground where they selfishly thought they could justify why they held jointly, which only because most of the UTC attendees and NCITS/L2 attendees were the same. (I have to note that even after all, they still hold UTC and NCITS/L2 meeting jointly at least as of today, though) In my understanding, those two organizations in principal represent the completely different interest parties and are chartered differently even though most of the attendees are entical. It can not be counted whether the members are identical or not to justify one country’s national body and international organization mixup their different position and charter. Such apparent superpower arrogance was enough to make me upset to stand up, but before taking an action, I soon discovered that their arrogance actually went even beyond the level of surprise. They did not only hold the meeting jointly, but also undistinguished them as the decision making bodies. Even though I had hesitated to speak up in front of such many strangers because I had to use interpreter to express my opinion due to my English limitation, they looked amazingly or awfully inadequate to me as the representatives of national body and the representatives of International standard consortium members. Finally they led me break though the barrier. I stood up and questioned them that their justification of taking the UTC resolution and the L2 resolution in undistinguished fashion while possibly the UTC and the L2 could have different positions because one is an international consortium and the other is the U.S. national body. I also explained that I was not entitled, and I did not want to be entitled, to vote for the U.S. national body.

After I came back to Japan, I talked this episode to my friend. He is a sociologist and is researching media theory. He responded immediately; “Mr. Kobayashi, this story is a stereotypical episode of Pax Americana syndrome”. This episode was the tip of the iceberg. Since then I constantly discover they can be American centric time after time.

By the way, I would like to share my experience of feeling the American justice as well on the example I experienced above.
When I complained about the UTC and NCITS/L2 mixed-up of national body position and international organization position, I was actually so afraid, and actually expected huge push-back from those strangers I had never met with, and who had been entitled to run the meeting in such way for long time. However, what I actually got was their word of appreciation instead.

“Thank you very much Tatsuo. We had never ever aware of such our problem. We will separate UTC and L2 meeting from now on. We will only have a balloting for UTC during UTC meeting. After UTC meeting, we will have the NCITS/L2 meeting and even though for most of the attendees, having another balloting is duplicate, we promise we will demonstrate our discipline”.
It eventually became my first contribution to UTC.

We can see one of the typical examples how they are PAX AMERICANA in Unicode Standard itself. There are more than 35 scripts already standardized in Unicode but some of them are standardized without having native speakers involved.

America is great. American knows everything. American knows all scripts better than native speakers.

This symptom is not peculiar to Americans. I do see the same symptom in Japan, China, and even in Europe. I encountered not only Pax Americana but also Pax Japonica, Pax Chinica, and Pax modern Romana through out my experiences with international standardization activities.

Let’s pick the UTC example to study further this Pax Americana syndrome as a representative Pax XXX syndrome.

Actually, not all of UTC members are Anglo-Americans. Rather, foreign-born citizens like German-American, Austrian-American, Jordanian-American, Indian live in Canada, and a few Chinese from Taiwan take majority.
So it is not the case that they are simply ignorant on the situation of the World. Those foreign-born citizens work for IT industry in the U.S., and they believe in technology and prosperity of the U.S.
We may be able to say that their ground upon making decision is not where they physically live or were born but which global company they work for and which leading-edge technologies they believe in.
For them, the United States is not a Nation State in traditional meaning, but the united virtual entity of global companies as a nucleus of producing leading-edge technologies.

UTC members are confident on the technology they develop and believe in their contribution to the prosperity of the world by doing that.
They develop the technology in good mind and devote their life to deploy their technology worldwide to make the world happier.

On the other hand, there are many languages, scripts, cultures and sense of values in the world. They should always be sensible on the fact that if they misrepresent those whom they pretend to represent, their good faith may end up result only to make someone miserable.

[Chinese characters are only for Chinese?]
Briefly after I joined UTC, I also joined Japanese committees for ISO/IEC JTC1/SC2 and JTC1/SC2/WG2/IRG.
Now I have several caps, one is cap of JUSTSYSTEM in UTC, one is cap of Japan national body in ISO/IEC JTC1/SC2 and SC2/WG2, and one is cap of Japanese head of delegation in SC2/WG2/IRG. IRG is Ideographic Rapporteur Group in SC2/WG2 which is in charge of developing unified ideographs.

Before I talk about second episode, I would like to explain what IRG is in detail. IRG consists of national representatives of China, Japan, Korea, North Korea(Democratic Peoples Republic of Korea), Taiwan, Singapore, Vietnam, Hong Kong SAR, USA and representatives of Unicode Consortium as observer.
All countries and regions use Han Ideographic characters, as known as Chinese characters.

The problems around Han characters are very interesting from the view point of language diversity and communication in multilingual environment.

At the beginning, predecessor of IRG called CJK-JRG(China, Japan, Korea Joint Research Group) was set up to research for the feasibility of unification between Chinese, Japanese, Korean Han characters as voluntary group.
Han characters are originally used in China, spread over Japanese archipelago, Korean Peninsula, and other regions like Viet Nam. And along with its dissemination, Han characters acquired diversity.
For example, Japanese phonograms called Hiragana and Katakana, are derived from Han characters. There exist many Japanese origin Han characters which are not found in the original.
Korea and Viet Nam also have there original Han characters. Above all, China made new shapes called “Simplified Han characters”.
This is how Han characters got to have such huge variations and diversity.

The initial objective of CJK-JRG was feasibility study to unify these Han characters by using modern technology.
This may be yet another attempt to build the Tower of Babel.

Although the CJK-JRG did not carry out any conclusion against their objectives, IRG is formed as a successor of CJK-JRG. Their mission is as if they are to build the Tower of Babel to find out whether it is possible to build the Tower of Babel.

The following is a typical example of unification of Han characters.

Now, there are three groups in shapes. One is the shape used in main land China, one is in Taiwan and Hong Kong, one is in Japan and Korea.
Native Han ideograph characters users can easily tell these different shapes represents same character identity.
In the ISO/IEC 10646 standard, these shapes are considered as a minor glyphic variations, thus, are unified as same character and single code point is assigned to those different shapes.
Distinguishing those minor shape variations becomes crucial when it comes to business. A product for China requires Chinese shapes, a product for Taiwan requires Taiwanese shapes, a product for Japan requires Japanese shapes. Everybody in IRG is supposed to know the importance of conserving those shapes differences. However, they sometime fall into the pitfall of national ego upon discussion regarding shapes. It is desirable for the international committee like IRG that members respect other members’ culture and its diversity, help accommodating conflicting requirements, and move forward to develop technology usable for people shares Han Ideographic scripts for communication.

【Dream comes True】
In these couple of years, I have participated in a committee called “Multi Lingual Information Environment Technology” organized by CICC(Center for International Cooperation for Computerization).

CICC was established in June 1983 to cooperate and assist developing countries in the introduction of computers and information technology, and to promote computerization thereby for their economic and social development.

*** foot notes ***

[CICC]
CICC is currently conducting the following cooperation programs: *Training on computerization for developing countries *Education and guidance on computerization for developing countries *Surveys, research, and R & D on computerization for developing countries *Collection and dissemination of information and data on computerization in developing countries *International interchanges related to cooperation for computerization *Other activities to achieve the objectives of CICC
[MLIT]
In the networked Information Society to come, most computers are going to connect to worldwide networks such as the Internet. For developing electronic commerce and cultural exchange in the age of global network, it is insufficient to go with computers of individual implementation of single language. To meet the needs of the worldwide network, information processing and information exchange via computers shall be able to handle multiple languages (multi-lingual systems). This is the “Multilingual Information Processing Environment” that MLIT is aiming at.

** end of foot notes **

Now, ISO/IEC 10646 includes about 35 scripts. One “Latin” script is shared among several languages like English, French, Germany, Spanish etc. Arabic and Indic scripts also shared among several languages. So, ISO/IEC 10646 can be use for huge languages.

But there are many problems exist in ISO/IEC 10646. Largest problem is that some scripts were developed without having participation of native speaker or specialist of each scripts. Since single script may be shared among multiple languages and may be used differently, it would be ideal to have native speaker or specialist of each languages which use the script, but the reality is often opposite.

One of the reasons of this problem is that it is costly to participate in the international standardization activities.
It costs not only to affiliate with ISO or IEC as voting member, but also to attend committee meetings held consecutively everywhere in the world.

Another reason is that government staffs and subject specialists in the concerned nations or regions are not familiar with the process and the situation of ISO/IEC standardization. In those days, especially in the IT field, it became a common practice that whom leads the standardization activity takes the priority start for the races of development of industry. This holds true not only in de jure standard but also in de facto standard.

IT became a center of public interest in South East Asian countries as well.

A little before the lunch at MLIT symposium held in Ho Chi-minh City, Viet Nam, a world wide company demonstrated the operating system which supports Vietnamese language. The demonstration was not scheduled in the conference program. Due to the special arrangement of the general and minister of Information, the demonstration was interjected after the programs scheduled originally in the morning session were finished.
An operator of the demonstration spoke English fluently. Although he spoke very good English, probably best among attendees of the symposium, the demonstration was very poor due to the quality of its support of Vietnamese. Since it was even noticeable to non native like myself, it must look awful to native Vietnamese. I was wondering why the general interjected the demonstration of such poor quality support of Vietnamese. I still do not know for sure, but he might think it was a big step forward for Viet Nam to have their script supported in such major operating system regardless of its quality.

Actually, we can see such quality problem everywhere. As we know, web browser is one of the most advanced multilingual capable programs.
However, even the Web browsers do not reach the reasonable level of language support in terms of quality. The quality of the Western European languages support is reasonable but the quality of the other languages support is typically unreasonably poor. The reason is that the browser is originally designed only for Western European languages, the system architecture is based on Western typography and Western font design. Because the foundation of the system design is not internationalized, building the add-hoc language support on top of such insufficient foundation becomes very difficult in terms of quality. For example, a very basic, common idea of ascend/descend line in Western typography does not exist in Japanese typography.
A Japanese font designer typically use up all pixels to fill out the entire display cell by assuming the existence of reasonable line space in all the rendering systems.

There are many scripts, thus, there are many senses of beauty.
However, if one sticks with the Western centric view, one may not be able to see them.

Of course there are global enterprises which are sensible on cultural diversity.
A friend of mine lives in England and works for a famous copy machine company as researcher of cultural differences taught me that in Greece we should not use an icon with open hand sign to indicate STOP.
The icon with open hand sign typically represent STOP in Western countries, but it means antagonistic in Greece. Therefore, his company uses different icon for Greece model. I believe such sensible company would be successful because what they have done contributes to customer satisfaction in the global market and such a satisfied customer would become repeater of and evangelist for the company.

Another possible reason that the demonstration at MLIT symposium in Vietnam was poor is that the Vietnamese support was not conducted with the internationalization Engineers in the U.S.A head quarter who had involved with the team developed the original Western version of operating system. It is often seen that the local sales office patches out the original to make their own language only version by using ad hoc approach.

Such ad hoc localization approach can be found everywhere. For example, I have seen the incomparability problem among different language versions of same word processor.
Even the file contents are Latin alphabets, such ad hoc localization breaks the interoperability with the original.
It is often said by such global companies’ local sale office that time to market is highest priority to boost the local market, therefore, such ad hoc localization is justified. I, however, do not think it is a good idea because such ad hoc localization causes compatibility problems among its product line, thus, it ends up diminishing the economy of the local market by isolating them from global market.

I would like to quote a letter from Myanmar representative to CICC.

The Dream Comes To A Reality
Introduction
This is neither a story of success nor a story of a model. Only a story that try to share our experience and enhance the point how important of cooperation in our region. In particular, I would like to share our experience of encoding Myanmar characters under ISO scheme.

Background History
Starting from 1990, the use of personal computers in Myanmar has been increasing very rapidly. Naturally Myanmar Language Processing is much needed for widespread use of PCs in the country.

There are many dialects of local implementations available starting from 1992. However, almost all of them are only the implementation of font sets and could not provide information processing. Moreover, we didn’t realize that the character codes should be standardized nationally and internationally.

In 1996, I was invited to participate in the symposium held in Tsukuba, Japan. Although I could present Myanmar language structure, I had to admit that almost all the things related to the standardization discussed at the symposium were outside of my scope. The terminology such as glyph, encoding character set, ISO10646, and UNICODE were so strange to me. My first impression at that symposium was that I was wrongly invited by ETL.
But I did recognize that those kind of stuffs are needed in our country and I had the responsibility to distribute what I had got in the symposium to my colleagues.

Then I have attended several seminars in several countries over the past few years. I have to express my many thank to CICC and related Japanese organizations for their kind arrangement to attend these seminars and their patience to my slow progress. I always convey the information to my friends and suggest higher authorities what we should do whenever I come back from these seminar. Over the years, to encode Myanmar character internationally by Myanmar people became my dream. Of course, I had to admit that at that time, it was very difficult to convert the dream into a reality. Because we then were lack of technology, lack of information, lack of support, and lack of budget. In 1997, I attended MLIT meeting in Japan and got an information from Mr. Sato that Myanmar character code was going to be approved at ISO without the knowledge and involvement of Myanmar people. Mr. Sato encouraged us to participate the ISO meeting and offered his help to make it. Whetry, I reported this issue to the government and suggested to form a committee to perform standardization task immediately.Our government realized the importance of this issue and formed the Myanmar IT Standardization Committee (MITSC). MITSC consists of several IT people and Myanmar Language experts. We worked closely with Myanmar Language Commission, the only organization which is responsible for setting the national standard of Myanmar Language. We tried our best to develop a revised proposal and sent it to ISO through Mr. Sato’s kind help. Moreover, with the kind contribution of Mr. Sato, Mr. Mikami and other friends, we could make it to attend ISO/IEC/JTC1/SC2/WG2 meeting held in London in September, 1998.
Our government realized the importance of this issue and formed the Myanmar IT Standardization Committee (MITSC). MITSC consists of several IT people and Myanmar Language experts. We worked closely with Myanmar Language Commission, the only organization which is responsible for setting the national standard of Myanmar Language. We tried our best to develop a revised proposal and sent it to ISO through Mr. Sato’s kind help. Moreover, with the kind contribution of Mr. Sato, Mr. Mikami and other friends, we could make it to attend ISO/IEC/JTC1/SC2/WG2 meeting held in London in September, 1998.

Ad-hoe meetings at London In fact, the encoding of Myanmar character at ISO had already reached PDAM stage at that time. It was almost the last chance to express our concern and amend the proposal that in line with our culture and tradition. Again we got a lot of supports and more importantly tactics to deal with ISO people from Mr. Sato. We had met with several personnel concerned and conducted several ad-hoc meetings during that meeting from 21 September 1998 through 24 September 1998.

Although we had faced some misunderstandings and disagreements during the discussions, all the meetings were very productive and fruitful in guerrilla. We had to admit that we had learnt a lot of things from ISO and UNICODE experts during the meetings. We all really appreciate their patience and understanding to us. Finally they accepted some of our technical and culture related comments and put into the Final Proposed Draft Amendment (FPDAM). We came out a consensus among the participants and a short outline of the remaining points in contentions. We agreed to implement a Myanmar processing system based on the FPDAM and will make comments of our finding to UNICODE and ISO.

Current Status We have implemented and tested a Myanmar Language Processing system based on the agreed encoded scheme. We found it works as we expected. We don’t think any major amendments will be necessary to the FPDAM. We are going to participate and discuss our implementation outcome at the next ISO/IEC/JTC1/SC2/WG2 meeting to be held in Fukuoka, Japan in March 1999. There may have some dissuasions and proposals for further amendments at there. Hopefully we could resolve the contentions and go up one step further to become a international standard.

Our Dream Comes To A Reality
Finally our dream comes to a reality. This is not something about a very successful story.
But I believe most of us recognize the fact that this one is definitively the outcome of collaborative efforts made by Myanmar people and Japanese friends. Of course, our government’s encouragement and support played major role for it. We are pleased to see the Japanese initiatives and continual supports in this IT standardization area for Asia Pacific region.

Last, but not the least, I would like to express my sincere thanks to Mr. Mikami, Mr. Sato. Mr. Komurasaki, and all the people from CICC who contributed in converting our dream into a reality. Myanmar people always appreciate and never forget their kind contributions to the IT development of our country.
Thaung Tin (Myanmar) Managing Director, KMD Co. Ltd.
Executive Council Member, Myanmar IT Standardization Committee Executive Council Member, Myanmar Computer Federation

After this letter had been sent, Draft Amendment for Myanmar script was adopted as an official international standard in ISO/IEC JTC1/SC2 meeting held at Fukuoka, Japan.
Delegations of Myanmar attended to the meeting as observer, and they expressed their belief on the standard would be fully satisfactory for the people in Myanmar.

After all, Myanmar government established a national committee for standardization for information technology.

Note that in most of the cases the standardization of such scripts is not done as successful as this example in terms of correctly reflecting the benefit of native users due to the politics among nations and/or member companies.

Beginning of this year, I had a chance to have a interesting discussion with my friend who lives in Hong Kong about “Digital Divide”.
These days, we often hear the new words “Computer Literacy”, “IT Literacy” and “Digital Divide”.

The digital divide means that such literacy can be a cause of yet another discrimination for those whom do not have such literacy.

She questioned that the IT is supposed to help people to improve their literacy by providing easy-to-use technology just as a tool rather than requiring IT literacy to use technology.

Literacy is not for IT but IT is for Literacy.

I, as one of member of international standard working group for coded character set, hope to see we all benefit from information technology as a way of conserving language diversity rather than extinguishing language diversity, and am grateful if I can contribute to the prosperity of human and the peaceful world.

カテゴリー: デジタルと文化の狭間で, 文字コードの宇宙, 旧稿再掲 パーマリンク

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です