next up previous contents index
Next: EUC and ASCII Compatibility Up: Linux-Nihongo Previous: Applications

   
Japanese Encoding Methods

If you are new to Japanese, get ready for a surprise. There is no single standard for Japanese encoding. The three main types of encoding are:

ShiftJIS is an 8-bit Japanese standard on Macintosh computers and DOS-V machines. ShiftJIS text usually cannot be sent through the Internet as e-mail and like EUC, must be translated into JIS. EUC and JIS are double-byte codes. Linux usually displays EUC encoded characters that must be converted to JIS in some way if you want to enter into the net. EUC and ShiftJIS are the main encoding methods used when saving files to disk. JIS is commonly used in electronic data transmission. USENET and e-mail programs transmit the e-mail as JIS conforming to the ISO-2022-JP standard. When the e-mail or news article is displayed, it is converted to ShiftJIS or EUC automatically by the mail or news reader. The reason for this awkward process is that JIS was easily transmitted over the network infrastructure that supported only 7 bit bytes. A JIS character is composed of two 7 bit bytes and was the most robust way to transport Japanese characters. For those with a technical interest in network transmission of Japanese characters, a sample Java program that sends JIS encoded e-mail is included in the appendix. (see section 15.1.4)

The process of how a program like Emacs would normally handle Japanese encoding is shown below. Keyboard input and screen display are set to EUC. E-mail sent to the Internet is encoded as JIS.

\includegraphics{encoding/images/emacs-mail.ps}



 
next up previous contents index
Next: EUC and ASCII Compatibility Up: Linux-Nihongo Previous: Applications
Craig Toshio Oda
1998-05-07