next up previous contents index
Next: Considerations of Japanese E-mail Up: Mail Previous: Mail


How Japanese E-mail is Different

There are two main sections of the mail message that the mail software needs to handle properly. The first is the message body. The second is the subject line of the mail message.

Message Body

Until the early 1990s Internet mail was assumed to be a sequence of single byte characters with only 7 bits of the 8 bit byte used. Most of the e-mail was ASCII which fit this assumption. Japanese e-mail does not fit this assumption. The most commonly used encoding methods for personal computers are ShiftJIS and EUC. Both of these encoding methods use the 8th bit. This is not a problem most of the time. However, it may cause problems on some mail transport software. To avoid this problem, the standard for mail transport is JIS encoding. Each JIS character consists of two 7 bit bytes. The use of JIS or iso-2022-jp eliminates the problem of mail transport software incorrectly interpretting the 8th bit as a control character. However, it does add a layer of complexity to software used to read and write mail.

The mail software must read in the JIS encoding and convert it to the local encoding, usually EUC on a Linux system. One of the things the mail software should check for is the Content-Type: definition of the incoming mail header. If it is defined as iso-2022-jp, the mail software should transparently do the appropriate conversions to allow the text to be displayed properly.

   Content-Type: Text/Plain; charset=iso-2022-jp

The mail software should also add the Content-Type: tag to outgoing messages and identity Japanese mail as iso-2022-jp. Before sending, the mail software needs to convert the local encoding to iso-2022-jp.

Unfortunately, only a few software packages such as mew transparently handle the Content-Type: definition properly. Most of the e-mail sent of the Internet does not have a Content-Type: field. Also, mail sent as EUC or ShiftJIS without conversion may be received and sent properly by some mailers. This is due more to luck than to the robust nature of the mail software. Since the transmission of Japanese Internet mail as iso-2022-jp is such a widely accepted standard, it is better to conform to the standard instead of relying on luck.

Japanese Subject Line

The subject line is another area that is confusing. Subject line text sent as plain ShiftJIS, JIS, or EUC may be read by some mail software. However, this is not a reliable method of reading and sending the subject lines.

The standardized and recommended method to send subject lines is to convert the subject line into Base64, the same encoding method used for MIME messages. In addition to encoding the subject line as Base64, the character encoding method should be defined and a series of ? delimiters are used to seperate the parts of a subject line.

A raw mail header looks like this:

  Subject: =?iso-2022-jp?B?GyRCJDMkbCRPO2QkTiVGJTklSCRHJDkhIxsoQg==?=

The mail software scans the header and picks out the Subject: keyword. Next, the =? characters signal the beginning of the message. The iso-2022-jp identifies the encoding type of the text of the subject line. This is usually the same as the body of the message. However, it does not have to be. The next ? character is a delimiter. The B identifies the encoding method of the iso-2022-jp encoded message as Base64. The subject is encoded twice. First, the Japanese characters are encoded into iso-2022-jp. Then the iso-2022-jp message is encoded with Base64. After the next ? the text of the subject continues until the ending ?=. Long subjects are split into smaller sections and each section contains the start and stop delimiters, character encoding specification and transport encoding specification.

next up previous contents index
Next: Considerations of Japanese E-mail Up: Mail Previous: Mail
Craig Toshio Oda