MIME: What and Where is it?
by Julian Onions, NEXOR
Electronic mail on the Internet has been around nearly since its development. The base standard for Internet electronic mail is defined in the Request for Comments (RFC) 822, published in mid-1982. RFC-822 defines a simple ASCII text-format for the message header and stipulates that the remainder of the body is ASCII 7-bit text. However, the technology has changed over the lifetime of the standard, and users demand more capabilites from their messaging systems.
This format, although suited to the vast majority of messages sent across the Internet, has a number of drawbacks:
- There is no support for transferring anything that is or may contain binary characters.
- There is no support for character sets other than U.S. ASCII. This restricts users who's native language uses other characters.
These two restrictions were the primary driving force behind an enhancement of the RFC-822 standard known as Multi-purpose Internet Mail Extensions (MIME).
Multi-purpose Internet Mail Extensions
MIME extends and enhances the basic RFC-822 message in a manner that is backwards compatible. MIME messages can be viewed with non-MIME capable user agents. While the text may not always be meaningful, it will always be text.
The major impact of MIME is on the content of the message. The basic function of MIME imposes a structure on the normally unstructured body of a RFC-822 message. This format is indicated by the addition of new fields in the message header.
MIME Body Types
There are four basic categories of content supported by MIME: text format, multi-media, application, and structured. Each type is composed of a main type, a subtype and an optional list of parameters. This is encoded as:
<main-type>/<subtype>; parameters
The text format with subtype plain is the default format. This is equivalent to the normal RFC-822 content. It may have a parameter indicating that the plain text is composed in a character set, other than U.S. ASCII. Other text subtypes have been defined to allow revisable formats of text with more attributes, such as rich text.
MIME defines a number of multi-media types. It allows the indication that the content is an audio or video image. Each of these has applicable subtypes, such as image/gif and video/mpeg.
Application types are designed for the use of message-enabled applications and the exchange of application data. This allows proprietary word processor and spread-sheet data to be exchanged.
Finally, there are structured or multi-part types which do not carry data as such, but allow the combination and nesting of the above data types.
Structured MIME types include four subtypes: mixed, parallel, digest and alternative. Of these, mixed is the most common, and allows a sequential collection of attachments to be constructed. It is typically used for messages composed of several body parts (e.g., text and image). Parallel is similar to mixed, but indicates that each of the components should be rendered together (e.g., audio and image). Digest is a simple mechanism which allows the collection of messages. It is commonly used to send a collection of messages instead of a series of messages to a single recipient. Its primary function is backwards compatibility with earlier digest formats. Alternative indicates that several versions of the same data are sent. It allows a plain text body part, a word processor format and a postscript representation of the same information to be sent as one message. The recipient can then choose which version to view dependent on the capabilities of their environment.
In addition, there are other types of MIME for forwarding individual messages, fragmenting messages into several partial messages and referencing external objects. The external reference is used to good effect on the Internet for the distribution of new RFCs. When a RFC is published, a message is circulated. The message normally consists of a text part with a synopsis and title of the new RFC followed by an external reference (the FTP details) of an Internet host holding the new document. Users with MIME-capable user agents are therefore presented with the details of the new RFC and are then asked if they would like to fetch the document. If so, it can automatically be fetched and displayed.
These capabilities cater to most users who wish to send data of any format. If not, there are extension mechanisms and private types that can be used for experimentation.
Encodings
There is still a problem. RFC-822 only allows 7-bit ASCII body parts. Many of the more interesting formats such as video, applications and audio are defined in arbitrary formats. To this end, each MIME body type can have associated with it an encoding format. This allows the type to be reversibly mapped into 7-bit ASCII. There are three basic encoding mechanisms:
- 7-bit indicates that the data is encoded in 7-bit ASCII, and therefore is already suitable for transfer. This is the default type of format if no other is specified.
- Quoted-printable is a means to encode data that is mostly 7-bit ASCII. This leaves most ASCII characters unmapped, but contains an escape mechanism for characters outside the ASCII set to be individually encoded. The result is mostly readable text with a few odd characters remaining mostly untouched. It also allows recipients with non-MIME capable user agents to make sense of most of the message with just a few unreadable characters.
- Base64 is a full encoding, whereby each group of 3 characters is encoded into a group of 4 characters picked from a subset of ASCII that is known to pass through most mail gateways untouched. Base64 encoded messages are unreadable without being decoded.
Two other encodings are possible: 8-bit and binary. These are not legal encodings in normal RFC-822 messages, but may be used when it is known that the path in use supports full 8-bit messages. The difference between 8-bit and binary is a good way to test the MIME expertise level of any individual!
Putting all this together, here is an example of a MIME encoded RFC-822 message.
Future Development
MIME development is proceeding and investigating new formats and structures. In particular formats for delivery reports, notifications and the support of EDI are all current topics. Also under development are extensions to the SMTP protocol that transfers RFC-822 messages, to allow for other facilities to be implemented.
Gateways
The advent of MIME has also greatly increased the ability of gateways to transfer messages between RFC-822 and other formats. In particular, much work has been done on conversions from X.400 contents to MIME. The conversion allows reversible mappings of the different common formats. Where no equivalent format is defined, a mechanism is in place to carry the type in a well defined way. This allows MIME aware gateways to provide a much better service to recipients situated either side of the gateway. The mechanisms for the conversion are documented in RFC-1495 and RFC-1496.
Problems with MIME
While MIME has enjoyed broad success, there have been some problem areas and a few loose ends. For instance, it is not possible to stack encodings, so allowing the transfer of compressed data and base64 encoded data. There are issue around adding security to MIME messages (i.e., do you secure the message or the parts?). These are for the most part small issues. However, from our own experience, one of the biggest problems with MIME, particularly with MTAs and gateways, is not the facilities it gives, but the opportunity it gives for people to make mistakes. The Internet, unlike many other networks, such as telephone, X.25 or X.400 ADMDs, requires no certification to join. Therefore any individual can (and many do) write their own software. This allows great freedom for experimentation, but also causes great problems when running a service. It is very easy to generate illegal messages, and Internet mail systems are often very liberal about what they are willing to receive, often to the point of accepting any string of characters at random!
As an example, the Nexor MTA software routes RFC-822, MIME and X.400 messages, and converts between them. Before the advent of MIME only the occasional broken RFC-822 message was received because it is simple to construct a valid RFC-822 message. There are only three mandatory fields and the syntax of most of them is quite straight-forward. Ignoring the few incredibly strange addresses, most messages adhere to the published standard.
Now, bring onto the scene MIME which imposes a structure on the content of the message and greatly increases the complexity of RFC-822 messages. As mentioned above, this allows great things to happen when gatewaying to X.400, but does require examination of the whole message to construct an equivalent message in the new format. While there are only one or two ways to generate illegal RFC-822 messages (at least in a way that makes rocessing difficult), there are numerous clever and intricate ways to generate illegal MIME messages. Since running our gateway in production, it has been a constant battle to code around the weird and wonderful variants of MIME received which can break a message. These are a few examples of the way messages can be broken:
- A header indicating a multipart content, without a delimiter specified to delimit the parts. This means the gateway can't split apart the content as it has no splitting token
- Multiple, conflicting type definitions for the same part (e.g., text/plain and image/gif). Which definition is correct--how can you tell?
- Unknown transfer encodings (e.g., 9-bit or xyz). Impossible to decode the content, so what can you gateway.
As each new oddity appears, a mechanism to make the best attempt to deal with the error, yet still produce as output a legal MIME message, has to be fashioned.
Summary
MIME allows many new formats of mail to be transferred across the Internet without disturbing the existing user base or transport service. It offers the ability to experiment with new formats and to mail enabled applications. It is particularly useful for mail gateways to preserve more of the format of the original message.