Internet
Messaging:
From the Desktop to the Enterprise
(Originally published in Messaging Magazine,
September/October 1998)
Following is an excerpt of Chapter 5 from Internet
Messaging: From the Desktop to the Enterprise
by Marshall Rose and David Strom
Standards
There are two enterprise-specific standards for receiving mail: the Post Office Protocol (POP) and the Internet Message Access Protocol (IMAP). For historical (some say hysterical) reasons, there is a kind of imagined rivalry going on between the supporters of each protocol. In addition to these two standards, there are others that relate to Virtual Private Networks (VPN), which we will also examine.
The Post Office Protocol
The original POP was developed and deployed back in 1988. Since that time, there have been some minor embellishments to the protocol, but nothing that would prevent a client implementing the latest version of POP from interacting with a vintage server (or vice-versa). This is both a strength and a weakness. While its stability has made it attractive to implement POP in a wide range of products, it has not evolved to meet new needs.
The design goals of POP are two-fold:
Hence, in addition to stability, an important design goal is that of scalability. A modest server can handle hundreds of simultaneous clientsdelays are due to network latency, not server bottleneck. Furthermore, because it is often easier to install, configure and maintain a single POP server than to do the same for full messaging servers on multiple workstations, this design trade-off can work for everyone from enterprise managers to ISPs.
It is important to appreciate that neither POP nor IMAP are replacements for, or subsets of, Simple Mail Transfer Protocol (SMTP). As Chapter 3 discusses, SMTP is responsible for sending messages, POP and IMAP are responsible for retrieving them. It is also important to understand that message envelope information is not available from either POP or IMAP. That is, when an SMTP server makes final delivery to a mailbox, the SMTP envelope is not placed in the mailboxÑonly the message being delivered is.
Protocol Interactions
There are four parts to a POP session:
The actual commands used in POP aren't particularly interesting. However, the minimalist functionality they provide is. POP is also somewhat unique in that its command syntax is even simpler than SMTPrather than using the three digit reply codes from Chapter 2, if a POP server likes what it sees it says "+OK," otherwise it says "-ERR." Any text that follows (on the same line) is meant for human consumption. So, by simply looking at the first character of a response, the client can decide whether things are working or not. This minimalist syntax works because the protocol operations are very simple.
The exchange of greetings is used primarily to authenticate the user to the server. Unlike SMTP, in which no authentication occurs when relaying messages, access to a particular mailbox requires some kind of authentication. At present, there are four different authentication schemes of varying cryptographic complexity. (Indeed, the addition of these schemes is really the only change made to POP over the past decade.) The original scheme, PASS, uses plain text passwords (and, obviously, has no cryptographic complexity). Other authentication schemes include the exchange of cryptographic hashes based on shared secrets (APOP and AUTH), and Kerberos.
Mailbox examination allows a client to determine how many messages are in the mailbox and the size of each message. This provides a basic functionality for clients that wish to filter based on size of message (e.g., "don't download anything over 50 kilobytes"). Messages are identified by positive integers (e.g., 1, 2, and so on), and sizes are expressed in octets, although they may be approximations to reduce the complexity of the server.
Message retrieval and deletion is straightforward. The client tells the server to send back an entire message. Typically, once a message is downloaded, the server is told to mark the message for deletion. Some servers may also support a "top N lines" command, in which the headers of the message and the first N lines of the body are returned. This is provided to clients that wish to filter based on values in the headers.
Session release tells the POP server to remove all messages marked for deletion (typically every message retrieved).
Limitations
Although POP has enjoyed much success over the past decade, its shortcomings are also evident. These typically fall into two areas:
Let's look at each.
Although POP is highly optimized to allow for efficient implementations on host systems, its use of the network is suboptimal with respect to latency. The typical POP session consists of at least 4+2N round-trip interactions, where N is the number of messages retrieved (three round-trips to exchange greetings, one round-trip interaction to release the session, and two round-trips for each message). Although the data carried in each round-trip may be very small, latency can be when accessing the Internet via a dialup connection or across several networks. If the POP client decides to filter messages (because they are large), the number of round-trip interactions may climb to as high as three per message (although this may be offset by not downloading large messages). In order to remedy this situation, the server would need to somehow apply the user's filters to the mailbox and then blast the selected messages down the network toward the client.
The issue of decentralized operations is considerably trickier, as its roots come from a different style of user interaction. The POP model is that a single mailbox is split between the server and the client. The server's job is to put messages into the mailbox for later retrieval by the client. The client's job is to retrieve the messages and then let the user process them. Implicit in this shared contract is the notion that the client deletes messages on the server after they are retrieved. For users who utilize a single machine to process their mail, the POP model is usually adequate.
However, some users may decide to utilize multiple machines to process their mail. For example, they might use a desktop system at the office, a laptop system on the road and another system at home-all to process messages addressed to the same recipient. For those users, the POP model of centralized operations (all processing done on a single client system) is inadequate. Throughout the history of POP, there have been some attempts to support this additional model, but they have proven largely unsatisfactory. For example, some clients are able to talk to multiple POP servers and manage multiple mailboxes. This really doesn't provide for decentralized operations, but it does allow users with multiple mailboxes a centralized mechanism for managing their mail. To deal with the decentralized model, we need to look at a different protocol.
The Interactive Message Access Protocol
The original IMAP (IMAP2) was developed and deployed back in 1988. Since that time, the protocol has undergone substantive changes. As of this writing, the current version is termed IMAP4rev1. Although the continued development and refinement of IMAP can be viewed as a strength, its lack of stability has certainly hindered its deployment in operational systems. (The fact that there are no fewer than 10 documents in the IMAP standards suite doesn't help either.)
The design goal of IMAP is to deal with the decentralized model. To do so, three new building blocks are present:
By allowing a client to manage multiple mailboxes, the client can use the server as a bona fide replacement for its own message store. This means that users can decide how to organize messages and then have the client tell IMAP how to carry out the necessary operations (e.g., moving messages from one mailbox to another) on the server. This is a powerful feature provided by nearly all desktop e-mail software, so by providing it as a protocol service, IMAP goes a long way toward usability as an access protocol. The difficult part, unfortunately, is that both the IMAP client and server have to agree on naming conventions-the IMAP standards provides little guidance in this area. As a consequence, interoperability suffers.
The second building block needed by the user's e-mail software is the ability to associate properties or attributes with each message. Although a "deleted" attribute first comes to mind, there is actually a much more important one: a unique, persistent identifier for the message at the server. If messages can be uniquely identified, then it is possible for multiple clients to synchronize their behavior if they become disconnected from the server. In addition, entire messages can be uploaded back to the server.
For example, an IMAP client might connect to a server, download some messages, possibly delete them, and then release the IMAP session. Later, when the client reconnects to the server, it can determine which of those messages are still on the servereven if some other program accessed the mailbox and "did things" in the interim. It may decide to upload the message back to a different mailbox, and so on.
The final concept allows for more efficient use of network resources. IMAP provides a mechanism that allows a client to send search criteria to the server to select messages within a mailbox. The client can also examine the structure of a message in order to select different parts of the message for download (e.g., "send me the 1-kilobyte text part, but skip the 50-megabyte video clip").
Protocol Interactions
There are four parts to an IMAP session:
However, unlike the typical POP session, the client may choose to iterate between steps 2 and 3, managing the messages in different mailboxes.
As might be expected, IMAP has a considerably richer command set than POP. What is unexpected, however, is that IMAP's syntax is considerably more complicated than either POP or SMTP (or just about any other Internet application protocol). There are two reasons for this; to allow the client to have multiple IMAP commands in flight simultaneously, and to allow the server to send unsolicited information to the client (e.g., new messages have arrived).
With respect to the first reason, both SMTP and POP operate in lock step: The client sends a command and awaits a response before issuing another command. IMAP allows the client to send multiple commands, and hence requires a mechanism for correlating responses with the original requests. While an interesting concept in theory, the practice is another story. First, if the server isn't multithreaded, it will process the requests serially. Second, clients have to be particularly clever in order to use such a feature. As a result, the overall protocol is considerably more complex, with dubious benefit in the field. With respect to the second reason, this is a useful feature of IMAP. It allows the server to tell the client when new messages have arrived without the client having to continuously poll for this information.
However, setting aside this feature, the IMAP command syntax is simply baroque (read: "broke"). It mixes syntactic features from LISP (parenthesized lists), pre-MIME encodings (character counts), and so on. As a consequence, IMAP parsers are complex.
The exchange of greetings serves the same function as with POPit allows the client to authenticate itself to the server. Like POP, different authentication mechanisms are possible. (In fact, the only time POP gets changed these days is when someone invents a new authentication scheme for IMAP and then decides to fit it onto POP!)
Mailbox selection allows the client to access whatever mailboxes are provisioned for the user. There is one hard-coded mailbox, termed INBOX, which is where incoming mail is received. Beyond that, the client is free to issue commands to create new mailboxes or rename or delete existing ones. Mailboxes can be nested, which requires the client to ask the server what separation characters it uses (a.k.a. the server's "hierarchy separator," such as the backslash character for the Windows directory structure). As you might expect, IMAP provides a mechanism for the client to determine its list of available mailboxes. And, consistent with the IMAP philosophy of letting the server filter information before sending it to the client, the list can be a subset based on user-supplied search criteria.
Message management revolves primary around searching for messages that meet a certain criteria, fetching them across the network and then updating the message's properties back on the server (e.g., to set the \Deleted flag). Both searching and fetching allow for considerable flexibility. Rather than treating each message as a single unit, IMAP provides access to the headers, body parts and several IMAP-specific attributes, such as:
This final attribute shouldn't be confused with the SMTP envelope described in Chapter 3. Rather, in IMAP, an "envelope" is a collection of certain headers from the message structured as a parenthesized, nested list. Presumably the purpose of this encoding is to minimize both the cost of sending the headers over the network. In practice, it isn't particularly clear if this results in a latency reduction. It is clear, however, that the client has to implement yet another parser in addition to the header parsing.
Finally, session release closes the selected mailbox (if any) and terminates the session.
Virtual Private Networking
E-mail transactions can be made more secure by exchanging mail protocols such as POP and IMAP over a VPN. Depending on your perspective, there are either no stable standards for VPNs, or many different and sometimes contradictory standards. Part of the problem is that there isn't any generally accepted definition of the term virtual private network. As VPNs have become popular, vendors have twisted and stretched the acronym to fit product offerings.
Certainly the goal of all VPN products is to enable deployment of logical networks, independent of physical topology. That's the virtual partallowing a geographically distributed group of hosts to interact and be managed as a single network, extending the end-user dynamics of a LAN or workgroup without regarding the true location of hosts within the VPN.
And private? Within the VPN market, this is generally interpreted as a provision of security characteristics such as privacy, integrity and trust among hosts participating in a virtual network. VPNs provide security by establishing a virtual connection called a "tunnel" over all or part of the network. For example, a roaming laptop may use a tunnel to obtain secure access to an enterprise mail server over the public Internet. Access controls ensure that tunnels are set up only between authorized endpoints, and data sent through the tunnel can be encrypted to protect it from prying eyes.
But every VPN product takes a different approach and supports a unique mixture of proprietary and (draft) standard protocols and algorithms. This makes interoperability between products difficult, if not impossible, until stable standards emerge. This means that any corporation contemplating VPN solutions may want to purchase products from a single vendor or search carefully for complementary products that operate at different layers.
Layer 2 tunnels (named after the layer in the protocol stack with which they are associated) provide point-to-point data link connections between tunnel endpoints. Several competing approaches have been defined to support layer 2 tunnels, including Cisco's Layer 2 Forwarding (L2F) protocol and Microsoft's Point-to-Point Tunnel Protocol (PPTP). PPTP is included in Microsoft's Windows NT and Windows95 operating systems, and it is catching on with vendors that sell access servers to ISPs. But PPTP is far from being a standard, and, like anything else from Microsoft, is subject to change whenever the company releases a new operating system version.12 Consequently, vendors in this market are working together to produce a common, interoperable Layer 2 Tunneling Protocol (L2TP) standard.
Layer 3 tunnels provide IP-based virtual connectionsin this approach, normal IP packets are routed between tunnel endpoints that lie anywhere, separated by any intervening network. Encapsulated within tunneled IP packets are headers that provide packet-level authentication and/or data integrity and confidentiality, all covered by a variety of draft international standards that are too technical for us to be concerned with here. These standards, which go under the umbrella name of IPsec (secure) are still under construction. This means that IPsec vendors' products may implement different versions of these standards and therefore may not work together.
Finally, there are layer 5 tunnels, supported by circuit-level proxies that sit between client and server applications. In this approach, a proxy server permits secure UDP or TCP-based access to network resources. This means that traffic from clients outside the VPN must go through the proxy, which blocks unauthorized access to servers inside the VPN on a per-application and per-user basis. Many products in this market use the SOCKS v5 protocol for authenticated firewall traversal.
Of course, as you might expect, combining these approaches is rather popularand to satisfy some security policies, absolutely necessary. For example, the TIS Gauntlet firewall supports IPsec tunnel protocols as well as application-level filters and content scanning. And VPN products can be layered on top of each other, complementing one vendor's authentication product with another vendor's encryption product, for example. We'll have more to say on specific VPN implementations and limitations in our next section.