Meta
Directory:
The Technology Differences
(Originally published in Messaging Magazine, March/April
1999)
By Ian Goldsmith and Tony Mulqueen, ISOCOR
In February of 1996 the Burton Group defined Meta directory as "the join of all the directories in the enterprise." Since its inception, the joining, or unification, of corporate information has become one of the hottest topics for many of the worlds largest organizations. Before understanding a Meta-directory, we need to look at the backgroundthe heterogeneous and often incompatible ways that we store the data we use to locate people and resources.
Computers have allowed us to store many kinds of structured data in many ways, in repositories we have come to know as databases. The data we store only becomes information when it is queried, and it only becomes knowledge when it is effectively queried in an intelligently selected context.
Broadly, there are two forms of query. We may wish to analyze the datafor example, "How many of our customers have been with us for longer than two years?" Alternatively, we may wish to locate somebody or something: "What is the e-mail address of the German marketing manager?" Databases come in two major flavors to meet these needs. Relational Databases (RDBMS) are optimized for the first requirement, and hierarchical databasesnowadays more often referred to as directoriesare optimized to meet the second. The ability to locate people and expertise quickly is a competitive differentiator, as can be seen by the attention given to topics such as Knowledge Management. Effective directory services are a key component of Knowledge Management.
What RDBMS systems have going for them is the speed with which they can conduct complex analysis and reporting, through techniques such as normalization and indexing. Directories, partly because of the hierarchical nature of the data, are better able to partition data logically across a number of locations. This is useful because directory information can be updated locally, but replicated widely throughout a distributed database. At the same time, directories can respond to the location-type queries for which they are designed, more effectively and rapidly than a relational database.
Today, almost every application that involves communication comes with some kind of directory. Human Resources and Enterprise Resource Planning (ERP) applications such as PeopleSoft and SAP are becoming critical in large organizations. Telephony applications running on PABX systems from vendors such as Siemens and Nortel maintain details which map user names against extensions and voice messaging datapasswords, stored messages and so on.
Messaging applications also have their directories, since no one can be expected to remember large numbers of e-mail addresses. The connectors that link groupware environments such as Notes and Exchange must keep track of at least two types of address (native and Internet) for each user.
On the network, the twin requirements of security and access are managed through directories, which define exactly who has access to what network resources. More directoriesfor example, Domain Name System (DNS) are used to provide name spaces and name resolution services that map easy-to-remember names for people and machines against the numeric data required by routing services. In short, directories are employed in a huge variety of ways by networked computers.
It is beginning to occur to organizations that the number and variety of directory typesand the difficulty of organizing and maintaining the data they storeis becoming a real problem. Moreover, the hodge-podge of directories that characterizes a large organization today, looks to even the most casual observer like the polar opposite of knowledge management. In February of 1996, the Burton Group defined Meta directory as "the join of all the directories in the enterprise."
"Information Islands"
What are the problems caused by these "information islands"? They begin as soon
as a new employee joins. Provisioning the new hire is slow and labor intensive, because
all the facilities are accessed via different directories. This is even more painful if
the new hire is a contractor, and valuable days wear away without location information and
communication facilities put properly in place. If the employee moves location, a fresh
crop of problems arises. In addition, if an employee leavesespecially under
disgruntled circumstancesa crop of security issues arises during the time lag that
is introduced by the directory problem.

An important new feature of the directory picture is the role of the Internet. In the past, directory was very much an internal problemoften focused around the need for a paper directory for employee use. It was accepted that providing a directory was worth the often-considerable cost. More painful than the cost was the reality that in any large organization a substantial percentage of the information would always be out of date. Internally, the main implication of Internet access is that it introduces yet more forms of directory informationInternet e-mail addresses and Web URLsthat need to be maintained and updated.
However, in todays networked world the directory focus is moving outwards. The Internet is also a powerful tool for sharing directory style informationwith the public and/or with trading partners. The cornerstone of any successful Internet-based commercial application is an organizations ability to effectively select and manage the information from each "information island" that is required to communicate with employees, customers and trading partners. Managing the information is key to providing an effective "extranet."
Apart from making effective use of Internet technologies to bring the customer closer, there is a simple need to reduce costs. In a white paper devoted to analyzing the costs associated with multiple directory management and miscommunication, ISOCOR has arrived at a figure for potential savings of 4.5 million dollars annually for organizations with 10,000 employees. While the impact of a well-managed directory infrastructure may vary from one enterprise to another, there is no doubt that return on investment for this technology is easy to establish.

Directories are inseparable from the issue of security. Sensitive location details must be kept secret, so the directory itself must come with security mechanisms. However, the directory is also a security provider: it can be used to store the "certificate" details which people and applications can use to encrypt confidential data and to verify digital signatures.
Digital signatures are particularly important when we move to doing any form of commerce over open networks. Digital signature provides essential commercial services, such as authentication (we can be sure who originated the transaction), non-alteration (we can be sure nobody changed the information connected to the transaction), and non-repudiation (the originator cannot deny that they originated the transaction). With these values in place, the transaction can go ahead with the normal confidence associated with paper documents, and far more swiftly and efficiently, enabling massive savings in areas such as Just In Time manufacturing and reduced inventory.
So far, so good. The problem for organizations is that historically, no one directory has been capable of delivering all the functionality described above. Directories are understandably designed and optimized for the primary environment they serve, whether that is an ERP environment, a network, or a messaging system. Vendors that offer to integrate everything often turn out to be offering a Yet Another Directory (YETA) solution. Having spoken eloquently of the perils of managing too many directories, they ask the customer to install just one more. Faced with the prospect of a YETA directory to manage, plus the logistics of migrating a large body of material from existing sources, the customer is inclined to reject the concept of a Grand Unified Corporate Directory. The fact is that most of the information required to enter the electronic marketplace already exists within most organizations. Using this data rather than starting from the beginning at ground zero can save significant political, administrative and financial costs.
The good news is that many of the standards for integrating directory into a Meta directorywithout installing YETAalready exists. LDAP is widely accepted as the standard means of accessing directory data, making it easy for applications to look up directories in a consistent way, and encourages developers to build directory support into their products. X.500 is a well-established model for directory-to-directory interactions, such as chained referrals and replication. X.500 is likely to continue to serve this purpose until LDAP has fully evolved its server-to-server service definitions. There are standard mechanisms to put and get data from proprietary directories inside products such as SAP, Siemens HiCom, SQL databases and so on.
So far, we have looked at a number of unwelcome alternatives: an incompatible conglomeration of directories is not acceptable in the long run, as is the notion of starting from scratch with YETA. The more workable solution is a Meta directory approach: a central directoryideally, already deployedand the necessary middleware to effect a "join" of directory-style information located in other repositories. The join is the creation of a single directory entry that contains or references attributes from multiple related entries. By providing it, we solve the issue of disparate locations, while sidestepping the need to provide an enterprise directory from scratch.
Whats Needed to Deliver a
Meta Directory?
The requirements of the "directory in the middle" are straightforward. It must
be able to communicate via LDAP, since this is the lingua franca of directory interaction.
It must be capable of dealing with extensible schema. Schemas are the definitions of the
structure and form of information held in a directory or database. Meta directory requires
flexible schemasmany directories come with hard-coded data definitions. Finally, the
"directory in the middle" must be capable of being polled for change, so that
the Meta directory management module can provide an acceptable level of responsiveness.
Polling for change means that only the elements that have been updated need to be
reflected in the Meta directory. Three directory products that meet these requirements are
Netscape Directory Server, Microsoft Active Directory, and ISOCOR Global Directory Server.
The GartnerGroup has identified three different basic technologies that can be used to provide Meta directory solutions, each of which has its merits and disadvantages.

Replication Based (Attribute
Copying)
The most common Meta directory solutions use a replication based model to copy attributes
from each of the proprietary servers to build an entry in the central directory which
contains all the attributes. This method has been implemented many times over the last
several years and has many advantages:
Some of the issues that this type of approach can introduce are:
Virtual Directory
Virtual Directory identifies the concept of eliminating the central repository and
building directory clients which are capable of accessing information from many different
proprietary and standard directory sources.
This technology appeals because it removes the need for YETA in the enterprise. It does however mean that the client will require a great deal of intelligence and will make it very difficult to perform the join of the data into a single view for each logical entity in the organization.
Virtual Directory is facilitated by tools like the Active Directory Services Interface (ADSI) from Microsoft. ADSI does not offer a full virtual directory solution, it does, however, provide a framework that could be used to provide a single client with access to multiple different directory and database systems.
Virtual Directory is unlikely to be successful in the short term, because very few organizations have data with a single unique key that can be used to identify like records available in all their different data sources.
Directory Information Broker
A directory information broker is a directory server which does not keep copies of
attributes locally, but retrieves them from their respective proprietary sources using
indexes or pointers when they are required and returns the values to the requester via
standard directory protocols.
The advantages of this approach are:
This mechanism does have some clear disadvantages:


It is clear from the above descriptions that the ideal Meta directory will combine the best attributes of the replication based, and information broker models. In the short term, the most practical model is a replication-based approach that offers a high-performance enterprise directory solution. There are several other key technology issues to consider. Probably the most important of these is the propagation delay that can be inherent in synchronization solutions. It is essential that a Meta directory that will be deployed in a large-scale enterprise environment supports an incremental update model for change propagation. Some of the early solutions only offer a total update paradigm that limits the Meta directory to daily or even weekly updates. This means that many of the benefits of the Meta directory are lost.
Another important issue is the scalability of the directory server itself. The Meta directory will become a core infrastructure application and must therefore be able to scale to meet the needs of a rapidly growing information system. The directory server vendors are making great strides in performance and scalability and it is essential that the Meta directory is able to take advantage of these advances.
LDAP vs. X.500In the last couple of years LDAP has been the hottest thing in the
directory industry. LDAP is definitely the directory access protocol of choice, however,
it is important to recognize that LDAP does not present a total directory or Meta
directory solution. As the standard evolves it is growing rapidly (to the point where the
LDAP standards and drafts now span more pages than the X.500 specifications), and is
beginning to add elements, such as replication, and chaining, that make it a more complete
directory system. MM