Open Database Access and Interoperability

Antonio Gualtieri, TOG IOP TG Champion
GUIDE International
Kemper National Insurance Companies

White Paper 10/23/96


Table Of Contents

1. Introduction / The need

2. Summary

3. Task Force Members and Supporters

4. Discussion

5. Recommendations / Requirements


Introduction / The need

Organizations increasingly require access to data residing in multiple sources in multiple vendor DBMS throughout the enterprise. However, the ongoing diversity of SQL dialects and the lack of a function rich standardized database protocol for interoperability has made this difficult and costly. This has also inhibited downsizing and the move to Client/Server.

More and more companies are buying applications rather than database management systems. That is, database selection is more and more being driven by application availability and suitability. In many ways this is really good and what we have been striving for. However, it very quickly leads companies to a problem: they want and need to economically and easily integrate those databases in order to effectively use their IT resources. With today's standards and products, it is not as easy to do this as customers would like. Nor is it as economical as it should be when you factor in all the support and problem management costs. Users currently can not achieve the benefits of the open system's promise of vendor independence when it comes to database management systems.

The end users also want to be able to integrate their desktop tools and other applications in a consistent manner with whatever databases (or multiple databases concurrently) are in the enterprise. This also isn't easy due to SQL dialects, the use of gateways, etc.

For these reasons, The Open Group approved the formation of the Open Database Access and Interoperability Task Group in the Interoperability Program Group to evaluate existing business requirements that address these issues and to formulate a business case in support of them. The object of this white paper is to demonstrate the business case and the requirements and the need for Open Database Access and Database Interoperability.

Summary

The Task Group reviewed the issues and problems with open database access and interoperability. We then grouped those items into categories as follows:

6. APIs

7. Interoperability between applications and database management systems and between the database management systems of different vendors

8. Directory of Distributed Databases

9. Committing data simultaneously to more than one database - 2-Phase Commit

10. Transparent Replication

11. Consistent Distributed Security

12. Management / Ability to manage the environment

13. Manageable Cost Effective Environment

14. Problems with Internet access to databases (Java Bindings)

15. Problems with Transaction Manager interrelationships

16. Database transaction via ORBs

In order to help us analyze these issues and understand what interface areas need to be managed we developed a graphic that shows the various interface points and the services that are involved.

Figure 1

A larger version of the graphic is shown in the Section V. Recommendations / Requirements along with a description. The graphic does NOT show flow or connections but rather relationships between applications, Client, Server and various services in order to make it easier for us to position our discussion of each issue.

We determined that the relationships fell into two categories: Infrastructure / Connectivity and Service Components. Infrastructure / Connectivity items include SQL API, Database Protocol and Network Protocol. The Services include such things as Directory, Security, Transaction Management, Replication and Management.

As a result, we determined that the requirements for Open Database Access and Interoperability are:

1. Work on the consolidation and branding of the merged ODBC and XCLI should continue.

2. A standardized database protocol for multi-vendor interoperability should be specified.

3. Service specifications and interfaces that improve interoperability and facilitate vendor independence are required.

4. We should have an X/Open DBMS brand that is based on testing.

5. We need a standardized method of defining data replication requirements across heterogeneous servers.

6. The Security and System Management groups need to handle the security and system management issues identified above.

Please read Section V. Recommendations / Requirements to understand these requirements and sub-points included in them. Any focus on a subset of the problem only (e.g. SQL syntax or database access protocol) doesn't eliminate the need for multi-vendor solutions, will not address other semantic differences and will not solve the open database access and interoperability problem. However, unless we solve the Infrastructure / Connectivity issues the Service Components issues can't be solved. They should be solved in parallel and soon.

It should go without saying that whatever is done needs to work with existing standards like X.500, XA, DCE, etc.

From the users view point, these problems need to be fixed and fixed soon; until they are, we can not achieve open systems - vendor independence - in the database arena.

Task Force Members and Supporters

The members of the Open Database Access Task Force are:


Anwar Choudury			UK MoD  
James de Raeve           	The Open Group  
Alan Doniger                    POSC (Europe) Ltd 
Mike Elson                      HP 
Mark Farlin                     StarQuest Software, Inc.
Vic Goddard                     The Open Group
Michael Greenberg               NEC
Tony Gualtieri          	GUIDE International  
Reilly Hayes                    Merrill Lynch 
Peter Holdermann                Software AG 
George Hutfilz          	IBM  
Petr Janecek                    The Open Group  
Martin Joyce                    Amdahl  
Anthony Kay                     Amdahl  
Jim Keithley                    GUIDE International  
Dave King                       Barclays Bank  
Roy McKean                      The Open Group  
Mongi Miled                     Ministry of Finance  
John Minter                     Inland Revenue 
Carol Rajunas                   OSF Staff (US East)  
Paul Rampel                     StarQuest Software, Inc.  
Vilhelm Rosenqvist              NCR  
John Spencer                    The Open Group  
Walter Stahlecker               HP  
Rob Thompson                    Boeing  
Dan Wu                          US DOD  
Steve Zalewski                  Oracle Corporation  
Mel Zimowski                    IBM  




Companies and organizations that have reviewed this paper and are in agreement with its recommendations are:



COMPANY                                                 REPRESENTATIVE 
Kemper National Insurance Companies             Antonio Gualtieri  
UK Ministry of Defense                          Anwar Choudury  





(NOTE: after we get a good draft, I will send it out to many customers for comment and approval, this is now a place hold for the list that reply)

Discussion

Many customer environments include a significant amount of relational data that is managed on various hardware platforms by relational database management systems supplied by more than one vendor. To complicate matters even more, some of the needed information is still managed by older database management systems and using legacy data formats. The customers requirement is to access all of this data as painlessly as possible and within the simplest and most cost effective environment as possible.

Unfortunately, with today's environment, users are running into several obstacles that keep them from accomplish this goal:

17. SQL APIs provided by existing products are inconsistent and lack important features

The most commonly used SQL APIs for access to relational data are SQL and ODBC. However, current product implementations of these SQL APIs do not fully address customer interoperability requirements.

Although the ISO SQL92 standard provides a good SQL foundation, it lacks function that customers require. Examples are stored procedures and user-defined types. The emerging SQL3 standard promises to fill these gaps. In the meantime, various vendors are providing these capabilities through proprietary extensions to their SQL APIs. Further, not all vendors support all the features in SQL87 or SQL92 when they announce support for these standard. As a result, it is very difficult for users to write an SQL application program that is not dependent on the SQL syntax or features of a particular vendor implementation. This makes it difficult for users to reuse existing applications when the underlying DBMS changes or access to the DBMSs of additional vendors is required.

To make matters worse, applications that contain SQL must be recompiled for each hardware and operating system environment in which they will be used. ODBC applications, on the other hand, use a standard collection of library functions to execute SQL statements that eliminates the need for environment specific compiling. ODBC, however, currently lacks support for many of the advanced features of the SQL92 and emerging SQL3 standards such as large objects and abstract data types. The Open Group also has XCLI that accomplishes some of the same objective as ODBC. However, XCLI and ODBC are not consistent.

The syntax differences are relatively simple and can be accommodated. The biggest problem is that the semantics are different between vendors, and (worse) that the supported sets of functionality are different, so that no two vendor products all support the same features.

In addition, vendor's implementation of SQL are different and they evolve quickly to meet new functionality requirements. It will be difficult to allow their SQL to evolve and yet ensure interoperability.

In summary, the issues are:

18. Interoperability between applications and database management systems and between the database management systems of different vendors is difficult to achieve.

A number of mechanisms are required in order to achieve interoperability between the database management systems of different vendors. These include:

In a multi-vendor environment, remote data access is never seamless. Unfortunately, the ability to provide transparent data access to heterogeneous databases is exactly what large IS organizations desire most. Transparency means that in the user's view (the external schema of the three-schema architecture), the physical location of the data (the internal schema) is completely hidden; the enterprise database appears homogeneous.

For most organizations, the inability to provide transparency has led to a different view of distributed databases. Rather than provide users access to (or at least transparent views of) the data, organizations are now looking at how to replicate data: how to deliver frequently refreshed and updated copies of the data to the users' local servers or workstations. For users, replication means data at your service, but for IS, it means finding a solution to the difficult problem of reconciling different versions of the data.

In summary, the issues are:

19. Directory of Distributed Databases are not consistent

Directories are one of a corporation's key assets because they provide a common naming/addressing scheme for data, individuals and resources. Most organizations, however, have multiple directories: telephone lists, E-Mail directories, human resource lists, database and LAN directories. Much money is spent on maintaining these lists and keeping then in synchronization, and duplication runs wild. The lack of a global naming service has hampered the free flow of information. Now, with data warehouse and data marts directory services are becoming even more critical. Using multi-vendor DBMS can also lead to conflicting definitions, improper replication, and incorrect answers. Furthermore, good optimization can not be achieved unless the catalogs can be distributed and shared between DBMS from multiple vendors.

In summary, the issues are:

20. Committing data simultaneously to more than one database - 2-Phase Commit

Customer applications often access multiple database servers within the same transaction. Customers require that the results of such a transaction are either all committed or are all rolled back, that the transaction transforms a shared resource from one valid state to another valid state, that any change to a shared resource by one transaction is not visible outside that transaction until the transaction commits, and that the changes that result from the commitment of the transaction survive system or media failures. Within SQL environments, the coordinated commit and recovery services that provide these properties generally employ 2-phase commit protocols.

In some environments, the client that initiates a transaction on behalf of a client application may not have the capability or the resources to support recovery in the event of failures during 2-phase commit processing. For this reason, it should be possible to assign the responsibility for recovery to a participating site other than the initiating client.

The approach used should be consistent with the X/Open Distributed Transaction Processing model, so that XA compliant Transaction Managers can work with multiple XA Resource Managers regardless of vendor.

In summary, the issues are:

21. Transparent Replication

With distributed database processing, an organization's database can be strategically divided among multiple physical databases located on distinct database servers. One method of synchronizing these databases can be through the use of replication. However, not all replication products allow the user to replicate data from a database on one platform under the management of one vendor's DBMS to a database on another platform under the management of another vendor's DBMS. This need is becoming mandatory.

Organizations frequently maintain multiple copies of the same data throughout their environment to improve response time, minimize communication costs, or provide uninterrupted operation. Different data elements typically have different requirements for acceptable propagation delays, transaction semantics, updatability and recovery. Simple periodic snapshots of full databases are no longer an acceptable solution. Managing and synchronizing these redundant data copies to ensure appropriate access to accurate, consistent, timely instances regardless of network and server failures has in the past been left as an exercise for the user. Transition to a distributed, client/server architecture involving open systems from many vendors requires improved methods for managing replicated data throughout the enterprise.

Existing X/Open specifications do not allow system administrators to specify their data replication requirements for automatic management by the system. From experience with common replication scenarios, it appears that there are a small number of classes of processing characteristics typically needed to manage replicated data. A common model of replicated data will help data administrators design systems with a minimum of specialized application coding, using standardized tools and procedures (perhaps automated within the database manager) for the most common cases.

Keep in mind that data formats are the LEAST of the problems in this area. Replication semantics are much more complex than data transport.

In summary, the issues are:

22. Consistent Distributed Security

First, when using multiple DBMS, it is not uncommon to be faced with then need to Log-On to each one you use separately rather than simply logging onto the system once.

Once you are Logged-On, the application of security rules vary by vendor, so that using multiple DBMS's can lead to inconsistent security between databases. A user may have access to a data element in one DB but not in another.

In summary, the issues are:

23. Management / Ability to manage the environment

It was recognized that this item is in a different category than items 1-6 above, being less concerned with Application Interoperability, but of no less importance.

Although almost all the DBMS provide the features need to manage the DBMS environment. one thing that becomes obvious very soon after trying to support DBMS from multiple vendors is that the user interface or methodology for those feature vary considerably. The users need a consistent way to manage their databases. Have a single way to invoke or schedule backups, security rules, get and process alerts, etc. In fact, it would be even better if all the database in the enterprise could be managed from a single point.

In summary, the issues are:

24. Manageable Cost Effective Environment

In the world of proprietary database protocols, each DBMS supports its own set of proprietary protocols for communicating with clients. In this type of environment, each client workstation or PC that desires access to data at a DBMS typically has a driver that supports one of the proprietary protocols for that DBMS. In the worst case, the client workstation or PC has a driver for each DBMS available within the environment. This results in client environments that are complex, difficult to manage, and difficult to use. Every time an additional layer of software or hardware is added, problem determination become harder. This results in server environments that are complex, difficult to manage, costly, and do not perform as well as they could..

If all database servers supported the same services and database protocol, then only a single driver need be installed at a client workstation or PC. Eliminating the myriad of drivers and limiting the number of gateways that are currently required would reduce the complexity of user environments, making it easier for customers to configure and manage client environments. This results in server environments that are less complex, easier to manage, and more affordable.

RDA would be a solution to this issue, however, RDA does not have enough functionality to make it useful. For example, 2-phase commit is not supported, TCP/IP is not supported, the use of OSI standards for the description of data may make it impossible to handle the special data types possible with SQL3, stored procedures are not supported, etc.

Users would prefer to take interoperability for granted rather than be forced to consciously consider interoperability as a key software product purchase criterion. A standardized open database protocol for heterogeneous database interoperability supports this objective by making vendor client and server implementations more interchangeable, allowing users to choose between product offerings based on end-user capabilities rather than on infrastructure.

In summary, the issues are:

25. Internet access to databases and with Java Bindings

If you are going to run SQL over the Internet you will need a lot of patience!!!

(If anyone is aware of any problems in this area related to Open DB Access or Multi-Vendor RDBMS Interoperability, please send me a note. We will leave this category here until the final paper is published as a reminder.)

26. Are there problems with Transaction Manager interrelationships?

(If anyone is aware of any problems in this area related to Open DB Access or Multi-Vendor RDBMS Interoperability, please send me a note. We will leave this category here until the final paper is published as a reminder.)

27. Are there issues with database transaction via ORBs?

A challenge for the industry is that it is not clear that existing database access methodology currently available in the marketplace will be the right vehicle for communication with objects as defined in SQL3, or that a standard in this area will evolve fast enough to accommodate today's rapidly changing database products.

(If anyone is aware of any problems in this area related to Open DB Access or Multi-Vendor RDBMS Interoperability, please send me a note. We will leave this category here until the final paper is published as a reminder.)

Recommendations / Requirements

The impact of the above is that users must spend more time than necessary in order to learn to use a new DBMS and to manage a mutli-vendor DBMS environment. But more importantly, trouble shooting is more dificult and the users become locked into a vendor. As a result users do not realize the benefits of Open Systems. In addition, today's tool and query vendors need to write modules that know how to convert SQL and protocols requests from one vendor to another. This simply increases the cost of products, slows the development of new version and versions of the same product for new platforms.

In order to help us analyze the above issues and understand what interface areas need to be managed we developed the following diagram that shows the various interface points and the services that are involved.

FIGURE 2

The above diagram does NOT show flow or connections but rather relationships between applications, Client, Server and various services in order to make it easier to position the discussion of each issue.

An application program accesses one or more databases using an SQL API; from the applications point of view, it thinks it is accessing the database directly. However, each SQL request is processed by the Client component. The Client parses the SQL request and passes the request to the Server component to do the physical access of the data. In some database systems, the Client and the Server are a logical distinction, for others they can be physical module. The Client communicates with the Server using a database protocol. The Client and Server can be in the same machine or in different machines.

For all this to work, the SQL must be transmitted to the Server using an agreed upon Database Protocol which in turn must be transmitted over the communication wires using an agreed upon Network Protocol. We must have agreement on each of these - SQL API, Database Protocol and Network Protocol to have a functional connection and interoperability. These items are referred to as the Infrastructure / Connectivity interfaces.

The "services" are invoked locally by the Client and Server components as needed during the processing of the SQL.

In a multi-vendor environment, when a DBMS "A" needs to access data that is maintained by another DBMS "B", a Client component at DBMS "A" would interact with a Server component at DBMS "B". In our diagram, the DBMS "A" would in effect take the place of the Application Program.

The following are our requirements for improving the open database access and interoperability problem.

28. Work on the consolidation and branding of the merged ODBC and XCLI should continue

Providing a consistent API for all open databases in and of itself does not solve the interoperability problems, however, from a user perspective, we can't have interoperability if we are required to use different APIs to access data managed by different DBMS. We should also be able to submit a query that spans DBs using one SQL request. That request should perform the same function on each DBMS. This requirement addresses the Application Program to DB interface point in our diagram.
29. A standardized database protocol for multi-vendor interoperability should be specified
We need a way for databases to interoperate with each other and to allow a transaction to span databases managed by DBMS from different vendors without the use of a gateway. RDA is a good start but it does not provide all the needed functionality required by the users especially in a multi-tiered environment. The standard database protocol should support:

This requirement addresses the Client to Server interface point - the Database Protocol - in our diagram.

30. Service specifications and interfaces that improved interoperability and facilitate vendor independence are required

The client and server components rely on the availability of directory, security, transaction management, replication, and management services within their respective environments. In order to achieve complete interoperability in a multi-vendor environment, these services must be consistent and must be universally available across these client and server environments. For example, the same directory and security services should be available at each client and server node within the network. Further, standardized service APIs should be specified that allow for the easy integration of these service components within the environments of the client and server components. That is, if all clients and servers used these standard Service APIs for each of the services along with the Infrastructure/Connectivity interfaces, then interoperability between an application and one or more databases or between a database and one or more databases would be possible.

Client and server support for the service APIs would allow the services to be "componentized", which will enhance vendor independence. For example, this would allow a customer that already had DCE security to integrate their DCE security server within this client and server environment even though the client or server may also support (or come with) other types of authentication.

There are two parts to this requirement:

This requirement fosters complete interoperability through consistent and universally available services and addresses the interaction of the services components with the client and server components in our diagram.

31. We should have an X/Open DBMS brand

Users need some way of knowing that a DBMS follows all The Open Group specifications. The best way to do this is by providing an X/Open brand for DBMS. The brand would test the APIs, protocols, etc. This is even more important now that the NIST has decided to stop it's testing.

32. We need a standardised method of defining their data replication requirements across heterogeneous servers.

Users need a standardised method of defining their data replication requirements across heterogeneous servers (including non-relational databases) through a set of replication services that are invisible to the application users. Application programs themselves should not need to be aware of the existing replication characteristics of the data (replication transparency). Programs should be presented with a uniform data view, whether the data is local or remote, distributed across many systems, fully or partially replicated, or managed by products from different vendors (distribution transparency).

In addition, replication should have the ability to replicate all or part of the database and in either direction. During that replication, we need to be able to change the order or formats of fields in addition to coding or decoding the information in the field.

33.The Security and System Management groups need to handle the security and system management issues identified above

The existing programs like Security and Systems Management should review their scope and make sure that the above items that are specific to their area are covered in their program. These items should not be covered elsewhere.

For example, Security needs to provide requirements that address such things as:

System Management needs to provide requirements that address such things as:

Summary

We determined that the interoperability points that need to be managed fell into two categories: Infrastructure / Connectivity and Service Components. Infrastructure / Connectivity items include SQL API, Database Protocol and Network Protocol. The Services include such things as Directory, Security, Transaction Management, Replication Management, etc.

Any focus on a subset of the problem (e.g. SQL syntax or database access protocol):

However, unless we solve the Infrastructure / Connectivity issues the Service Components issues can't be solved. Since both are needed as soon as possible they should be solved in parallel.

As a final comment it should be noted that other approaches (such as using CORBA ORBs to "encapsulate" or "wrap" existing legacy applications) are likely to also be required. Existing applications wouldn't magically become protocol-savvy. In addition, and maybe it should go without saying, whatever is done in any of these areas needs to work with existing standards like X.500, XA, DCE, etc.



  Antonio Gualtieri         
  VMail: (847) 320-3126                  
  External EMail: IBMMAIL(US2B72BN)      
  Internet ID: US2B72BN@IBMMAIL.COM      
  FAX:  (847) 320-4628 or (4464)