3. Task Force Members and Supporters
5. Recommendations /
Requirements
Organizations increasingly require
access to data residing in multiple sources in multiple vendor
DBMS throughout the enterprise. However, the ongoing diversity of
SQL dialects and the lack of a function rich standardized
database protocol for interoperability has made this difficult
and costly. This has also inhibited downsizing and the move to
Client/Server.
More and more companies are buying applications rather than database management systems. That is, database selection is more and more being driven by application availability and suitability. In many ways this is really good and what we have been striving for. However, it very quickly leads companies to a problem: they want and need to economically and easily integrate those databases in order to effectively use their IT resources. With today's standards and products, it is not as easy to do this as customers would like. Nor is it as economical as it should be when you factor in all the support and problem management costs. Users currently can not achieve the benefits of the open system's promise of vendor independence when it comes to database management systems.
The end users also want to be able to integrate their desktop tools and other applications in a consistent manner with whatever databases (or multiple databases concurrently) are in the enterprise. This also isn't easy due to SQL dialects, the use of gateways, etc.
For these reasons, The Open Group
approved the formation of the Open Database Access and
Interoperability Task Group in the Interoperability Program Group
to evaluate existing business requirements that address these
issues and to formulate a business case in support of them. The
object of this white paper is to demonstrate the business case
and the requirements and the need for Open Database Access and
Database Interoperability.
The Task Group reviewed the issues and problems with open database access and interoperability. We then grouped those items into categories as follows:
6. APIs
7. Interoperability between applications and database management systems and between the database management systems of different vendors
8. Directory of Distributed Databases
9. Committing data simultaneously to more than one database - 2-Phase Commit
10. Transparent Replication
11. Consistent Distributed Security
12. Management / Ability to manage the environment
13. Manageable Cost Effective Environment
14. Problems with Internet access to databases (Java Bindings)
15. Problems with Transaction Manager interrelationships
16. Database transaction via ORBs
In order to help us analyze these issues and understand what interface areas need to be managed we developed a graphic that shows the various interface points and the services that are involved.
Figure 1
A larger version of the graphic is shown in the Section V. Recommendations / Requirements along with a description. The graphic does NOT show flow or connections but rather relationships between applications, Client, Server and various services in order to make it easier for us to position our discussion of each issue.
We determined that the relationships fell into two categories: Infrastructure / Connectivity and Service Components. Infrastructure / Connectivity items include SQL API, Database Protocol and Network Protocol. The Services include such things as Directory, Security, Transaction Management, Replication and Management.
As a result, we determined that the requirements for Open Database Access and Interoperability are:
1. Work on the consolidation and branding of the merged ODBC and XCLI should continue.
2. A standardized database protocol for multi-vendor interoperability should be specified.
3. Service specifications and interfaces that improve interoperability and facilitate vendor independence are required.
4. We should have an X/Open DBMS brand that is based on testing.
5. We need a standardized method of defining data replication requirements across heterogeneous servers.
6. The Security and System Management groups need to handle the security and system management issues identified above.
Please read Section V. Recommendations / Requirements to understand these requirements and sub-points included in them. Any focus on a subset of the problem only (e.g. SQL syntax or database access protocol) doesn't eliminate the need for multi-vendor solutions, will not address other semantic differences and will not solve the open database access and interoperability problem. However, unless we solve the Infrastructure / Connectivity issues the Service Components issues can't be solved. They should be solved in parallel and soon.
It should go without saying that whatever is done needs to work with existing standards like X.500, XA, DCE, etc.
From the users view point, these
problems need to be fixed and fixed soon; until they are, we can
not achieve open systems - vendor independence - in the database
arena.
The members of the Open Database Access Task Force are:
Anwar Choudury UK MoD James de Raeve The Open Group Alan Doniger POSC (Europe) Ltd Mike Elson HP Mark Farlin StarQuest Software, Inc. Vic Goddard The Open Group Michael Greenberg NEC Tony Gualtieri GUIDE International Reilly Hayes Merrill Lynch Peter Holdermann Software AG George Hutfilz IBM Petr Janecek The Open Group Martin Joyce Amdahl Anthony Kay Amdahl Jim Keithley GUIDE International Dave King Barclays Bank Roy McKean The Open Group Mongi Miled Ministry of Finance John Minter Inland Revenue Carol Rajunas OSF Staff (US East) Paul Rampel StarQuest Software, Inc. Vilhelm Rosenqvist NCR John Spencer The Open Group Walter Stahlecker HP Rob Thompson Boeing Dan Wu US DOD Steve Zalewski Oracle Corporation Mel Zimowski IBM
Companies and organizations that have reviewed this paper and are in agreement with its recommendations are:
COMPANY REPRESENTATIVE Kemper National Insurance Companies Antonio Gualtieri UK Ministry of Defense Anwar Choudury
(NOTE: after we get a good
draft, I will send it out to many customers for comment and
approval, this is now a place hold for the list that reply)
Many customer environments
include a significant amount of relational data that is managed
on various hardware platforms by relational database management
systems supplied by more than one vendor. To complicate matters
even more, some of the needed information is still managed by
older database management systems and using legacy data formats.
The customers requirement is to access all of this data as
painlessly as possible and within the simplest and most cost
effective environment as possible.
Unfortunately, with today's
environment, users are running into several obstacles that keep
them from accomplish this goal:
17. SQL APIs provided by existing products are inconsistent and lack important features
The most commonly used SQL APIs
for access to relational data are SQL and ODBC. However, current
product implementations of these SQL APIs do not fully address
customer interoperability requirements.
Although the ISO SQL92 standard provides a good SQL foundation,
it lacks function that customers require. Examples are stored
procedures and user-defined types. The emerging SQL3 standard
promises to fill these gaps. In the meantime, various vendors are
providing these capabilities through proprietary extensions to
their SQL APIs. Further, not all vendors support all the features
in SQL87 or SQL92 when they announce support for these standard.
As a result, it is very difficult for users to write an SQL
application program that is not dependent on the SQL syntax or
features of a particular vendor implementation. This makes it
difficult for users to reuse existing applications when the
underlying DBMS changes or access to the DBMSs of additional
vendors is required.
To make matters worse, applications that contain SQL must be
recompiled for each hardware and operating system environment in
which they will be used. ODBC applications, on the other hand,
use a standard collection of library functions to execute SQL
statements that eliminates the need for environment specific
compiling. ODBC, however, currently lacks support for many of the
advanced features of the SQL92 and emerging SQL3 standards such
as large objects and abstract data types. The Open Group also has
XCLI that accomplishes some of the same objective as ODBC.
However, XCLI and ODBC are not consistent.
The syntax differences are relatively simple and can be
accommodated. The biggest problem is that the semantics are
different between vendors, and (worse) that the supported sets of
functionality are different, so that no two vendor products all
support the same features.
In addition, vendor's implementation of SQL are different and
they evolve quickly to meet new functionality requirements. It
will be difficult to allow their SQL to evolve and yet ensure
interoperability.
In summary, the issues are:
18. Interoperability between applications and database management systems and between the database management systems of different vendors is difficult to achieve.
A number of mechanisms are
required in order to achieve interoperability between the
database management systems of different vendors. These include:
In a multi-vendor environment, remote data access is never seamless. Unfortunately, the ability to provide transparent data access to heterogeneous databases is exactly what large IS organizations desire most. Transparency means that in the user's view (the external schema of the three-schema architecture), the physical location of the data (the internal schema) is completely hidden; the enterprise database appears homogeneous.
For most organizations, the
inability to provide transparency has led to a different view of
distributed databases. Rather than provide users access to (or at
least transparent views of) the data, organizations are now
looking at how to replicate data: how to deliver frequently
refreshed and updated copies of the data to the users' local
servers or workstations. For users, replication means data at
your service, but for IS, it means finding a solution to the
difficult problem of reconciling different versions of the data.
In summary, the issues are:
19. Directory of Distributed Databases are not consistent
Directories are one of a
corporation's key assets because they provide a common
naming/addressing scheme for data, individuals and resources.
Most organizations, however, have multiple directories: telephone
lists, E-Mail directories, human resource lists, database and LAN
directories. Much money is spent on maintaining these lists and
keeping then in synchronization, and duplication runs wild. The
lack of a global naming service has hampered the free flow of
information. Now, with data warehouse and data marts directory
services are becoming even more critical. Using multi-vendor DBMS
can also lead to conflicting definitions, improper replication,
and incorrect answers. Furthermore, good optimization can not be
achieved unless the catalogs can be distributed and shared
between DBMS from multiple vendors.
In summary, the issues are:
20. Committing data
simultaneously to more than one database - 2-Phase Commit
Customer applications often
access multiple database servers within the same transaction.
Customers require that the results of such a transaction are
either all committed or are all rolled back, that the transaction
transforms a shared resource from one valid state to another
valid state, that any change to a shared resource by one
transaction is not visible outside that transaction until the
transaction commits, and that the changes that result from the
commitment of the transaction survive system or media failures.
Within SQL environments, the coordinated commit and recovery
services that provide these properties generally employ 2-phase
commit protocols.
In some environments, the client that initiates a transaction on
behalf of a client application may not have the capability or the
resources to support recovery in the event of failures during
2-phase commit processing. For this reason, it should be possible
to assign the responsibility for recovery to a participating site
other than the initiating client.
The approach used should be
consistent with the X/Open Distributed Transaction Processing
model, so that XA compliant Transaction Managers can work with
multiple XA Resource Managers regardless of vendor.
In summary, the issues are:
21. Transparent Replication
With distributed
database processing, an organization's database can be
strategically divided among multiple physical databases located
on distinct database servers. One method of synchronizing these
databases can be through the use of replication. However, not all
replication products allow the user to replicate data from a
database on one platform under the management of one vendor's
DBMS to a database on another platform under the management of
another vendor's DBMS. This need is becoming mandatory.
Organizations frequently maintain multiple copies of the same data throughout their environment to improve response time, minimize communication costs, or provide uninterrupted operation. Different data elements typically have different requirements for acceptable propagation delays, transaction semantics, updatability and recovery. Simple periodic snapshots of full databases are no longer an acceptable solution. Managing and synchronizing these redundant data copies to ensure appropriate access to accurate, consistent, timely instances regardless of network and server failures has in the past been left as an exercise for the user. Transition to a distributed, client/server architecture involving open systems from many vendors requires improved methods for managing replicated data throughout the enterprise.
Existing X/Open specifications do not allow system administrators to specify their data replication requirements for automatic management by the system. From experience with common replication scenarios, it appears that there are a small number of classes of processing characteristics typically needed to manage replicated data. A common model of replicated data will help data administrators design systems with a minimum of specialized application coding, using standardized tools and procedures (perhaps automated within the database manager) for the most common cases.
Keep in mind that data formats
are the LEAST of the problems in this area. Replication semantics
are much more complex than data transport.
In summary, the issues are:
22. Consistent Distributed
Security
First, when using multiple DBMS, it is not uncommon to be faced with then need to Log-On to each one you use separately rather than simply logging onto the system once.
Once you are Logged-On, the application of security rules vary by vendor, so that using multiple DBMS's can lead to inconsistent security between databases. A user may have access to a data element in one DB but not in another.
In summary, the issues are:
23. Management / Ability to
manage the environment
It was recognized that this item is in a
different category than items 1-6 above, being less concerned
with Application Interoperability, but of no less importance.
Although almost all the DBMS provide the features need to manage
the DBMS environment. one thing that becomes obvious very soon
after trying to support DBMS from multiple vendors is that the
user interface or methodology for those feature vary
considerably. The users need a consistent way to manage their
databases. Have a single way to invoke or schedule backups,
security rules, get and process alerts, etc. In fact, it would be
even better if all the database in the enterprise could be
managed from a single point.
In summary, the issues are:
24. Manageable Cost
Effective Environment
In the world of proprietary
database protocols, each DBMS supports its own set of proprietary
protocols for communicating with clients. In this type of
environment, each client workstation or PC that desires access to
data at a DBMS typically has a driver that supports one of the
proprietary protocols for that DBMS. In the worst case, the
client workstation or PC has a driver for each DBMS available
within the environment. This results in client environments that
are complex, difficult to manage, and difficult to use. Every
time an additional layer of software or hardware is added,
problem determination become harder. This results in server
environments that are complex, difficult to manage, costly, and
do not perform as well as they could..
If all database servers supported the same services and database
protocol, then only a single driver need be installed at a client
workstation or PC. Eliminating the myriad of drivers and limiting
the number of gateways that are currently required would reduce
the complexity of user environments, making it easier for
customers to configure and manage client environments. This
results in server environments that are less complex, easier to
manage, and more affordable.
RDA would be a solution to this issue, however, RDA does not have
enough functionality to make it useful. For example, 2-phase
commit is not supported, TCP/IP is not supported, the use of OSI
standards for the description of data may make it impossible to
handle the special data types possible with SQL3, stored
procedures are not supported, etc.
Users would prefer to take
interoperability for granted rather than be forced to consciously
consider interoperability as a key software product purchase
criterion. A standardized open database protocol for
heterogeneous database interoperability supports this objective
by making vendor client and server implementations more
interchangeable, allowing users to choose between product
offerings based on end-user capabilities rather than on
infrastructure.
In summary, the issues are:
25. Internet access to
databases and with Java Bindings
If you are going to run SQL over the
Internet you will need a lot of patience!!!
(If anyone is aware of any problems in
this area related to Open DB Access or Multi-Vendor RDBMS
Interoperability, please send me a note. We will leave this
category here until the final paper is published as a reminder.)
26. Are there problems with
Transaction Manager interrelationships?
(If anyone is aware of any
problems in this area related to Open DB Access or Multi-Vendor
RDBMS Interoperability, please send me a note. We will leave this
category here until the final paper is published as a reminder.)
27. Are there issues with
database transaction via ORBs?
A challenge for the industry is that it is not clear that existing database access methodology currently available in the marketplace will be the right vehicle for communication with objects as defined in SQL3, or that a standard in this area will evolve fast enough to accommodate today's rapidly changing database products.
(If anyone is aware of any problems in
this area related to Open DB Access or Multi-Vendor RDBMS
Interoperability, please send me a note. We will leave this
category here until the final paper is published as a reminder.)
The impact of the above is that users
must spend more time than necessary in order to learn to use a
new DBMS and to manage a mutli-vendor DBMS environment. But more
importantly, trouble shooting is more dificult and the users
become locked into a vendor. As a result users do not realize the
benefits of Open Systems. In addition, today's tool and query
vendors need to write modules that know how to convert SQL and
protocols requests from one vendor to another. This simply
increases the cost of products, slows the development of new
version and versions of the same product for new platforms.
In order to help us analyze the above issues and understand what interface areas need to be managed we developed the following diagram that shows the various interface points and the services that are involved.
FIGURE 2
The above diagram does NOT show flow or connections but rather relationships between applications, Client, Server and various services in order to make it easier to position the discussion of each issue.
An application program accesses one or more databases using an SQL API; from the applications point of view, it thinks it is accessing the database directly. However, each SQL request is processed by the Client component. The Client parses the SQL request and passes the request to the Server component to do the physical access of the data. In some database systems, the Client and the Server are a logical distinction, for others they can be physical module. The Client communicates with the Server using a database protocol. The Client and Server can be in the same machine or in different machines.
For all this to work, the SQL must be transmitted to the Server using an agreed upon Database Protocol which in turn must be transmitted over the communication wires using an agreed upon Network Protocol. We must have agreement on each of these - SQL API, Database Protocol and Network Protocol to have a functional connection and interoperability. These items are referred to as the Infrastructure / Connectivity interfaces.
The "services" are invoked locally by the Client and Server components as needed during the processing of the SQL.
In a multi-vendor environment, when a DBMS "A" needs to access data that is maintained by another DBMS "B", a Client component at DBMS "A" would interact with a Server component at DBMS "B". In our diagram, the DBMS "A" would in effect take the place of the Application Program.
The following are our requirements for
improving the open database access and interoperability problem.
28. Work on the
consolidation and branding of the merged ODBC and XCLI should
continue
Providing a consistent API for
all open databases in and of itself does not solve the
interoperability problems, however, from a user perspective, we
can't have interoperability if we are required to use different
APIs to access data managed by different DBMS. We should also be
able to submit a query that spans DBs using one SQL request. That
request should perform the same function on each DBMS. This
requirement addresses the Application Program to DB interface
point in our diagram.
29. A standardized
database protocol for multi-vendor interoperability should be
specified
We need a way for databases to interoperate with each other
and to allow a transaction to span databases managed by DBMS from
different vendors without the use of a gateway. RDA is a good
start but it does not provide all the needed functionality
required by the users especially in a multi-tiered environment.
The standard database protocol should support:
This requirement addresses the Client to Server interface point - the Database Protocol - in our diagram.
30. Service specifications and interfaces that improved interoperability and facilitate vendor independence are required
The client and server components rely on the availability of directory, security, transaction management, replication, and management services within their respective environments. In order to achieve complete interoperability in a multi-vendor environment, these services must be consistent and must be universally available across these client and server environments. For example, the same directory and security services should be available at each client and server node within the network. Further, standardized service APIs should be specified that allow for the easy integration of these service components within the environments of the client and server components. That is, if all clients and servers used these standard Service APIs for each of the services along with the Infrastructure/Connectivity interfaces, then interoperability between an application and one or more databases or between a database and one or more databases would be possible.
Client and server support for the service APIs would allow the services to be "componentized", which will enhance vendor independence. For example, this would allow a customer that already had DCE security to integrate their DCE security server within this client and server environment even though the client or server may also support (or come with) other types of authentication.
There are two parts to this requirement:
This requirement fosters complete
interoperability through consistent and universally available
services and addresses the interaction of the services components
with the client and server components in our diagram.
31. We should have an X/Open
DBMS brand
Users need some way of knowing
that a DBMS follows all The Open Group specifications. The best
way to do this is by providing an X/Open brand for DBMS. The
brand would test the APIs, protocols, etc. This is even more
important now that the NIST has decided to stop it's testing.
32. We need a standardised
method of defining their data replication requirements across
heterogeneous servers.
Users need a standardised method
of defining their data replication requirements across
heterogeneous servers (including non-relational databases)
through a set of replication services that are invisible to the
application users. Application programs themselves should not
need to be aware of the existing replication characteristics of
the data (replication transparency). Programs should be presented
with a uniform data view, whether the data is local or remote,
distributed across many systems, fully or partially replicated,
or managed by products from different vendors (distribution
transparency).
In addition, replication should have the ability to replicate all
or part of the database and in either direction. During that
replication, we need to be able to change the order or formats of
fields in addition to coding or decoding the information in the
field.
33.The Security and System
Management groups need to handle the security and system
management issues identified above
The existing programs like
Security and Systems Management should review their scope and
make sure that the above items that are specific to their area
are covered in their program. These items should not be covered
elsewhere.
For example, Security needs to provide requirements that address such things as:
System Management needs to provide requirements that address such things as:
We determined that the interoperability points that need to be managed fell into two categories: Infrastructure / Connectivity and Service Components. Infrastructure / Connectivity items include SQL API, Database Protocol and Network Protocol. The Services include such things as Directory, Security, Transaction Management, Replication Management, etc.
Any focus on a subset of the problem (e.g. SQL syntax or database access protocol):
However, unless we solve the Infrastructure / Connectivity issues the Service Components issues can't be solved. Since both are needed as soon as possible they should be solved in parallel.
As a final comment it should be noted
that other approaches (such as using CORBA ORBs to
"encapsulate" or "wrap" existing legacy
applications) are likely to also be required. Existing
applications wouldn't magically become protocol-savvy. In
addition, and maybe it should go without saying, whatever is done
in any of these areas needs to work with existing standards like
X.500, XA, DCE, etc.
Antonio Gualtieri VMail: (847) 320-3126 External EMail: IBMMAIL(US2B72BN) Internet ID: US2B72BN@IBMMAIL.COM FAX: (847) 320-4628 or (4464)