OSF DCE SIG S. Dietzen (Transarc) Request For Comments: 53.0 R. Fleming (DEC) January 1994 REQUIREMENTS FOR TRANSACTION PROCESSING WITH DCE (Report of the DCE SIG Transaction Processing Working Group) 1. INTRODUCTION This paper specifies a set of requirements for the support of distributed transaction processing built on the Distributed Computing Environment (DCE). It reflects the discussions conducted within the Transaction Processing Working Group of the DCE SIG and the mail exchanged by the members of this group. The scope of these requirements is limited to an initial offering, but it also lays out the framework for other work in this area. 2. JUSTIFICATION In order for DCE to be successful in the commercial marketplace it must address the requirements for building distributed transaction processing (DTP) systems. Transaction integrity is important to those commercial applications (like banking, financial, airline, etc.) which cannot afford any loss or inconsistency in data. Even in a centralized application, TP methodologies are necessary to these applications. As applications become more distributed and the number of points of failure increase, transaction integrity becomes even more important. In the absence of DTP techniques, it is left to the application designers and implementors to worry about data integrity. This adds an unacceptable burden in a small system, and makes the construction and maintenance of large ones infeasible. While DCE provides an excellent base of technology, it is by no means sufficient for DTP applications. Large commercial systems in the key industries will not be able to give up the transaction protection they currently have and hence will be unable to implement distributed computing until DTP is available. Through the adoption of core transaction processing support, the OSF enables independent application developers and TP product vendors to build and interconnect transactional clients and servers, resource managers, logs, TP monitors, alternative communication gateways, additional higher-level programming interfaces, management tools, etc. The incorporation within the DCE of the basic DTP extensions proposed herein facilitates interoperability for the broad range of DCE-based DTP technologies that will evolve over time. Dietzen, Fleming Page 1 DCE-RFC 53.0 Transaction Processing with DCE January 1994 This paper describes how to augment existing DCE systems with transaction semantics in a way that is consistent with the relevant standards in this area, in particular those of X/Open (see [TxRPC]). This paper recommends use of the X/Open TxRPC API and protocols as described in Section 3.2. 3. REQUIREMENTS 3.1. Transactional Semantics The primary purpose for extending DCE RPC is to provide transactional semantics, thus creating a transactional RPC (TxRPC). That is, the At-Most-Once semantics, currently supported by DCE RPC, will be augmented so that a unit of work can possess ACID properties when distributed using RPCs. ACID properties are defined as follows: (a) Atomicity -- Transactions are "all or nothing". In this way, both programmers and users need not be concerned that failures could cause data to be left in an inconsistent state. (b) Consistency -- The data updates made by transactions preserve the integrity of data by mapping one consistent state to another. (c) Isolation (or serializability) -- Concurrent transactions behave as if they were executed in series. This means that application developers are freed from considering potential complex interleavings of their concurrent computations, and users can be more confident in the correctness of the resulting software. (d) Durability (or permanence) -- Once done, transactions are not undone. Therefore, users and developers are assured that critical modifications to data will not be lost by subsequent system failures. These properties are preserved across many types of failures, including: communications outages, system crashes and application failures. Under a typical scenario for DCE DTP, a client application begins a transaction, performs multiple accesses to some number of servers using transactional RPCs, and then ends the transaction. The client and participating servers may also access local resource managers. When the client application ends a distributed transaction, the system polls each of the participating servers (via a commit protocol) to determine whether updates associated with that transaction should be made permanent. If any participant is unable to do the required computation, all the work is undone (i.e., aborted, or rolled back). Otherwise, the transaction and the Dietzen, Fleming Page 2 DCE-RFC 53.0 Transaction Processing with DCE January 1994 associated updates are committed. Note that the use of transactional semantics are an option, not a requirement. Programmers can choose to use the existing At-Most-Once semantics as opposed to transactional semantics if they so desire. 3.2. X/Open TxRPC and DCE RPC The X/Open Distributed Transaction Processing Group (XTP) has specified a transactional RPC consisting of: (a) An API that is OSF DCE RPC with extensions to the IDL. (b) Use of the OSI TP protocol. Complete details of the API and protocol may be found in [TxRPC]. Because of the large and growing installed base of DCE customers, we also need a DCE-based protocol for transaction processing. Some customers will accept an OSI-based solution, but existing customers would prefer a DCE-based protocol for transaction processing that would work with all existing DCE implementations. A DCE-based protocol leverages DCE RPC, security, and naming to flow transactions. There are currently examples of such implementations in the market. The OSF DCE requirements for distributed transaction processing are: (a) An API as specified in the X/Open TxRPC specification. (b) Support for a DCE-based protocol for transactional RPC. (c) When market demand and resources exist, support for an OSI TP- based protocol for transactional RPC as specified in the X/Open TxRPC specification. 3.3. Structure and Completeness The extensions to DCE RPC to make it transactional must be complete and modular. It must be possible to use the DCE TxRPC with a an X/Open Distributed Transaction Processing (DTP) compliant Resource Manager (RM) without having to develop any additional software, except, of course the application. Alternatively, it must be possible to integrate the DCE TxRPC with existing Transaction Processing (TP) systems. To achieve the latter the design must be modular and the interfaces to the modules must be rigorously specified. Figure 1 shows a diagram of a system structured to meet these requirements. Shown in Figure 1 are: (a) Application (AP) -- The user provided application which leverages the transaction manager to define transactions and Dietzen, Fleming Page 3 DCE-RFC 53.0 Transaction Processing with DCE January 1994 which uses DCE RPC facilities to ship requests to servers. (b) Resource Manager (RM) -- The manager of resources (e.g., a database system) provided by the local system. (c) RPC Stubs -- The DCE RPC stubs generated from the IDL source code. These stubs are augmented with provisions for shipping transactional context over TxRPC interfaces. Such changes are upwardly compatible with the present DCE RPC. (d) RPC Runtime -- Runtime support for the RPC stubs. (e) Transaction Manager (TM) -- The manager of transactions guaranteeing the ACID properties over distributed applications. The TM provides the transaction context that is piggybacked on transactional RPCs, as well as that which is included within the two-phase commit flows. (f) Communications Manager (CM) -- The manager of communications. The CM provides for the sending of transactional RPCs and is responsible for shipping the two-phase commit protocol among transactional participants. Note: The TM is completely independent of the communications mechanism used, while the CM is independent of the transaction state information it ships. (As an aside, the TM should be able to operate with multiple CMs so that alternative communication technologies (e.g., peer-to-peer) can be easily integrated.) (g) Logging component -- The stable storage abstraction for recording transaction state and outcomes. The logging facility must be optimized to provide state-of-the-art performance on both light and heavily loaded systems. It is not necessary that this logging facility be capable of providing RM logging. However, the design should be open in that it permits "common" logging, wherein another log (e.g., an RM log) is used in place of the DCE log for efficiency. (h) Recovery service -- The manager for system restart and abort processing that replays the log to invoke the appropriate undo/redo logic within the participating resource managers. The recovery manager may be viewed as an internal component of the TM, but a clearly defined TM/recovery interface permits the integration of specialized recovery mechanisms (beyond the provisions of the XA interface, which is the name for the transactional interface between the RM and TM). The structure just described meets the requirement of being complete but also of lending itself to being integrated with an existing TP system. Other structures are possible. Dietzen, Fleming Page 4 DCE-RFC 53.0 Transaction Processing with DCE January 1994 In Figure 1, the components that are provided by applications are marked with a "*" -- all other components are provided by the system. FIGURE 1 +--------------------------------------------------------+ | | | Application (AP*) | | | +----+-------------------+--------------+----------------+ | | | Enhanced | | | | RPC Stubs | +----+-----+ +------+------+ +----------------+ | Resource | XA | Transaction | | Transaction | | Manager +------+ Manager +-------+ Communications | | (RM*) | | (TM) | | Manager (CM) | +----------+ +-------------+ +----------------+ | Recovery | | RPC runtime | +-------------+ +----------------+ | Logging | +-------------+ 3.4. Interface Specifications The interfaces in Figure 1 that must be formally documented are discussed in this section. Specified interfaces allow DCE TxRPC to be integrated into a system as a single unit, and it allows DCE TxRPC to be integrated with an existing system by augmenting DCE with the new components illustrated in Figure 1. The interfaces requiring formal specifications are: (a) TM/RM interface (see [XA]). (b) TM/AP interface (see [TX]). (c) TM/CM interface. (d) RPC Stubs interface to CM and RPC Runtime. (e) Selected interfaces within the TM. If the TM is structured with a separate Recovery Service (RS), the TM/RS and the RS/Logging service interfaces must be specified. If the RS is embedded within the TM and/or logging service, it is necessary to specify the TM/Logging service interface. (f) The distributed transaction service should provide a management interface for querying transaction state and for resolving blocked transactions by making heuristic decisions. The TM should also be extensible in that systems can associate special computations (e.g., application callbacks) with transaction Dietzen, Fleming Page 5 DCE-RFC 53.0 Transaction Processing with DCE January 1994 state transitions (prepare, commit, and abort). 4. DESIRABLE FEATURES The following are enhancements which go beyond what is specified by XTP. Since these enhancements are also under consideration in XTP and since the desire is to have a standards-based solution, OSF is encouraged to work with XTP to address the features described below. 4.1. Nested Transactions In programming multi-threaded applications on top of the DCE, the developer must ensure that concurrent threads do not conflict in their access to shared data. This suggests a synergistic relationship between the transaction construct and threading primitives, since the transaction model defines concurrency control constructs that support the ACID properties. By assigning a transaction (or multiple, sequential transactions) to each individual thread, intra-application resource contention is handled in the same manner as inter-application. The use of the nested transaction model provides protection for multi-threaded applications that are cooperating in the work of a concurrently executing transaction. Consider that a given multi- threaded procedure may itself be an atomic unit of work concurrently sharing data with other applications. What is needed in this situation is the notion of a hierarchy of transactions in which sub- transactions individually contend for resources. Nested transactions provide that hierarchy: a nested transaction only commits relative to its parent; if the parent aborts, any work associated with a child is rolled back as well. Another reason that nested transactions should be supported is to provide failure isolation. Consider that as TP applications become highly distributed, the likelihood is increased that a failure within a particular server or communication will result in a global rollback of computation. To avoid such scenarios, nested transactions are employed to allow particular servers to locally isolate and recover from failures without effecting other transaction participants. Through nested transactions, developers have a uniform means by which to trap exception conditions. This can most clearly be seen for the case in which servers are themselves the clients of other servers: without nested transactions, the programmer of such server applications is prohibited from using the transaction concept, unless he guarantees that any invoking client is non-transactional. But even then, this same restriction would apply to other servers invoked by a transactional server. As the above discussion illustrates, nesting becomes very important as distributed transactions become "deeper" -- i.e., when servers act Dietzen, Fleming Page 6 DCE-RFC 53.0 Transaction Processing with DCE January 1994 as the clients of other servers -- and "broader" -- i.e., as greater numbers of servers participate. Without nesting, alternative ad hoc failure recovery mechanisms must be developed by the designer. Nested transactions offer a common facility for addressing the complexities associated with concurrency and failure recovery in a distributed environment. 4.2. Coordinator Migration There exists a window of vulnerability in the two-phase commit protocol: once a server has prepared but before it is informed of the transaction outcome, it gives up its ability to abort the transaction, and thereby gives up control over its data (i.e., it agrees to keep all relevant data locked pending the transaction outcome). Should the transaction coordinator become unreachable during this window, access to potentially critical data is blocked. This vulnerability can be substantially reduced by migrating the role of transaction coordinator from clients to more reliable servers. In this way, we reduce the likelihood that the coordinating machine will be inadvertently shut down or rebooted at an inappropriate time. Moreover, this allows "ephemeral" clients -- those without any facility for logging transaction results (such as diskless PCs and workstations) -- to begin and end transactions. Additionally, through coordinator migration, transaction participants can suggest or require that a particular server act as the coordinator. And hence, a server managing critical data can demand to serve as the coordinator so that it need never relinquish control of that data. 5. ACKNOWLEDGEMENTS Members of the DCE SIG Transaction Processing Working Group contributed substantially to this document, both at SIG meetings and over the sig-dce-tp@osf.org mailgroup. In fact, the named authors of this document are largely place-holders: although we produced the initial draft and took care of editorial matters, the members of the Working Group were largely responsible for the substance of the final version. REFERENCES [TX] Distributed Transaction Processing: The TX (Transaction Demarcation) Specification, X/Open Preliminary Specification, ISBN 1-872630-65-0, P209, October 1992, X/Open Company Ltd. (Apex Plaza, Forbury Rd., Reading, Berkshire, RG11AX, UK; Internet email: XoSpecs@xopen.co.uk). Dietzen, Fleming Page 7 DCE-RFC 53.0 Transaction Processing with DCE January 1994 [TxRPC] Distributed Transaction Processing: The TxRPC Specification, X/Open Preliminary Specification, ISBN 1- 85912-000-8, July 1993, X/Open Company Ltd. (Apex Plaza, Forbury Rd., Reading, Berkshire, RG11AX, UK; Internet email: XoSpecs@xopen.co.uk). [XA] Distributed Transaction Processing: The XA Specification, X/Open CAE Specification, ISBN 1- 872630-24-3, C193, October 1991, X/Open Company Ltd. (Apex Plaza, Forbury Rd., Reading, Berkshire, RG11AX, UK; Internet email: XoSpecs@xopen.co.uk). AUTHORS' ADDRESSES Scott Dietzen Internet email: dietzen@transarc.com Transarc Corp. Telephone: +1-412-338-4439 707 Grant Street Pittsburgh, PA 15219 USA Robert Fleming Internet email: fleming@olcrow.enet.dec.com Digital Equipment Corp. Telephone: +1-508-952-4267 151 Taylor Street Littleton, MA 01460 USA Dietzen, Fleming Page 8