OSF DCE SIG M. Karuzis (HP) Request For Comments: 31.0 December 1992 SUPPORTING THREADLESS DCE CLIENTS 1. INTRODUCTION A threadless implementation of the DCE library is one which does not introduce additional threads into an RPC based application. That is, the runtime library is constrained to performing all call-related processing within the context of the application thread which initiates a remote call. The following discussion is limited to supporting the RPC DG protocol in a threadless environment. As such, all references to the 'runtime' should be understood to mean the 'DG runtime.' It's likely that running the CN protocol in the absence of threads, if possible, will require changes unrelated to those discussed here. Also, this paper deals only with the prospect of developing single threaded clients; there is no consideration of, or impact on, the server side of a remote call. 2. HOW THE RUNTIME USES THREADS The current implementation of the runtime creates 'helper' threads to perform the following tasks: (a) Waiting for, and reading in, socket data (the listener thread). (b) Detecting timeouts, and performing garbage collection (the timer thread). (c) Maintaining liveness for applications that use context handles (the INDY thread). 2.1. The Listener Thread The listener thread is responsible for reading data from all open sockets, determining to which call the data is being sent, and signalling that call thread that data is available. The need for a separate thread to read socket data is a consequence of the fact that concurrent calls (over the same protocol family) share a single socket. This implementation choice necessitates the de-multiplexing of incoming data packets. Karuzis Page 1 DCE-RFC 31.0 Threadless DCE Clients December 1992 2.2. The Timer Thread The timer thread has two primary responsibilities. First, it periodically checks to make sure that calls are making progress (and will initiate corrective action if they are not). Second, it is responsible for keeping track of internal data structures and freeing up resources that are no longer in use. 2.3. The INDY Thread Applications which use context handles require that servers be informed if a connection to a client is lost (for example, if the client exits prematurely). This mechanism gives servers an opportunity to clean up state being maintained on behalf of a client that is no longer active. The DG protocol implements this facility by periodically sending keep-alive messages from client to server. The sending of these keep-alive messages occurs within a separate thread dedicated to this task. 3. ALTERNATIVES TO USING THREADS 3.1. The Listener Thread One of the performance enhancements under consideration for DCE 1.1 is the introduction of private client sockets. Briefly, a call thread using a private socket is given exclusive access to that socket. One of the features of such a scheme is that the call thread need not be concerned about de-multiplexing data received on the socket. As such, the call thread can read the socket directly, circumventing the need for a listener thread. Assuming that private client sockets are implemented for DCE 1.1, it will be a feature of every client that the listener thread will not be started until the application needs to use a shared socket. Since the use of shared sockets will only occur in applications that make concurrent RPCs (and are therefore already multi-threaded), single threaded clients will never require the use of the listener thread. 3.2. The Timer Thread The timer thread awakens periodically and consults a queue of timer requests to see if any have reached their activation time. All such requests are then serviced by calling a routine which was registered along with the timer request. This process is called 'running the timer chain.' To avoid creating a thread to perform this function, we must find some other means of ensuring that timer requests are acted upon. In addition, it is highly desirable that any new scheme retain much of the existing timer service structure, on which the runtime relies Karuzis Page 2 DCE-RFC 31.0 Threadless DCE Clients December 1992 heavily. As mentioned above, the timer thread really serves two major functions. We'll look at them separately to see what the requirements of each are. 3.2.1. Detecting timeouts Client call threads register a timer handler for each new call that's started. The registered routine is then called periodically to inspect the state of the call, and determine if there are any timeout conditions that need to be handled. Rather than having this periodic checking done by the timer thread, we can arrange for it happen along with the rest of the normal call thread processing. This actually works out quite nicely, since the call thread always know exactly when the next timeout can occur (as opposed to the timer thread which just checks all active calls indiscriminately.) Consider that all timeout conditions are the result of not receiving some expected network packet (a fack, working, or response packet). The call blocks trying to read data from the network, and at some later time the timer thread determines that it's been waiting too long and takes some corrective action. A more elegant way to do this would be for the call to decide how long it is willing to wait for data, and then to block for only that long; if nothing is received, the call thread could handle the situation on its own behalf. Of course, to leverage off the current implementation, the timeout could be handled by simply having the call thread run the timer chain (by calling the timer thread's base routine directly). Since the call thread will have previously registered a timer routine, this will result in that routine being run, and the timeout condition will be handled appropriately. There are at least three reasonable ways to bound the wait for socket data: (a) Use timer interrupts to abort the recvfrom system call. (b) Add an augmented recvfrom() into KRPC. (c) Use select(). Using a timer interrupt is probably not viable since it uses a scarce resource that might be required by the application. Implementing an augmented recvfrom(), which takes a timer parameter, would be ideal for platforms that support KRPC, since it would avoid the overhead of a call to select() followed by a call to recvfrom(). Karuzis Page 3 DCE-RFC 31.0 Threadless DCE Clients December 1992 Using select is the most general solution to the problem, despite its inefficiency. 3.2.2. Garbage collection The second major task of the timer thread is to run garbage collectors. To ensure that this facility still works correctly will require that the runtime periodically call through the timer chain, perhaps once every couple of minutes. To accomplish this, a threadless runtime could keep a clock stamp indicating the last time that the timer chain was run. Of course, there's no guarantee that the runtime will be able to run the timer chain regularly, but this would only occur if the application went for a long period of time without using RPC. In such a case, it's unlikely that the accumulation of stale data structures would present a problem. The best place from which to make the call into the timer chain is in transceive(), immediately after pushing out the last arguments to a call, but before beginning to wait for a response. Doing the timer processing here will keep it out of the executing call's fast-path. 3.2.3. The dual nature of the timer chain It's worth noting that, in a threadless environment, the two uses of the timer chain are slightly incongruous. The detection of timeouts is essentially a synchronous activity, the timer routine gets run at the time that a timeout is detected. On the other hand, garbage collection is essentially an asynchronous activity; it must be done periodically regardless of whatever else the runtime is doing. The result is that whenever a timeout is detected, along with running the call thread's timer routine, the garbage collectors will also get run. Likewise, whenever the runtime does garbage collection, any active call timer routine will (unnecessarily) be run. This situation is deemed acceptable based on the following observations: (a) This structure makes sense for a threaded runtime, where the listener thread does not have enough information to anticipate timeouts, and so timeouts really are asynchronous. (b) Running the garbage collectors when a timeout occurs will not affect performance (we're already dealing with a timeout). (c) Running a call thread's timer routine when we do garbage collection will not affect performance since it will only happen once every few minutes. Karuzis Page 4 DCE-RFC 31.0 Threadless DCE Clients December 1992 3.2.4. Other timer thread issues 3.2.4.1. The runtime clock The runtime keeps its idea of the current time in a global variable. Call threads use this value to time-stamp various data structures and activities to aid in detecting timeout conditions. Since the global time variable is updated once each time the timer thread is run, it exhibits a granularity equal to that frequency. In particular, between updates, all requests for a time-stamp see the same value. This implementation choice trades off clock accuracy for a reduction in system call overhead. However, without a timer thread it will no longer be possible to make this tradeoff. Each request for the current time will need to be satisfied by consulting the actual system time. This will have some impact on performance, although it is anticipated that the impact will be minor. 3.2.4.2. Delayed ACKS One other function performed by the call thread timer routine is the sending of delayed acknowledgements. This is a performance optimization which relies on the fact that each remote call implicitly acknowledges all preceding calls. If a client is making back to back calls, we can allow the calls to acknowledge each other, and avoid sending out a specific acknowledgement for each one. The runtime effects this behavior by delaying the sending of acknowledgements, in the anticipation that a new call will soon be started. After a period of time, detected by the timer thread, if a new call has not been started, a delayed acknowledgement will be sent. There is no way to mimic this behavior in a single threaded environment. As such, we are forced to forego the optimization, and send out an acknowledgement for every completed call. 3.3. The INDY Thread Since the runtime is not guaranteed that it will be called with any regularity in this environment, it is not possible for it to guarantee that it can regularly send keep-alive messages to a server. Under these conditions it does not seem possible to provide support for context handles. However, it may be possible to provide this service by requiring that the runtime be allowed to set up a timer interrupt. This may be an acceptable compromise for applications that want to be single- threaded but require the use of context handles Karuzis Page 5 DCE-RFC 31.0 Threadless DCE Clients December 1992 4. USER VISIBLE EFFECTS OF RUNNING SINGLE THREADED 4.1. Simplified Debugging Environment One significant benefit to developers of single threaded clients will be in the area of debugging. In the absence of thread-aware debuggers, debugging a multi-threaded program can be a frustrating experience. And even with a reasonable debugger, application developers who are not experienced with threading can find themselves in a foreign environment, developing even the simplest application. This situation can be especially frustrating for a developer whose application becomes threaded as a result of calling into a third party library that uses RPC. 4.2. Memory and Disk Requirements Whether or not there will be any difference in memory or disk usage depends on whether we support two DCE libraries. This proposition is discussed below in the section titled 'DCE Lite.' 4.3. Performance It is not expected that a single threaded client would see any performance increase over a similar application linked with a threaded DCE library. In fact, it is likely that performance will decrease slightly. While it is true that a threadless client will not carry the overhead of a timer thread running several times a second, this will be more than compensated for by the need to call select() before reading each network packet. 5. DCE LITE? The discussion thus far has assumed that the existing RPC runtime would be modified to provide both threadless and threaded support. In a client application, the runtime would refrain from creating auxiliary threads unless and until they were needed. Single threaded client applications would thus remain single threaded. At the other end of the spectrum, we might consider providing two DCE libraries, the second providing only the functionality necessary for writing single threaded client applications. The motivation here would be to reduce the disk/memory requirements of DCE-based applications. For example, in such an environment most of the CMA library code could be stubbed out. Besides the obvious fact that a threadless client wouldn't need any of the code which directly supports threading, it could also do without the code that supports mutual exclusion and thread synchronization. Bypassing the use of mutex Karuzis Page 6 DCE-RFC 31.0 Threadless DCE Clients December 1992 locks and condition variables would presumably also provide a modest increase in performance. Also, the DG code has been fairly well partitioned between modules that support clients only, servers only, or both. Removing server support would reduce the DG code size by approximately 40%. The question here is whether there are platforms that would require/benefit from such a partitioning of the DCE functionality. The answer is probably 'no' for any platform which supports dynamic/shared libraries. In such an environment, the only disk savings will be the difference in size between the full-DCE and DCE- lite libraries; this difference might only be 20-30%; and to benefit from this small decrease in disk usage the system must give up its ability to run servers or multi-threaded clients. AUTHOR'S ADDRESS Mark Karuzis Internet email: markar@apollo.hp.com Distributed Object Computing Program Telephone: +1-508-436-4337 Hewlett-Packard Co. 250 Apollo Drive Chelmsford, MA 01824 USA Karuzis Page 7