OSF DCE SIG J. Wrabetz (Aggregate) Request For Comments: 16.0 September 1992 DISTRIBUTED RESOURCE SELECTION 1. INTRODUCTION This paper describes a set of capabilities which are currently not available as part of the Distributed Computing Enironment (DCE) or the Distributed Management Environment (DME), but which are needed to support a wide variety of distributed applications. Many of the capabilities described herein have been or are being developed in application specific forms in order to facilitate the development of business solutions in the absence of standard services. The membership of the OSF DCE Special Interest Group recognize the need to simplify distributed application development and system administration. The development and propagation of many incompatible and overlapping services will occur in the absence of any OSF direction in this area. This will, as it has in the past, cause industry confusion, and slow the development of applications. Therefore, the Resource Management Working Group of the OSF DCE SIG has defined a set of common services which can be added as DCE/DME enhancements upon which developers can build business solutions with minimum overlap, and with simple administration. The services defined in this paper are used for identifying, locating, filtering, and choosing among resources in a network. The initial target applications are particularly concerned with the use of processing resources. Information about many resources is required to support the process of locating and accessing processing capability over the network. The applications developed using these services will share resources on the network, getting more productive use of all of the machines on the network, and achieving better performance through use of the entire network. There are already several industries in which target applications are being developed. In particular, the CAD/CAE market, the financial analysis market and the CASE tools markets are already actively developing tools that use the whole network to perform their applications. The known applications which require resource selection services include: (a) Distributed CASE utilities (e.g., "distributed make"). Wrabetz Page 1 DCE-RFC 16.0 Distributed Resource Selection September 1992 (b) Network backup and restore applications. (c) Cooperative processing tools. (d) Distributed batch processing applications. (e) Resource sharing tools. (f) Distributed CAD/CAE tools (e.g., Valid Logic's Distributed Analog Workbench, CADENCE Design System's Distributed Dracula, GenRad's distributed CAE tools, and Intergraph's distributed simulation). Anticipated future applications include: (a) Other application specific network parallel applications. (b) Load balancing tools. (c) Replication management tools. (d) Other system management tools. (e) Dynamic reconfiguration tools. 2. SERVICES A set of resource selection services are required which would provide a basic mechanism for applications to select and use network-wide resources. Decision making in the use of network resources can vary widely between applications and any resource selection services provided must have the necessary information upon which to make decisions. These services and some of the system applications that use them are loosely organized as shown in Figure 1. Discussion in this paper is limited to a core set of services required to support a wide variety of applications and a limited set of system resource selection applications that will utilize these services immediately to enhance the flexibility of the DCE itself. A resource information service is defined to provide the core of the services for making intelligent decisions about network-wide resources. This Resource Information Service (RIS) provides information about known network resources. Other DCE and DME services may also be consulted in the selection process, including the DCE directory service and the DME Management Information Base (MIB). There is no intention to duplicate these services with the RIS. The RIS is intended to provide the structure of information Wrabetz Page 2 DCE-RFC 16.0 Distributed Resource Selection September 1992 ------------------------------------------------------------------------ +-----------------------+ +-----------------------+ | Resource | | Service | | Selection | | Instantiation | | Applications | | Facility (SIF) | +-----------------------+ +-----------------------+ +--------------+ +--------------+ +--------------+ | | | | | Remote | | RPC | | ORB | | Execution | | | | | | Service | +--------------+ +--------------+ +--------------+ +--------------------------------------------------+ | | | Decision Making Interface | | | +--------------------------------------------------+ +--------------------------------------------------+ | | | Resource Information Service (RIS) | | | +--------------------------------------------------+ +-----------------------+ +-----------------------+ | | | | | DCE | | DME | | | | | +-----------------------+ +-----------------------+ Figure 1. ------------------------------------------------------------------------ necessary for decision making applications. A set of filtering or "decision making" interfaces are defined which manipulate the RIS information in order to support intelligent applications. Decision making interfaces allow applications to "choose" resources on the network for various application functions. In addition to these "core" services, a set of binding services can incorporate use of the "core" resource selection services to provide flexible, location independent binding, using dynamic resource information. These binding services support remote invocation services. Eventually binding services should be provided for RPC, ORB, and other remote execution services. A Remote Execution Service has been defined because one does not exist in the current DCE Wrabetz Page 3 DCE-RFC 16.0 Distributed Resource Selection September 1992 framework. It was felt that such a service will be critical to the future success of DCE. Similarly, DCE must address Object management services. However, the definition of these services is not considered in this paper, only the interface to resource selection services to provide flexible binding to objects. Finally, this paper addresses one resource selection application which is of interest within the DCE and DME frameworks themselves. A "service instantiator" service is defined which is an automated administration service to determine at runtime, where and when to instantiate services. In the remainder of this paper, each service is defined in more detail and some general requirements are described. 3. RESOURCE INFORMATION SERVICE (RIS) Resource selection is a decision making task. Its goal is to support improved usage of the resources in the environment. As with any decision making task, it should be based upon information about the resources being managed. The function of the Resource Information Service (or RIS) is to provide uniform access to this information, which can be used in making resource selection decisions. In this section we identify a set of requirements for the RIS, with a focus on large, distributed, and heterogeneous computing environments, based upon the OSF DCE. One of the primary elements that the RIS must provide is a uniform interface for defining new resources and updating their definitions. Resource naming must conform to a global naming scheme (such as X.500/CDS) so that resources may be scoped and accessed uniformly. In addition, the RIS must allow resources to be added and removed without an interruption of the service. It is desirable that the implementation of the RIS provide extensible definitions of common resources in a distributed environment. It is highly desirable that the implementation follow an object model. Different access methods may be specified to obtain such information. For example, the resource may contain a binding to an agent who can provide that information. The RIS must provide APIs for collection and querying the information about the resources and these APIs must be remotely accessible using the DCE RPC. These interfaces must be rich enough to support manipulation and filtering of information in the upper layers of the resource selection framework. It is not up to the RIS implementation to perform filtering of any sort. It is also desirable that the RIS have a user interface through which the information in the RIB may be accessed. Wrabetz Page 4 DCE-RFC 16.0 Distributed Resource Selection September 1992 No extensive information recovery mechanisms are needed, as resource information is dynamic and has a short valid lifetime. The RIS may maintain the age and the lifespan of information about each resource and make this available via the APIs. It is desirable that the RIS be highly available. Applications that use this service must be designed so that the non-availability of the RIS should not prevent access to resources, although their use may be sub-optimal. The RIS implementation must support a high access and update rate to a large amount of resource information. It should scale to large networks containing thousands of nodes, with graceful degradation of quality of service. Some of the metrics to be used in this regard are the response time and freshness of information against the size of the information being maintained by the RIS. It should optimize access and update of local information as opposed to non-local or infrequent references. The RIS must also implement authentication and access control on the resource objects. It is desirable that this implementation use DCE security mechanisms or provide interfaces consistent with those of the DCE. Lastly, management of the RIS as a whole must be possible within the DME framework. In addition, since the RIS is an information management service, it should be consistent with the DME model. Optionally, the RIS may allow collections of resources to be defined. It may allow applications to define trigger conditions on the state of a resource and may inform or signal the application on the occurrence of that condition. It may also actively collect resource information. 4. DECISION MAKING SUPPORT INTERFACE The decision making support interface is a well defined API which allows resource selection applications to query the resource information service (RIS). The API must allow resource selection applications to express their management policies. Queries to the interface will take the form of a set of conditions on the resource selection information that compare RIS data. RIS data must be typed in order to allow these comparisons. For example, a load balancing application might set the policy that only machines with load less than five jobs are eligible to receive the next job. So, if load level was stored by the RIS as an integer, the load balancing application could ask the interface to compare load levels with the integer five and to return a list of machines whose load is less than five. Potentially, the queries generated by resource selection applications could involve the entire contents of the RIS. In a distributed Wrabetz Page 5 DCE-RFC 16.0 Distributed Resource Selection September 1992 environment, this is not always practical. Thus, the interface must allow resource selection applications to limit the time or cost (in dollars) of a particular query. Likewise, the number of answers to a query must be controlled by the interface. This will allow an application to specify the number of answers it requires. The resource selection application needs to know how "good" the RIS data is in order to make well informed decisions. For example, two important indications of the goodness of the data are freshness and accuracy. Freshness indicates the current age of the data with respect to its lifetime. Accuracy indicates, for a difficult to measure value, the difference between the measured value and the actual value. Goodness values should be stored as characteristics of the RIS data. This will allow the decision making support interface to use these characteristics to disallow "bad" data. In addition, resource selection applications should be able to specify their own goodness limits. An application may wish to be notified when a particular condition on RIS data becomes true. To support this, the interface must be able to express "trigger" conditions. Furthermore, every RIS is not guaranteed to contain the same data items. This creates a situation in which the condition expressions contain "unknown" values. The interface must cleanly handle this problem. 5. DISTRIBUTED PROCESSING SERVICE Distributed processing services may be provided at a number of different levels. We require that a minimum level of services be provided suitable for a heterogeneous environment. Basic service should integrate remote execution and batch job submission (with redirection of IO) with resource selection. The integration should be available in both transparent and non-transparent modes, with both command line and API interfaces, including policy specification for use in the decision making process by lower level services. Distributed processing services should be integrated with appropriate DCE services to incorporate security and access control. Additional higher levels of distributed processing services may be provided, similarly integrated with the resource selection system. For example, moving along the spectrum from heterogeneous towards homogeneous systems, support for Unix signals and process groups may be offered, or at the extreme end of the spectrum, support for transparent process migration across similar hardware/software systems. Wrabetz Page 6 DCE-RFC 16.0 Distributed Resource Selection September 1992 6. BINDING APPLICATIONS Binding applications are important consumers of the resource selection services. Each "binding application" may require its own stylized interfaces. 6.1. RPC Interface The DCE RPC is a potential major and critical consumer of the resource selection services. A resource selection system must be well integrated with the DCE RPC providing RPC client applications with the option of either completely transparent integration of resource selection or explicit integration. Resource selection may be part of the RPC binding process. In particular, if a client application uses the auto-handle method of binding, or uses the explicit NSI interfaces that resolve requests via profiles or groups, automatic resource selection may be used. The underlying NSI operations will automatically use the resource selection services, unless this automatic use has been disabled by local policy. However, if the client application specifies a particular server, the resource selection system shall not interfere with it. If the resource selection system is unavailable or for any other reason cannot assist in the binding selection, the behavior must revert to the standard NSI behavior. Alternatively, RPC client applications must be able to make explicit requests to the resource selection system to assist in their binding decisions. For example, a client shall be able to query the reported load average of a set of candidate servers. A well-defined API shall be provided for such purposes, and it shall be capable of returning RPC binding handles for direct use in subsequent RPCs. Two alternative forms of policy input are required. An API shall be provided for RPC applications to specify local resource management policy. A management interface shall also be provided to allow specification of local resource selection policy for RPC applications. These policy specification interfaces may be used regardless of whether the RPC application is using resource selection transparently or non-transparently. Default policies shall be specified. 6.2. Object Request Broker Interface An "Object Request Broker" is also a provider of a style of binding services to client applications. An appropriately stylized interface to the resource selection services should be provided. Wrabetz Page 7 DCE-RFC 16.0 Distributed Resource Selection September 1992 6.3. Remote Execution Service Interface Another form of "binding application" is distributed processing services, where binding may be either implicit or explicit. Again, appropriately stylized interfaces to the resource selection services should be provided. Distributed processing services must be able to explicitly or implicitly make use of resource selection services, through multiple levels of interfaces with verying degrees of integration with resource selection services. A well defined API shall be provided for such purposes, and it shall be capable of returning the appropriate binding handles for direct use in subsequent remote invocations. 7. SERVICE INSTANTIATION FACILITY (SIF) In the current DCE, a server will be accessible only if it is running. If a server is not running, all objects, services, and ACLs provided and managed by that server will be inaccessible until an instance of the server is started. Thus not all objects, ACLs, and servers known to DCE users will always be accessible. If a server managing several objects crashes, or is never started to begin with, the objects and probably their ACLs as well will be inaccessible until the server crash is detected and the server is restarted. However, we cannot realistically expect all servers to be running all the time. Some servers may be too lightly used or may impose too high a resource drain to be run continuously. Moreover, servers can crash. Nonetheless, the DCE environment would be significantly easier to work in if DCE users, client programs, and client servers could expect that most if not all servers would behave as if they were running all the time. Moreover, since the resource selection facility assumes that resource attributes and data can be accessed at any time, all servers that control resources under the care of a resource selection facility must appear to be running at all times for the resource selection facility to function properly. Servers can be made to appear to be running at all times through the use of a "Service Instantiation Facility" (SIF). The job of the SIF is to start a requested server before a request is delivered to the server. Servers need not be constantly running; they can be left shut down and will be started up when needed. That leaves memory, swap space, and OS resources free for productive use. The SIF must be able to determine how to correctly start up a server that is down. This implies that the SIF must know about (or be able to access information on) each server that it can start up. We will Wrabetz Page 8 DCE-RFC 16.0 Distributed Resource Selection September 1992 therefore assume that some servers are managed by the SIF and others are not. There must be a mechanism by which a server can be made known to the SIF. We call this mechanism "SIF Registration". In order to prevent denial-of-service attacks, SIF Registration must be able to detect and prevent unauthorized registration of server information. Otherwise, users could substitute invalid server start-up information in place of the actual information. Authorized users will generally consist of the the server authors (in the case of private programs), server administrators (in the case of more widely used programs), the cell administrator(s), and perhaps the node administrator if one exists. The SIF will only attempt to restart servers that are properly SIF registered. When services are accessed via RPC, the SIF must be invoked when a DCE program performs an RPC to a SIF-managed server on a partially bound (partially-resolved) RPC handle for that server. A SIF invocation consists of a request to the SIF to start up a particular server. The SIF must be capable of managing any server that uses DCE RPC. Correct SIF operation when fully-bound handles are used in an RPC is desirable but optional. The SIF must make at least one attempt to correctly start up the proper SIF-managed server per SIF invocation. (The SIF will guarantee only to attempt to start up the server. The SIF cannot and will not guarantee that the server successfully starts up.) The SIF shall not attempt to prevent multiple instances of the same server from running on the same machine. Solving that problem (if it is a problem for that server) should be left to the individual servers. The existence and functioning of the SIF must be transparent to the DCE programs that access SIF-managed servers. The SIF shall require no changes to the existing RPC API except for additions necessary to implement SIF registration, Detection of a down server may or may not be part of the SIF. For example, given the requirements above, it should be possible to implement the detection and invocation of the SIF in the rpcd. If that is the case, the SIF need not detect whether a server is down. The rpcd will detect the absence of the requested server and invoke the SIF. 8. CONCLUSION The demand for capabilities that allow applications to make intelligent and dynamic use of network-wide resources is already great and will continue to grow. Unfortunately, many application developers who might build these applications do not realize that DCE does not provide that capability today and are likely to be Wrabetz Page 9 DCE-RFC 16.0 Distributed Resource Selection September 1992 disappointed. Further, the cost and expertise required to develop these capabilities for an individual application is prohibitive. This paper describes a set of services which can significantly enhance the usability of DCE services, and which can widen the applicability of DCE to markets that require more sophisticated use of network resources, while making less direct use of low level distributed computing services. These services are proposed as enhancements to DCE not only because they enhance the usability of DCE itself, but because they are best provided in a common application independent form. Providing a common set of standard services in this area allows for greatly reduced administrative burden for application users, and greatly reduced network activity in support of the capability because both administration and overhead activities are shared by many applications using the services. AUTHOR'S ADDRESS Joan Wrabetz Internet email: jmw@aggregate.com Aggregate Computing Telephone: +1-612-546-5579 300 S. Hwy. 169, Suite 400 Minneapolis, Minnesota 55426 USA Wrabetz Page 10