OSF DCE SIG                                       J. Wrabetz (Aggregate)
   Request For Comments: 16.0                                September 1992


                        DISTRIBUTED RESOURCE SELECTION


   1. INTRODUCTION

      This paper describes a set of capabilities which are currently not
      available as part of the Distributed Computing Enironment (DCE) or
      the Distributed Management Environment (DME), but which are needed to
      support a wide variety of distributed applications.  Many of the
      capabilities described herein have been or are being developed in
      application specific forms in order to facilitate the development of
      business solutions in the absence of standard services.

      The membership of the OSF DCE Special Interest Group recognize the
      need to simplify distributed application development and system
      administration.  The development and propagation of many incompatible
      and overlapping services will occur in the absence of any OSF
      direction in this area.  This will, as it has in the past, cause
      industry confusion, and slow the development of applications.
      Therefore, the Resource Management Working Group of the OSF DCE SIG
      has defined a set of common services which can be added as DCE/DME
      enhancements upon which developers can build business solutions with
      minimum overlap, and with simple administration.

      The services defined in this paper are used for identifying,
      locating, filtering, and choosing among resources in a network.  The
      initial target applications are particularly concerned with the use
      of processing resources.  Information about many resources is
      required to support the process of locating and accessing processing
      capability over the network.

      The applications developed using these services will share resources
      on the network, getting more productive use of all of the machines on
      the network, and achieving better performance through use of the
      entire network.  There are already several industries in which target
      applications are being developed.  In particular, the CAD/CAE market,
      the financial analysis market and the CASE tools markets are already
      actively developing tools that use the whole network to perform their
      applications.

      The known applications which require resource selection services
      include:

        (a) Distributed CASE utilities (e.g., "distributed make").


   Wrabetz                                                           Page 1


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


        (b) Network backup and restore applications.

        (c) Cooperative processing tools.

        (d) Distributed batch processing applications.

        (e) Resource sharing tools.

        (f) Distributed CAD/CAE tools (e.g., Valid Logic's Distributed
            Analog Workbench, CADENCE Design System's Distributed Dracula,
            GenRad's distributed CAE tools, and Intergraph's distributed
            simulation).

      Anticipated future applications include:

        (a) Other application specific network parallel applications.

        (b) Load balancing tools.

        (c) Replication management tools.

        (d) Other system management tools.

        (e) Dynamic reconfiguration tools.


   2. SERVICES

      A set of resource selection services are required which would provide
      a basic mechanism for applications to select and use network-wide
      resources.  Decision making in the use of network resources can vary
      widely between applications and any resource selection services
      provided must have the necessary information upon which to make
      decisions.

      These services and some of the system applications that use them are
      loosely organized as shown in Figure 1.

      Discussion in this paper is limited to a core set of services
      required to support a wide variety of applications and a limited set
      of system resource selection applications that will utilize these
      services immediately to enhance the flexibility of the DCE itself.

      A resource information service is defined to provide the core of the
      services for making intelligent decisions about network-wide
      resources.  This Resource Information Service (RIS) provides
      information about known network resources.  Other DCE and DME
      services may also be consulted in the selection process, including
      the DCE directory service and the DME Management Information Base
      (MIB).  There is no intention to duplicate these services with the
      RIS.  The RIS is intended to provide the structure of information


   Wrabetz                                                           Page 2


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


   ------------------------------------------------------------------------

             +-----------------------+  +-----------------------+
             |     Resource          |  |    Service            |
             |     Selection         |  |    Instantiation      |
             |     Applications      |  |    Facility (SIF)     |
             +-----------------------+  +-----------------------+

             +--------------+  +--------------+  +--------------+
             |              |  |              |  |  Remote      |
             |     RPC      |  |     ORB      |  |  Execution   |
             |              |  |              |  |  Service     |
             +--------------+  +--------------+  +--------------+

             +--------------------------------------------------+
             |                                                  |
             |            Decision Making Interface             |
             |                                                  |
             +--------------------------------------------------+

             +--------------------------------------------------+
             |                                                  |
             |        Resource Information Service (RIS)        |
             |                                                  |
             +--------------------------------------------------+

             +-----------------------+  +-----------------------+
             |                       |  |                       |
             |          DCE          |  |          DME          |
             |                       |  |                       |
             +-----------------------+  +-----------------------+


                                   Figure 1.

   ------------------------------------------------------------------------

      necessary for decision making applications.

      A set of filtering or "decision making" interfaces are defined which
      manipulate the RIS information in order to support intelligent
      applications.  Decision making interfaces allow applications to
      "choose" resources on the network for various application functions.

      In addition to these "core" services, a set of binding services can
      incorporate use of the "core" resource selection services to provide
      flexible, location independent binding, using dynamic resource
      information.  These binding services support remote invocation
      services.  Eventually binding services should be provided for RPC,
      ORB, and other remote execution services.  A Remote Execution Service
      has been defined because one does not exist in the current DCE


   Wrabetz                                                           Page 3


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


      framework.  It was felt that such a service will be critical to the
      future success of DCE.  Similarly, DCE must address Object management
      services.  However, the definition of these services is not
      considered in this paper, only the interface to resource selection
      services to provide flexible binding to objects.

      Finally, this paper addresses one resource selection application
      which is of interest within the DCE and DME frameworks themselves.  A
      "service instantiator" service is defined which is an automated
      administration service to determine at runtime, where and when to
      instantiate services.

      In the remainder of this paper, each service is defined in more
      detail and some general requirements are described.


   3. RESOURCE INFORMATION SERVICE (RIS)

      Resource selection is a decision making task.  Its goal is to support
      improved usage of the resources in the environment.  As with any
      decision making task, it should be based upon information about the
      resources being managed.  The function of the Resource Information
      Service (or RIS) is to provide uniform access to this information,
      which can be used in making resource selection decisions.  In this
      section we identify a set of requirements for the RIS, with a focus
      on large, distributed, and heterogeneous computing environments,
      based upon the OSF DCE.

      One of the primary elements that the RIS must provide is a uniform
      interface for defining new resources and updating their definitions.
      Resource naming must conform to a global naming scheme (such as
      X.500/CDS) so that resources may be scoped and accessed uniformly.
      In addition, the RIS must allow resources to be added and removed
      without an interruption of the service.

      It is desirable that the implementation of the RIS provide extensible
      definitions of common resources in a distributed environment.  It is
      highly desirable that the implementation follow an object model.
      Different access methods may be specified to obtain such information.
      For example, the resource may contain a binding to an agent who can
      provide that information.

      The RIS must provide APIs for collection and querying the information
      about the resources and these APIs must be remotely accessible using
      the DCE RPC.  These interfaces must be rich enough to support
      manipulation and filtering of information in the upper layers of the
      resource selection framework.  It is not up to the RIS implementation
      to perform filtering of any sort.  It is also desirable that the RIS
      have a user interface through which the information in the RIB may be
      accessed.


   Wrabetz                                                           Page 4


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


      No extensive information recovery mechanisms are needed, as resource
      information is dynamic and has a short valid lifetime.  The RIS may
      maintain the age and the lifespan of information about each resource
      and make this available via the APIs.  It is desirable that the RIS
      be highly available.  Applications that use this service must be
      designed so that the non-availability of the RIS should not prevent
      access to resources, although their use may be sub-optimal.

      The RIS implementation must support a high access and update rate to
      a large amount of resource information.  It should scale to large
      networks containing thousands of nodes, with graceful degradation of
      quality of service.  Some of the metrics to be used in this regard
      are the response time and freshness of information against the size
      of the information being maintained by the RIS.  It should optimize
      access and update of local information as opposed to non-local or
      infrequent references.

      The RIS must also implement authentication and access control on the
      resource objects.  It is desirable that this implementation use DCE
      security mechanisms or provide interfaces consistent with those of
      the DCE.

      Lastly, management of the RIS as a whole must be possible within the
      DME framework.  In addition, since the RIS is an information
      management service, it should be consistent with the DME model.

      Optionally, the RIS may allow collections of resources to be defined.
      It may allow applications to define trigger conditions on the state
      of a resource and may inform or signal the application on the
      occurrence of that condition.  It may also actively collect resource
      information.


   4. DECISION MAKING SUPPORT INTERFACE

      The decision making support interface is a well defined API which
      allows resource selection applications to query the resource
      information service (RIS).  The API must allow resource selection
      applications to express their management policies.  Queries to the
      interface will take the form of a set of conditions on the resource
      selection information that compare RIS data.  RIS data must be typed
      in order to allow these comparisons.  For example, a load balancing
      application might set the policy that only machines with load less
      than five jobs are eligible to receive the next job.  So, if load
      level was stored by the RIS as an integer, the load balancing
      application could ask the interface to compare load levels with the
      integer five and to return a list of machines whose load is less than
      five.

      Potentially, the queries generated by resource selection applications
      could involve the entire contents of the RIS.  In a distributed


   Wrabetz                                                           Page 5


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


      environment, this is not always practical.  Thus, the interface must
      allow resource selection applications to limit the time or cost (in
      dollars) of a particular query.  Likewise, the number of answers to a
      query must be controlled by the interface.  This will allow an
      application to specify the number of answers it requires.

      The resource selection application needs to know how "good" the RIS
      data is in order to make well informed decisions.  For example, two
      important indications of the goodness of the data are freshness and
      accuracy.  Freshness indicates the current age of the data with
      respect to its lifetime.  Accuracy indicates, for a difficult to
      measure value, the difference between the measured value and the
      actual value.  Goodness values should be stored as characteristics of
      the RIS data.  This will allow the decision making support interface
      to use these characteristics to disallow "bad" data.  In addition,
      resource selection applications should be able to specify their own
      goodness limits.

      An application may wish to be notified when a particular condition on
      RIS data becomes true.  To support this, the interface must be able
      to express "trigger" conditions.  Furthermore, every RIS is not
      guaranteed to contain the same data items.  This creates a situation
      in which the condition expressions contain "unknown" values.  The
      interface must cleanly handle this problem.


   5. DISTRIBUTED PROCESSING SERVICE

      Distributed processing services may be provided at a number of
      different levels.  We require that a minimum level of services be
      provided suitable for a heterogeneous environment.  Basic service
      should integrate remote execution and batch job submission (with
      redirection of IO) with resource selection.  The integration should
      be available in both transparent and non-transparent modes, with both
      command line and API interfaces, including policy specification for
      use in the decision making process by lower level services.
      Distributed processing services should be integrated with appropriate
      DCE services to incorporate security and access control.

      Additional higher levels of distributed processing services may be
      provided, similarly integrated with the resource selection system.
      For example, moving along the spectrum from heterogeneous towards
      homogeneous systems, support for Unix signals and process groups may
      be offered, or at the extreme end of the spectrum, support for
      transparent process migration across similar hardware/software
      systems.


   Wrabetz                                                           Page 6


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


   6. BINDING APPLICATIONS

      Binding applications are important consumers of the resource
      selection services.  Each "binding application" may require its own
      stylized interfaces.

   6.1. RPC Interface

      The DCE RPC is a potential major and critical consumer of the
      resource selection services.  A resource selection system must be
      well integrated with the DCE RPC providing RPC client applications
      with the option of either completely transparent integration of
      resource selection or explicit integration.  Resource selection may
      be part of the RPC binding process.  In particular, if a client
      application uses the auto-handle method of binding, or uses the
      explicit NSI interfaces that resolve requests via profiles or groups,
      automatic resource selection may be used.  The underlying NSI
      operations will automatically use the resource selection services,
      unless this automatic use has been disabled by local policy.
      However, if the client application specifies a particular server, the
      resource selection system shall not interfere with it.  If the
      resource selection system is unavailable or for any other reason
      cannot assist in the binding selection, the behavior must revert to
      the standard NSI behavior.

      Alternatively, RPC client applications must be able to make explicit
      requests to the resource selection system to assist in their binding
      decisions.  For example, a client shall be able to query the reported
      load average of a set of candidate servers.  A well-defined API shall
      be provided for such purposes, and it shall be capable of returning
      RPC binding handles for direct use in subsequent RPCs.

      Two alternative forms of policy input are required.  An API shall be
      provided for RPC applications to specify local resource management
      policy.  A management interface shall also be provided to allow
      specification of local resource selection policy for RPC
      applications.  These policy specification interfaces may be used
      regardless of whether the RPC application is using resource selection
      transparently or non-transparently.  Default policies shall be
      specified.

   6.2. Object Request Broker Interface

      An "Object Request Broker" is also a provider of a style of binding
      services to client applications.  An appropriately stylized interface
      to the resource selection services should be provided.


   Wrabetz                                                           Page 7


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


   6.3. Remote Execution Service Interface

      Another form of "binding application" is distributed processing
      services, where binding may be either implicit or explicit.  Again,
      appropriately stylized interfaces to the resource selection services
      should be provided.

      Distributed processing services must be able to explicitly or
      implicitly make use of resource selection services, through multiple
      levels of interfaces with verying degrees of integration with
      resource selection services.  A well defined API shall be provided
      for such purposes, and it shall be capable of returning the
      appropriate binding handles for direct use in subsequent remote
      invocations.


   7. SERVICE INSTANTIATION FACILITY (SIF)

      In the current DCE, a server will be accessible only if it is
      running.  If a server is not running, all objects, services, and ACLs
      provided and managed by that server will be inaccessible until an
      instance of the server is started.  Thus not all objects, ACLs, and
      servers known to DCE users will always be accessible.  If a server
      managing several objects crashes, or is never started to begin with,
      the objects and probably their ACLs as well will be inaccessible
      until the server crash is detected and the server is restarted.

      However, we cannot realistically expect all servers to be running all
      the time.  Some servers may be too lightly used or may impose too
      high a resource drain to be run continuously.  Moreover, servers can
      crash.

      Nonetheless, the DCE environment would be significantly easier to
      work in if DCE users, client programs, and client servers could
      expect that most if not all servers would behave as if they were
      running all the time.  Moreover, since the resource selection
      facility assumes that resource attributes and data can be accessed at
      any time, all servers that control resources under the care of a
      resource selection facility must appear to be running at all times
      for the resource selection facility to function properly.

      Servers can be made to appear to be running at all times through the
      use of a "Service Instantiation Facility" (SIF).  The job of the SIF
      is to start a requested server before a request is delivered to the
      server.  Servers need not be constantly running; they can be left
      shut down and will be started up when needed.  That leaves memory,
      swap space, and OS resources free for productive use.

      The SIF must be able to determine how to correctly start up a server
      that is down.  This implies that the SIF must know about (or be able
      to access information on) each server that it can start up.  We will


   Wrabetz                                                           Page 8


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


      therefore assume that some servers are managed by the SIF and others
      are not.

      There must be a mechanism by which a server can be made known to the
      SIF.  We call this mechanism "SIF Registration".  In order to prevent
      denial-of-service attacks, SIF Registration must be able to detect
      and prevent unauthorized registration of server information.
      Otherwise, users could substitute invalid server start-up information
      in place of the actual information.  Authorized users will generally
      consist of the the server authors (in the case of private programs),
      server administrators (in the case of more widely used programs), the
      cell administrator(s), and perhaps the node administrator if one
      exists.  The SIF will only attempt to restart servers that are
      properly SIF registered.

      When services are accessed via RPC, the SIF must be invoked when a
      DCE program performs an RPC to a SIF-managed server on a partially
      bound (partially-resolved) RPC handle for that server.  A SIF
      invocation consists of a request to the SIF to start up a particular
      server.  The SIF must be capable of managing any server that uses DCE
      RPC.  Correct SIF operation when fully-bound handles are used in an
      RPC is desirable but optional.

      The SIF must make at least one attempt to correctly start up the
      proper SIF-managed server per SIF invocation.  (The SIF will
      guarantee only to attempt to start up the server.  The SIF cannot and
      will not guarantee that the server successfully starts up.)  The SIF
      shall not attempt to prevent multiple instances of the same server
      from running on the same machine.  Solving that problem (if it is a
      problem for that server) should be left to the individual servers.

      The existence and functioning of the SIF must be transparent to the
      DCE programs that access SIF-managed servers.  The SIF shall require
      no changes to the existing RPC API except for additions necessary to
      implement SIF registration,

      Detection of a down server may or may not be part of the SIF.  For
      example, given the requirements above, it should be possible to
      implement the detection and invocation of the SIF in the rpcd.  If
      that is the case, the SIF need not detect whether a server is down.
      The rpcd will detect the absence of the requested server and invoke
      the SIF.


   8. CONCLUSION

      The demand for capabilities that allow applications to make
      intelligent and dynamic use of network-wide resources is already
      great and will continue to grow.  Unfortunately, many application
      developers who might build these applications do not realize that DCE
      does not provide that capability today and are likely to be


   Wrabetz                                                           Page 9


   DCE-RFC 16.0         Distributed Resource Selection       September 1992


      disappointed.  Further, the cost and expertise required to develop
      these capabilities for an individual application is prohibitive.

      This paper describes a set of services which can significantly
      enhance the usability of DCE services, and which can widen the
      applicability of DCE to markets that require more sophisticated use
      of network resources, while making less direct use of low level
      distributed computing services.

      These services are proposed as enhancements to DCE not only because
      they enhance the usability of DCE itself, but because they are best
      provided in a common application independent form.  Providing a
      common set of standard services in this area allows for greatly
      reduced administrative burden for application users, and greatly
      reduced network activity in support of the capability because both
      administration and overhead activities are shared by many
      applications using the services.


   AUTHOR'S ADDRESS

   Joan Wrabetz                           Internet email: jmw@aggregate.com
   Aggregate Computing                           Telephone: +1-612-546-5579
   300 S. Hwy. 169, Suite 400
   Minneapolis, Minnesota 55426
   USA


   Wrabetz                                                          Page 10