Open Software Foundation R. Friedrich (HP) Request For Comments: 33.0 S. Saunders (HP) July 1995 G. Zaidenweber (HP) D. Bachmann (IBM) S. Blumson (CITI) STANDARDIZED PERFORMANCE INSTRUMENTATION AND INTERFACE SPECIFICATION FOR MONITORING DCE-BASED APPLICATIONS 1. INTRODUCTION Distributed systems offer advantages in flexibility, capacity, price-performance, availability and resource sharing. Distributed applications can provide user productivity improvements through ease of use and access to distributed data. However, managing applications in a distributed environment is a complex task, and the lack of performance measurement facilities is an impediment to large-scale deployment. This document describes performance instrumentation and measurement interface specifications that support performance related tasks such as configuration planning, application tuning, bottleneck analysis, and capacity planning. These performance measurement capabilities are a necessary component of any commercially viable computer technology, and are currently insufficient in DCE. Specifically, to provide high-level analysis software the data to compute correlated resource utilization across nodes in a network, this document describes the: (a) Functional specifications for a performance measurement access and control interface. (b) Content of performance instrumentation within the DCE RPC runtime library and stubs. (c) Extensions to support instrumentation of applications and other middleware technologies based on DCE. The guiding philosophy is to define a set of _standardized performance instrumentation_ that is consistently collected, reported and interpreted in a heterogeneous environment. Furthermore, these measurement capabilities are compiled into the core DCE services for use at customer sites. To support pervasive instrumentation the instrumentation must have minimal overhead on applications and services. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 1 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 A companion RFC, RFC 32.0, discusses the requirements for performance monitoring, the metrics that are of interest for performance analysis and performance management, and the instrumentation necessary to collect performance data [RFC 32]. Consequently, the requirements for instrumentation are not described in this document. 1.1. Minimum Content for DCE 1.2 We recommend deployment of core instrumentation with the DCE Release 1.2 and then roll out additional instrumentation in later releases. The following summarizes the minimum content for DCE release 1.2: (a) Define and implement critical RPC *RTL* (runtime library, `libdce') instrumentation. (b) Define common access and collection interfaces for application servers and clients. (c) Recompile/relink all DCE services to utilize the RPC instrumentation for Naming, Security, Time and DFS exported interfaces. (d) Recompile/relink middleware with these instrumented DCE services. (e) Link the performance measurement facilities (*observer* and *NPCS* (networked performance collection service, defined below)) with the standard instrumentation to allow monitoring the measurement system. 1.2. Terminology and Concepts To ensure consistent meaning the following terms and concepts are defined for use in this document. A more detailed discussion of some of these concepts is found in later sections of this document. 1.2.1. Metrics *Metrics* define measurable quantities that provide data to evaluate the performance of a system under study. They may consist of raw information (such as events) or derived quantities such as statistical measures or rates. Examples are response time, throughput, and utilization. These metrics, and more, are described in detail in section 4. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 2 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 1.2.2. Instrumentation *Instrumentation* are specialized software components incorporated into programs to provide mechanisms for measuring data that is used to calculate the relevant performance metrics. The basic measurement techniques are counting, timing and tracing. The objective of instrumentation is to provide measures of resource utilization (such as CPU, memory, I/O, network, etc.) and processing time (such as service time, queuing time, etc.). These measures are delivered to a *performance monitor* as statistical measures or as frequency and time histograms. From here on we will often refer to instrumentation as *sensors*. 1.2.3. Sensors *Sensors* are the logical instantiations of the instrumentation necessary to collect data for a particular, single metric. Sensors consist of aggregations of *probes* located at well defined *probe points*. Sensors contain internal state that satisfies the definition of a particular metric. For example, a "response time sensor" will consist of two probes (a begin-timer and end-timer probe) but appear to the user as a single, logical entity. In object-oriented language, the sensors are the objects that encapsulate the data and functions provided by the instrumentation primitives. A conceptual model of a sensor is illustrated in Figure 1. A sensor is a "software IC (integrated circuit)" that has input, output and control functions. The input to a sensor is provided by an event measured by a probe. The sensor provides output data, internal error conditions, and registration data so that the sensor can be identified by the measurement system. A sensor is controlled by several functions, including initialization, getting data, and modifying the sensor configuration. A sensor maintains internal state such as its identification, statistical data, and possibly some small algorithms that support threshold, histogram and trace functions. There are three *types* of sensors: (a) *Counter* sensors support the counting of events. (b) *Timer* sensors support the timing of events (or functions). (c) *Pass-thru* sensors support accessing data already available within a service that is not provided by a probe or allow arbitrary structures to be passed. The first two sensor types support *threshold detection* to minimize data transmitted across the network by supplying data only when a user specified threshold criteria is met. All three sensor types Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 3 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 support a *fast-path* option that does not set locks during sensor data update operations. There are two *categories* of sensors, each of which support all three sensor types described above: (a) *Standard* sensors are those defined and implemented by the core DCE services and are automatically available for an application with no source modifications. These sensors are statically-defined for a particular release of DCE. (b) *Custom* sensors are specialized sensors created by application or middleware developers to count, time or pass-thru data specific to the application. Custom sensors are created within the process address space and are integrated within the measurement infrastructure. Since DCE application environments are multi-threaded, all sensors must be re-entrant (in the case of custom sensors, this is the application-programmer's responsibility). Sensors are described in detail in section 5. # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 1.* A sensor is conceptually illustrated here. A sensor can be thought of as a "software IC" that has input, control and output functions. In addition the sensor contains some internal state including sensor identifier, statistical metric data, metric computation algorithms, and other actions. The set of input, control and output functions are described in detail in sections 10 and 11. 1.2.4. Probes *Probes* are the basic primitives from which sensors are constructed. Probes provide data input, control, and data access (output). For example, a probe might define the functions necessary to increment/decrement a counter. In general, probes do not contain local state, but only access global sensor data. (An exception is for timer probes, where the start-time must be maintained locally.) Probes are pre-defined as _macros_ to ensure consistency in implementation of sensors and to ease instrumenting source code. The macro definitions are presented in section 6. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 4 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 Probes provide input to a sensor. It is possible to place these probes in non-DCE services to obtain measures of interest (for example in the C library to collect data on sockets), but this spec focuses on DCE-based middleware and application software. 1.2.5. Probe points *Probe points* are the locations within a program's flow of control where significant event transitions occur, and are thus candidates for the placement of probes. For example, when a client program issues an RPC a state transition occurs from "user code" to "runtime library", and is an excellent place for placing instrumentation software to record counts or elapsed times. The use of probes placed at probe points to construct a timer sensor is illustrated in Figure 2. Although the `probe_point_B' shown there is within the same scope as `functionN()', it is not restricted to the same scope as `probe_point_A'. # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 2.* The implementation of a sensor timer is illustrated in this figure for an arbitrary `functionN()'. The probes are located at the beginning and ending of the function. These probe points provide input data into the sensor for starting and stopping an elapsed time clock. 1.2.6. Performance information sets Requirements for different data capture granularities and subsets requires that the measurement system have a controllable capability to obtain only the required amount of data with minimum overhead. Consequently, we have defined varying data collection *information sets* to provide increasing detail in the collected data. This controls the detail (statistics) of the collected data. Under the best scenario there is no overhead incurred by the measurement system when no observations are required. Increasing the size of performance information sets increases the number of data components of the collected data, providing a more comprehensive picture of operational behavior, but at the cost of increasing resource utilization. Information set control is done on a per-sensor, and not a per-process, basis. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 5 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 Furthermore, for minimal overhead during continuous monitoring, metric *thresholds* are set, such that the measurement system will only report data when it exceeds the value of the specified thresholds. Minimizing resource consumption requires that *filtering* take place as close to the sensors as possible. This specification adopts the philosophy that the sensors themselves are simple and very efficient and that filtering tasks would complicate them needlessly. Consequently, filtering is done on the node, but by the NPCS. Table 1 summarizes the sensor data information sets and their characteristics. +-----------+-------------------------------------+----------------+ | Info Set | | New Statistics | | Value | Description | Per Metric | +===========+=====================================+================+ |0 | Minimum overhead, no data needed. | None. | +-----------+-------------------------------------+----------------+ |0x01 | Provides simple utilizations, usage | Counts, Simple | | | counts, error counts, mean times, | sums, Minimums,| | | mean rates ONLY if a user-specified | Maximums. | | | threshold has been exceeded. | | | | Otherwise, no data is returned from | | | | the NPCS. | | +-----------+-------------------------------------+----------------+ |0x02, | Provides simple utilizations, usage | Counts, Simple | |0x04, 0x08 | counts, error counts, mean times, | sums, Minimums,| | | mean rates. | Maximums. | +-----------+-------------------------------------+----------------+ |0x10 | Provides 2nd moments so that | Sum of squares.| | | analysis can yield variance. | | +-----------+-------------------------------------+----------------+ |0x20 | Provides 3rd moments so that | Sum of cubes. | | | analysis can yield skew. | | +-----------+-------------------------------------+----------------+ *Table 1.* Performance Information Sets *Event tracing* is necessary to provide events in a time-ordered causal relationship. Due to scalability concerns and overhead in a production environment, this is not a part of the specification. 1.2.7. Reporting interval The *reporting interval* is the time interval, measured in seconds, over which metrics are collected and statistics are summarized and then reported. To minimize performance measurement overhead, single events are not collected. Rather, the sensors summarize data over a reporting interval (currently 5 seconds minimum), and only report interval statistics to the higher level performance monitor. This Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 6 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 interval is adjustable to decrease collection overhead. 1.2.8. Thresholds Support of *threshold* sensors can dramatically reduce the amount of data collected and transmitted through the network environment, since only "exception cases" are reported. This supports the "management by exception" philosophy of network management. Thresholds are defined on a per-sensor basis, with a minimum and maximum (or both) values (i.e., a range). The NPCS then processes incoming sensor data, and when a sensor's minimum is below the configured threshold the sensor data from this reporting interval is reported to the PMA (*performance monitor application*) at the next NPCS reporting interval. Supporting threshold detection in the NPCS simplifies the sensors and allows multiple PMAs to configure a specific sensor with different threshold values. 1.2.9. Network node This document distinguishes the hardware from the software process for clients and servers. For the purpose of this paper, the physical hardware that clients and servers execute on is referred to as a *network node*. (Many management applications define a "server" as the hardware device that is providing the service. This is different from our definition). 1.2.10. DCE client A *DCE client* is a software process/thread executing on a particular network node, that makes RPC requests. This definition includes a custom-developed application that issues RPC requests to a DCE server, as well as a DCE system-level service making a request of another DCE server. 1.2.11. DCE server A *DCE server* is a software process/thread executing on a particular network node that receives (and usually responds to) RPC requests. This definition includes system-level DCE services (such as the `dced') as well as custom-developed application services. Note that a "server" in this document is a software process and not the physical hardware (see definition of network node, above). 1.2.12. Performance monitor, and application (PMA) A *performance monitor* (or just *monitor*) is a process that provides on-going collection and reporting of performance data for evaluation by system managers, application designers and capacity planners. A specific instance of a monitor that also supports management functions is called a *performance management application (PMA)*. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 7 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 1.2.13. DCE Measurement System (DMS) By the *DCE Measurement System* we mean the framework of of sensors, standard interfaces, and monitoring processes that initialize, control, access, and present performance data, as defined within this specification. Figure 4 in section 7 provides a block diagram for these components and their relationships. The following processing elements are shown in figure 4: (a) *Performance Management Application (PMA)* -- The distributed application measurement system supports a single, logical view of the distributed application via a distributed application monitor. The most important views of the data provided by the PMA are discussed in more detail in section 1.3.2. It supports the *NPRI* interface. There are two special case PMAs: a *client-only PMA (COP)* that does not support the NPRI, and an *SNMP-agent PMA (SAP)* that interfaces with SNMP. (b) *Sensors* are located throughout the application's address space, and may reside in application and stub source code, and in libraries such as the DCE RTL. It is described in detail in section 5. (c) *Observer* is a mechanism within the process's address space that manages the sensors and optimizes the transfer of data outside the address space. It pushes the sensor data to the NPCS once per reporting interval, using the *PRI* interface. This library functionality resides within the DCE RTL and supports the PMI interface. It is described in detail in section 12. (d) *NPCS* is the networked performance collection service. There is one per node, and it supports access and control requests for distributed application performance data over the heterogeneous network. It supports the NPMI and PRI interfaces. It is described in detail in section 12. (e) *Encapsulated library* is the vendor-specific library that supports communication between the observer and NPCS. This library implements the platform-specific version of the standard PMI and PRI interfaces. It is described in detail in section 7.2. The following standard interfaces are also shown in figure 4: (a) *Networked Performance Measurement Interface (NPMI)* -- The standard interface to a DCE-based node-level service (NPCS) for accessing and controlling performance data collected by the measurement system in a heterogeneous network. This interface is used to access and control sensor data from components of a Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 8 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 distributed application and then construct correlated information about the application. A novel feature of the NPMI is that it supports locating _client_ processes (that today are not locatable using standard DCE services). It is described in detail in section 8. (b) *Networked Performance Reporting Interface (NPRI)* -- The standard interface to a PMA, used by the NPCS for reporting sensor data. It is described in detail in section 9. (c) *Performance Measurement Interface (PMI)* -- The standard interface to a DCE-based service, for accessing and controlling performance data collected by the measurement system in a heterogeneous network. The interface is provided automatically by DCE for all DCE client and server processes. It is described in detail in section 10. (d) *Performance Reporting Interface (PRI)* -- The standard interface to a DCE-based node-level service (NPCS), for reporting sensor data collected by each process. It is described in detail in section 11. 1.3. A Vision of a Distributed Application Monitor This section describes a vision of a performance measurement infrastructure that efficiently supports distributed application performance monitoring. It describes the need for a pervasive measurement infrastructure, the PMA presentation requirements, and the estimated design center impact. 1.3.1. Pervasive measurement infrastructure The requirements for a distributed measurement system are described in detail in [RFC 32] and supplemented in section 3. The present section discusses a vision of a software system that realizes these requirements. The components of the measurement capability described later in this document satisfy the requirements of this vision of a monitor for distributed applications. Performance instrumentation should provide data for various users and uses: (a) System designers need data to understand complex, dynamic system behavior. (b) Application designers must evaluate resource consumption of designs. (c) System managers require data for system sizing and acquisition, monitoring performance goals service levels, load balancing and planning for future capacity. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 9 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (d) System analysts collect data to determine input parameters to application models. (e) System vendors can use this data to evaluate workload demands on the services they provide. From these users' perspectives, different vendor solutions should converge to provide a seamless, single, "logical" view of the behavior of the distributed environment. This demands that a distributed measurement system must collect heterogeneous data from all vendor systems (nodes) and present it for analysis in a consistent manner. Therefore the specification of a distributed measurement system must define a common set of performance metrics and instrumentation to ensure consistent collection and reporting across heterogeneous platforms, define standard APIs to ensure pervasive support in heterogeneous environments, and utilize self- describing data to ensure accessibility, extensibility and customizability of the measurement architecture in heterogeneous environments. For ease of use the measurement system should support concurrent measurement system requests with different configurations and sampling intervals, allow enabling/disabling the instrumentation on a running system without disrupting an active application environment, and support custom application-defined metrics and instrumentation. Collected data should also be accessible by third-party performance monitors and application clients. A performance measurement system, although not a system management service in and of itself, is an important aspect of any system management capability. Therefore, the measurement system should converge wherever possible with relevant measurement standards and node-based measurement facilities. It should also provide a closed feedback loop, so that changes in a distributed application environment are evaluated using the data collected by the measurement system. The measurement system should provide a correlated view of resource consumption across heterogeneous network nodes. It should also provide an infrastructure for integrating disparate performance measurement interfaces from the host operating system, networking, and major subsystems in the distributed systems infrastructure. Figure 3 illustrates our notion of a measurement infrastructure that is closely integrated with a distribution infrastructure. Instrumentation (depicted by measurement "meters") is dispersed throughout the software components. These components, when grouped in a logical manner, constitute a distributed application. The measurement system collects, transmits, reduces and correlates data from all relevant constituent components. These components include the distribution infrastructure (such as DCE), the host platform (an Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 10 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 instrumented operating system such as HP-UX or AIX, or a non- instrumented operating system such as those found on PCs), other middleware components (such as Distributed Objects or Transarc's Encina transaction manager), as well as the application developed client and server code. # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 3.* A measurement infrastructure for the performance monitoring of distributed applications. A well-designed measurement infrastructure should provide a "centralized" view of distributed objects and measure all aspects of the distributed application, not just the distribution infrastructure. It is crucial to support a centralized view of the distributed application, regardless of the physical location of the components. For maximum flexibility, this centralized view is available from any node (assuming proper authorization). Finally, the instrumentation needs to provide a logical-to-physical mapping of the sensor names, as known by the user and stored by the measurement system. The alternative to the approach illustrated in Figure 3 is to use several different performance tools, each running in a unique window, different for each platform in the network, presenting non-correlated and sometimes contradictory data. This approach is cumbersome, error-prone, inefficient, and ultimately useless, since distributed applications consist of interactions between logical groupings of software services. These logical groupings are impossible to capture and present without standardized instrumentation. Unfortunately, without standard performance instrumentation this is the only realizable alternative. The efficiency of the infrastructure is important. If enabling performance monitoring excessively perturbs the environment then it is useless. The measurement system should minimize in-line overhead (the overhead in the direct dynamic path of the application) by deferring processing to outside of the application's direct path whenever possible. This technique still consumes CPU on the node, but minimizes the negative effect on application response time. Creating variable-size information sets (with increasing resource consumption) was described in section 1.2.6. Such variable information sets allow a person to "dial in" only the necessary monitoring data collection level (which minimizes overhead). A goal Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 11 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 of the measurement system is to minimize network bandwidth consumed by the transmission of collected data. This is accomplished by summarizing data over intervals (instead of reporting every individual data item as it occurs), and supporting bulk retrieval interfaces. Transmitted data may contain confidential information on application components or location, and requires a secure network communication channel to eliminate interception or modification. In summary, standardized, pervasive performance instrumentation provides the following benefits: (a) Supports monitoring of services on heterogeneous nodes. (b) Ensures consistent metrics for interpretation. (c) Provides fine grained view of server operations. (d) Provides correlated views of client and server performance. 1.3.2. Possible PMA presentation views The instrumentation and measurement system described by this RFC can provide data to support the following graphical and tabular presentation views of the PMA: (a) *Summary Application View* -- Display the response time and throughput of the application, by monitoring all or a subset of the application clients in the DCE Cell. (b) *Summary Application Server View* -- Display the response time, throughput, and CPU utilization of all or a subset of the application servers in the DCE Cell. (c) *Summary Application View By Network Node* -- Display the response time and throughput of the application, by monitoring all or a subset of the application clients executing on a particular network node. (d) *Summary Application Server View By Network Node* -- Display the response time, throughput, and CPU utilization of all or a subset of the application servers executing on a particular network node. (e) *Component Application View* -- Display the response time or throughput components of the application by monitoring all or a subset of the application clients in the DCE Cell. (f) *Component Application Server View* -- Display the response time, throughput, and CPU utilization components of all or a Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 12 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 subset of the application servers in the DCE Cell. This includes fine-grain measurements at the level of per-interface summaries, and per-manager operation summaries. However, a PMA is not required to support _all_ of these views, or _only_ these views. 2. SCOPE OF PROPOSAL As we investigated the need and requirements for DCE performance instrumentation, we discovered that there exist several related activities and uses of performance data. How this specification incorporates these requirements is discussed in this section. 2.1. Scope (a) *Performance Instrumentation* -- The specific requirements for instrumentation are described in RFC 32.0 [RFC 32]. The requirements presented here supplement those outlined in RFC 32.0. (b) *Managed Objects* -- The DCE Management SIG group is defining a set of managed objects for the DCE [RFC 38]. We have reviewed their proposal and are working with the team to incorporate performance metrics into the managed object definitions. (c) *Event Tracing* -- The generalized tracing of events to collect performance data is an inheritantly non-scalable approach. Consequently it is not described in this document. A generalized event tracing mechanism for DCE is described in the RFC [RFC 11]. (d) *Computer Measurement Group PMWG Measurement Interface* -- This group has proposed a standard OS performance measurement interface definition [CMG], and submitted it to X/Open. We support this effort but do not address it directly due to its current state as a submitted (as contrasted with accepted) X/Open draft. (e) *Performance Management* -- The instrumentation described herein forms the basis for a performance management system, but a management system _per se_ is not described. That work should remain in the domain of management application products. (f) *SNMP/CMIP and Network Management* -- These techniques focus on network device management, in contrast to the application performance management described within this document. We support a "polling" function for the NPMI interface that can be used by an SNMP agent to collect performance measures from this instrumentation. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 13 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (g) *Accounting* -- The instrumentation provides some data necessary for accounting purposes (such as charge-back), but does not describe an accounting system _per se_. (h) *Fault/Error Detection* -- Errors within an environment can have a serious performance impact, because of aborts or retries. The measurement system described here counts error conditions for RPCs. 2.2. Users The following users of the performance instrumentation were identified. 2.2.1. Highest importance (a) Operational and Administration Management. Performance sensors should yield the critical information to enable dynamic control of a distributed application to improve its performance. Capacity planning and modeling are involved here as well since they utilize this data as input parameters. 2.2.2. Medium importance (a) Resource Accounting (partially an auditing function; not only performance data needed here). A goal is to provide resource consumption data that accounting requires, to eliminate redundant collection mechanisms. This proposal is not intended to be a competitive or complete mechanism for all of accounting's needs. Some information is outside of the capabilities described in this paper. (e.g., a strict accounting of "which client called which server method", and all the network, CPU, memory, and disk resources for that RPC). (b) Tracing of Transactions and Events (for modeling or auditing). Required for topology and application understanding. No event trace facility is provided by this proposal. 2.2.3. Lowest importance (a) Detailed System S/W observation (tuning/troubleshooting). There will always be a role for "lab tools", which by virtue of high overhead on the system or proprietary low-level nature, are not feasible in the production environment of an end-user. Lab tools will continue to exist but this specification does not explicitly address their requirements. However, this Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 14 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 proposal does not preclude their use. Tools built on top of this proposed infrastructure can be used in the lab in providing basic information that is easily obtained (as does `vmstat()' and `iostat()' serve in some internal benchmarks for sanity checking). 3. MEASUREMENT SYSTEM REQUIREMENTS The following are the basic requirements that we agreed are necessary for the success of this specification. When we ranked them, only a few were ranked less than "MUSTS". (a) Extensibility of architecture: (i) Allow dynamic creation of new sensors. (ii) Extends to data store (self-describing data). (iii) Basic sensor types provide most functionality. This specification does not aspire to recognize every sensor need that might ever be needed for distributed systems. As a result, the architecture must have extensibility as its core, to accommodate new sensors throughout its collection, naming, and display capabilities. As new applications are developed, middleware versions are released, or current runtime libraries are enhanced, the recognition of the need for additional sensors must be accommodated. (b) Dynamic Control of sensors: (i) Enable/disable sensors (i.e., instrumentation can be dynamically disabled such that overhead is negligeable (~ 0%), when sensors are off). (ii) Select amount of sensor data (sums, means, variance, histograms). (iii) Deliver sensors data periodically, or only at thresholds. In the interests of operational efficiency, only the overhead associated with the currently required sensors should be imposed on the system. Even with a particular sensor, there needs to be the capability of providing simple sums or means when this information is sufficient, but also have the capability to supply higher statistical moments or distributions when necessary. (c) Pervasive instrumentation: Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 15 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (i) No application source changes required for instrumentation. (ii) No application recompilation necessary to enable sensors. (iii) Environment is pre-populated with basic sensors. This requirement assures the DCE customer that his/her application is monitorable, independent of the hardware platforms on which it is running. (d) Measurements available in production systems: (i) Sensor overhead under strict architecture constraints. (ii) Dynamic control of sensors. This requirement assures the DCE customer that his/her application is monitorable in a production system, since the architecture specification has strict guidelines to minimize overhead. (e) Administration ease of handling sensor meta-data: (i) Naming, classification, and registration. (ii) Easily controlled sensor status. Sensors are more complex than simple counters. The architecture which prescribes their naming, organization and control is thereby critical to implementation and deployment. (f) Consistency of sensor metrics: (i) Definitions (agreement on specifics and names as described in RFC 32.0 [RFC 32]). (ii) Results (all vendor implementations). Pervasive instrumentation also requires consistently defined metrics, so that valid operations can be performed on sensors implemented in a heterogeneous environment. (g) Security: (i) Controlled access to interfaces. (ii) Protected performance data on the network. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 16 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 Provide user configurable access and data protection for sensor names and data. (h) Validation Suite (at implementation): (i) Adherence to the sensor performance spec. (ii) "Branding" to conformance functional specification set. Ensure that metrics are valid from release to release. (i) Compatibility -- Interplay with other performance tools: (i) Higher importance: [a] X/Open DCI. (ii) Lesser importance: [a] SNMP. [b] 3rd party tools (e.g., PerfVIEW (HP) and Toolkit/6000 (IBM)). Ease access to performance data for new and legacy application and system management tools. 4. PERFORMANCE METRICS AND STATISTICS This section describes the metrics and statistics that guide the design and placement of performance instrumentation. Performance metrics are provided for a client perspective (end user) and for a server perspective. A detailed description of the sensors that collect these performance metrics is found in section 12. 4.1. Fundamental Performance Metrics The following metrics define the quantities and the notation that are used throughout the remainder of the document. The metrics and notation have been derived from [Laz]. (a) *T* -- The length of *time* that observations (measurements) were made. (b) *A* -- The number of request *arrivals* observed. (c) *C* -- The number of request *completions* observed. (d) l -- The *arrival rate* of requests: l = *A / T*. (The standard notation is the lower-case Greek letter lambda, Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 17 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 instead of "l".) (e) *X* -- The *throughput* of completions: *X* = *C / T*. (f) *B* -- The length of time that a single resource was *busy*. (g) *U* -- The *utilization* of a resource: *U* = *B / T* = *X * S*. (h) *S* -- The average *service requirement* per request: *S* = *B / C*. (i) *N* -- The average *number of requests* in the system: *N* = *X * R*. (j) *R* -- The average system *response/residence time* per request. (k) *Z* -- The average user *think time*. (l) *Vk* -- The average number of *visits* that a system level request makes to resource *k*. (m) *Dk* -- The *service demand* at resource *k*: *Dk* = *Vk * Sk* = *Bk / C*. (n) *Qk* -- The average *queue length* at resource *k*. (o) *Wk* -- The average *waiting time* at resource *k*. (p) *Lk* -- The average count of *locking contention* (unsatisfied lock requests) at resource *k*. In general, a metric with an annotation of "*k*" is for a particular resource *k*. Non-annotated metrics are for the system as a whole. The above non-annotated metrics can also be defined for a particular resource. For example, "l*k*" is the arrival rate of requests at resource *k*. 4.2. Client Performance Metrics The following metrics are collected or derived from a client perspective: (a) Response time. (b) Number of server request completions. (c) Service demand. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 18 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (d) Think time. (e) Number of active clients in system. (f) Length of measurement interval. 4.3. Server Performance Metrics The following metrics are collected or derived from a server perspective: (a) Number of arrivals. (b) Arrival rates. (c) Number of completions (only non-error RPCs are counted). (d) Throughput. (e) Service requirement. (f) Residence time. (g) Visit count (includes error conditions). (h) Waiting (queue) time. (i) Queue length. (j) Utilization. (k) Measure of locking contention (count). (l) Length of measurement interval. 4.4. Collected Statistics The instrumentation must provide analysis software with the data required to compute the following statistical quantities: (a) _Minimum_, during a sensor reporting interval. (b) _Maximum_, during a sensor reporting interval. (c) _Sum_, since sensor enabled for collection. (d) _Mean_, since sensor enabled for collection. (e) _Variance_, since sensor enabled for collection. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 19 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 5. STANDARD AND CUSTOM SENSORS This section describes how sensors are named in the cell, and their high level functions. The macro primitives used to construct these sensors are described in section 6. This section focuses on the standard (default) sensors in the distribution infrastructure (i.e., DCE), and custom sensors usable by other middleware technologies and application developers. 5.1. Sensor Naming This section describes the semantics and syntax of sensor naming. 5.1.1. Terms of interest Several terms are used in sensor naming and are described as follows: (a) A *metric* is an abstraction without physical meaning, e.g., marshalling time. This is the concept of interest to the performance analyst. (b) An *instance* is a physical manifestation of the metric, e.g., marshalling time for inbound parameters for interface `interface_0' and its `manager_operation_2()' operation. (c) A *sensor* is the implementation that measures an instance of a metric in a particular process's address space on a particular host. Consequently, metrics are not dynamic, but instances are. The dynamic instances are those aspects that may not be known at process link or load time, such as interface (since a server can register and unregister interfaces) or fileset (since filesets can be moved between DFS servers). The sensor name should have the dynamic elements as the suffix to allow naming into SNMP MIBs. The full name of a sensor consists of three parts: (a) The process name. (b) The metric name. (c) The instance. The process name is used by the performance management application (the NPMI client) to locate the correct NPCS and tell it what sensors are of interest. The metric name and instance are converted by NPCS into the corresponding sensor identifier which is used to access the right sensor. The data structures that implement naming are described in section 7.3.2. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 20 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 5.1.2. The process name The process name identifies which process on which host is being queried. A process may have more than one name, e.g., a CDS server can be named by /.:/hosts/dceperf.node101.osf.org/cds-server as well as by /.:/hosts/dceperf.node101.osf.org/perf-server/cdsd or by /.:/hosts/dceperf.node101.osf.org/perf-server/11345 A `dfsbind' (client-side DFS helper) could be named as /.:/hosts/orion.node42.osf.org/perf-server/dfsbind or /.:/hosts/orion.node42.osf.org/perf-server/14316 The process name is used by the NPMI client to bind to the appropriate NPCS, thus any naming scheme that can be used by DCE clients to bind to DCE servers will work for NPMI clients as well. For current DCE implementations, that is the DCE Cell Directory Service (CDS). In the future this may be Federated Naming or other schemes. The names used to specify a particular process to the NPCS can be either process IDs or executable names. The process ID is guaranteed to be unique, but requires first somehow finding out the ID, either by querying NPCS or other means. It may not have meaning on some platforms. The program name is more user-friendly, but may not be unique, especially in the case of clients on multi-user machines. The process ID is also more suitable for use by numeric naming schemes such as SNMP. Both the process name and service name allow for continuity in time despite server restarts. They also avoid the problem of recycling of process IDs by the OS. 5.1.3. The metric name The second part of the sensor name is the name of the particular metric (e.g., `rpc_calls'). The third part specifies the instance, e.g., protocol or interface and manager. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 21 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 A metric has only one name, which is specified in this section for standard sensors, and made public via some similar mechanism for custom sensors. To avoid collisions these start with a domain identifier, where domain is the name of the DCE-based service domain (e.g., Encina, DFS, User, DCE, Security, ...). These domains should be registered with the OSF and documented in an OSF-RFC. The metric name has two forms, a human readable list of slash- separated names (e.g., `dce/packets-out/protseq'), and a dot- separated list of numbers or *object ID (OID)* (e.g., `1.3.4'). These names are then suffixed with the name identifying the instance, giving, say, `dce/packet-out/protseq/ncadg_ip_udp' and `1.3.4.1'. It is expected that users will typically specify a sensor by the human-readable name, while programs are more likely to use the object ID notation amongst themselves. Also, when SNMP agents are mapping the metric namespace into the MIB, the OID for the sensor will be the name used in the MIB. For efficiency, the data provided by a sensor is treated as atomic, and any subparts are not nameable. The entire set of data is accessed as a whole via both the PMI and NPMI. 5.2. General Sensor Functions This section describes functions supported by all sensors. 5.2.1. Fast-path The fast-path option supports non-locking, to minimize update cost for those sensors where losing an update is considered acceptable. Note that this option cannot result in decreased reliability of a DCE process or service. 5.2.2. Information sets Selectable statistical levels are supported for each sensor, namely, the minimum, maximum, sum, mean, and variance are collected, based on the collection information set. 5.2.3. Reporting interval Selectable reporting interval allows modifying the interval (in seconds) that the sensor summarizes and reports data. Larger intervals reduce the amount of data transmitted across the network while reducing the granularity of the events measured. Summarization intervals will range from a minimum of 5 seconds to a maximum of 60 minutes. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 22 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 5.2.4. Counter overflow Counters are 32-bits (unsigned). This provides support for an activity that executes at the rate of 1.19 million operations per second for a maximum summarization interval of 1 hour. Overflow is a concern only if the counters value wraps twice in a single summarization interval. This is not likely. Consequently, overflow will be handled by the PMAs, since the data is cumulative and can be extracted. Sensors do not have to worry about overflow. 5.2.5. Threshold detection Threshold detection and notification occurs for counter and timer sensors when a threshold condition is true. A threshold condition is a value range and a flag that specifies whether the threshold test should occur for values above or below this configured value. For example, a response time sensor set to detect thresholds would report data only when a user-configured threshold condition is true (for example, maximum response times are greater then 20 seconds). It is important to note that threshold detection is based on minimum or maximum values. 5.2.6. Minimum and maximum values During a reporting interval, the minimum and maximum values are retained and returned. At the end of each reporting interval, the minimum and maximum are reset. This provides insights into the variation of the metric for a single interval (and not over the long term; it is a responsibility of the PMA to keep track of long term minimum and maximum behavior). 5.2.7. Histograms Histograms provide distribution frequencies for a monitored event. They are not supported in this version of the specification, but are a candidate for future support. 5.2.8. Registration Standard and custom sensors register with NPCS using the data structures and functions described in sections 7.3.2 and 6.2. Custom sensors also require a utility to load their specific metric attributes into the DCE CDS for use throughout the cell. This utility is not defined by the specification. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 23 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 5.2.9. Metric types The specification defines a wide range of metric attributes that are described in detail in section 7.3.7. 5.3. Counter Sensors Based on the client and server metrics described in sections 4.2 and 4.3, the following counter sensors are implemented for each client process and for each server RPC interface. For each sensor the minimum, maximum, sum, mean, and variance are collected based on the collection information set. 5.3.1. Standard client counter sensors (a) Calculate the _client RPC throughput rate_. This measures the client's total RPC throughput rate as determined by the number of successful completions of client RPC requests per unit time. Collect the data to compute the following: (i) Total for all servers invoked. (ii) Total by server. (iii) Total by server-interface. (iv) Total by server-interface-operation. Note that throughput is a rate. The sensor keeps track only of request completions, thus higher-level software must divide this by the current measurement interval to compute the rate. (b) Count the number of _RPC calls initiated_ by the client. This measures the frequency of client requests. Collected for each RPC server interface invoked by the client. (c) Count of _total RPC packets sent_ by the client. This metric measures the number of packets sent by the client, and should be collected per protocol sequence (i.e., the number of packets passed to the network transport -- not necessarily the number of network packets). Collected for each RPC server interface invoked by the client. (d) Count of _total RPC packets received_ by the client. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 24 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 This metric measures the number of packets received by the client, and should be collected per protocol sequence (i.e., the number of packets passed to the network transport -- not necessarily the number of network packets). Collected for each RPC server interface invoked by the client. (e) Number of _total bytes sent per RPC call_ from the client to the server. This metric provides information about the size of the data transferred from the client to the server. Collected for each RPC server interface invoked by the client. (f) Number of _total bytes received per RPC call_ from the server to the client. This metric provides information about the size of the data transferred from the server to the client. Collected for each RPC server interface invoked by the client. (g) Count the _number of RPC call errors and failures_. This information, although not a "performance" metric properly so-called, provides insight into the operational environment, and whether error conditions might be causing performance problems. (h) Count the number of _lock request waits_. Count the number of DCE thread lock requests that could not be satisfied, and so resulted in thread waits. Note that the lock path is a high-frequency, performance-critical path, and extra care must be employed to instrument it without resulting in a performance degradation. (i) Count the number of server _binding lookup requests_. Count the number of NSI (or, perhaps in the future, XFN) binding look-ups and imports. Collected for each RPC server interface invoked by the client. (j) Count the number of _NSI entities returned_. Count the number of NSI (or XFN) entities returned from look- ups and imports. Collected for each RPC server interface invoked by the client. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 25 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 5.3.2. Standard server counter sensors (a) Calculate the _server throughput_ rate. This measures the server's total RPC throughput rate, as determined by the number of successful completions of client RPC requests per unit time. Collect the data to compute the following: (i) Total by server. (ii) Total by server-interface. (iii) Total by server-interface-operation. Note that throughput is a rate. The sensor keeps track only of request completions, thus higher-level software must divide this by the current measurement interval to compute the rate. (b) Count of _total RPC packets sent_ by the server. This metric measures the number of packets sent by the server for all clients. This metric should count packets sent by the server including nested RPCs sent to other servers. Collected for each RPC server interface. (c) Count of _total RPC packets received_ by the server. This metric measures the number of packets received by the server for all clients. This metric should count packets received by the server including nested RPCs received from other servers. Collected for each RPC server interface. (d) Number of _total bytes sent per RPC call_ from the server to the client. This metric provides information about the size of the data transferred from the server to the client. Collected for each RPC server interface. (e) Number of _total bytes received per RPC call_ from the client to the server. This metric provides information about the size of the data transferred from the client to the server. Collected for each RPC server interface. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 26 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (f) _Queue length_ at the server. This metric provides information about the queue length of RPC calls at the server, due to a lack of available call threads. This differs from calls queued (see next item), by providing a distribution of queue length. (g) Count the number of _RPC calls queued_ at the server. This metric provides information about the number of RPC calls that were queued at the server, due to a lack of available call threads. This differs from queue length (see previous item) by providing only a count of calls queued. (h) Count the number of _active call threads_. This metric provides information about the utilization of the server's thread pool, by counting the number of active (non- idle) threads. (i) Count the number of _RPC call errors and failures_. This information, although not a "performance" metric properly so-called, provides insight into the operational environment and whether error conditions are causing performance problems. (j) Count the number of _lock request waits_. Count the number of DCE thread lock requests that could not be satisfied, and resulted in thread waits. Collected for each RPC server interface. 5.3.3. Custom counter sensors The following custom sensors are available to the application developer to use for specific application events. (a) _Counter sensor_. This measures the total count of an application-specified event during the previous measurement interval. 5.4. Timer Sensors Based on the client and server metrics described in sections 4.2 and 4.3, the following timer sensors are implemented for each client process and for each server RPC interface. For each sensor the minimum, maximum, sum, mean, and variance are collected based on the collection information set. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 27 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 5.4.1. Standard client timer sensors (a) _Response time per RPC call_ from the Client perspective. This measures the total elapsed time, including server processing time and delay/queueing, for a client routine that invokes a particular DCE server. Collect the following data: (i) Total for all servers invoked. (ii) Total by server. (iii) Total by server-interface. (iv) Total by server-interface-operation. Measure the elapsed time per RPC call, from the time the client's runtime initiates the call until the last packet has been received by and unmarshalled at the client. This should include nested RPC call elapsed times if other DCE servers, such as the security service, are invoked (the nested RPC call time is optionally broken out). RPCs that result in DCE errors should be reported in a separate category, not included in this one. Note that this time will not include client application or user interface response time, since those are outside ("above") the scope of the DCE services. (b) _Service requirement at client_ for all RPCs. This measures the service requirement at the client, including operating system and network software CPU processing time, required to satisfy a client's RPC request. This request may consist of multiple RPC packets, but only one RPC call. This requires that the host operating system support a performance measurement system and that DCE servers use it to gather CPU service time. The implementation of this sensor is thus host OS dependent. Data is collected on a per-server interface. (c) _Marshalling time at client_ for all RPCs. This measures the marshalling time of RPC parameters at the client required to satisfy a client's RPC request. Data is collected on a per server interface. (d) _Unmarshalling time at client_ for all RPCs. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 28 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 This measures the unmarshalling time of RPC parameters at the client required to satisfy a client's RPC request. Data is collected on a per server interface. (e) _RPC network delay_. This measures the delay of the network between a particular client and server node, as measured between client and server runtime libraries. Consequently, it measures the latency of the networking software transport, in addition to the physical network wire. The data is collected per transport protocol sequence. (DTS may already capture this "DCE ping" time, and if so, then it should be used.) 5.4.2. Standard server timer sensors (a) _Residence time per RPC call_ from the Server perspective This measures the total elapsed time, including server processing time and delay/queueing, required for the server to satisfy a client request. Collect the following data: (i) Total by server. (ii) Total by server-interface. (iii) Total by server-interface-operation. Measure the elapsed time per RPC call, from the time the server runtime receives the call until the last packet has been marshalled by the server and sent. This should include nested RPC call elapsed times if other DCE servers, such as the security service, are invoked (the nested RPC call times are optionally broken out). RPCs that result in DCE errors should be reported in a separate category, not included in this one. Note that the elapsed time does not begin to accumulate until a thread from the call-thread pool is dispatched on behalf of this incoming request; consequently, this does not include call-thread queueing time prior to the first call thread dispatch. This queuing time is collected by the initial queuing time at the server. (b) _Initial queueing time at server_ for all RPCs. This measures the queueing time of an incoming RPC request if no call-thread is available to dispatch. See residence time (previous item) for complementary elapsed measure time. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 29 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (c) _Service requirement at server_ per client request. This measures the service requirement at the server, including operating system and network software CPU processing time, required to satisfy a client's request. This request may consist of multiple RPC packets, but only one RPC call. This requires that the host operating system support a performance measurement system and that DCE servers use it to gather CPU service time. Data collected on a per-server-interface- operation basis. (d) _Marshalling time at server_ for all RPCs. This measures the marshalling time of RPC parameters at the server required to satisfy a client's RPC request. Data collected on a per-server interface operation basis. (e) _Unmarshalling time at server_ for all RPCs. This measures the unmarshalling time of RPC parameters at the server required to satisfy a client's RPC request. Data collected on a per-server-interface-operation basis. (f) _Interarrival time at server_ for all RPCs. This measures the interarrival time of incoming RPC requests. Data collected on a per-server-interface-operation basis. 5.4.3. Custom timer sensors The following custom sensors are available to the application developer to use for specific application events. (a) _Timer sensor_. This measures the total elapsed time, including processing time and delay/queueing, for an event as determined by the application developer. 5.5. Pass-thru Sensors Custom sensors can be defined that pass opaque data through the measurement systems. These sensors merely copy data from existing internal data structures. These sensor data types are opaque, and require pickling routines for support which are supplied at sensor registration time. The DCE 1.1 IDL compiler supports "pickling", i.e., support for encoding and decoding data types to and from a byte stream format. A sensor may take advantage of this pickling process to encode data into the opaque array of bytes, which it is able to transmit via the Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 30 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 standard interfaces. This allows sensors to be created with elaborate data types and provides a mechanism for that data to be marshalled. 5.6. Standard Operating System Sensors Collecting host-specific resource consumption (such as service demand) requires accessing the host operating system's measurement system. Specifically, each DCE host's operating system should provide the following application specific metrics via a standard interface: (a) CPU utilization (system + user). (b) File Disk I/Os per second. (c) Paging I/Os per second. (d) Network packets per second. (e) OS dispatcher queue length and average queue time. (f) Process physical main memory usage. (g) Process virtual memory usage. These host OS performance metrics can be reported by the observer as a process global metric. The X/Open DCI [CMG] is a good candidate to provide a standard interface to operating system measures. If the host OS does not support the DCI, then these sensors will require porting to the proprietary OS measurement interface. 6. SENSOR PROBE MACROS AND FUNCTIONS This section describes the macros that are used at various probe points to construct sensors. These software probes, implemented as a set of macros, implement each sensor to ensure consistency and decrease implementation time for DCE developers and application writers. 6.1. Sensor Data Flow During process initialization, various process-wide sensors, such as `rpc_call_thread_utlization' and `rpc_queue_utilization', are initialized and registered with the observer, using the functions in section 6.2. The sensors associated with specific server interface operations are not registered until the server registers this interface via the RTL call to `rpc_server_register_if()'. Probes defining these sensors Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 31 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 are located in the execution path of the RPC and store their data into a structure that "travels" with the RPC call. At the end of the call, after the call response has been sent to the client, all probe data is "tallied", and the global sensor data structure is updated. Some sensors are updated directly by the probe that executes during the event being sensed. When the observer thread executes it checks for entries on its tally queue and updates those sensors. Then it searches the lists of registered sensors and builds a batch of updates to send to the PRI. 6.2. Sensor Registration and Data Functions Functions for registering, unregistering sensors and queueing sensor data are described in this section. /* These function-pointer definitions allow a subsystem * designer to provide callbacks to the observer for * controlling a subsystem and its sensors. The functions * which are referenced must be re-entrant, as the code * updating the sensors and/or subsystems from the * middleware/application will be asynchronous from the * observer. Each function defines a pointer to a control * block defined by the function writer as an [in] parameter, * and a 32-bit DCE format status value as an [out] parameter. * These may be passed in as NULL values, but this will prevent * any control information from being passed back up to the * subsystem/sensor from PMAs. */ typedef void (*dms_subsys_ctl_fn_t) (void *ctlblock, unsigned32 *st); typedef void (*dms_sensor_ctl_fn_t) (void *ctlblock, unsigned32 *st); typedef void (*dms_data_pickle_fn_t) (void *data, unsigned32 *st); Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 32 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 /* This structure contains information about a subsystem which * the observer may use to construct its persistent storage -- * it's patterned from the information needed for an RPC * interface, but may be used for any type of subsystem defined * by a middleware or application designer. Note the * presumption that all operations have the same properties and * are instrumented with the same number of sensors per * operation. This functionality is for batching registrations. * Sensor registration may be performed individually. * * The array of sensor descriptors is defined with dimension 1 * to accommodate certain compiler limitations. Nonetheless, * the array may be allocated at any size. For example, one * may allocate an appropriately sized subsystem descriptor * with the following malloc call: * * ssd = (dms_subsys_descriptor_p_t) malloc ( * (size_t) (sizeof(struct dms_subsys_descriptor) + * (n_ops * sizeof struct dms_sensor_descriptor) * )); * * The array does not need to be null-terminated. */ typedef struct dms_subsys_descriptor { uuid_t subsys_uuid; void *subsys_handle; dms_subsys_ctl_fn_t ctl_fn; int n_ops; int n_sensors_per_op; char *subsysname; dms_sensor_descriptor_t sensors[1] } dms_subsys_descriptor_t, *dms_subsys_descriptor_p_t; /* This structure contains information about individual sensors * which the observer needs to construct its persistent storage * of sensor data and for registering sensors through the PRI. * These structures may be chained into the sensors field of * the subsystem descriptor to batch sensor registrations. * * The following fields may be set to 0 (or NULL) to disable * the respective functionality: * ctl_fn * millisec * attrs */ Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 33 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef struct dms_sensor_descriptor { uuid_t sensor_id; void *sensor_handle; int op_num; dms_sensor_ctl_fn_t ctl_fn; char *sensorname; int millisec; /* sampling interval; 0 if event-sampled */ dms_data_descriptor_p_t sensor_data; void *attrs[dms_HIGHEST_ATTRIBUTE] } dms_sensor_descriptor_t, *dms_sensor_descriptor_p_t; /* The following structure is for describing a sensor's data * format. */ typedef struct dms_data_descriptor { size_t datasize; void *data; dms_data_pickle_fn_t data_fn } dms_data_descriptor_t, *dms_data_descriptor_p_t; /* For registering interfaces or custom subsystems. */ void dms_obs_register_subsys ( dms_subsys_descriptor_t *subsys, void **subsys_handle, unsigned32 *st ); /* Opposite of register_subsys. */ void dms_obs_unregister_subsys( void *subsys_handle, unsigned32 *st ); /* For registering sensors. */ void dms_obs_register_sensor( dms_sensor_descriptor_t *sensor, void *subsys_handle, void **sensor_handle, unsigned32 *st ); /* Opposite of register_sensor. */ Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 34 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 void dms_obs_unregister_sensor( void *sensor_handle, unsigned32 *st ); void dms_obs_queue_data( void *sensor_handle, dms_sensor_descriptor_t *sensor, unsigned32 *st ); 6.3. Sensor Probe Macros This section describes the probe macros used to create sensors. For each macro, only the function signature (pseudo-prototype) is provided. The macro body has been excluded in the interest of brevity. Note that the sensor data location is passed into each relevant macro. /* Utility functions: Zero-out the values in a timestamp * Pseudo-prototype: * void DMSTIMEZERO(struct dms_timestamp *); */ /************************************************************** * For those cases where interval times are deemed more * appropriate, the following data and macro definitions may be * used. */ /* An interval timer data structure allows preservation of both * begin and end timestamps, returning the interval in a new * timeval structure. */ typedef struct dms_itimer { struct dms_timestamp intervalstart; struct dms_timestamp intervalstop; struct timeval interval; } dms_itimer_t; /* Start interval timer * Pseudo-prototype: * void DMS_INTERVALSTART(struct dms_itimer); */ /* Stop interval timer, and calculate wallclock time * Pseudo-prototype: * void DMSINTERVALEND(struct dms_itimer); */ Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 35 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 /************************************************************** * Counter and MIN/MAX Probe Data structures */ /* Counter element. */ struct dms_probe_cnt { long counter; /* local value maintained by probe */ }; /* Minimum/Maximum element. */ struct dms_probe_mm { int reset; /* reset command from sensor */ unsigned long value; /* value maintained by probe */ unsigned long *datum; /* ptr to comparison datum */ }; /* A pass-through probe datatypes: to be used for sensing * counters and/or timers (in gettimeofday() format) and/or * amorphous data chunks maintained elsewhere. */ struct dms_probe_vpt { unsigned long localval; /* local value maintained by probe */ unsigned long *value; /* pointer to value fetched by probe */ }; struct dms_probe_tpt { struct timeval localval; /* local value maintained by probe */ struct timeval *value; /* pointer to value fetched by probe */ }; /************************************************************** * Counter Probe. * * This probe will add any value to its counter. The second * argument may be a reference to a delta value maintained * elsewhere or to a constant. */ /* Pseudo-prototype: * void CNTPINIT(struct probeCounter A); */ #define CNTPINIT(A) (A).counter = 0; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 36 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 /* Pseudo-prototype: * void CNTPROBE(struct probeCounter A, long valp); * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */ /* Minimum/Maximum probes. * * These probes store the minimum [maximum] value of their * current value and a value stored elsewhere at the time they * execute. * * They are implemented to allow resetting. The process for * resetting utilizes a "reset flag" in the probe structure. * When the controlling thread, usually the observer or a * thread under it's control, wants to reset the probe, it * unconditionally writes a non-zero value to the reset flag. * When the probe actually executes it checks this flag for * non-zero and branches based on its value: * If zero, it executes the minimum [maximum] function. * If non-zero, it sets the data value to the current value * of the data and then clears the reset flag. Once the * reset flag is clear, the controlling thread may consider * the data valid again. * This procedure is designed to minimize exposure to a case of * multiple threads trying to write data to the value location, * resulting in lost data. */ /* Pseudo-prototype: * void MAXPINIT(struct probeMinMax A, long *datp); * `datp' points to a long which is the comparison value in * this and the following probes. */ /* Pseudo-prototype: * void MAXPINIT(struct probeMinMax A, long *datp); */ /* Pseudo-prototype: * void MAXPRESET(struct probeMinMax A); */ /* Pseudo-prototype: * void MINPRESET(struct probeMinMax A); */ Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 37 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 /* The minimum probe will store the minimum of its present * value and the datum it is sensing to its value. The maximum * probe simply reverses the comparison clause of the ternary * operation. The value is an unsigned long, the datum is a * pointer to unsigned long. */ /* Pseudo-prototype: * void MAXPROBE(struct probeMinMax A); * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */ /* Pseudo-prototype: * void MINPROBE(struct probeMinMax A); * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */ /* Pseudo-prototype: * void PASSPROBE(dms_probe_vpt) * The function of this probe macro is to snapshot a dynamic * value stored outside the context of the DMS to a local value * in order to lessen concurrency issues and hopefully provide * more stable readings. Its use is not mandatory. * * This macro should work fine for either value or time * pass-throughs. * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */ Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 38 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 6.4. Sensor Timer Functions Timestamps play a crucial role in instrumentation but can also have high overhead. To resolve this the specification has defined several high-speed timer functions. /************************************************************** * TIME functions. * * The DCE runtime maintains a correlation between of the value * returned by dms_gettime() with that returned from * gettimeofday(). The clocks should be presumed to be stable * and accurate and to remain exactly correlated over the * periodic re-correlation interval. The re-correlation * interval should be a fairly small fraction of the * dms_gettime() wrap interval. For instance, a 200 MHz * machine for which the time is maintained as a 32-bit value * of system clock ticks will wrap in about 20 seconds. * * We recommend a re-correlation interval of 5 seconds. This * should be a small enough fraction of the wrap time, yet * infrequent enough to avoid unnecessarily increasing the * gettimeofday() overhead. */ #include /* The following should be available from . */ #ifndef ULONG_MAX # define ULONG_MAX 0xFFFFFFFFUL #endif #ifndef UINT_MAX # define UINT_MAX 0xFFFFFFFFU #endif #ifndef INT_MAX # define INT_MAX 0x7FFFFFFF #endif #define USEC_PER_SEC 1000000 typedef unsigned long dms_time_offset_t; typedef struct dms_timestamp { struct timeval base_wallclock; dms_time_offset_t base_ticks; dms_time_offset_t current_ticks; } dms_timestamp_t; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 39 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 /************************************************************** * DMS_TIMESTAMP() retrieves the information necessary for * computing an accurate timestamp (later) without calling * gettimeofday() inline. It is structured to preserve the * information which will be required for later, out-of-line * calculation of time intervals. This macro must be passed a * valid pointer to struct dms_timestamp. * Pseudo-prototype: * void DMS_TIMESTAMP(struct dms_timestamp *); */ /************************************************************** * DMS_TICKS_TO_USEC() converts system-clock ticks to * microseconds. This macro must be passed a valid * dms_time_offset_t. It is not normally invoked directly by * user code. * Pseudo-prototype: * unsigned long DMS_TICKS_TO_USEC(dms_time_offset_t); */ /************************************************************** * DMS_TS_TO_TV() converts the time stored in a dms_timestamp * structure to the format of timeval. Both input pointer * parameters must be valid. It is not normally invoked by * user code. * Pseudo-prototype: * void DMS_TS_TO_TV(struct dms_timestamp *, struct timeval *) */ /************************************************************** * DMS_SUB_TIME() returns the difference between two timestamps * into a timeval structure * Pseudo-prototype: * void DMS_SUB_TIME( * struct dms_timestamp *, * struct dms_timestamp *, * struct timeval *); * If the timestamp for the end time is earlier than the * timestamp for the begin time, this macro will compute a * negative interval which may cause problems. Therefore, the * caller must check for the error condition (negative seconds * field -- the microseconds field is unsigned). */ /* DMS_GETTIMEOFDAY() fills in a struct timeval with the "real, * current" wallclock time without calling gettimeofday(). * Pseudo-prototype: * void DMS_GETTIMEOFDAY(struct timeval *); * This macro requires a valid pointer-to-struct-timeval. */ Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 40 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 /************************************************************** * dms_gettime_int() is a fast, implementation-specific * function which returns an unsigned long with a * machine-dependent resolution. Each implementor must provide * this system-specific function and the conversion factor * specifying the relationship of this number to a standard * time unit such as seconds or microseconds. */ extern dms_time_offset_t dms_gettime_int(void); 7. STANDARD INTERFACES To achieve pervasiveness in a heterogeneous environment, the measurement system must support standardized interfaces that support access and control of both server and client sensors. This section provides an overview of the standard application programming interfaces (API), data structures and related capabilities. The "official" IDL files are located in appendices, and they supercede the discussion in this section. 7.1. API Overview The standard interfaces of this spec provide the framework for inter-node and intra-node DCE performance instrumentation control and data transfer. Four APIs provide for the relationships diagramed in Figure 4 for each node in a DCE cell. These four interfaces are the: (a) *PMI* -- Performance Measurement Interface, to the sensors contained within a process, used by the NPCS. (b) *PRI* -- Performance Reporting Interface, used by the observer to report sensor data to the NPCS. (c) *NPMI* -- Networked Performance Measurement Interface, to the NPCS, used by the PMA to communicate with the NPCS. (d) *NPRI* -- Networked Performance Reporting Interface, used by the NPCS to send sensor data to the PMA in bulk. There are two categories of APIs. First, there are APIs at the DCE process level (the PMI and PRI); second, APIs at the node (machine) level (the NPMI and NPRI). The NPMI and the NPRI are used by the PMA developer. The PMI and PRI are used by the DCE vendor and the NPCS developer. The NPMI provides the interface between the NPCS and any Performance Management Applications (PMA's) that wish to access DCE performance Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 41 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 instrumentation. The PMI provides the interface between the NPCS and DCE client and server processes. These processes contain the performance instrumentation, sensors. Basically PMA's use the NPMI to discover and request/receive data from sensors. The NPCS uses the PMI to gain knowledge of DCE client and server processes, control the configuration of sensors, and receive data from sensors. The NPMI and NPRI interfaces are RPC interfaces to leverage security and naming features of DCE. The PMI and PRI are node-local and can use any relevant IPC mechanism, including RPC, implemented in the encapsulated library described in section 7.2. # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 4.* NPRI, NPMI, NPCS, PMI, PRI and sensor relationships. Also, the NPCS is shown as an independent mechanism. Whether it is an independent process or part of another process is implementation- specific. The NPMI, PMI, NPCS, and sensors exist and operate to provide PMA's with DCE performance instrumentation in the manner described below. The PRI and NPRI provide the communication channel to efficiently return sensor data to the PMA using a push protocol. During the steady-state, runtime sensors collect specific metrics within the DCE environment whenever a thread executes their set of probes. Probes are the (inline) code sequences that capture the data needed to produce a metric, e.g., timestamps for a response time metric. This relationship is illustrated in Figure 5. During the execution of a distributed application, the flow of control passes from the client code into the client stub into the DCE runtime library (RTL1), possibly across a network, into the DCE runtime library (RTL2), into the server stub and into the server code. The thread of execution returns in a reverse manner. As it passes through RTL2 it encounters two probes, a begin-response-probe and an end-response-probe. After it passes through the end-response-probe the appropriate sensor is located and updated. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 42 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 5.* Flow of control, probes and sensors shown for a response time sensor in the DCE run time library. Probes are not restricted to the RTL and can also occur in client or server source or stubs. Sensors provide metrics by supplying the component values necessary to calculate intervalized metric values in probes and store sensor data in a process accessible structure. The component values provided by sensors are in the form of cumulative totals, for example. A sensor with the purpose of providing a response time metric (ignoring location) would make available a total number of responses (R), and a total of the time spans to produce those responses (RT). These values could be taken from the sensor at the beginning and ending of a time interval and the mean response time for that interval. The observer (also known as the "address space helper thread") periodically captures the metric component values for each sensor that has been configured. The capture periodicity is specific/unique to each sensor. The observer will then communicate the captured metric component values and a timestamp to NPCS through the PRI interface. The NPCS provides a consistent node-level view of all DCE performance instrumentation on a given node. It maintains a registry of sensors and observers provided to it through the PRI interface. It responds to queries against that registry made through the NPMI interface. It maintains a (single) copy of the latest captured metric component values for all registered sensors, communicated to it through the PRI interface. It maintains a registry of the collection of sensors that each PMA has configured through the NPMI interface. Based on the configurations requested by all PMAs, NPCS configures individual sensors through the PMI interface. It communicates the component metric values of any sensors that have been active during the requested (PMA-specific) interval through the NPRI interface. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 43 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 7.2. Encapsulated Library The connection between the NPCS and the instrumented DCE processes is a critical one; it is very high volume, so its performance is a major factor in minimizing the impact of instrumentation on the overall performance of a node. Because of this, the connection is specified as two interfaces whose implementation is deliberately left vendor- specific; the goal is to allow full use of any available system- specific mechanisms to minimize the overall cost of transfers. The central focus is on the actual reporting of collected data, since this will be the greatest volume and the most likely to occur during normal operation. # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 6.* The architecture of the Encapsulated Library. The model is illustrated in Figure 6. It provides two libraries which use/support the PRI and PMI interfaces. Servers of PRI and clients of PMI would link with `npcs_lib'. It is worthwhile to emphasize that there is only one NPCS server of PRI per node. This is denoted in the diagram as _pri2_, thus indicating the subset of PRI[B] (functions) specified through the PMI[A]. The point is that `npcs_lib' defines the functions (entry point symbols) named in the PMI specification and `observer_lib' defines the functions named in the PRI specification. Servers of PMI and clients of PRI would link with `observer_lib'. This is very analogous to DCE RPC client and server stubs. The libraries may create threads needed to support asynchronous communication. The _pmi_talker_ and _pri_talker_ threads are shown in Figure 6, and are named "talker", to contrast with RPC listener threads. The middle region labeled _IPC_ represents an intra-node IPC mechanism whose choice is unspecified as long as the PRI and PMI interfaces provide the connecting mechanisms described in this API section. This flexibility will permit many implementation approaches without requiring ANY modification to the NPCS or DCE processes. The interface is made independent of the underlying IPC mechanism, by the use of procedures provided by the recipient (server) of a request, which are invoked whenever a (client) request is made. This is analogous to an RPC, but to allow for a more general implementation the procedure names are passed to the libraries as procedure-valued parameters to the initialization calls: `dms_pmi_el_initialize' in section 10.2, and `dms_pri_el_initialize' in section 11.2. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 44 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 The subset of the PRI functions passed to the PMI is denoted as _pri2_ in Figure 6. These functions perform local initialization, and then take whatever steps are required to open a communication path for the processes to communicate. The exact nature of these steps depend on the particular implementation of the PMI/PRI interface. Possibilities include, but are not limited to: (a) Creating a pair of named pipes (fifos). (b) Calling `dciInitialize()'/`dciRegister()' (see the discussion regarding the DCI in [CMG]). (c) Initializing a DCE RPC interface which accepts the needed procedures as RPCs. (d) Creating a shared memory segment and initializing it with appropriate structures, the monitor threads to dequeue input messages, and semaphores to control access to the message queues. (e) The encapsulated library requires several utility functions for library initialization and clean up. These are described in detail in sections 10 and 11, and summarized here. The `dms_pmi_el_initialize()' and `dms_pri_el_initialize()' functions are used to initialize the library and underlying IPC mechanisms. The `dms_pmi_el_free_outputs()' and the `dms_pri_el_free_outputs()' functions are used for freeing up memory resources, and encapsulate RPC free routines if necessary. 7.3. Important State Information This section summarizes the important state maintained or passed via the standard interfaces. 7.3.1. Sensor data and reporting data structures Sensor data components are described by `sensor_data' of type `dms_datum_t'. These types allow a wide range of sensor data representations, including opaque data structures for extensibility. Sensor data is reported using the `sensor_report_list' of type `dms_observations_data_t'. typedef struct dms_opaque { unsigned long size; [size_is(size)] byte bytes[]; } dms_opaque_t; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 45 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef enum { dms_LONG, dms_HYPER, dms_FLOAT, dms_DOUBLE, dms_BOOLEAN, dms_CHAR, dms_STRING, dms_BYTE, dms_OPAQUE, dms_DATA_STATUS } dms_datum_type_t; typedef union dms_datum switch (dms_datum_type_t type) { case dms_LONG: long long_v; case dms_HYPER: hyper hyper_v; case dms_FLOAT: float float_v; case dms_DOUBLE: double double_v; case dms_BOOLEAN: boolean boolean_v; case dms_CHAR: char char_v; case dms_STRING: dms_string_t *string_p; case dms_BYTE: byte byte_v; case dms_OPAQUE: dms_opaque_t *opaque_p; case dms_DATA_STATUS: error_status_t status_v; } dms_datum_t; typedef struct dms_sensor_data { dms_sensor_id_t sensor_id; unsigned long count; [size_is(count)] dms_datum_t sensor_data[]; } dms_sensor_data_t; typedef struct dms_timevalue { unsigned long sec; unsigned long usec; } dms_timevalue_t; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 46 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef struct dms_observation_data { dms_timevalue_t end_timestamp; unsigned long count; [size_is(count)] dms_sensor_data_t* sensor[]; } dms_observation_data_t; typedef struct dms_observations_data { unsigned long count; [size_is(count)] dms_observation_data_t* observation[]; } dms_observations_data_t; 7.3.2. Sensor naming and registration data structures Sensors are registered using the `sensor_register_list' of type `dms_instance_dir_t'. Sensors in the sensor registry are named using the `registry_list' of type `dms_instance_dir_t'. /* This interface defines the data structures that represent * the dms namespace. There are two forms of names that can be * represented, a simple string-only form, and a fully * decorated form. */ typedef struct dms_name_node* dms_name_node_p_t; typedef struct dms_name_nodes { unsigned long count; [size_is(count)] dms_name_node_p_t names[]; } dms_name_nodes_t; typedef struct dms_name_node { dms_string_t* name; /*"*" == wildcard*/ dms_name_nodes_t children; } dms_name_node_t; typedef struct dms_attr { dms_string_t* attr_name; dms_datum_t attr_value; } dms_attr_t; typedef struct dms_attrs { unsigned long count; [size_is(count)] dms_attr_t* attrs[]; } dms_attrs_t; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 47 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef struct dms_sensor { dms_sensor_id_t sensor_id; dms_attrs_t* attributes; unsigned short count; [size_is(count)] small metric_id[]; } dms_sensor_t; typedef struct dms_instance_leaf { unsigned long count; [size_is(count)] dms_sensor_t* sensors[]; } dms_instance_leaf_t; typedef struct dms_instance_node* dms_instance_node_p_t; typedef struct dms_instance_dir { unsigned long count; [size_is(count)] dms_instance_node_p_t children[]; } dms_instance_dir_t; typedef enum { dms_DIRECTORY, dms_LEAF, dms_NAME_STATUS } dms_select_t; typedef union dms_instance_data switch (dms_select_t data_type) { case dms_DIRECTORY: dms_instance_dir_t* directory; case dms_LEAF: dms_instance_leaf_t* leaf; case dms_NAME_STATUS: error_status_t status; } dms_instance_data_t; typedef struct dms_instance_node { dms_string_t* name; dms_datum_t* alternate_name; dms_instance_data_t data; } dms_instance_node_t; The naming data structure is illustrated in Figure 7. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 48 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 7.* Sensor naming data structure. This example uses the parameters defined in the function `dms_npmi_get_registry()', and shows the structures supporting the names `root/dce/...' and `root/dfs/...', where "root" refers to the local network node where the NPCS resides. The depth parameter limits searches of subtrees. 7.3.3. Sensor configuration data structures Sensor configuration data is returned in the `sensor_config_list' of type `dms_configs_t'. const unsigned long dms_NO_METRIC_COLLECTION = 0; const unsigned long dms_THRESHOLD_CHECKING = 0x00000001; const unsigned long dms_COLLECT_MIN_MAX = 0x00000002; const unsigned long dms_COLLECT_TOTAL = 0x00000004; const unsigned long dms_COLLECT_COUNT = 0x00000008; const unsigned long dms_COLLECT_SUM_SQUARES = 0x00000010; const unsigned long dms_COLLECT_SUM_CUBES = 0x00000020; const unsigned long dms_COLLECT_SUM_X_TO_4TH = 0x00000040; const unsigned long dms_CUSTOM_INFO_SET = 0x80000000; typedef unsigned long dms_info_set_t; typedef struct dms_threshold_values { dms_datum_t lower_value; dms_datum_t upper_value; } dms_threshold_values_t; typedef union dms_threshold switch (boolean have_values) { case TRUE: dms_threshold_values_t values; case FALSE: ; } dms_threshold_t; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 49 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef struct dms_config { dms_sensor_id_t sensor_id; dms_timevalue_t reporting_interval; /*0 == infinite*/ dms_info_set_t info_set; dms_threshold_t* threshold; error_status_t status; } dms_config_t; typedef struct dms_configs { unsigned long count; [size_is(count)] dms_config_t config[]; } dms_configs_t; 7.3.4. DMS binding data structures Several "handles" are defined to bind elements, speed up searching and decrease communication costs. (a) *Sensor ID* -- To speed up searching for sensors in the NPCS registries, the specification defines a "handle", i.e., a shorthand, 32-bit reference that is unique per NPCS (and hence each node). This handle is called a sensor ID, and it is assigned by the NPCS at the time of initial sensor registration. This same handle is then provided to the PMA for its use. (b) *Process index* -- Shorthand provided by NPCS to speed observer/NPCS communication. Allows NPCS to search for all sensors for a particular process identifier (PID). (c) *NPCS index* -- Shorthand provided by PMA to speed PMA/NPCS communication. Allows PMA to rapidly identify sensor data reported by a particular NPCS. (d) *PMA index* -- Shorthand provided by NPCS to speed PMA/NPCS communication. Allows NPCS to rapidly identify requests of a particular PMA. /* This interface defines the data structures used to represent * relationships between entities (sensors, processes, nodes) * within DMS. Some are transparent, meaning that a user of * that structure can manipulate its contents. Some are * opaque, meaning that only the creating entity can manipulate * its contents. */ /* TRANSPARENT BINDING TYPES */ typedef [string] unsigned char dms_string_t[]; typedef unsigned long dms_protect_level_t; /* see rpc.h */ Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 50 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef [string] unsigned char dms_string_binding_t[]; /* OPAQUE BINDING TYPES */ typedef unsigned long dms_pma_index_t; typedef unsigned long dms_npcs_index_t; typedef unsigned long dms_process_index_t; typedef unsigned long dms_sensor_id_t; typedef struct dms_sensor_ids { unsigned long count; [size_is(count)] dms_sensor_id_t ids[]; } dms_sensor_ids_t; 7.3.5. Sensor registry The *sensor registry* contains descriptive information about the sensors located on a particular node. This registry is maintained by the NPCS. An entry contains: (a) Sensor name (full sensor name in both string and OID format; this includes node, process, metric, and instance names). (b) Sensor help text that describes the collected metric. There is no explicit interface for obtaining modifications to the sensor registry. The PMA must periodically request interested sensors and compare this with previous requests. 7.3.6. Sensor configuration registry A *configuration registry* contains configuration state about the sensors located on a particular node. This registry is maintained by the NPCS. An entry contains: (a) Sensor name (full sensor name in both string and OID format; this includes node, process, metric, and instance names). (b) Sensor information set. (c) Sensor threshold values. (d) Sensor summarization interval. This may be combined with the sensor registry within the NPCS. There is no explicit interface for obtaining modifications to the sensor configuration registry. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 51 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 7.3.7. Sensor and metric attributes There are several sensor and metric attributes. These include: (a) Threshold. (b) Units (e.g., kilobytes, seconds, etc.). (c) Metric identifier. (d) Metric name. (e) Help text. (f) Information sets supported. (g) Sensor value subcomponent. typedef enum { dms_METRIC_ID, dms_METRIC_DATUM_TYPE, dms_DATA_LENGTH, dms_METRIC_TYPE, dms_METRIC_NAME_INDEX, dms_HELP_TEXT_INDEX, dms_INFO_SET_SUPPORT, dms_SENSOR_UNITS, dms_LAST_ATTRIBUTE /* this should remain last */ } dms_attribute_t; Runtime behavior for sensor value subcomponent attributes is described below: (a) Minimum and maximum are RESET for each reporting interval. (b) Counters and timers are accumulated continuously. (c) Thresholds can support above, below, or a range of values to check against. Since the NPCS performs this test multiple thresholds values can be set for each sensor. 7.3.8. OSF global sensor registry OSF must maintain a *global sensor registry* similar to the IETF SNMP registry [Rose], allowing vendors to provide globally known metrics and sensors but preserving local (vendor) autonomy and number assignment. This registry should be divided into domains analogous to the sensor naming described in section 5.1, to ease administration and interpretation of the sensors. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 52 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 These "official" sensors are registered within the CDS when the DCE cell is brought up, and updates are registered as new versions of DCE are started within the cell. A user branch must be available in the global sensor registry so that application developers may place well-known metrics and sensors there. An experimental branch should be supported to be used however deemed in each cell. The specification proposes that this registry have the following tree structure (note that each entry level listed below represents a subdirectory; object identifiers are shown in parentheses following names): (a) internet (1) (i) osf (5) [a] dce (1) [b] dfs (2) [c] security (3) [d] cds (4) [e] user (5) [f] experimental (6) [g] vendor (7) [i] digital (1) [ii] gradient (2) [iii] hp (3) [iv] hitachi (4) [v] ibm (5) [vi] informix (6) [vii] microsoft (7) [viii] novell (8) [ix] oracle (9) Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 53 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 [x] sun (10) [xi] transarc (11) The above tree ignores the other branches already in use with the Internet SNMP community. We have added a branch for OSF with object identifier 5 (this value requires verification with IETF). Under the OSF branch are several subtrees for various DCE services. The user branch is unique to each customer's cell, and contains the results of custom sensors registered by user applications as described in section 7.4. The experimental subtree is for temporary use within a cell. The vendor subtree allows vendors the autonomy to assign and manage their custom sensors without requiring intervention from OSF. These vendor sensor must be registered within the cell in the same way as user custom sensors. The OSF needs to work with the Internet Assigned Numbers Authority to register sensors and attributes. 7.4. Storing Custom Sensor Attributes in a Global Repository Custom sensor attributes must be registered and stored in the CDS so that they are available to all PMAs in the cell. This specification recommends that they be stored in the CDS with the form: /.:/dms/sensors/ where `' is one of `dce', `dfs', `security', `cds', `user', `experimental' or `vendor'. 7.5. Security It is a requirement to provide secure network transmission of performance data if mandated by local administrative policies. This allows protection against unauthorized users obtaining cleartext names of server processes, interfaces, operations or binding handles; falsifying client or server identities; or modifying transported data. What are the implications on the 4 interfaces defined? The 2 control interfaces, NPMI and PMI, must be protected by access control to ensure that configuration data is modified only by those with proper authorization. The 2 data transport interfaces, PRI and NPRI, must be free from eavesdropping. This specification assumes that intra-node communication via the PMI and PRI is secured by the host OS or the communication mechanism used. Consequently, it is not addressed further here. To ensure that clients and servers are authentic, this specification recommends the creation of the new DCE security group, `perf_admin', Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 54 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 and enroll each host in this group. Principals for this group must be added to the security registry, and both the PMA and NPCS must login and execute as one of the principals (refreshing credentials programmatically as necessary). The host key is already available on the node and is automatically changed every 30 minutes. The benefit of making `perf_admin' a group is that the performance principal on each host (node) can change passwords independent of other hosts (nodes). The NPCS must be able to execute as the owner of the performance principal's keytab file. Since the NPCS must be able to assume the identity of the host, it must run as root. However, this specification does not recommend that the NPCS run as root, but rather with a separate identity with sufficient capabilities to utilize DCE security services. This does not solve the problem of users who can become root on a local host, and thereby become a member of the `perf_admin' group. Implementations of the measurements system should not preclude the extension of supporting several performance administration groups to address this security hole, when needed in "hostile" environments. Authorization must be handled through the use of a reference monitor hard-coded into the manager routines of the NPMI and NPRI. The security policy enforced via this reference monitor is that clients with the `perf_admin' principal identity are authorized to invoke an NPMI or NPRI function. Client requests with any other principal identity should be rejected. This reference monitor is universally enforced across all functions of the NPMI and NPRI. (It is possible to create an ACL manager that provides a much richer set of authorization capabilities, but that is beyond the scope of this version of the specification.) The reference monitor does not require support from IDL parameters, since the reference manager code obtains security information directly from the local RTL prior to processing the NPMI or NPRI function. (Note that the X/Open DCI uses a security key as a parameter. The PMI and PRI routines do not explicitly refer to this parameter, since that it is an implementation detail encapsulated by the PMI and PRI, and should be transparent to the calling process.) Authenticated RPCs are used to address eavesdropping. Parameters in string form can appear for both NPMI and NPRI functions. The RPC data protection level is specified by the PMA when it first registers with the NPCS. Because all NPCSs may not support the same maximum protection level (for example, some data encryptions algorithms may not be available world-wide due to international export laws), the NPCS responds to the PMA request with the actual protection level that it can support. The PMA may unregister from this NPCS if the actual protection level is insufficient. The actual protection level can be set during sensor registration by specifying a minimum data Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 55 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 protection level. This allows application developers and system managers to jointly specify the data protection level on an application basis if necessary. The policy enforced by the NPCS is the maximum of the PMA request and the sensor specified. The NPCS may also refuse service to a PMA that does not meet its minimum security requirements. The use of a keytab file is also required (to hold the encryption key) for authenticated RPC, and implies that the NPCS executes with a dedicated user identifier to protect the keytab from unauthorized users. Although not recommended, unauthenticated RPC requests can be optionally supported by an NPCS on an implementation-dependent basis (this requires a configuration or command line parameter to enable). The security policy outlined here does not prevent a PMA from accessing another's NPRI interface. Since this is an interface for trusted users (i.e., `perf_admin' principal), it is expected that PMA developers not invoke another NPRI. PMAs that support cross-cell monitoring must use cross-cell authentication mechanisms prior to contacting an NPCS in a separate cell. 7.6. Error Conditions Errors are described for each of the four APIs. Error conditions are returned in the `error_status_t' function return parameter. A general engineering philosophy is that error conditions should not be used to convey non-error-related state. This will assure efficient use of exception handling code for future implementations that decide to use C++. These function errors are described in detail in appendix I. 7.7. DMS Naming Convention The following naming conventions are used in this specification: (a) APIs are prefaced with the lower-case acronym of the distributed measurement systems concatenated with the interface name; e.g., `dms_pmi_'. (b) API names use verbs and nouns separated by underscores; e.g., `dms_pmi_get_sensor_data()'. (c) API names use the SNMP GET and SET verbs when applicable. This specification uses the verb REPORT for those interfaces that are push-based. (d) Parameter names are in lower-case, separated by underscores. Names should make it clear whether a variable is a value or a pointer to a value, by using the suffix `_p' for pointers. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 56 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 Type names should end with the suffix `_t'. String names will end with the suffix `_str'. 7.8. API Description Format The next four sections describe the standard APIs: (a) The NPMI is described in section 8. (b) The NPRI is described in section 9. (c) The PMI is described in section 10. (d) The PRI is described in section 11. Each of the functions is described with the following format: (a) The description provides a programmer's overview of the function's actions. (b) The IDL provides the functions input and output parameters and types. (c) The function input briefly describes each input parameter and its use (see section 7.3 for details on primary data structures). (d) The function output briefly describes each input parameter and its use (see section 7.3 for details on primary data structures). (e) The possible errors are summarized with a likely cause identified. (f) The engineering notes provides explicit recommendations to the implementor (and not the user) of the function. 8. NPMI INTERFACE The NPMI and NPRI interfaces are used by the PMAs to access and control sensors on any node in a DCE cell. The NPMI is supplied by the NPCS on each node. The NPRI is an optional, although recommended, interface provided by the PMA. The NPMI is described in this section, and the NPRI in section 9. The NPMI interface provides each PMA with its own view of the sensors on a node in the DCE environment. Each PMA communicates with the NPCS to arrange delivery of sensor data via the NPMI or NPRI interfaces. The NPMI interface requires that PMA's explicitly discover and enable (configure) sensors, and then receive changed Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 57 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 sensor data as it is pushed to them by the NPCS via the PMA's NPRI server interface. Specifically, the NPMI supports registering and unregistering PMAs interested in local sensors, getting and setting sensor configuration, and getting sensor data in a polled manner. The NPMI is an RPC interface that is exported by the NPCS. Since this interface is accessed over the network, a non-RPC implementation is not recommended for security reasons. The NPMI functions pass parameters that local system administration policies may require protection from reading or modifying over a network. Therefore, the use of RPC data protection is supported for all NPMI functions (except for the initial act of registering a PMA). The following Figure 8 illustrates the relationship between the physical sensors in an instrumented process and the PMA's logical view of sensors that is supported through the NPMI. Sensors are located in distinct processes and communicate with the NPCS via the observer. Each PMA, however, is only aware of the NPCS and sensors; the observer is transparent to the PMA. # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 8.* PMA versus NPCS view of sensors. A PMA's view of a sensor is limited to its own configuration request. The NPCS maintains the configuration state of all sensors on its node for all interested PMAs. In this example there are four sensors: _s1_, _s2_, _s3_, _s4_, and three PMAs: _PMA1_, _PMA2_, _PMA3_. For sensor _s1_, _PMA1_ and _PMA3_ have it enabled, while _PMA2_ does not. Similarly, for sensor _s2_, _PMA1_ does not have it enabled, while _PMA2_ and _PMA3_ do. The observer in each process (_obs1_ and _obs2_) control requests and data from the NPCS and the sensors. 8.1. NPMI IDL The complete IDL file is provided in appendix E. 8.2. dms_npmi_register_pma() Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 58 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 8.2.1. Description This interface is provided by NPCS to allow PMAs to establish a connection. A PMA uses this interface to register its existence, the binding handle of its NPRI, and to establish data protection levels. Any PMA that requests a greater protection level than specified by the *minimum_protection_level* will have to decide whether to continue. The protection level will be applied to parameters of all function calls and to ALL sensor data transported from this node to the PMA via the NPRI. This may cause excessive overhead, so it should be used with caution. If a new instrumented process begins execution and requires a higher protection level than that in place when a PMA previously registered with the NPCS, then the NPCS must not make any of this sensor data available to the PMA until the PMA re-registers with the proper protection level. 8.2.2. Function signature error_status_t dms_npmi_register_pma ( [in ] handle_t handle, [in,ptr] dms_string_binding_t* npri_binding, /*null == client-only PMA*/ [in ] dms_npcs_index_t npcs_index, [in ] dms_protect_level_t requested_protect, [ out] dms_pma_index_t* pma_index, [ out] dms_protect_level_t* granted_protect ); 8.2.3. Function input (a) `handle' -- RPC binding handle of NPMI. (b) `npri_binding' -- Pointer to a string binding handle of PMA's NPRI interface. If this is NULL, then the PMA does not support a NPRI. (c) `npcs_index' -- Unique identifier assigned by PMA that provides a shorthand for future NPCS-to-PMA communication. (d) `requested_protect' -- The PMA's requested level of RPC data protection for use in subsequent NPMI calls, or when data is returned via NPRI functions. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 59 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 8.2.4. Function output (a) `pma_index' -- Unique identifier assigned by NPCS that provides a shorthand for future PMA-to-NPCS communication. (b) `granted_protect' -- The NPCS's granted level of RPC data protection used by the NPCS when returning data via NPRI functions, or for subsequent NPMI functions. It might not be the same as that requested by the PMA. It is established by the system manager at NPCS execution time. (c) _Function return value_ -- `dms_status' -- status of call; non-zero if call encountered an error. 8.2.5. Errors (a) `REGISTER_FAILED' -- NPCS unable to complete registration. (b) `ALREADY_REGISTERED' -- PMA previously registered. (c) `PROTECT_LEVEL_NOT_SUPPORTED' -- Requested data protection level not supported; `granted_protect' will be used. (d) `ILLEGAL_BINDING' -- Binding handle illegal. 8.2.6. Engineering notes (a) Datagram RPC communication to the NPRI interface is recommended. This eliminates the overhead of TCP/IP connection/teardown for infrequent communication. The rest of the infrastructure has been designed to minimize the effects of lost packets, should they occur. (b) The PMA client code must inform its NPRI server of the granted data protection level used by the NPCS for subsequent NPRI invocations. The reference monitor of the NPRI controls whether requests with the granted data protection level specified by the NPCS are acceptable, based on its supported minimum protection level. (c) It is not necessary for the NPCS to register the NPMI in the CDS. Instead, the UUID can be converted to a string, concatenated with the NPCS node IP address, and a call made that the `dced' will deliver. The UUID of the NPMI is specified in section 8.1. (d) The use of context handles between the PMA and NPCS are not recommended, because some PMAs will be client-only or single- threaded, and the amount of "still alive" traffic between the RTLs must be minimized. The failure modes and recovery actions described in section 13.8 should be implemented instead of Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 60 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 context handles. (e) It is possible for a PMA to register multiple times with the same NPCS. This allows the PMA to support different NPRI interfaces in the same or different processes. The NPCS should return a unique `pma_index' for each of these registrations. (f) This function supports non-idempotent semantics. 8.3. dms_npmi_get_registry() 8.3.1. Description This interface is provided by NPCS to supply all or part of the node-level sensor registry. The PMAs use this interface to discover available sensors and the current configuration state. The NPCS does not support a synchronous event function to notify a PMA of changes to the sensor registry, namely the addition and deletion of individual sensors. The PMA must periodically invoke this function with interested sensors in the `request_list' and compare the results with previous calls to determine what changes have occurred within the sensor registry. This function should be used sparingly for this need, to minimize network resource utilization. The registry structure is defined by the data structures in section 7.3.2, and is illustrated in Figure 7. There is no support for a "wildcard" using regular expressions. Rather, the tree of interest is provided in the `request_list' with a `depth_limit', and all subtrees matching these constraints are returned. This bulk-input parameter allows support for requesting multiple sensors in a single call to this function. However, a more generalized query processor is delegated to the PMA, which must then translate requests to this function. If the requested `depth_limit' is greater then the implicit `depth_limit' of the `request_list', then this function returns the sensors at a depth equal to that of the `request_list'. Otherwise, only the requested `depth_limit' of the registry is returned. Requests can only be made with string sensor instance names. 8.3.2. Function signature Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 61 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 error_status_t dms_npmi_get_registry ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,ptr] dms_name_nodes_t* request_list, /*null == entire registry*/ [in ] long depth_limit, /*0 == infinity*/ [ out] dms_instance_dir_t** registry_list ); 8.3.3. Function input (a) `handle' -- RPC binding handle of NPMI. (b) `pma_index' -- Unique identifier assigned by NPCS that provides a shorthand for NPCS-to-PMA communication. This also provides a test to determine whether NPCS has terminated and restarted since the last `dms_npmi_get_registry()' call, because a new NPCS won't know this value. (c) `request_list' -- A pointer to a tree of sensor names that the PMA is interested in. This parameter uses a tree structure that contains one or more subtrees. If the pointer is NULL, then the entire registry is returned. (d) `depth_limit' -- This limits the search depth, and consequently the number of subtrees, returned by the NPCS. This value is the number of nodes starting with the "root" node of the NPCS sensor registry. If this value is 0, then all subtrees are returned. 8.3.4. Function output (a) `registry_list' -- Registry data for one or more sensors that satisfy the `request_list'. The sensor identifiers contained within this structure are used by the PMA for subsequent configuration actions, and to identify sensor data reported via the NPRI. (b) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 8.3.5. Errors (a) `UNKNOWN_PMA' -- PMA not registered. (b) `UNKNOWN_SENSOR' -- One or more sensors included in `sensor_list' were not registered. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 62 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 8.3.6. Engineering notes (a) Since RPC parameters include sensor names, this interface must have the option of supporting RPC data protection. This is accomplished via the `dms_npmi_register_pma()' call. (b) If a DCE service-oriented view of a process name is used (e.g., `/.:/sec'), then the PMA must translate this to a legal sensor name before contacting the NPCS. (c) This function supports idempotent semantics. 8.4. dms_npmi_set_sensor_config() 8.4.1. Description This interface is provided by NPCS to allow PMAs to configure which sensor metric components to collect, and reporting frequency. This view of the sensor is unique to each requesting PMA (`pma_index'), and conflicts, if any, are arbitrated by the NPCS. Requested configuration changes are set on a sensor-by-sensor basis. A list of `sensor_configs' is used to request configuration, and to return configuration status. Only sensors that could not be set to requested configuration state are returned, along with their current configuration state. If a sensor cannot be set to one or more of the requested parameters, then no configuration changes are made to the sensor. No sensor data will be reported for sensors that were not successfully configured. PMA must re-invoke this function with acceptable configuration parameters before data will be returned for a sensor. The PMA also uses this function to disable sensors it is no longer interested in collecting data on. It does this by providing a list of sensors in `sensor_configs' with the `info_set' value set to 0. There is no explicit support in this specification for getting sensor configuration data since this function can satisfy this need. 8.4.2. Function signature error_status_t dms_npmi_set_sensor_config ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,out] dms_configs_t** sensor_configs ); Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 63 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 8.4.3. Function input (a) `handle' -- RPC binding handle of NPMI. (b) `pma_index' -- Unique identifier assigned by NPCS that provides a shorthand for NPCS-to-PMA communication. (c) `sensor_configs' -- A list of sensor identifiers and configuration state that PMA is interested in. 8.4.4. Function output (a) `sensor_configs' -- A list of sensor identifiers, status of configuration request, and configuration state returned by the NPCS. Only sensors that could not be configured as requested are returned in this structure. (b) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 8.4.5. Errors (a) `UNKNOWN_PMA' -- PMA not registered. (b) `UNKNOWN_SENSOR' -- One or more sensors included in `sensor_list' were not registered. (c) `NO_SENSOR_REQUESTED' -- `sensor_configs' contains no sensors. (d) `FUNCTION_FAILED' -- The set operation failed due to one or more specified parameters conflicting with a previous request. No sensor configuration modifications were made. (e) `UNKNOWN_INFO_SET' -- Information set level out of range. (f) `UNKNOWN_THRESHOLD_LEVEL' -- Threshold level out of range. 8.4.6. Engineering notes (a) The NPCS must arbitrate conflicting PMA requests for reporting interval, sensor information sets, and sensor threshold values, as described in section 12.2 on NPCS functions. (b) No partial sensor configuration changes are supported. If a sensor cannot be set to all requested configuration values, then NONE of them will be set (i.e., leave sensor state unchanged). (c) This function does not support idempotent semantics, since sensor registry changes may occur during a requested set operation. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 64 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 8.5. dms_npmi_get_sensor_data() 8.5.1. Description This interface is provided by NPCS to permit a poll of metric data without waiting for the next reporting interval. The sensor data is returned as an [out] parameter of the RPC. Users of this interface include SNMP agents, PMAs with a monitoring policy of an occasional "one-shot" request, client-only PMAs, and special monitors for benchmarking or load-balancing that capture state before and after a workload's execution. To access the current content of a sensor set the `bypass_cache' flag to TRUE. This forces the NPCS to collect requested sensor data by invoking `dms_pmi_get_sensor_data()' for each requested process. This provides current sensor data, but is very costly. When the flag is FALSE, the NPCS returns the latest complete version of sensor data from its internal cache. The NPCS never returns data from a "partial interval", only the latest complete interval. This is much more efficient, but may provide "old" sensor data, depending on the sensor reporting interval. If the "bypass_cache" flag is TRUE, then this function has the side- effects of resetting all sensor minimum and maximum values. This is because the action of a poll, by definition, results in the termination of the current summarization interval. The observer's next scheduled reporting interval, if there is one, is not affected. To prevent these side-effects from affecting other PMAs that receive this data, a PMA using this function must first set the sensor reporting interval to `NO_REPORT_INTERVAL'. This interval value is also used by the NPCS to ensure that only one PMA in the cell can access this sensor using this function, since this mode assumes that only one PMA "owns" the sensor and wants no interference from other PMA requests. All other PMAs are then prevented from modifying the sensors configuration, although they can access its data. These side-effects do not occur if the "bypass_cache" flag is FALSE. This get operation will fail if the PMA has not previously registered and set the sensor configuration correctly. In this failing case, a NULL list of `sensor_data' is returned. The use of this "polling" interface is discouraged, since it requires significant network bandwidth. 8.5.2. Function signature Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 65 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 error_status_t dms_npmi_get_sensor_data ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in ] dms_sensor_ids_t* sensor_id_list, [in ] boolean bypass_cache, [ out] dms_observations_data_t** sensor_data ); 8.5.3. Function input (a) `handle' -- RPC binding handle of NPMI. (b) `pma_index' -- Unique identifier assigned by NPCS that provides a shorthand for NPCS-to-PMA communication; this handle is NULL for client only PMAs. (c) `sensor_id_list' -- A list of sensor identifiers that the PMA is interested in. (d) `bypass_cache' -- A flag that when TRUE forces the NPCS to collect requested sensor data directly from each sensor. This provides current sensor data, but is very costly. When the flag is FALSE, the NPCS returns the latest version of sensor data from the NPCS internal cache. This is much more efficient, but may provide "old" sensor data depending on the sensor reporting interval. 8.5.4. Function output (a) `sensor_data' -- One or more sensor identifiers and corresponding data are returned. (b) _Function return value' -- `dms_status' -- Status of call; non-zero if call encountered an error 8.5.5. Errors (a) `UNKNOWN_PMA' -- PMA not registered. (b) `UNKNOWN_SENSOR' -- One or more sensors included in `sensor_list' were not registered. (c) `NO_SENSOR_REQUESTED' -- `sensor_list' contained no sensors. (d) `BYPASS_NOT_ALLOWED' -- Sensor configuration does not allow cache bypass, due to conflict with another PMA. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 66 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 8.5.6. Engineering notes (a) Since RPC parameters include sensor data, this interface must have the option of supporting RPC data protection. This RPC data protection level was set via the `dms_npmi_register_pma()' call. (b) This interfaces does not support idempotent semantics. 8.6. dms_npmi_unregister_pma() 8.6.1. Description This interface is provided by a NPCS to break the connection between a PMA and a NPCS, and free up NPCS resources. All sensors that have been configured by this PMA are disabled if the NPCS arbitration rules permit. PMAs use this interface to permanently break a connection. There is no support in this specification for a PMA temporarily suspending a connection. Client-only PMAs (COPs) must use this interface to minimize resources unnecessarily consumed by the NPCS. The NPCS will maintain COP requests for a maximum interval of one between COP requests for getting sensor data. 8.6.2. Function signature error_status_t dms_npmi_unregister_pma ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index ); 8.6.3. Function input (a) `handle' -- RPC binding handle of NPMI. (b) `pma_index' -- Unique identifier assigned by NPCS that provides a shorthand for NPCS-to-PMA communication. 8.6.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 8.6.5. Errors (a) `UNKNOWN_PMA' -- PMA not registered. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 67 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 8.6.6. Engineering notes (a) Must use the `granted_protect' returned in `dms_npmi_register_pma()' call -- this may cause problems for international users whose PMAs and NPCS are in different countries with different export controls on the use of authenticated RPC. This issue is beyond the scope of this RFC. (b) All unregister requests result in the NPCS freeing up resources and re-setting sensors to a quiescent state wherever that does not conflict with other PMA requests. (c) The NPCS should conduct a sanity check on the RPC binding handle (using string binding conversion), to disallow PMA1 from unregistering an NPCS request of PMA2. (d) This interface does not support idempotent semantics. 9. NPRI INTERFACE The NPRI's primary purpose is to provide a data transport channel so that a PMA can receive sensor data from an NPCS without the need to poll for each update. Specifically, this interface supports network reporting of a node's sensor data. All PMAs must implement this interface to receive data from an NPCS without the need to poll for it. However, a polling interface, `dms_npmi_get_sensor_data()', is provided by the NPMI for simple or client-only PMAs (COPs). All other state information about NPCS and sensors is obtained explicitly by invoking the NPMI routines. To simplify the design the NPCS does not notify the PMA of changes in sensor or NPCS state. The NPRI is an RPC interface that is a part of the PMA. Since this interface is accessed over the network a non-RPC implementation is not recommended, due to security issues. The PMA sets the data protection level of this interface in the `dms_npmi_register_pma()' call. 9.1. NPRI IDL The complete IDL is located in appendix F. 9.2. dms_npri_report_sensor_data() 9.2.1. Description This interface is provided by the PMAs to assimilate updated sensor metric components without the need for polling. All sensor data that has changed within the last reporting interval is packaged together by the NPCS and reported in a single report. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 68 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 The state diagram in Figure 9 illustrates when data is pushed from the NPCS to the PMA. All state transitions occur only at PMA- specified reporting interval boundaries, with the exception of the reconfiguration state transition, which occurs asynchronously with respect to reporting intervals. The nesting of state indicates a separate state machine for each PMA's view of a sensor configured. # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 9.* `dms_npri_report_sensor_data()' sensor state machine. Sensor data is pushed to the PMA by the NPCS only if it was modified during the current reporting interval. This call requires the PMA to have previously registered with the NPCS, and provided a binding to its NPRI interface. Data will not flow to the NPRI until the PMA enables sensors using the `dms_npmi_set_sensor_config()' function. The NPCS will return a NULL sensor data list of there is no sensor data to report for this interval. This serves as a "still-alive" message to the PMA during periods of application (and hence sensor) inactivity or when no thresholds were exceeded. 9.2.2. Function signature error_status_t dms_npri_report_sensor_data ( [in ] handle_t handle, [in ] dms_npcs_index_t npcs_index, [in,ptr] dms_observations_data_t* sensor_data /*null == keep-alive*/ ); 9.2.3. Function input (a) `handle' -- The RPC binding handle of the NPRI. (b) `npcs_index' -- Unique identifier assigned by PMA by function `dms_npmi_register_pma()' that provides a shorthand for NPCS- to-PMA communication. (c) `sensor_data' -- A structure containing one or more sensors and the data components as configured by this PMA. See section 7.3.1 for details. May be NULL if no sensor data to report in this interval. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 69 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 9.2.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 9.2.5. Errors (a) `UNKNOWN_SENSOR' -- Reported sensor not requested by PMA; PMA should call `dms_npmi_set_sensor_config()' and disable this sensor. (b) `UNKNOWN_NPCS' -- Reporting NPCS not recognized; PMA should re-register with this NPCS to reestablish a valid `npcs_index'. 9.2.6. Engineering notes (a) NPRI routine not called for PMAs that register a NULL `npri_binding' handle in `dms_npmi_register_pma'. (b) Our philosophy is to minimize the data sent across the network; consequently, the NPCS maintains a directory of sensor configurations by PMAs, and only sends requested sensors with requested configurations. (c) This call returns a synchronous output, so that the NPCS can determine if the PMA is still executing. If the call times out, then the NPCS should restart, based on section 13.8. (d) For efficiency use idempotent RPC semantics for this call. 10. PMI INTERFACE The PMI and PRI are the two low-level interfaces. These interfaces are used by the observer and NPCS to control sensors and transmit state. These interfaces are provided by the DCE vendor and are transparent to the PMA developer. The PMI's primary purpose is to provide a control and access interface to sensors located within a process that supports DCE instrumented services. An NPCS uses the PMI routines to set sensor configuration state, get sensor data state, and initialize and terminate the connection to the NPCS. The PMI is implemented in the encapsulated library as described in section 7.2. The actual communication is implemented as either an RPC interface or as an implementation-specific IPC mechanism. The encapsulated library hides the actual communication mechanism from the programmer. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 70 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 10.1. PMI IDL The complete IDL is located in appendix G. 10.2. dms_pmi_el_initialize() 10.2.1. Description This utility function is necessary to initialize the encapsulated library. It records the PRI procedures in private variables, and takes whatever steps are required to open a communication path for processes to communicate with NPCS. The exact nature of these steps depend on the particular implementation of the PMI/PRI interface. Possibilities include, but are not limited to: (a) Creating a FIFO of known name and opening it for reading. (b) Calling `dciInitialize()'. (c) Initializing a DCE RPC interface, and creating a talker thread that accepts the PRI procedures as RPCs. (d) Creating a shared memory segment and initializing it with appropriate structures, the PRI talker thread to dequeue input messages from instrumented processes, and a semaphore to control access to the queue. 10.2.2. Function signature error_status_t dms_pmi_el_initialize ( [in ] dms_pri_reg_proc_fp_t pri_register_process, [in ] dms_pri_reg_sensor_fp_t pri_register_sensor, [in ] dms_pri_report_data_fp_t pri_report_sensor_data, [in ] dms_pri_unreg_sensor_fp_t pri_unregister_sensor, [in ] dms_pri_unreg_proc_fp_t pri_unregister_process ); 10.2.3. Function input (a) `pri_register_process' (b) `pri_register_sensor' (c) `pri_report_sensor_data' (d) `pri_unregister_sensor' (e) `pri_unregister_process' These are all callback (local) procedures exported by NPCS, invoked by the encapsulated library whenever the corresponding PRI procedure Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 71 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 is invoked by an instrumented process. These procedures have identical signatures to their corresponding PRI procedures. 10.2.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 10.2.5. Errors (a) `FUNCTION_FAILED' -- Initialization function failed due to an internal encapsulated library error. 10.2.6. Engineering notes (a) The details necessary to support the specific IPC mechanism are implementation-dependent, and transparent to this function. 10.3. dms_pmi_el_free_outputs() 10.3.1. Description This utility function is necessary to initialize free output data in the encapsulated library, encapsulate RPC free memory functions, and eliminate possible memory leaks. 10.3.2. Function signature error_status_t dms_pmi_el_free_outputs ( [in,ptr] dms_configs_t* sensor_config_list, /*null == absent*/ [in,ptr] dms_observation_data_t* sensor_report_list /*null == absent*/ ); 10.3.3. Function input (a) `sensor_config_list' -- A pointer to the sensor configuration list that the programmer desires to free allocated memory. Set this to NULL if no list is to be freed. (b) `sensor_report_list' -- A pointer to the sensor reporting list that the programmer desires to free allocated memory. Set this to NULL if no list is to be freed. 10.3.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 72 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 10.3.5. Errors (a) `FUNCTION_FAILED' -- Initialization function failed due to an internal encapsulated library error. 10.3.6. Engineering notes (a) The details necessary to support the specific free memory mechanisms are implementation-dependent, and transparent to this function. 10.4. dms_pmi_terminate() 10.4.1. Description This function disconnects the NPCS from all registered observers, and is useful for planned shutdowns of the NPCS. The function undoes the actions of the `dms_pmi_el_initialize()' function. The specific actions are implementation-dependent. The observer's response to this request is to return all sensors to a quiescent state. There is no comparable call from the NPMI, so a PMA cannot cause this action. This call should be supported via the normal DCE control programs (such as `dcecp'). 10.4.2. Function signature error_status_t dms_pmi_terminate ( void ); 10.4.3. Function input None. 10.4.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 10.4.5. Errors (a) `FUNCTION_FAILED' -- Terminate action failed due to an internal encapsulated library error. 10.4.6. Engineering notes (a) The implementation specific encapsulated library must provide a mechanism that ensures that observer's calling any PRI function prior to receiving the `dms_pmi_terminate()' call can determine that the NPCS has stopped execution and should invoke its Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 73 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 internal clean-up routines. 10.5. dms_pmi_set_sensor_config() 10.5.1. Description This interface is provided to select which metric components (information set, etc.) a sensor supplies, and the interval between sensor summarizing and reporting those components. The NPCS uses this interface to set sensors on a per-process basis (i.e., for one observer at a time). Consequently, to set the sensors in _N_ processes requires _N_ invocations of this function (one call each to _N_ observers). All requested operations are done on a sensor-by-sensor basis only, for sensors requested in the `sensor_config_list'. No global sensor configurations are supported. This function does not return verification status about each sensor configured. It returns status only on sensors that were not modified. Sensors are never left in a "partially modified" state. If any of the requested configuration states were not modified, then no sensor state is modified, and this current state is returned as function output with the appropriate error status. If "all or nothing" semantics are required, then the application must explicitly reset all sensors that were successfully set. 10.5.2. Function signature error_status_t dms_pmi_set_sensor_config ( [in ] dms_process_index_t process_index, [in,out] dms_configs_t** sensor_config_list ); 10.5.3. Function input (a) `process_index' -- Shorthand provided by NPCS via `dms_pri_register_process()'. (b) `sensor_config_list' -- A list of sensor identifiers and requested configuration states. 10.5.4. Function output (a) `sensor_config_list' -- A list of sensor identifiers and resulting configuration states for sensors that could NOT be set to the requested level. (b) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 74 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 10.5.5. Errors (a) `CHECK_INTERNAL_STATUS' -- Sensor configuration not changed due to non-existent sensor, illegal request, or previous state is mutually exclusive of requested state. The status of each failing sensor request is returned in the internal status fields of the `sensor_config_list'. (b) Individual sensor errors are summarized in section 7.6. 10.5.6. Engineering notes (a) The `process_index' is an input parameter for use by the encapsulated library to identify the requested observer. (b) This function input can support setting sensors to a threshold level, even though this version of the specification requires this is a function of the NPCS for standard sensors. However, custom sensors might support thresholds that the NPCS cannot. Consequently, if no threshold is settable on the sensor of interest, then return `NO_THOLD' as an error. 10.6. dms_pmi_get_sensor_data() 10.6.1. Description This function is provided as a "polling" interface that obtains current sensor data as function output. The function returns data for each sensor requested, whether the sensor data has changed in the last interval or not. A timestamp is also returned so that this data can be correlated with other measurements in the cell. This function is not directly callable by a PMA, but is only invoked when the `dms_npmi_get_sensor_data()' function is invoked with the `bypass_cache' flag set to TRUE. This function has the side-effects of resetting all sensor minimum and maximum values. The observer's next scheduled reporting interval, if there is one, is not affected. To prevent these side- effects from affecting other PMAs that receive their data in the recommended way, a PMA using this function must first set the sensor reporting interval to `NO_REPORT_INTERVAL'. This interval value is also used by the NPCS to ensure that only one PMA in the cell can access this sensor using this function. This function is not the recommended method of obtaining sensor data, but is provided for compatibility with existing management applications (such as SNMP), and to support client-only PMAs. The recommended mode of access is using the PRI `dms_pri_report_sensor_data()' function, which is more efficient and scalable. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 75 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 10.6.2. Function signature error_status_t dms_pmi_get_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list, [ out] dms_observation_data_t** sensor_report_list ); 10.6.3. Function input (a) `process_index' -- Shorthand provided by NPCS via `dms_pri_register_process()' (b) `sensor_id_list' -- A list of sensor identifiers as assigned by the NPCS via the `dms_pri_register_sensor()' function. 10.6.4. Function output (a) `sensor_report_list' -- Returns a list of sensors and individual values, and a timestamp that corresponds to when the observer returned the data. (b) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 10.6.5. Errors (a) `UNKNOWN_SENSOR' -- Sensor does not exist, or unknown sensor identifier. (b) `SENSOR_NOT_CONFIGURED' -- Sensor not configured to collect data. (c) `SENSOR_CONFIG_CONFLICT' -- Sensor not configured for access via this method, since its reporting interval was not set to `NO_REPORT_INTERVAL'. 10.6.6. Engineering notes (a) This function returns data in the same format as supplied by `dms_pri_report_sensor_data()'. The IPC mechanism for non-RPC implementations of the encapsulated library is implementation- dependent but must support this function's input and output parameters. (b) This function output does not include the sensor data component containing the metric threshold value for this reporting interval, since that is a property of the NPCS for standard sensors. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 76 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (c) The timestamp returned in the `sensor_report_list' is obtained by the observer at the end of the reporting interval, i.e., after it has prepared sensor data for transport but just prior to actually transporting the data. 11. PRI INTERFACE The PRI's primary purpose is to provide an efficient, interprocess data transportation channel for observer-to-NPCS communication. Specifically, the PRI supports routines to register processes (observers) and sensors, transmit (push) sensor data between the instrumented process's address space and the NPCS's, and unregister processes (observers) and sensors. The observer is the only DMS element allowed to invoke these routines. The registration routine is invoked prior to providing any data collection or support of PMI routines. The PRI is implemented as either an RPC server interface exported by the NPCS, or as an IPC mechanism. 11.1. PRI IDL The complete IDL is located in appendix H. 11.2. dms_pri_el_initialize() 11.2.1. Description This utility function is necessary to initialize the encapsulated library. It records the PMI procedures in private variables, and takes whatever steps are required to locate the communication path to communicate with the instrumented process. The exact nature of these steps depend on the particular implementation of the PMI/PRI interface. Possibilities include, but are not limited to: (a) Opening a FIFO of known name for writing. (b) Calling dciRegister (see this function description in [CMG]). (c) Obtaining a binding to the NPCS DCE RPC interface which accepts the PRI procedures as RPCs. (d) Attaching to the shared memory segment created by NPCS, and creating the PMI talker thread to monitor an input queue of messages from NPCS. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 77 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 11.2.2. Function signature error_status_t dms_pri_el_intialize ( [in ] dms_pmi_set_config_fp_t pmi_set_sensor_config, [in ] dms_pmi_get_data_fp_t pmi_get_sensor_data, [in ] dms_pmi_terminate_fp_t pmi_terminate ); 11.2.3. Function input (a) `pmi_set_sensor_config' (b) `pmi_get_sensor_data' (c) `pmi_terminate' These are all callback (local) procedures provided by the instrumented process, that are invoked by the encapsulated library whenever the corresponding PMI procedure is invoked by the NPCS. These procedures have identical signatures to their corresponding PMI procedures. 11.2.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 11.2.5. Errors (a) `FUNCTION_FAILED' -- Initialization function failed due to an internal encapsulated library error. 11.2.6. Engineering notes (a) The details necessary to support the specific IPC mechanism are implementation-dependent, and transparent to this function. 11.3. dms_pri_el_free_outputs() 11.3.1. Description This utility function is necessary to initialize free output data in the encapsulated library, encapsulate RPC free memory functions, and eliminate possible memory leaks. 11.3.2. Function signature error_status_t dms_pri_el_free_outputs ( [in,ptr] dms_instance_dir_t* sensor_register_list /*null == absent*/ ); Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 78 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 11.3.3. Function input (a) `sensor_register_list' -- A pointer to the sensor registration list that the programmer desires to free allocated memory. 11.3.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 11.3.5. Errors (a) `FUNCTION_FAILED' -- Initialization function failed due to an internal encapsulated library error. 11.3.6. Engineering notes (a) The details necessary to support the specific free memory mechanisms are implementation-dependent, and transparent to this function. 11.4. dms_pri_register_process() 11.4.1. Description This interface is invoked by instrumented DCE processes to provide the NPCS with the data necessary to build and maintain the node-level sensor registry. The observer in a DCE process uses this interface to register process specific state. 11.4.2. Function signature error_status_t dms_pri_register_process ( [in ] dms_string_t* process_name, [in ] long process_pid, [ out] dms_process_index_t* process_index ); 11.4.3. Function input (a) `process_name' -- A string that contains the `argv[0]' value of the instrumented DCE process. (b) `process_pid' -- The value returned by `getpid()'. Note that these function inputs are described for an operating system exporting a POSIX-conformant interface. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 79 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 11.4.4. Function output (a) `process_index' -- Shorthand reference for future observer-to- NPCS communication; assigned and maintained by the NPCS. (b) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 11.4.5. Errors (a) No errors are returned for this call. The observer is blocked until this call successfully returns. This supports the start/restart policies described in section 13.8. 11.4.6. Engineering notes (a) The process identifier (PID) must be returned in a operating system independent fashion. (b) The `process_index' is used by the encapsulated library to determine which PMI/observer requested NPCS action using the PRI. (c) Non-RPC implementations must be able to provide secure control and communication mechanisms if necessary. Not all IPC mechanism support a secure one-reader/_N_-writer model that is required for the NPCS and the _N_ observers on the node. (d) The lack of a properly executing NPCS must not reduce the availability or reliability of the instrumented DCE process. (e) The instrumentation must not impact the instrumented process's execution state or functional behavior. The observer must invoke `dms_pri_register_process()' prior to invoking `dms_pri_register_sensor()'. This ensures proper behavior of the registration process in environments where all of DCE or DMS are not yet executing. In addition, an observer blocked in `dms_pri_register_process()', or an observer that has not yet invoked `dms_pri_register_process()', must not prevent sensors from calling their registration macros in a non-blocking fashion. The registration macros must enqueue the registration data so that it is available to the observer after it is un- blocked. (f) An observer is the only element allowed to invoke the PRI routines. Sensors must use the sensor macros that will trigger out-of-line observer actions. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 80 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 11.5. dms_pri_register_sensor() 11.5.1. Description This function allows observers to provide the data to the NPCS to build the node level sensor registry. Standard and custom sensors within the process address space are registered by the observer using this function. The NPCS returns a sensor identifier that is used for all subsequent references to the registered sensor. Sensors can be registered singly or in bulk. For efficiency, bulk registration should be used wherever possible. Since most DCE processes will contain dozens to hundreds of sensors, a bulk registration significantly reduces the RPC/IPC access overhead. It is our assumption that the standard sensors (i.e., client, server, and global sensors) reside in DCE RTL, stubs, and DCE services (such as `secd' and `cdsd'). The custom sensors are those added by middleware components providers (such as Encina and DFS), and application client or server developers. 11.5.2. Function signature error_status_t dms_pri_register_sensor ( [in ] dms_process_index_t process_index, [in,out] dms_instance_dir_t** sensor_register_list ); 11.5.3. Function input (a) `process_index' -- Shorthand provided by NPCS via `dms_pri_register_process()'. (b) `sensor_registration_list' -- Specifies one or more sensors to register. Configuration data includes sensor name, sensor attributes and metric attributes. 11.5.4. Function output (a) `sensor_registration_list' -- The structure passed as input is returned with the sensor identifier and registration status fields set. (b) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 81 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 11.5.5. Errors Returned for entire call (i.e., summarizes results for all sensors that requested registration). (a) `CHECK_INTERNAL_STATUS' -- One or more sensors failed to register (see individual status for details). Check the status contained within the returned structure for details. (i) `registration_status' -- Registration results for this particular sensor; one of: [a] `STATUS_OK' -- Sensor registered with no problems. [b] `DUPLICATE_SENSOR' -- Sensor already registered. [c] `ILLEGAL_NAME' -- Sensor name not legal. [d] `ILLEGAL_CLASS' -- Unknown sensor class. [e] `ILLEGAL_METRIC' -- Unknown metric identifier. (b) `UNKNOWN_PROCESS' -- Process has never registered. (c) `NO_NPCS' -- NPCS not present. Unlike the `dms_pri_register_process()' function, the observer does not block if the NPCS is not present. On receipt of this error, the observer should initiate the restart policy described in section 13.8. 11.5.6. Engineering notes (a) The observer in the instrumented DCE process should minimize the number of times it utilizes this expensive IPC mechanism by using bulk registration whereever possible. (b) Standard sensor metric IDs must be defined and consistently maintained for each release of the nstrumentation system. (c) Any PMA that requests a greater protection level then specified by the `minimum_protection_level' will have to decide whether to continue (see `dms_npmi_register_pma()'). The highest `minimum_protection_level' requested during the registration of sensors will be applied to ALL sensor data transported from this node to the PMA via the NPRI. This may cause excessive overhead, so use with caution. (d) For an application that desires to support "all or nothing" semantics for registering a group of sensors, in the case of failure, all sensors with a `registration_status' of `STATUS_OK' should be immediately unregistered, using the Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 82 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 `dms_pri_unregister_sensor()'. (e) For non-RPC interfaces, the encapsulated library might generate a UUID and associate it with the sensor, and store it as its internal representation for the sensor identifier. Note that this specification does not require that the sensor identifier be unique for the cell, just unique for the node. (f) Descriptive strings are necessary to name server interfaces and operations when presenting data to the end user. These `friendly names' require extensions to IDL to support a new structure in the stub or RTL that contains the string names. An API to retrieve these via the RTL must also be specified. The details of this are beyond the scope of the specification, but must be supported in the encapsulated library. (g) Custom sensor registration requires a global repository for storing this data. The use of the DCE CDS to store metric name and instance, metric type, and help text is recommended. The utilities necessary to store this are beyond the scope of this specification. `metric_id' numbers for custom sensors must be unique within the process. This requires a utility function (not described in this spec), get_metric_id()', that returns a unique `metric_id each time it is invoked. Additional details regarding the need for a global repository are described in section 7.4. 11.6. dms_pri_report_sensor_data() 11.6.1. Description The observer uses this NPCS interface to report (push) modified sensor data during the last reporting interval. This allows the observer to report sensor data in an efficient manner, since it does not require the NPCS to poll for the next request and returns sensor data in bulk. To speed up the performance of the steady-state path, it is not required that this function return errors synchronous with each call. Errors are guaranteed to be returned no later then by the next invocation of this function. Any data associated with bad status may be lost. 11.6.2. Function signature error_status_t dms_pri_report_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_observation_data_t* sensor_report_list ); Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 83 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 11.6.3. Function input (a) `process_index' -- Shorthand provided by NPCS via `dms_pri_register_process()'. (b) `sensor_report_list' -- One or more sensors and their component values are contained in this structure. See section 7.3.1 for additional details. 11.6.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 11.6.5. Errors (a) `REPORT_FAILED' -- Unknown error prevented NPCS from updating sensor data values (possible causes include lack of resources or execution time of NPCS). (b) `NO_NPCS' -- NPCS not present; observer should begin clean-up process. 11.6.6. Engineering notes (a) The state diagram in Figure 10 shows the behavior of the observer with respect to providing data to NPCS. All the state transitions occur only at interval boundaries, with the exception of the NoMod -> Config, and Data Modified -> Config state transitions, which occur asynchronously with intervals. A copy of this state machine exists for each sensor. The input to the state machine is a modification flag set by probes and cleared by the observer. The objective is to report only non- zero or non-modified sensor data for an interval. This is in keeping with our philosophy to report only the minimum required data using the PRI and NPRI interfaces. (b) To ease implementation of the encapsulated library and speed the performance of the steady-state path, it is not required for the function to return errors in a synchronous manner. It is only required that errors be returned at some future point in time (but no later then by the end of the next invocation of this function). Any data associated with bad status may be lost. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 84 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 # ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.] *Figure 10.* `dms_pri_report_sensor_data()' sensor state machine. Sensor data is pushed to the NPCS only if it was modified during the current reporting interval. 11.7. dms_pri_unregister_sensor() 11.7.1. Description The observer uses this NPCS interface to notify the NPCS that one or more sensors can be removed from the node-level sensor registry. This allows the NPCS to free resources associated with these sensors. In most cases, groups of sensors are unregistered only in the (unlikely) event of a server unregistering an interface. 11.7.2. Function signature error_status_t dms_pri_unregister_sensor ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list ); 11.7.3. Function input (a) `process_index' -- Shorthand provided by NPCS via `dms_pri_register_process'. (b) `sensor_id_list' -- A list of sensor identifiers to unregister. 11.7.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 11.7.5. Errors (a) `NOT_REGISTERED' -- One or more sensors were never registered. (b) `NO_NPCS' -- NPCS not present. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 85 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 11.7.6. Engineering notes (a) As sensors are unregistered the NPCS should use a "recycling" algorithm that does not attempt to re-use recently freed sensor identifiers. This will minimize the chance that PMAs will confuse cached but "stale" sensor identifiers with the incarnation of a new sensor. 11.8. dms_pri_unregister_process() 11.8.1. Description An observer uses this NPCS interface to notify the NPCS to remove all of the sensors in the instrumented DCE process from the node-level sensor registry. This allows the NPCS to free resources associated with the unregistering process. 11.8.2. Function signature error_status_t dms_pri_unregister_process ( [in ] dms_process_index_t process_index ); 11.8.3. Function input (a) `process_index' -- Shorthand provided by NPCS via `dms_pri_register_process()'. 11.8.4. Function output (a) _Function return value_ -- `dms_status' -- Status of call; non-zero if call encountered an error. 11.8.5. Errors (a) `NOT_REGISTERED' -- Observer was never registered. (b) `NO_NPCS' -- NPCS not present. 11.8.6. Engineering notes (a) None. 12. ADDITIONAL OBSERVER AND NPCS FUNCTIONS This section describes additional functions supplied by the two standard mechanisms: the observer and the NPCS. Core functions were described in the relevant API sections. This section focuses on additional functionality necessary for the implementor of the measurement system to provide. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 86 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 12.1. Observer Functions Core observer functions were described in sections 7, 10 and 11. The additional responsibilities are expressed in terms of an idealized implementation. It is possible that the responsibilities outlined here might require, or benefit from multiple observer threads. (a) _Intervalized Capture of Raw Sensor Data._ A snapshot of the raw data for each "active" sensor in an address space (process) must be made at the end of each summarization interval, by the data intervalizer executing on the observer thread. The notion of an `active' sensor is any sensor that has reached the end of its summarization interval, and has had execution of some thread pass through its final probe point during that interval (i.e., the sensor has produced some raw data from which its metric can be computed). This frees sensors from any direct responsibility for interval summarization, and provides the basis for time correlated metrics. (b) _Computation of Intervalized Sensor Metrics._ All sensor metric computations that are performed once per summarization interval are made on the snapshot raw data, by the metric calculator executing on the observer thread. This helps to minimize in-line sensor overhead. An example of this is the computation of mean response time, where the observer calculates the mean by dividing the cumulative response time by the number of completions. (c) _Probes/Sensors for Process Global Sensors._ Any interval sensor, i.e., a sensor that has probes executed once and only once each summarization interval, independent of any ("normal") thread, will precede the data intervalizer's execution on the observer thread. This provide the means of supplying process global metrics that are independent of any other sensors, and minimizes overhead by collecting them out of the application's in-line path. Most of these sensors are described in section 5.6. 12.2. NPCS Functions A Performance Management Application (PMA) is the "value-added" performance management and display application supplied by a vendor or third party. The PMA interacts with NPCS from across the network. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 87 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 The NPCS is a "trusted process", but is used only for the collection and control of performance data. It should run as a non-privileged user. Core NPCS functions were described in sections 7, 8, 9, 10 and 11. (a) _Multiplexer.' The NPCS is a many-to-one funnel for sensors on a node. It fulfils a similar function for the users of the data as well. While there may be many management stations wanting information, the NPCS buffers these requests so the sensors in the application server or client process do not have to manage multiple logical connections. The local sensor mechanism needs only to move the latest information to the (single) NPCS at the required rate, and for the requested information set. Then, NPCS will satisfy the various demands of the management stations requesting information. As such, it handles the state structures required to most efficiently assemble and move requested information to the performance management applications. (b) _Unused State Recovery (garbage collection)._ NPCS may be implemented as a long-running daemon. Memory leaks in any form would be debilitating for a standard, required daemon. NPCS must have measures to identify sensors which have disappeared for whatever reason (e.g., process containing the sensors is killed or crashes). The memory and state associated with these sensors must be completely recovered. Similarly the state associated with defunct or disinterested PMAs must be recovered when the connection with the PMA is broken or unused. (c) _LCD Time Management._ As part of NPCS's role as multiplexer, it instructs the sensors in processes on the local node to report at the "least common denominator" (LCD) time interval to handle the requests from performance management applications. A bound would be selected that limits the time intervals that can be selected. For those performance management applications requesting relatively longer time intervals, NPCS summarizes multiple reports from the servers/clients reporting information on that node at the lower rate, and transmits only the data requested by the PMA. This is in keeping with our philosophy of transmitting the minimum data necessary across interfaces. (d) _Transmitting Bulk Data for Efficiency._ In the steady state, the NPCS will be supplying data to a PMA for several dozen, or even hundreds, of sensors. If each Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 88 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 sensor is provided in a separate communication (RPC), the measurement system specification goals cannot be met. Thus the NPCS batches data at regular intervals from numerous sensors bound for a particular PMA. (e) _Non-POSIX and Partial-DCE Implementations._ On systems which are DCE-compliant, or which have some RPC mechanism of interest (but not truly DCE), a form of NPCS must be made available if data is to be collected. Perhaps in its translation capability, NPCS can be made available to management stations, even if running on a PC or non-POSIX operating system. 13. ENGINEERING ISSUES This section documents all engineering issues related to the measurement system that were not described elsewhere in this document. 13.1. Conformance The minimum functionality that is required to support this specification is: (a) Standard sensors described in section 5 (custom sensor support is optional). (b) NPMI API described in section 8. (c) NPRI API described in section 9. (d) PMI API described in section 10. (e) PRI API described in section 11. (f) The PMI/PRI encapsulated library mechanism described in section 7.1, or an RPC interface. (g) Security as described in section 7.5. (h) Internationalization as described in section 13.9. (i) Supplemental observer functions described in section 12.1. (j) Supplemental NPCS functions described in section 12.2. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 89 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 13.2. Encapsulated Library The requirements on the underlying implementation of the encapsulated library are that it correctly implements the various functions. A few points are emphasized here: (a) "Who starts first" issues should be resolved so that the `dms_pri_register_process()' call by a process observer thread of an instrumented process will block until it has "executed" in the NPCS. The case of single-threaded DCE clients could be handled by immediately returning a "no NPCS yet" status. Checking the CMA value `cma__g_thrdcnt' can be used to determine multi-threaded support. Sensors being registered by other threads in the process will need to be queued for later registration with the NPCS, but these threads cannot be blocked because the NPCS may not ever appear. (b) Since the library is emulating a procedure call mechanism, calls should be synchronous and return accurate status. The exception to this is `dms_pri_report_data()'. Because this is a bulk data transfer mechanism, it can return immediately, improving its efficiency. Note the caller of `dms_pri_report_sensor_data()' must be permitted to deallocate the input `dms_observation_data_t' data structures as soon as the call returns. This implies that either the return must be delayed, the data must be copied before returning, or some other (more complicated) PMI deallocation callback must be added if the underlying implementation permits, to allow more data to be queued. Errors may be reported later, on subsequent calls. Also, the possibility exists that a failing NPCS will cause a `dms_pmi_terminate()' callback, rather than bad status on a subsequent (_pri2_) call. (c) When the NPCS or an instrumented process fails, the library should emulate a call (i.e., invoke the appropriate "server" procedure) to `dms_pmi_terminate()' or `dms_pri_unregister_process()', to allow the other end to clean up. (d) The observer thread must be prepared to replay its sensor registrations in the event of a crash, and restart of the NPCS. It should wait for the NPCS to restart by recalling `dms_pri_register_process()'. (e) The library will need to monitor the process identifiers assigned by NPCS and returned by `dms_pri_register_sensor()', in order to maintain a mapping from them to communications paths. (f) The library will provide functions, `dms_pmi_el_free_outputs()' and `dms_pri_el_free_outputs()', to handle the deallocation of Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 90 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 output data structures. This permits the underlying memory management mechanisms to be the responsibility of the allocating module (NPCS, `npcs_lib', `observer_lib', DCE process). This also implies that in/out parameters need to be handled correctly to avoid memory leaks (i.e., save a copy of input pointers). (g) The underlying IPC mechanism in this library must never block an entire process when used by either the NPCS or the observer. Blocking a single thread is acceptable. (h) If an observer is blocked in `dms_pri_register_process()', then sensors must be allowed to continue to invoke the registration macros. Individual sensor data is then enqueued until the observer is unblocked and able to process the sensor registration requests. The observer then processes these sensor registrations in bulk using the `dms_pri_register_sensor()' call. 13.3. DCE RTL The DCE RTL needs to support a mechanism that allows client process to be identified and contacted if necessary for monitoring purposes. Additional investigation is necessary to understand how to collect and report data for "nested RPCs" (i.e., an RPC that invokes a server, that causes the server to act as a client and invoke a different server). 13.4. DCE IDL The DCE IDL must support a structure in the stub that contains data to construct "friendly named" sensors, since the RTL knows server operations by a UUID and an operation number (which is not very meaningful to a system administrator). 13.5. Other DCE Services After the RTL is instrumented, all DCE core services should be recompiled to incorporate the instrumented `libdce'. 13.6. Sensor Information Sets Since this capability is represented by a set, and individual sensors can support subsets, then it is a policy that all sensor data value components be returned in order of their definition in `dms_info_set_t'. If a particular sensor does not support a given set component, it must return NULL values in this sensor data value component location in `dms_sensor_data_t'. This also allows new set components to be defined and processed for future versions, as long as no set value is ever reused. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 91 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 13.7. Application and DCE Availability The non-existence or errors of the elements of the instrumentation must not decrease the availability of applications or DCE core services. This restriction reinforces the notion that the instrumentation is an aid to management, and not a hindrance. 13.8. Instrumentation Initialization and Restart The instrumentation system must not decrease the availability of DCE applications or core services. Initialization and recovery of the measurement system are controlled to minimize impact on applications and core services. Thus this specification addresses a measurement system that supplements application and DCE core service functionality, and simplifies the design by eliminating recoverable data state mechanisms such as checkpoints. Start-up dependencies are a crucial issue that must be addressed to ensure a robust implementation. An example of the problem illustrates the challenge: If the NPCS starts execution on a node prior to security or naming services, then the NPCS cannot provide secure communications (since this requires using a DCE login context that is not available without a security service). And if the NPCS on the same node as the security server starts execution after the security server, then the observer in the security process cannot register sensors (since this requires an NPCS supporting the PRI functions). To resolve this dependency problem, a lazy connection strategy that allows elements to defer initialization and registration if the requested server component is not currently available is recommended. For the example in the previous paragraph, the security service defers registering sensors until the NPCS is available. The observer maintains registration context and periodically tests until the NPCS is available to complete registration. The NPCS has less of an issue since it responds to observer requests and does not initiate them. This technique has the benefit of allowing upgraded or failed NPCSs to be restarted in a live environment with no impact on application availability (although no performance data is available during the interval of NPCS inactivity). Specifically, the following scenarios must be supported in conforming implementations. For each scenario the implementation policies are described. 13.8.1. Cell/node/process start-up In this scenario, the cell, node, and instrumented DCE process start up for the first time. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 92 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 Assumptions/requirements: (a) NPCS requires access to security services and a DCE login context to support secure NPMI/NPRI functions. (b) Security services registering sensors requires an executing NPCS. Recommendation: (a) Start the DCE core services in normal order. (b) Observers within DCE core services block in `dms_pri_register_process()' since no NPCS exists on the node. While the observer is blocked, sensors must still be able to register within the process (but no calls to `dms_pri_register_sensor()' are allowed until the observer unblocks on `dms_pri_register_process()'). The observer is a separate thread, so there is no impact on the instrumented application. (c) Start NPCS and authenticate with security service. (d) Blocked observers are serviced by the NPCS, they unblock, and then they register sensors using the `dms_pri_register_sensor()' function. (e) PMA can login, authenticate, and begin monitoring. 13.8.2. Node restart In this scenario, the node is restarting after a planned or unplanned shutdown. Assumptions/requirements: (a) Sensors are initialized when processes restart. (b) NPCS state of sensors on node is lost. (c) PMA is not aware that node is restarted. Recommendation: (a) For NPCS, observer and sensor: (i) Follow policy in cell/node/process start up. (b) For PMA: Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 93 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (i) PMA stops hearing from the NPCS via the NPRI. This does not apply to client-only PMAs (COPs), since they do not support the NPRI. (ii) Invoking any NPMI routine results in an RPC communication failure error (if the NPCS is not executing), or results in a "who-are-you" RPC status (if the NPCS has restarted but PMA not registered). (iii) PMA resets its internal sensor configuration state for all sensors on this node (since the observer will return all sensors to a quiescent state). (iv) After a user-configurable time, PMA re-registers with NPCS. 13.8.3. PMA terminate and restart In this scenario, the PMA unexpectedly terminates and restarts. The NPCS and sensors are unaware of this event. Assumptions/requirements: (a) PMA state of sensors on node is lost. (b) NPCS and sensors are not aware that PMA has failed/restarted. Recommendation: (a) NPCS invokes NPRI functions that result in an RPC communication failure. This does not apply to client-only PMAs, since they do not support the NPRI. (b) After a user-configurable time: (i) For non-COPs: The NPCS ceases to invoke NPRI routines and resets sensors configured only by this PMA to a quiescent state. (ii) For COPs: Since there is no direct mechanism for the NPCS to test COP liveness, the NPCS periodically checks for when the last request was made by this PMA, and resets sensors configured only by this PMA to a quiescent state if the PMA has not made a recent request. The maximum period for client-only PMA inactivity is 7 days. This allows COPs to sample the instrumentation on a low- frequency basis, while minimizing resource consumption of the NPCS's internal tables. (c) PMAs reregister after restarting. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 94 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 13.8.4. NPCS shutdown and restart In this scenario, the NPCS process gracefully exits. Assumptions/requirements: (a) NPCS's state of sensors on the node is discarded. (b) PMA is not aware that NPCS has terminated. (c) Observer and sensors are informed that NPCS has terminated. Recommendation: (a) For NPCS: (i) NPCS invokes `dms_pmi_terminate()' prior to exiting. This informs the encapsulated library that the NPCS is no longer available. (b) For observer and sensors: (i) Same as NPCS shutdown/restart, described above. (c) For PMA: (i) Same as NPCS crash/restart, described below. 13.8.5. NPCS crash and restart In this scenario, the NPCS unexpectedly terminates and restarts. The PMA and sensors are unaware of this event. Assumptions/requirements: (a) NPCS state of sensors on node is lost. (b) PMA, observer and sensors are not aware that NPCS has failed/restarted. Recommendation: (a) For PMA: (i) Same as node restart, described above. (b) For observer: (i) On the receipt of an PRI function that results in an error in communicating to the local NPCS, the encapsulated library must set a global flag that informs Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 95 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 all observers that the NPCS has terminated. Note that the encapsulated library must provide a synchronous mechanism to notify observers that the NPCS has terminated. Otherwise, an observer that is not currently reporting data will be "lost" and not reachable when the NPCS restarts. (ii) Observers reset all sensors to a quiescent state. (iii) Observers unregister (this must break the current connection with the encapsulated library, and clean up any encapsulated library state related to this observer). (iv) Observers re-register. This is like the node start up. (c) For NPCS: (i) Same as node start up. 13.8.6. DCE process shutdown In this scenario, the instrumented DCE process gracefully exits. Assumptions/requirements: (a) Sensors within the process are deleted. (b) NPCS is informed of sensor deletions so that it can free resources. (c) PMA is not informed of sensor deletions. Recommendation: (a) Observer invokes PRI unregister sensor function to communicate sensor termination. (b) NPCS removes these sensors from its registry. (c) PMA is informed implicitly by errors returned on explicit NPMI get and set operations. (d) PMA removes these sensors from its registry. 13.8.7. DCE process crash and restart In this scenario, the instrumented DCE process unexpectedly terminates. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 96 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 Assumptions/requirements: (a) Sensors within the process are deleted. (b) NPCS is not informed of sensor deletions. (c) PMA is not informed of sensor deletions. Recommendation: (a) Observer terminates before it can invoke PRI unregister sensor function. This requires that the encapsulated library provide an implementation dependent mechanism for detecting observers and sensors that are no longer executing. (b) An encapsulated library dependent routine informs the NPCS of observer termination. The NPCS removes all observer's sensors from its registry. (c) PMA is informed implicitly by errors returned on explicit NPMI get and set operations. (d) PMA removes these sensors from its registry. (e) After instrumented DCE process is restarted, the situation is the same as cell/node/process start up. 13.8.8. Network partition In this scenario, the PMA and NPCS are separated by a network partition. Assumptions/requirements: (a) Network partition is not directly detectable by neither PMA nor NPCS. Recommendation: (a) For the NPCS, same as the PMA crash/restart described above. (b) For the PMA, same as the NPCS crash/restart described above. 13.9. Internationalization Sensors contain cleartext descriptions that assist the end-user in interpreting the metric values. These descriptions are contained in a help text string. This string must support internationalization conventions as described in the various DCE RFCs on internationalization. Sensor names conform to the DCE portable character set. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 97 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 13.10. Integration with Host OS; X/Open Data Capture Interface (DCI) The DCI provides a standard interface to operating system performance data. The spec was submitted to X/Open in early 1994. That technology was evaluated for support by the functions in this specification. However, due to the concerns of availability and the uncertainty of the final shape of that standard, this specification does not explicitly support the DCI. But the following areas have been influenced by the DCI X/Open standard proposal: (a) Namespace. (b) Security. (c) Node Data Communication and Storage (between observer and NPCS). A list of DCE instrumentation requirements was provided to the authors of the DCI, for possible incorporation into the X/Open spec. 13.11. Instrumenting the Instrumentation System It may be desirable to collect performance measures on the four APIs themselves. The activities associated with these APIs should not be included in the totals for the process. Optionally, they should be measurable by a PMA just like any other interface. The implementations of the observer, NPCS, and the four APIs must support self-instrumentation. 13.12. Design Rationale This section describes several factors that influenced our design and recommendations. 13.12.1. Considerations of scale The measurement infrastructure must perform efficiently over a wide range of network topologies and cell sizes. While our design supports monitoring across cells, the primary monitoring functions will align with the administrative domain of the cell. Table 2 illustrates the scale of the measurement system from a server perspective (clients are not included, although they represent a potentially larger pool). The table estimates the following quantities to gauge the demands placed on the measurement system (DCE specific terminology is used): (a) The number of sensors per server manager operation. (b) The number of manger operations per manager. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 98 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (c) The number of managers per server interface. (d) The number of interfaces per server. (e) The number of application servers per network node. (f) The number of network nodes per DCE cell. (g) E table then summarizes the two quantities. (h) The number of sensors per network node. (i) The number of sensors per DCE cell. The number of operational sensors on a single node is large (500- 8,000), and the number in a cell is very large (50,000-8,000,000 or more). (Note that transaction processing and distributed object applications may support a dozen or more interfaces. This may increase the actual number of sensors in a cell.) These estimates, however, are probably pessimistic with respect to the number of active sensors, since cells will contain a large number of different applications in different domains that are managed separately and therefore require fewer active sensors. +---------------------+-------------+------------+ | | "Typical" | "Large" | | | Application | Application| +=====================+=============+============+ |Sensors / Operation | 10 | 20| +---------------------+-------------+------------+ |Operations / Manager | 5 | 10| +---------------------+-------------+------------+ |Managers / Interface | 1 | 1| +---------------------+-------------+------------+ |Interfaces / Server | 1 | 2| +---------------------+-------------+------------+ |Server / Node | 10 | 20| +---------------------+-------------+------------+ |Nodes / Cell | 100 | 1,000| +---------------------+-------------+------------+ |Sensors / Node | 500 | 8,000| +---------------------+-------------+------------+ |Sensors / Cell | 50,000 | 8,000,000| +---------------------+-------------+------------+ *Table 2.* Instrumentation Scale Considerations. Having control over the sensor state is crucial for meeting measurement system overhead goals. This is accomplished by the end- user judiciously selecting the information sets for the sensors of interest. Only sensors of interest can be enabled and collected. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 99 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 The above estimates do not include the number of active client sensors. This specification expects that only rarely will all clients have active instrumentation, due to excessive loading of node and network alike. To improve scalability of the measurement system it is expected that only a few clients are monitored at any time per application, in order to gather status and response times as proxies for others on the same node or in the same network. One final practical limitation for clients is that DCE does not support an identification mechanism for locating clients (only servers that register with the CDS). 13.12.2. Transporting data: pushing versus polling A major implementation issue of the measurement system was whether to transport data by periodically "pushing" it across the network, or forcing PMA's to explicitly request or "poll" for data, similar to the SNMP philosophy. After significant discussion, it was decided to require NPCSs to push data to PMAs. Basically, the reasons why the push model was selected for implementation follows: (a) _Scalability._ Since the situation is really a large number of servers (sensors) pushing to a smaller number of NPCSs (e.g., 1 per system), which in turn pushes to a very small number of PMAs (maybe 1-10 per enterprise), then pushing scales better than polling potentially thousands of sensors to find only those with new data. In fact, keeping the amount of data sent small is very important for network utilization and scalability. Pushing also allows thresholds to be used, and significantly reduces the amount of data sent, even for the largest of systems. (b) _State._ In the push case, the pusher needs to keep state information about all its consumers (PMAs). It needs to know who, where and when. It also needs to know if a data item has not been delivered. Moreover, only the NPCSs know exactly when the data for the PMA is available. Storing this state is simpler for NPCSs, because of the small number of PMAs registered at any one moment. In pull, the NPCSs would not be able to ignore the state information. Since no real saving in state is possible, the push case minimizes the state for PMAs. The PMAs will get cumulative data so they won't loose information if a sample is dropped, and they can tell if a sample is dropped or stale from timestamps. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 100 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (c) _Serialization._ Although push is inherently serial, NPCSs can start multiple threads to push, but an NPCS thread is blocked during the push. (It may take some time for the PMA to respond.) Most important, since there are often practical limits to the number of active threads, the NPCSs would have very few active threads for push, while PMAs would have to have a large number of threads for parallel pulls. For scalability issues, NPCSs would have a limited pool of threads to push. There would normally be enough to dedicate one per PMA, but a pool would remove any hard limit. (d) _Storage._ This is an advantage since NPCS controls the flow of data, it can discard data that has been delivered to all interested parties. It also does not need to maintain a queue of requests. However, it does need to maintain a table of state information on ALL PMAs. In addition, the assumption was made that all data for a sample to a PMA would be packaged together into a single push. (e) _Traffic._ Because of the need to ensure (if not guarantee) delivery of the data to PMAs, the push is at least a data ACK pair. Pulls would require one more message. In addition, to minimize traffic, only data is sent and packaged into one response per sample to the PMA. A stateless pull (like NFS) would require state information in the pull, which increases traffic. (f) _Scheduling._ Since the sensors have an observer thread that is pushing to the NPCS, the timing of when to send the sample data to the PMA is only precisely known to the NPCS. That makes the scheduling of the data send time easy for the NPCS. Most important, for thresholds where data is only sent when a value is exceeded, the NPCS is the ONLY place that knows when this occurs, and that a data send is required. A pull would require the NPCS to wait and collect all the information anyway. There is still an issue for scheduling of the PMA's data reduction, and correlation with the data arriving asynchronously from many NPCSs. However, since that is the highest level of the measurement system, and is the element with the least time sensitivity in the measurement system, it was considered an acceptable requirement. There may be several receiver threads, or one simply collecting data. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 101 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (g) _Error Handling._ For the push model, data is flowing to the PMAs from the NPCSs. By providing timestamps and cumulative data, the PMAs can deal with missing data by either extrapolating, skipping, or another "make right" strategy. As far as dealing with failures, the NPCSs would know who and where they were sending data to, so the lack of a PMA ACK indicates a failed PMA, which allows the NPCS to free up resources belonging to that PMA. NOTE: Even though the steady-state system is push-based, it was decided that a polling request function would be included in the NPMI to support special PMAs. This allows flexibility for something like a pull if used infrequently. The reason this is required is for SNMP support, client-only PMAs, and PMAs that register only thresholds but have not seen any data for awhile. A pull request allows the PMA to see the current data even if no thresholds were exceeded. 13.12.3. Sensor placement This section describes sensor implementation issues and placement locations within the RPC runtime library (RTL). The fundamental implementation question regards the placement of the sensors: Are they generated by the IDL compiler and placed in the stubs, or are they an integrated part of the DCE kernel (runtime library)? Instrumenting the stubs using IDL has merit. Coupled with an internal tracing tool, these form very powerful application development/debugging utility. Unfortunately, for performance monitoring of arbitrary applications in a large environment, the IDL approach has several shortcomings. First, sensors within stubs are visible to application developers, and thus modifiable by them. This is not safe for standard functions. Sensors within the RTL are not modifiable by the application writer. Second, supporting standard libraries is a pragmatic software engineering technique that minimizes implementation divergence in production environments. It also provides extensibility without the need to recompile an application's source code (users dislike recompilation because it almost always causes something to break). If sensors are in the RTL, then merely relinking the application with `libdce' provides new sensors. The requirement to relink (instead of recompile) also makes it easier to instrument other DCE services (CDS, Security) and middleware (Encina and CICS). Other issues also influenced this direction. First is a lack of control over the granularity of collection (all or nothing), and the resulting deluge of data that is generated (especially for all Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 102 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 clients) with a stub based architecture. (The scalability of this approach is unacceptably poor in a large environments.) The RTL is dynamically configurable to collect only the minimum amount of data that is requested. Finally, the need for pervasive support of this sensor requires a standard interface to sensors. Creating a standard performance interface to a stub is problematic. Because of these arguments, we have chosen a hybrid implementation of the standard sensors. Most are located in the RTL but some are located in the stubs to capture stub specific processing. 13.12.4. Threshold detection To minimize the amount of data transferred across the network counter and timer sensors, we support a threshold level detection mechanism. For example, a response time sensor set at the threshold would report data only when a user-configured threshold condition is TRUE (for example, when the maximum response time exceeds 20 seconds). In practice, we simplified the sensor implementation, and have the NPCS analyze the incoming data from the sensor to detect thresholds. This allows different PMAs to configure the same sensor with different threshold values, and still minimize the amount of data transported across the network. It is important to note that sensors report summarized data, thus the threshold detection is based on integrated values (mean, minimum or maximum) over a sampling interval. 13.12.5. Time units Two distinct timer sensors, each with a different granularity, were proposed: seconds and nanoseconds. This will provide sufficient resolution, and future growth for the next 5-7 years. Note that overflow concerns may require that sum-of-squares terms have a coarser granularity. To ensure timer resolution and efficient timestamp access, the spec defines a function that returns the time from the host OS with the proper granularity, and is implemented as efficiently as possible (this eliminates the problems with the POSIX `gettimeofday()' function). This implementation-specific routine is described in section 6.4. 13.12.6. Generic IDL pickling is used to support pass-thru sensors. This results in several advantages: (a) Allows for large unknown data structures. (b) Allows for adding sensors with arbitrary data without requiring a modification to this specification. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 103 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (c) Allows the observer to transmit data without knowing about the sensor's structure. The use of pickling results in several issues: (a) _Efficiency_ -- Don't slow down sensor reporting by adding overhead for pickling if the pickling is not necessary. The specification has provided a keyed union to allow for generic sensor data: long value, array of long values, and opaque bytes (may be used for pickling). (b) _Registering Metrics_ -- It has been proposed that the pickling information be sent across with the data. But, in order for the pickled data to be any use to the PMA, the PMA must have been compiled with the header file (probably output by an IDL compilation of sensor pickling functions). Thus the PMA must already have an idea of which custom metrics it plans to use. 13.13. Future Items for Specification The following items have been deferred for a future working group: (a) Investigate how to provide sensors for PC clients running Windows95 and NT. This may require supporting an interface to the Desktop Management Interface. (b) Histograms providing distribution frequencies for a monitored event. They are not supported in this version of the specification, but are a candidate for future support as a natural extension of the sensor information set. (c) Event tracing can provide explicit cause-and-effect for application characterization. Sensors to support this capability need to be investigated. (d) Resource accounting and charge-back are crucial management functions. This document describes a specification for measurement that can be extended to support resource accounting. We strongly recommend that a future resource accounting system NOT be designed with redundant measurement infrastructure, since this will only result in increased overhead. (e) There is a need to optimize the notification of reporting changes to the NPCS sensor registry. We debated between two alternatives: a mechanism that would create a version number for each unique version of the registry, and allow queries using a comparison version number; or a mechanism that would notify PMAs of modifications that impact the current configured sensors. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 104 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 (f) Multiple views of sensors: Although the instrumentation name space is organized in a hierarchy, there are may circumstances in which a consumer of instrumentation data will want to group the data in different ways. A system administrator might, for example, want to simultaneously observe the performance of all machines on which security daemons are running, or the CDS daemons which serve a particular clearinghouse. This specification does not provide an explicit mechanism for doing this; we believe that the definition and maintenance is a function best left to the individual performance management applications which will make use of the data this specification describes. At the same time, we hope that developers of performance management applications will develop common mechanisms for storing and transferring group definitions, so that users of different applications will be able to observe the same data with a minimum of manual re-configuration. (g) Extend `dms_pri_register_sensor()' to allow a process to specify its minimum data protection level, to automatically control the RPC data protection level used for PMA and NPCS communication. This feature eases system administration by allowing application clients or servers to establish the protection level during the development phase. (h) All DCE core services should be instrumented (the CDS metrics are described in RFC 32.0 [RFC 32]), to capture logical events and other service specific concerns. (i) The performance measurement interface should become part of a standard server management interface that is available for all DCE-based processes. (j) The authors' collective experience with previous projects has led them to conclude that software instrumentation is subject to the second law of thermodynamics: Over time, the instrumentation tends towards a more disordered state. This disorder is a result of defect repair and new functionality that changes the behavior of the instrumented software, and consequently the precise location (and meaning) of the instrumentation probe points. This has significant ramifications on maintaining the accuracy and the utility of instrumentation. To resolve this a validation suite to certify the instrumentation must be defined and implemented. A validation suite is required to ensure the correctness of the initial implementation of the sensors, and to provide a test case to demonstrate future correctness. Furthermore, an interoperability test for the interfaces is required to ensure interface compatibility. Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 105 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 14. ACKNOWLEDGMENTS This document is the result of many individuals who contributed their time and expertise. Rich Friedrich, Joe Martinka, Steve Saunders, Gary Zaidenweber, Tracy Sienknecht, Dave Glover (Hewlett-Packard Company). Dave Bachmann, Ellen Stokes, Robert Berry (International Business Machines, Inc.). Barry Wolman, Dimitris Varotsis, David Van Ryzin (Transarc). Sarr Blumson (CITI (Center for Information Technology Integration), University of Michigan). Art Gaylord (Project Pilgrim, University of Massachusetts). 15. REFERENCES [CMG] Computer Measurement Group -- Performance Management Working Group, _Requirements for a Performance Measurement Data Pool_, Revision 2.3, May 1993. [Laz] E. Lazowska, et al, _Quantitative System Performance_, Prentice Hall, Inc., Englewood Cliffs, NJ, 1984. [RFC 11] M. Hubbard, _DCE SIG Serviceability Requirements Document_, OSF DCE-RFC 11.0, August 1992. [RFC 32] R. Friedrich, _Requirements for Performance Instrumentation of DCE RPC and CDS Services_, OSF DCE-RFC 32.0, June 1993. [RFC 38] _DME/DCE Managed Objects Requirements Document_, OSF DCE-RFC 38.1, 1994 (to appear). [Rose] M. Rose, _The Simple Book -- An Introduction to Management of TCP/IP Based Internets_, Prentice Hall, Inc., Englewood Cliffs, NJ, 1991. APPENDIX A. dms_binding.idl [ version(2.2) ] interface dms_binding /* * This interface defines the data structures used to represent Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 106 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 * relationships between entities (sensors/processes/nodes) within * DMS. Some are "transparent", meaning that a user of that * structure can manipulate its contents. Some are "opaque", meaning * that only the creating entity can manipulate its contents. */ { /* TRANSPARENT BINDING TYPES */ typedef [string] unsigned char dms_string_t[]; typedef unsigned long dms_protect_level_t; /*see rpc.h*/ typedef [string] unsigned char dms_string_binding_t[]; /* OPAQUE BINDING TYPES */ typedef unsigned long dms_pma_index_t; typedef unsigned long dms_npcs_index_t; typedef unsigned long dms_process_index_t; typedef unsigned long dms_sensor_id_t; typedef struct dms_sensor_ids { unsigned long count; [size_is(count)] dms_sensor_id_t ids[]; } dms_sensor_ids_t; } APPENDIX B. dms_config.idl [ version(2.3), pointer_default(ptr) ] interface dms_config /* * This interface defines the sensor configuration data structures * for specifying the configuration of individual sensors. */ { import "dms_binding.idl", "dms_data.idl", "dms_status.idl"; const unsigned long dms_NO_METRIC_COLLECTION = 0; const unsigned long dms_THRESHOLD_CHECKING = 0x00000001; const unsigned long dms_COLLECT_MIN_MAX = 0x00000002; const unsigned long dms_COLLECT_TOTAL = 0x00000004; const unsigned long dms_COLLECT_COUNT = 0x00000008; const unsigned long dms_COLLECT_SUM_SQUARES = 0x00000010; const unsigned long dms_COLLECT_SUM_CUBES = 0x00000020; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 107 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 const unsigned long dms_COLLECT_SUM_X_TO_4TH = 0x00000040; const unsigned long dms_CUSTOM_INFO_SET = 0x80000000; typedef unsigned long dms_info_set_t; typedef struct dms_threshold_values { dms_datum_t lower_value; dms_datum_t upper_value; } dms_threshold_values_t; typedef union dms_threshold switch (boolean have_values) { case TRUE: dms_threshold_values_t values; case FALSE: ; } dms_threshold_t; typedef struct dms_config { dms_sensor_id_t sensor_id; dms_timevalue_t reporting_interval; /*0 == infinite*/ dms_info_set_t info_set; dms_threshold_t* threshold; error_status_t status; } dms_config_t; typedef struct dms_configs { unsigned long count; [size_is(count)] dms_config_t config[]; } dms_configs_t; } APPENDIX C. dms_data.idl [ version(2.2), pointer_default(ptr) ] interface dms_data /* * This interface defines the data structures that represent the * (sensor & attribute) data values communicated through DMS. */ { import "dms_binding.idl", "dms_status.idl"; typedef struct dms_opaque { unsigned long size; [size_is(size)] byte bytes[]; } dms_opaque_t; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 108 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef enum { dms_LONG, dms_HYPER, dms_FLOAT, dms_DOUBLE, dms_BOOLEAN, dms_CHAR, dms_STRING, dms_BYTE, dms_OPAQUE, dms_DATA_STATUS } dms_datum_type_t; typedef union dms_datum switch (dms_datum_type_t type) { case dms_LONG: long long_v; case dms_HYPER: hyper hyper_v; case dms_FLOAT: float float_v; case dms_DOUBLE: double double_v; case dms_BOOLEAN: boolean boolean_v; case dms_CHAR: char char_v; case dms_STRING: dms_string_t *string_p; case dms_BYTE: byte byte_v; case dms_OPAQUE: dms_opaque_t *opaque_p; case dms_DATA_STATUS: error_status_t status_v; } dms_datum_t; typedef struct dms_sensor_data { dms_sensor_id_t sensor_id; unsigned long count; [size_is(count)] dms_datum_t sensor_data[]; } dms_sensor_data_t; typedef struct dms_timevalue { unsigned long sec; unsigned long usec; } dms_timevalue_t; typedef struct dms_observation_data { dms_timevalue_t end_timestamp; unsigned long count; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 109 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 [size_is(count)] dms_sensor_data_t* sensor[]; } dms_observation_data_t; typedef struct dms_observations_data { unsigned long count; [size_is(count)] dms_observation_data_t* observation[]; } dms_observations_data_t; } APPENDIX D. dms_naming.idl [ uuid(5e542624-e9d6-11cd-a3a9-080009273eb9), version(2.2), pointer_default(ptr) ] interface dms_naming /* * This interface defines the data structures that represent the dms * namespace. There are two forms of names that can be represented, * a simple string only form and a fully decorated form. */ { import "dms_binding.idl", "dms_data.idl", "dms_status.idl"; typedef struct dms_name_node* dms_name_node_p_t; typedef struct dms_name_nodes { unsigned long count; [size_is(count)] dms_name_node_p_t names[]; } dms_name_nodes_t; typedef struct dms_name_node { dms_string_t* name; /*"*" == wildcard*/ dms_name_nodes_t children; } dms_name_node_t; typedef struct dms_attr { dms_string_t* attr_name; dms_datum_t attr_value; } dms_attr_t; typedef struct dms_attrs { unsigned long count; [size_is(count)] dms_attr_t* attrs[]; } dms_attrs_t; typedef struct dms_sensor { dms_sensor_id_t sensor_id; dms_attrs_t* attributes; unsigned short count; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 110 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 [size_is(count)] small metric_id[]; } dms_sensor_t; typedef struct dms_instance_leaf { unsigned long count; [size_is(count)] dms_sensor_t* sensors[]; } dms_instance_leaf_t; typedef struct dms_instance_node* dms_instance_node_p_t; typedef struct dms_instance_dir { unsigned long count; [size_is(count)] dms_instance_node_p_t children[]; } dms_instance_dir_t; typedef enum { dms_DIRECTORY, dms_LEAF, dms_NAME_STATUS } dms_select_t; typedef union dms_instance_data switch (dms_select_t data_type) { case dms_DIRECTORY: dms_instance_dir_t* directory; case dms_LEAF: dms_instance_leaf_t* leaf; case dms_NAME_STATUS: error_status_t status; } dms_instance_data_t; typedef struct dms_instance_node { dms_string_t* name; dms_datum_t* alternate_name; dms_instance_data_t data; } dms_instance_node_t; } APPENDIX E. dms_npmi.idl [ uuid(e8f6e46e-e9d7-11cd-be13-080009273eb9), version(2.2), pointer_default(ptr) ] interface dms_npmi /* * This interface defines the operations provided to a PMA by a NPCS. * The interface can by utilized by two styles of PMA, full-function * and client-only PMA. A full function PMA must support the * dms_npri interface, and can either have sensor data pushed to it, * or pull sensor data from a NPCS. The client-only PMA (COP) will * not support the dms_npri interface, and must pull sensor data from Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 111 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 * a NPCS. */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl", "dms_config.idl", "dms_naming.idl"; error_status_t dms_npmi_register_pma ( [in ] handle_t handle, [in,ptr] dms_string_binding_t* npri_binding, /*null == client-only PMA*/ [in ] dms_npcs_index_t npcs_index, [in ] dms_protect_level_t requested_protect, [ out] dms_pma_index_t* pma_index, [ out] dms_protect_level_t* granted_protect ); [idempotent] error_status_t dms_npmi_get_registry ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,ptr] dms_name_nodes_t* request_list, /*null == entire registry*/ [in ] long depth_limit, /*0 == infinity*/ [ out] dms_instance_dir_t** registry_list ); error_status_t dms_npmi_set_sensor_config ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,out] dms_configs_t** sensor_configs ); error_status_t dms_npmi_get_sensor_data ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in ] dms_sensor_ids_t* sensor_id_list, [in ] boolean bypass_cache, [ out] dms_observations_data_t** sensor_data ); error_status_t dms_npmi_unregister_pma ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index ); } Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 112 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 APPENDIX F. dms_npri.idl [ uuid(ee7599b2-e9d7-11cd-8e49-080009273eb9), version(2.2), pointer_default(ptr) ] interface dms_npri /* * This interface defines the operation provided to a NPCS by a PMA * to received sensor data from that NPCS. This interface is not * provided by a client- only PMA (COP). */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl"; [idempotent] error_status_t dms_npri_report_sensor_data ( [in ] handle_t handle, [in ] dms_npcs_index_t npcs_index, [in,ptr] dms_observations_data_t* sensor_data /*null == keep-alive*/ ); } APPENDIX G. dms_pmi.idl [ local, version(2.2) ] interface dms_pmi /* * This interface defines the operations provided to a NPCS by the * encapsulating library (npcs_lib). Additionally the operations * that must be provided to npcs_lib by a NPCS are specified. */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl", "dms_config.idl", "dms_naming.idl"; typedef [ref] error_status_t (*dms_pri_reg_proc_fp_t) ( [in] dms_string_t* process_name, [in] long process_pid, [out] dms_process_index_t* process_index ); typedef [ref] error_status_t (*dms_pri_reg_sensor_fp_t) ( [in] dms_process_index_t process_index, [in] dms_protect_level_t min_protect_level, [in,out] dms_instance_dir_t** sensor_register_list ); Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 113 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 typedef [ref] error_status_t (*dms_pri_report_data_fp_t) ( [in] dms_process_index_t process_index, [in] dms_observation_data_t* sensor_report_list ); typedef [ref] error_status_t (*dms_pri_unreg_sensor_fp_t) ( [in] dms_process_index_t process_index, [in] dms_sensor_ids_t* sensor_id_list ); typedef [ref] error_status_t (*dms_pri_unreg_proc_fp_t) ( [in] dms_process_index_t process_index ); /* * The following functions are needed to encapsulated the dms_pmi and * dms_pri interfaces in a library (npcs_lib). */ error_status_t dms_pmi_el_initialize ( [in ] dms_pri_reg_proc_fp_t pri_register_process, [in ] dms_pri_reg_sensor_fp_t pri_register_sensor, [in ] dms_pri_report_data_fp_t pri_report_sensor_data, [in ] dms_pri_unreg_sensor_fp_t pri_unregister_sensor, [in ] dms_pri_unreg_proc_fp_t pri_unregister_process ); error_status_t dms_pmi_el_free_outputs ( [in,ptr] dms_configs_t* sensor_config_list, /*null == absent*/ [in,ptr] dms_observation_data_t* sensor_report_list /*null == absent*/ ); /* * The following functions provide the basic dms_pmi functionality. */ error_status_t dms_pmi_set_sensor_config ( [in ] dms_process_index_t process_index, [in,out] dms_configs_t** sensor_config_list ); error_status_t dms_pmi_get_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list, [ out] dms_observation_data_t** sensor_report_list ); error_status_t dms_pmi_terminate ( void Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 114 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 ); } APPENDIX H. dms_pri.idl [ local, version(2.3) ] interface dms_pri /* * This interface defines the operations provided to an instrumented * process by the encapsulating library (observer_lib). Additionally * the operations that must be provided to observer_lib by an * instrumented process are specified. */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl", "dms_config.idl", "dms_naming.idl"; typedef [ref] error_status_t (*dms_pmi_set_config_fp_t) ( [in] dms_process_index_t process_index, [in,out] dms_configs_t** sensor_configs ); typedef [ref] error_status_t (*dms_pmi_get_data_fp_t) ( [in] dms_process_index_t process_index, [in] dms_sensor_ids_t* sensor_id_list, [out] dms_observation_data_t** sensor_report_list ); typedef [ref] error_status_t (*dms_pmi_terminate_fp_t) ( void ); /* * The following functions are needed to encapsulated the dms_pri and * dms_pmi interfaces in a library (observer_lib). */ error_status_t dms_pri_el_initialize ( [in ] dms_pmi_set_config_fp_t pmi_set_sensor_config, [in ] dms_pmi_get_data_fp_t pmi_get_sensor_data, [in ] dms_pmi_terminate_fp_t pmi_terminate ); error_status_t dms_pri_el_free_outputs ( [in,ptr] dms_instance_dir_t* sensor_register_list /*null == absent*/ ); Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 115 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 /* * The following functions provide the basic dms_pri functionality. */ error_status_t dms_pri_register_process ( [in ] dms_string_t* process_name, [in ] long process_pid, [ out] dms_process_index_t* process_index ); error_status_t dms_pri_register_sensor ( [in ] dms_process_index_t process_index, [in,out] dms_instance_dir_t** sensor_register_list ); error_status_t dms_pri_report_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_observation_data_t* sensor_report_list ); /*Note: return (status) may correspond to previous call!*/ error_status_t dms_pri_unregister_sensor ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list ); error_status_t dms_pri_unregister_process ( [in ] dms_process_index_t process_index ); } APPENDIX I. dms_status.idl [ version(2.4) ] interface dms_status /* * This interface defines the set of (resulting) status values for * all the operations and data structures defined in DMS. */ { import "dce/nbase.idl"; const error_status_t dms_STATUS_BASE = 0x114b2001; const error_status_t dms_STATUS_OK = error_status_ok; const error_status_t dms_NOT_IMPLEMENTED = dms_STATUS_BASE + 0; const error_status_t dms_UNKNOWN_SENSOR = dms_STATUS_BASE + 1; const error_status_t dms_UNKNOWN_PROCESS = dms_STATUS_BASE + 2; const error_status_t dms_UNKNOWN_INFO_SET = dms_STATUS_BASE + 3; Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 116 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 const error_status_t dms_UNKNOWN_THRESHOLD_LEVEL = dms_STATUS_BASE + 4; const error_status_t dms_UNKNOWN_NPCS = dms_STATUS_BASE + 5; const error_status_t dms_UNKNOWN_PMA = dms_STATUS_BASE + 6; const error_status_t dms_ILLEGAL_NAME = dms_STATUS_BASE + 7; const error_status_t dms_ILLEGAL_METRIC = dms_STATUS_BASE + 8; const error_status_t dms_ILLEGAL_SENSORID = dms_STATUS_BASE + 9; const error_status_t dms_ILLEGAL_VALUE = dms_STATUS_BASE + 10; const error_status_t dms_ILLEGAL_BINDING = dms_STATUS_BASE + 11; const error_status_t dms_SENSOR_CONFIG_CONFLICT = dms_STATUS_BASE + 12; const error_status_t dms_SENSOR_NOT_CONFIGURED = dms_STATUS_BASE + 13; const error_status_t dms_SENSOR_NOT_MODIFIED = dms_STATUS_BASE + 14; const error_status_t dms_DUPLICATE_SENSOR = dms_STATUS_BASE + 15; const error_status_t dms_NO_SENSOR_REQUESTED = dms_STATUS_BASE + 16; const error_status_t dms_NO_NPCS = dms_STATUS_BASE + 17; const error_status_t dms_NO_THRESHOLD = dms_STATUS_BASE + 18; const error_status_t dms_REPORT_FAILED = dms_STATUS_BASE + 19; const error_status_t dms_FUNCTION_FAILED = dms_STATUS_BASE + 20; const error_status_t dms_NOT_REGISTERED = dms_STATUS_BASE + 21; const error_status_t dms_REGISTER_FAILED = dms_STATUS_BASE + 22; const error_status_t dms_ALREADY_REGISTERED = dms_STATUS_BASE + 23; const error_status_t dms_PROTECT_LEVEL_NOT_SUPPORTED = dms_STATUS_BASE + 24; const error_status_t dms_BYPASS_NOT_ALLOWED = dms_STATUS_BASE + 25; const error_status_t dms_NO_OUTPUTS_FREED = dms_STATUS_BASE + 26; const error_status_t dms_CHECK_INTERNAL_STATUS = dms_STATUS_BASE + 27; const error_status_t dms_BAD_STATUS = dms_STATUS_BASE + 28; } AUTHORS' ADDRESSES Rich Friedrich Internet email: richf@hpl.hp.com Hewlett-Packard Company Telephone: +1-415-857-1501 1501 Page Mill Road, Mailstop 1U-14 Palo Alto, CA 94304 USA Steve Saunders Internet email: saunders@cup.hp.com Hewlett-Packard Company Telephone: +1-408-725-8900 11000 Wolfe Road, Mailstop 42U Cupertino, CA 95014 USA Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 117 OSF-RFC 33.0 DCE Performance Instrumentation July 1995 Gary Zaidenweber Internet email: gaz@ch.hp.com Hewlett-Packard Company Telephone: +1-508-256-6600 300 Apollo Drive Chelmsford, MA 01824 USA Dave Bachmann Internet email: bachmann@austin.ibm.com International Business Machines, Inc. Telephone: +1-512-838-3170 11500 Burnet Road, MS 9132 Austin, TX 78758 USA Sarr Blumson Internet email: sarr@citi.umich.edu CITI, University of Michigan Telephone: +1-313-764-0253 519 W William Ann Arbor, MI 48103 USA Friedrich, Saunders, Zaidenweber, Bachmann, Blumson Page 118