Open Software Foundation | R. Friedrich (HP) | |
Request For Comments: 33.0 | S. Saunders (HP) | |
July 1995 | G. Zaidenweber (HP) | |
D. Bachman (IBM) | ||
S. Blumson (CITI) |
Distributed systems offer advantages in flexibility, capacity, price-performance, availability and resource sharing. Distributed applications can provide user productivity improvements through ease of use and access to distributed data. However, managing applications in a distributed environment is a complex task, and the lack of performance measurement facilities is an impediment to large-scale deployment.
This document describes performance instrumentation and measurement interface specifications that support performance related tasks such as configuration planning, application tuning, bottleneck analysis, and capacity planning. These performance measurement capabilities are a necessary component of any commercially viable computer technology, and are currently insufficient in DCE.
Specifically, to provide high-level analysis software the data to compute correlated resource utilization across nodes in a network, this document describes the:
The guiding philosophy is to define a set of standardized performance instrumentation that is consistently collected, reported and interpreted in a heterogeneous environment. Furthermore, these measurement capabilities are compiled into the core DCE services for use at customer sites. To support pervasive instrumentation the instrumentation must have minimal overhead on applications and services.
A companion RFC, RFC 32.0, discusses the requirements for performance monitoring, the metrics that are of interest for performance analysis and performance management, and the instrumentation necessary to collect performance data [RFC 32]. Consequently, the requirements for instrumentation are not described in this document.
We recommend deployment of core instrumentation with the DCE Release 1.2 and then roll out additional instrumentation in later releases.
The following summarizes the minimum content for DCE release 1.2:
libdce
) instrumentation.
To ensure consistent meaning the following terms and concepts are defined for use in this document. A more detailed discussion of some of these concepts is found in later sections of this document.
Metrics define measurable quantities that provide data to evaluate the performance of a system under study. They may consist of raw information (such as events) or derived quantities such as statistical measures or rates. Examples are response time, throughput, and utilization. These metrics, and more, are described in detail in section 4.
Instrumentation are specialized software components incorporated into programs to provide mechanisms for measuring data that is used to calculate the relevant performance metrics. The basic measurement techniques are counting, timing and tracing. The objective of instrumentation is to provide measures of resource utilization (such as CPU, memory, I/O, network, etc.) and processing time (such as service time, queuing time, etc.). These measures are delivered to a performance monitor as statistical measures or as frequency and time histograms. From here on we will often refer to instrumentation as sensors.
Sensors are the logical instantiations of the instrumentation necessary to collect data for a particular, single metric. Sensors consist of aggregations of probes located at well defined probe points. Sensors contain internal state that satisfies the definition of a particular metric. For example, a response time sensor will consist of two probes (a begin-timer and end-timer probe) but appear to the user as a single, logical entity. In object-oriented language, the sensors are the objects that encapsulate the data and functions provided by the instrumentation primitives.
A conceptual model of a sensor is illustrated in Figure 1. A sensor is a software IC (integrated circuit) that has input, output and control functions. The input to a sensor is provided by an event measured by a probe. The sensor provides output data, internal error conditions, and registration data so that the sensor can be identified by the measurement system. A sensor is controlled by several functions, including initialization, getting data, and modifying the sensor configuration. A sensor maintains internal state such as its identification, statistical data, and possibly some small algorithms that support threshold, histogram and trace functions.
There are three types of sensors:
The first two sensor types support threshold detection to minimize data transmitted across the network by supplying data only when a user specified threshold criteria is met. All three sensor types support a fast-path option that does not set locks during sensor data update operations.
There are two categories of sensors, each of which support all three sensor types described above:
Since DCE application environments are multi-threaded, all sensors must be re-entrant (in the case of custom sensors, this is the application-programmer's responsibility). Sensors are described in detail in section 5.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
"\s10\fRSensor ID\fP" at 3.987,7.290 "\s10\fRStats\fP" at 3.987,7.103 "\s10\fRAlgorithms\fP" at 3.987,6.915 "\s10\fRExternal\fP" at 1.925,6.915 "\s10\fRData Source\fP" at 1.925,6.728 "\s10\fREvent_1\fP" at 1.925,8.040 "\s10\fR.\fP" at 1.925,7.853 "\s10\fR.\fP" at 1.925,7.665 "\s10\fR.\fP" at 1.925,7.478 "\s10\fREvent N\fP" at 1.925,7.290 "\s10\fII\fP" at 1.300,7.728 "\s10\fIn\fP" at 1.300,7.540 "\s10\fIp\fP" at 1.300,7.353 "\s10\fIu\fP" at 1.300,7.165 "\s10\fIt\fP" at 1.300,6.978 "\s10\fIO\fP" at 6.925,7.790 "\s10\fIu\fP" at 6.925,7.603 "\s10\fIt\fP" at 6.925,7.415 "\s10\fIp\fP" at 6.925,7.228 "\s10\fIu\fP" at 6.925,7.040 "\s10\fRActions\fP" at 3.987,6.728 "\s10\fIt\fP" at 6.925,6.853 "\s10\fIC o n t r o l\fP" at 3.987,9.228 dashwid = 0.050i line dashed from 3.362,6.513 to 3.362,7.513 to 4.612,7.513 to 4.612,6.513 box with .sw at (2.74,6.51) width 2.50 height 1.75 line -> from 2.237,8.075 to 2.737,8.075 line -> from 2.237,7.325 to 2.737,7.325 line -> from 2.237,6.825 to 2.737,6.825 line -> from 5.237,7.888 to 5.987,7.888 line -> from 5.237,7.388 to 5.987,7.388 line -> from 5.237,6.888 to 5.987,6.888 line -> from 3.237,8.825 to 3.237,8.262 line -> from 3.987,8.825 to 3.987,8.262 line -> from 4.737,8.825 to 4.737,8.262 "\s10\fRSensor\fP" at 3.987,8.040 "\s10\fIInternal State\fP" at 3.987,7.603 "\s10\fRRegistration\fP" at 6.362,6.853 "\s10\fRData\fP" at 6.362,7.853 "\s10\fRErrors\fP" at 6.362,7.353 "\s10\fRGet value\fP" at 3.987,8.915 "\s10\fRSet value\fP" at 4.737,8.915 "\s10\fRInitialize\fP" at 3.237,8.915
Figure 1. A sensor is conceptually illustrated here. A sensor can be thought of as a software IC that has input, control and output functions. In addition the sensor contains some internal state including sensor identifier, statistical metric data, metric computation algorithms, and other actions. The set of input, control and output functions are described in detail in sections 10 and 11.
Probes are the basic primitives from which sensors are constructed. Probes provide data input, control, and data access (output). For example, a probe might define the functions necessary to increment/decrement a counter. In general, probes do not contain local state, but only access global sensor data. (An exception is for timer probes, where the start-time must be maintained locally.) Probes are pre-defined as macros to ensure consistency in implementation of sensors and to ease instrumenting source code. The macro definitions are presented in section 6.
Probes provide input to a sensor. It is possible to place these probes in non-DCE services to obtain measures of interest (for example in the C library to collect data on sockets), but this spec focuses on DCE-based middleware and application software.
Probe points are the locations within a program's flow of
control where
significant event transitions occur, and are thus candidates for the
placement of probes. For example, when a client program issues an RPC a
state transition occurs from user code to runtime
library, and is an
excellent place for placing instrumentation software to record counts or
elapsed times. The use of probes placed at probe points to construct a
timer sensor is illustrated in Figure 2. Although the probe_point_B
shown there is within the same scope as functionN()
, it is not
restricted to the same scope as probe_point_A
.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
"\s10\fRsensor\fP" at 6.550,8.003 ljust "\s10\fRinput\fP" at 6.550,7.778 ljust "\s10\fI.\fP" at 1.488,7.978 ljust "\s10\fI.\fP" at 1.488,8.165 ljust "\s10\fI.\fP" at 1.488,7.790 ljust dashwid = 0.075i line dashed -> from 3.675,8.450 to 5.487,8.450 line dashed -> from 3.675,7.575 to 5.487,7.575 box with .sw at (0.99,6.95) width 3.19 height 2.12 "\s10\fRTo\fP" at 6.550,8.228 ljust box with .sw at (6.42,7.20) width 0.56 height 1.62 "\s10\fI/* probe_point_B */\fP" at 2.425,7.540 ljust "\s10\fRfunctionN();\fP" at 1.238,8.728 ljust "\s10\fRprobe_A;\fP" at 1.363,8.415 ljust "\s10\fRprobe_B;\fP" at 1.363,7.540 ljust "\s10\fRreturn;\fP" at 1.238,7.153 ljust "\s10\fREnd Timer\fP" at 5.612,7.540 ljust "\s10\fRBegin Timer\fP" at 5.612,8.415 ljust "\s10\fRSource\fP" at 1.113,9.228 ljust "\s10\fI/* probe_point_A */\fP" at 2.425,8.415 ljust
Figure 2. The implementation of a sensor timer is
illustrated in this figure for an arbitrary functionN()
. The
probes are located at the beginning and ending of the function. These
probe points provide input data into the sensor for starting and
stopping an elapsed time clock.
Requirements for different data capture granularities and subsets requires that the measurement system have a controllable capability to obtain only the required amount of data with minimum overhead. Consequently, we have defined varying data collection information sets to provide increasing detail in the collected data. This controls the detail (statistics) of the collected data. Under the best scenario there is no overhead incurred by the measurement system when no observations are required.
Increasing the size of performance information sets increases the number of data components of the collected data, providing a more comprehensive picture of operational behavior, but at the cost of increasing resource utilization. Information set control is done on a per-sensor, and not a per-process, basis.
Furthermore, for minimal overhead during continuous monitoring, metric thresholds are set, such that the measurement system will only report data when it exceeds the value of the specified thresholds. Minimizing resource consumption requires that filtering take place as close to the sensors as possible. This specification adopts the philosophy that the sensors themselves are simple and very efficient and that filtering tasks would complicate them needlessly. Consequently, filtering is done on the node, but by the NPCS.
Table 1 summarizes the sensor data information sets and their characteristics.
+-----------+-------------------------------------+----------------+ | Info Set | | New Statistics | | Value | Description | Per Metric | +===========+=====================================+================+ |0 | Minimum overhead, no data needed. | None. | +-----------+-------------------------------------+----------------+ |0x01 | Provides simple utilizations, usage | Counts, Simple | | | counts, error counts, mean times, | sums, Minimums,| | | mean rates ONLY if a user-specified | Maximums. | | | threshold has been exceeded. | | | | Otherwise, no data is returned from | | | | the NPCS. | | +-----------+-------------------------------------+----------------+ |0x02, | Provides simple utilizations, usage | Counts, Simple | |0x04, 0x08 | counts, error counts, mean times, | sums, Minimums,| | | mean rates. | Maximums. | +-----------+-------------------------------------+----------------+ |0x10 | Provides 2nd moments so that | Sum of squares.| | | analysis can yield variance. | | +-----------+-------------------------------------+----------------+ |0x20 | Provides 3rd moments so that | Sum of cubes. | | | analysis can yield skew. | | +-----------+-------------------------------------+----------------+ Table 1. Performance Information Sets
center, box, tab( | ); | |
cbw(1i) | cbw(3.5i) | cbw(1.5i) | ||
cb | ^ | cb | ||
l | l | l. | ||
Info Set | Description | New Statistics |
Value | Per Metric | |
= | ||
0 | Minimum overhead, no data needed. | None. |
_ | ||
0x01 | T{ | |
Provides simple utilizations, usage counts, error counts, mean times, | ||
mean rates ONLY if a user-specified threshold has been exceeded. | ||
Otherwise, no data is returned from the NPCS. | ||
T} | T{ | |
Counts, Simple sums, Minimums, Maximums. | ||
T} | ||
_ | ||
0x02, 0x04, 0x08 | T{ | |
Provides simple utilizations, usage counts, error counts, mean times, | ||
mean rates. | ||
T} | T{ | |
Counts, Simple sums, Minimums, Maximums. | ||
T} | ||
_ | ||
0x10 | T{ | |
Provides 2nd moments so that analysis can yield variance. | ||
T} | Sum of squares. | |
_ | ||
0x20 | T{ | |
Provides 3rd moments so that analysis can yield skew. | ||
T} | Sum of cubes. |
Event tracing is necessary to provide events in a time-ordered causal relationship. Due to scalability concerns and overhead in a production environment, this is not a part of the specification.
The reporting interval is the time interval, measured in seconds, over which metrics are collected and statistics are summarized and then reported. To minimize performance measurement overhead, single events are not collected. Rather, the sensors summarize data over a reporting interval (currently 5 seconds minimum), and only report interval statistics to the higher level performance monitor. This interval is adjustable to decrease collection overhead.
Support of threshold sensors can dramatically reduce the amount of data collected and transmitted through the network environment, since only exception cases are reported. This supports the management by exception philosophy of network management. Thresholds are defined on a per-sensor basis, with a minimum and maximum (or both) values (i.e., a range). The NPCS then processes incoming sensor data, and when a sensor's minimum is below the configured threshold the sensor data from this reporting interval is reported to the PMA (performance monitor application) at the next NPCS reporting interval. Supporting threshold detection in the NPCS simplifies the sensors and allows multiple PMAs to configure a specific sensor with different threshold values.
This document distinguishes the hardware from the software process for clients and servers. For the purpose of this paper, the physical hardware that clients and servers execute on is referred to as a network node. (Many management applications define a server as the hardware device that is providing the service. This is different from our definition).
A DCE client is a software process/thread executing on a particular network node, that makes RPC requests. This definition includes a custom-developed application that issues RPC requests to a DCE server, as well as a DCE system-level service making a request of another DCE server.
A DCE server is a
software process/thread executing on a particular network node that
receives (and usually responds to) RPC requests. This
definition includes system-level DCE services (such as the dced
) as
well as custom-developed application services. Note that a server in
this document is
a software process and not the physical hardware (see
definition of network node, above).
A performance monitor (or just monitor) is a process that provides on-going collection and reporting of performance data for evaluation by system managers, application designers and capacity planners. A specific instance of a monitor that also supports management functions is called a performance management application (PMA).
By the DCE Measurement System we mean the framework of of sensors, standard interfaces, and monitoring processes that initialize, control, access, and present performance data, as defined within this specification. Figure 4 in section 7 provides a block diagram for these components and their relationships.
The following processing elements are shown in figure 4:
The following standard interfaces are also shown in figure 4:
This section describes a vision of a performance measurement infrastructure that efficiently supports distributed application performance monitoring. It describes the need for a pervasive measurement infrastructure, the PMA presentation requirements, and the estimated design center impact.
The requirements for a distributed measurement system are described in detail in [RFC 32] and supplemented in section 3. The present section discusses a vision of a software system that realizes these requirements. The components of the measurement capability described later in this document satisfy the requirements of this vision of a monitor for distributed applications.
Performance instrumentation should provide data for various users and uses:
From these users' perspectives, different vendor solutions should converge to provide a seamless, single, logical view of the behavior of the distributed environment. This demands that a distributed measurement system must collect heterogeneous data from all vendor systems (nodes) and present it for analysis in a consistent manner. Therefore the specification of a distributed measurement system must define a common set of performance metrics and instrumentation to ensure consistent collection and reporting across heterogeneous platforms, define standard APIs to ensure pervasive support in heterogeneous environments, and utilize self-describing data to ensure accessibility, extensibility and customizability of the measurement architecture in heterogeneous environments.
For ease of use the measurement system should support concurrent measurement system requests with different configurations and sampling intervals, allow enabling/disabling the instrumentation on a running system without disrupting an active application environment, and support custom application-defined metrics and instrumentation. Collected data should also be accessible by third-party performance monitors and application clients.
A performance measurement system, although not a system management service in and of itself, is an important aspect of any system management capability. Therefore, the measurement system should converge wherever possible with relevant measurement standards and node-based measurement facilities. It should also provide a closed feedback loop, so that changes in a distributed application environment are evaluated using the data collected by the measurement system.
The measurement system should provide a correlated view of resource consumption across heterogeneous network nodes. It should also provide an infrastructure for integrating disparate performance measurement interfaces from the host operating system, networking, and major subsystems in the distributed systems infrastructure.
Figure 3 illustrates our notion of a measurement infrastructure that is closely integrated with a distribution infrastructure. Instrumentation (depicted by measurement meters) is dispersed throughout the software components. These components, when grouped in a logical manner, constitute a distributed application. The measurement system collects, transmits, reduces and correlates data from all relevant constituent components. These components include the distribution infrastructure (such as DCE), the host platform (an instrumented operating system such as HP-UX or AIX, or a non-instrumented operating system such as those found on PCs), other middleware components (such as Distributed Objects or Transarc's Encina transaction manager), as well as the application developed client and server code.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
line <-> from 4.050,7.388 to 4.050,7.263 to 4.175,7.263 line <-> from 4.050,7.138 to 4.050,7.263 to 3.925,7.263 circle at 5.300,8.512 rad 0.138 line <-> from 5.300,8.637 to 5.300,8.512 to 5.425,8.512 line <-> from 5.300,8.387 to 5.300,8.512 to 5.175,8.512 circle at 6.800,8.512 rad 0.138 line <-> from 6.800,8.637 to 6.800,8.512 to 6.925,8.512 line <-> from 6.800,8.387 to 6.800,8.512 to 6.675,8.512 circle at 6.800,7.763 rad 0.138 line <-> from 6.800,7.888 to 6.800,7.763 to 6.925,7.763 line <-> from 6.800,7.638 to 6.800,7.763 to 6.675,7.763 circle at 6.737,5.388 rad 0.138 line <-> from 6.737,5.513 to 6.737,5.388 to 6.862,5.388 line <-> from 6.737,5.263 to 6.737,5.388 to 6.612,5.388 circle at 5.050,7.263 rad 0.138 line <-> from 5.050,7.388 to 5.050,7.263 to 5.175,7.263 line <-> from 5.050,7.138 to 5.050,7.263 to 4.925,7.263 circle at 6.737,6.263 rad 0.138 line <-> from 6.737,6.388 to 6.737,6.263 to 6.862,6.263 line <-> from 6.737,6.138 to 6.737,6.263 to 6.612,6.263 line from 7.112,7.450 to 7.112,7.450 line <-> from 6.175,6.388 to 6.487,7.638 line <-> from 5.612,6.388 to 5.050,8.387 line <-> from 4.300,6.388 to 4.675,7.138 line <-> from 3.300,6.388 to 3.612,7.138 circle at 4.050,7.263 rad 0.138 line <-> from 1.738,6.200 to 2.987,6.200 "\s10\fIDistributed Applications\fP" at 4.987,9.165 line <-> from 6.362,5.513 to 6.112,6.013 box with .sw at (2.99,5.89) width 4.00 height 0.69 box with .sw at (2.99,5.08) width 1.75 height 0.56 box with .sw at (5.24,5.08) width 1.75 height 0.56 box with .sw at (2.99,7.01) width 1.31 height 1.00 line from 4.550,7.763 to 4.550,7.013 to 6.987,7.013 to 6.987,7.450 to 5.612,7.450 to 5.612,8.012 to 4.550,8.012 to 4.550,7.763 box with .sw at (5.86,8.26) width 1.12 height 0.62 line <-> from 5.800,6.388 to 6.300,8.387 box with .sw at (0.99,5.83) width 0.75 height 0.81 box with .sw at (5.74,7.51) width 1.25 height 0.50 box with .sw at (4.49,8.26) width 1.00 height 0.62 line from 0.988,6.388 to 1.050,6.200 to 1.113,6.450 to 1.113,6.325 to 1.175,6.263 to 1.238,6.513 to 1.300,6.138 to 1.363,6.388 to 1.425,6.138 to 1.425,6.388 to 1.488,6.263 to 1.488,6.513 to 1.550,6.388 to 1.613,6.200 to 1.613,6.388 to 1.675,6.263 to 1.738,6.388 "\s10\fISingle,\fP" at 1.113,7.228 ljust "\s10\fIIntegraged\fP" at 1.113,7.040 ljust "\s10\fIView\fP" at 1.113,6.853 ljust "\s10\fRDistribution and Measurement Infrastructure\fP" at 4.987,6.228 "\s10\fRPlatforms\fP" at 3.862,5.353 "\s10\fRPlatforms\fP" at 6.112,5.353 "\s10\fRObjects\fP" at 3.612,7.415 "\s10\fRDistrib.\fP" at 3.612,7.665 "\s10\fREncina\fP" at 5.050,7.665 "\s10\fRCICS\fP" at 6.425,7.728 "\s10\fRTerminal\fP" at 1.363,5.665 "\s10\fRClients\fP" at 4.987,8.665 "\s10\fRServers\fP" at 6.425,8.665
Figure 3. A measurement infrastructure for the performance monitoring of distributed applications. A well-designed measurement infrastructure should provide a centralized view of distributed objects and measure all aspects of the distributed application, not just the distribution infrastructure.
It is crucial to support a centralized view of the distributed application, regardless of the physical location of the components. For maximum flexibility, this centralized view is available from any node (assuming proper authorization). Finally, the instrumentation needs to provide a logical-to-physical mapping of the sensor names, as known by the user and stored by the measurement system.
The alternative to the approach illustrated in Figure 3 is to use several different performance tools, each running in a unique window, different for each platform in the network, presenting non-correlated and sometimes contradictory data. This approach is cumbersome, error-prone, inefficient, and ultimately useless, since distributed applications consist of interactions between logical groupings of software services. These logical groupings are impossible to capture and present without standardized instrumentation. Unfortunately, without standard performance instrumentation this is the only realizable alternative.
The efficiency of the infrastructure is important. If enabling performance monitoring excessively perturbs the environment then it is useless. The measurement system should minimize in-line overhead (the overhead in the direct dynamic path of the application) by deferring processing to outside of the application's direct path whenever possible. This technique still consumes CPU on the node, but minimizes the negative effect on application response time. Creating variable-size information sets (with increasing resource consumption) was described in section 1.2.6. Such variable information sets allow a person to dial in only the necessary monitoring data collection level (which minimizes overhead). A goal of the measurement system is to minimize network bandwidth consumed by the transmission of collected data. This is accomplished by summarizing data over intervals (instead of reporting every individual data item as it occurs), and supporting bulk retrieval interfaces.
Transmitted data may contain confidential information on application components or location, and requires a secure network communication channel to eliminate interception or modification.
In summary, standardized, pervasive performance instrumentation provides the following benefits:
The instrumentation and measurement system described by this RFC can provide data to support the following graphical and tabular presentation views of the PMA:
However, a PMA is not required to support all of these views, or only these views.
As we investigated the need and requirements for DCE performance instrumentation, we discovered that there exist several related activities and uses of performance data. How this specification incorporates these requirements is discussed in this section.
The following users of the performance instrumentation were identified.
Performance sensors should yield the critical information to enable dynamic control of a distributed application to improve its performance. Capacity planning and modeling are involved here as well since they utilize this data as input parameters.
A goal is to provide resource consumption data that accounting requires, to eliminate redundant collection mechanisms. This proposal is not intended to be a competitive or complete mechanism for all of accounting's needs. Some information is outside of the capabilities described in this paper. (e.g., a strict accounting of which client called which server method, and all the network, CPU, memory, and disk resources for that RPC).
Required for topology and application understanding. No event trace facility is provided by this proposal.
There will always be a role for lab tools, which by virtue of high
overhead on the system or proprietary low-level nature, are not
feasible in the production environment of an end-user. Lab tools will
continue to exist but this specification does not explicitly address
their requirements. However, this proposal does not preclude their use.
Tools built on top of this proposed infrastructure can be used in the
lab in providing basic information that is easily obtained (as does
vmstat()
and iostat()
serve in some internal
benchmarks for sanity checking).
The following are the basic requirements that we agreed are necessary for the success of this specification. When we ranked them, only a few were ranked less than MUSTS.
This specification does not aspire to recognize every sensor need that might ever be needed for distributed systems. As a result, the architecture must have extensibility as its core, to accommodate new sensors throughout its collection, naming, and display capabilities. As new applications are developed, middleware versions are released, or current runtime libraries are enhanced, the recognition of the need for additional sensors must be accommodated.
In the interests of operational efficiency, only the overhead associated with the currently required sensors should be imposed on the system. Even with a particular sensor, there needs to be the capability of providing simple sums or means when this information is sufficient, but also have the capability to supply higher statistical moments or distributions when necessary.
This requirement assures the DCE customer that his/her application is monitorable, independent of the hardware platforms on which it is running.
This requirement assures the DCE customer that his/her application is monitorable in a production system, since the architecture specification has strict guidelines to minimize overhead.
Sensors are more complex than simple counters. The architecture which prescribes their naming, organization and control is thereby critical to implementation and deployment.
Pervasive instrumentation also requires consistently defined metrics, so that valid operations can be performed on sensors implemented in a heterogeneous environment.
Provide user configurable access and data protection for sensor names and data.
Ensure that metrics are valid from release to release.
Ease access to performance data for new and legacy application and system management tools.
This section describes the metrics and statistics that guide the design and placement of performance instrumentation. Performance metrics are provided for a client perspective (end user) and for a server perspective. A detailed description of the sensors that collect these performance metrics is found in section 12.
The following metrics define the quantities and the notation that are used throughout the remainder of the document. The metrics and notation have been derived from [Laz].
.ds lL \(*l
.ds kK \s-3\dk\u\s+3
.ds mM \(mu
.ds dD /
In general, a metric with an annotation of k is for a particular resource k. Non-annotated metrics are for the system as a whole. The above non-annotated metrics can also be defined for a particular resource. For example, \*(lL\*(kK is the arrival rate of requests at resource k.
The following metrics are collected or derived from a client perspective:
The following metrics are collected or derived from a server perspective:
The instrumentation must provide analysis software with the data required to compute the following statistical quantities:
This section describes how sensors are named in the cell, and their high level functions. The macro primitives used to construct these sensors are described in section 6. This section focuses on the standard (default) sensors in the distribution infrastructure (i.e., DCE), and custom sensors usable by other middleware technologies and application developers.
This section describes the semantics and syntax of sensor naming.
Several terms are used in sensor naming and are described as follows:
interface_0
and its manager_operation_2()
operation.
Consequently, metrics are not dynamic, but instances are. The dynamic instances are those aspects that may not be known at process link or load time, such as interface (since a server can register and unregister interfaces) or fileset (since filesets can be moved between DFS servers). The sensor name should have the dynamic elements as the suffix to allow naming into SNMP MIBs.
The full name of a sensor consists of three parts:
The process name is used by the performance management application (the NPMI client) to locate the correct NPCS and tell it what sensors are of interest. The metric name and instance are converted by NPCS into the corresponding sensor identifier which is used to access the right sensor. The data structures that implement naming are described in section 7.3.2.
The process name identifies which process on which host is being queried. A process may have more than one name, e.g., a CDS server can be named by
/.:/hosts/dceperf.node101.osf.org/cds-server
as well as by
/.:/hosts/dceperf.node101.osf.org/perf-server/cdsd
or by
/.:/hosts/dceperf.node101.osf.org/perf-server/11345
A dfsbind
(client-side DFS helper) could be named as
/.:/hosts/orion.node42.osf.org/perf-server/dfsbind
or
/.:/hosts/orion.node42.osf.org/perf-server/14316
The process name is used by the NPMI client to bind to the appropriate NPCS, thus any naming scheme that can be used by DCE clients to bind to DCE servers will work for NPMI clients as well. For current DCE implementations, that is the DCE Cell Directory Service (CDS). In the future this may be Federated Naming or other schemes.
The names used to specify a particular process to the NPCS can be either process IDs or executable names. The process ID is guaranteed to be unique, but requires first somehow finding out the ID, either by querying NPCS or other means. It may not have meaning on some platforms. The program name is more user-friendly, but may not be unique, especially in the case of clients on multi-user machines. The process ID is also more suitable for use by numeric naming schemes such as SNMP.
Both the process name and service name allow for continuity in time despite server restarts. They also avoid the problem of recycling of process IDs by the OS.
The second part of the sensor name is the name of the particular metric
(e.g., rpc_calls
). The third part specifies the instance,
e.g., protocol or interface and manager.
A metric has only one name, which is specified in this section for standard sensors, and made public via some similar mechanism for custom sensors. To avoid collisions these start with a domain identifier, where domain is the name of the DCE-based service domain (e.g., Encina, DFS, User, DCE, Security, ...). These domains should be registered with the OSF and documented in an OSF-RFC.
The metric name has two forms, a human readable list of slash-separated names
(e.g., dce/packets-out/protseq
), and a dot-separated list of numbers
or object ID (OID) (e.g., 1.3.4
). These names are then
suffixed with the name identifying the instance, giving, say,
dce/packet-out/protseq/ncadg_ip_udp
and
1.3.4.1
.
It is expected that users will typically specify a sensor by the human-readable name, while programs are more likely to use the object ID notation amongst themselves. Also, when SNMP agents are mapping the metric namespace into the MIB, the OID for the sensor will be the name used in the MIB.
For efficiency, the data provided by a sensor is treated as atomic, and any subparts are not nameable. The entire set of data is accessed as a whole via both the PMI and NPMI.
This section describes functions supported by all sensors.
The fast-path option supports non-locking, to minimize update cost for those sensors where losing an update is considered acceptable. Note that this option cannot result in decreased reliability of a DCE process or service.
Selectable statistical levels are supported for each sensor, namely, the minimum, maximum, sum, mean, and variance are collected, based on the collection information set.
Selectable reporting interval allows modifying the interval (in seconds) that the sensor summarizes and reports data. Larger intervals reduce the amount of data transmitted across the network while reducing the granularity of the events measured. Summarization intervals will range from a minimum of 5 seconds to a maximum of 60 minutes.
Counters are 32-bits (unsigned). This provides support for an activity that executes at the rate of 1.19 million operations per second for a maximum summarization interval of 1 hour. Overflow is a concern only if the counters value wraps twice in a single summarization interval. This is not likely. Consequently, overflow will be handled by the PMAs, since the data is cumulative and can be extracted. Sensors do not have to worry about overflow.
Threshold detection and notification occurs for counter and timer sensors when a threshold condition is true. A threshold condition is a value range and a flag that specifies whether the threshold test should occur for values above or below this configured value. For example, a response time sensor set to detect thresholds would report data only when a user-configured threshold condition is true (for example, maximum response times are greater then 20 seconds). It is important to note that threshold detection is based on minimum or maximum values.
During a reporting interval, the minimum and maximum values are retained and returned. At the end of each reporting interval, the minimum and maximum are reset. This provides insights into the variation of the metric for a single interval (and not over the long term; it is a responsibility of the PMA to keep track of long term minimum and maximum behavior).
Histograms provide distribution frequencies for a monitored event. They are not supported in this version of the specification, but are a candidate for future support.
Standard and custom sensors register with NPCS using the data structures and functions described in sections 7.3.2 and 6.2.
Custom sensors also require a utility to load their specific metric attributes into the DCE CDS for use throughout the cell. This utility is not defined by the specification.
The specification defines a wide range of metric attributes that are described in detail in section 7.3.7.
Based on the client and server metrics described in sections 4.2 and 4.3, the following counter sensors are implemented for each client process and for each server RPC interface.
For each sensor the minimum, maximum, sum, mean, and variance are collected based on the collection information set.
This measures the client's total RPC throughput rate as determined by the number of successful completions of client RPC requests per unit time.
Collect the data to compute the following:
Note that throughput is a rate. The sensor keeps track only of request completions, thus higher-level software must divide this by the current measurement interval to compute the rate.
This measures the frequency of client requests. Collected for each RPC server interface invoked by the client.
This metric measures the number of packets sent by the client, and should be collected per protocol sequence (i.e., the number of packets passed to the network transport -- not necessarily the number of network packets). Collected for each RPC server interface invoked by the client.
This metric measures the number of packets received by the client, and should be collected per protocol sequence (i.e., the number of packets passed to the network transport -- not necessarily the number of network packets). Collected for each RPC server interface invoked by the client.
This metric provides information about the size of the data transferred from the client to the server. Collected for each RPC server interface invoked by the client.
This metric provides information about the size of the data transferred from the server to the client. Collected for each RPC server interface invoked by the client.
This information, although not a performance metric properly so-called, provides insight into the operational environment, and whether error conditions might be causing performance problems.
Count the number of DCE thread lock requests that could not be satisfied, and so resulted in thread waits. Note that the lock path is a high-frequency, performance-critical path, and extra care must be employed to instrument it without resulting in a performance degradation.
Count the number of NSI (or, perhaps in the future, XFN) binding look-ups and imports. Collected for each RPC server interface invoked by the client.
Count the number of NSI (or XFN) entities returned from look-ups and imports. Collected for each RPC server interface invoked by the client.
This measures the server's total RPC throughput rate, as determined by the number of successful completions of client RPC requests per unit time.
Collect the data to compute the following:
Note that throughput is a rate. The sensor keeps track only of request completions, thus higher-level software must divide this by the current measurement interval to compute the rate.
This metric measures the number of packets sent by the server for all clients.
This metric should count packets sent by the server including nested RPCs sent to other servers. Collected for each RPC server interface.
This metric measures the number of packets received by the server for all clients.
This metric should count packets received by the server including nested RPCs received from other servers. Collected for each RPC server interface.
This metric provides information about the size of the data transferred from the server to the client. Collected for each RPC server interface.
This metric provides information about the size of the data transferred from the client to the server. Collected for each RPC server interface.
This metric provides information about the queue length of RPC calls at the server, due to a lack of available call threads. This differs from calls queued (see next item), by providing a distribution of queue length.
This metric provides information about the number of RPC calls that were queued at the server, due to a lack of available call threads. This differs from queue length (see previous item) by providing only a count of calls queued.
This metric provides information about the utilization of the server's thread pool, by counting the number of active (non-idle) threads.
This information, although not a performance metric properly so-called, provides insight into the operational environment and whether error conditions are causing performance problems.
Count the number of DCE thread lock requests that could not be satisfied, and resulted in thread waits. Collected for each RPC server interface.
The following custom sensors are available to the application developer to use for specific application events.
This measures the total count of an application-specified event during the previous measurement interval.
Based on the client and server metrics described in sections 4.2 and 4.3, the following timer sensors are implemented for each client process and for each server RPC interface.
For each sensor the minimum, maximum, sum, mean, and variance are collected based on the collection information set.
This measures the total elapsed time, including server processing time and delay/queueing, for a client routine that invokes a particular DCE server.
Collect the following data:
Measure the elapsed time per RPC call, from the time the client's runtime initiates the call until the last packet has been received by and unmarshalled at the client. This should include nested RPC call elapsed times if other DCE servers, such as the security service, are invoked (the nested RPC call time is optionally broken out). RPCs that result in DCE errors should be reported in a separate category, not included in this one.
Note that this time will not include client application or user interface response time, since those are outside (above) the scope of the DCE services.
This measures the service requirement at the client, including operating system and network software CPU processing time, required to satisfy a client's RPC request. This request may consist of multiple RPC packets, but only one RPC call. This requires that the host operating system support a performance measurement system and that DCE servers use it to gather CPU service time. The implementation of this sensor is thus host OS dependent. Data is collected on a per-server interface.
This measures the marshalling time of RPC parameters at the client required to satisfy a client's RPC request. Data is collected on a per server interface.
This measures the unmarshalling time of RPC parameters at the client required to satisfy a client's RPC request. Data is collected on a per server interface.
This measures the delay of the network between a particular client and server node, as measured between client and server runtime libraries. Consequently, it measures the latency of the networking software transport, in addition to the physical network wire. The data is collected per transport protocol sequence. (DTS may already capture this DCE ping time, and if so, then it should be used.)
This measures the total elapsed time, including server processing time and delay/queueing, required for the server to satisfy a client request.
Collect the following data:
Measure the elapsed time per RPC call, from the time the server runtime receives the call until the last packet has been marshalled by the server and sent. This should include nested RPC call elapsed times if other DCE servers, such as the security service, are invoked (the nested RPC call times are optionally broken out). RPCs that result in DCE errors should be reported in a separate category, not included in this one.
Note that the elapsed time does not begin to accumulate until a thread from the call-thread pool is dispatched on behalf of this incoming request; consequently, this does not include call-thread queueing time prior to the first call thread dispatch. This queuing time is collected by the initial queuing time at the server.
This measures the queueing time of an incoming RPC request if no call-thread is available to dispatch. See residence time (previous item) for complementary elapsed measure time.
This measures the service requirement at the server, including operating system and network software CPU processing time, required to satisfy a client's request. This request may consist of multiple RPC packets, but only one RPC call. This requires that the host operating system support a performance measurement system and that DCE servers use it to gather CPU service time. Data collected on a per-server-interface-operation basis.
This measures the marshalling time of RPC parameters at the server required to satisfy a client's RPC request. Data collected on a per-server interface operation basis.
This measures the unmarshalling time of RPC parameters at the server required to satisfy a client's RPC request. Data collected on a per-server-interface-operation basis.
This measures the interarrival time of incoming RPC requests. Data collected on a per-server-interface-operation basis.
The following custom sensors are available to the application developer to use for specific application events.
This measures the total elapsed time, including processing time and delay/queueing, for an event as determined by the application developer.
Custom sensors can be defined that pass opaque data through the measurement systems. These sensors merely copy data from existing internal data structures. These sensor data types are opaque, and require pickling routines for support which are supplied at sensor registration time.
The DCE 1.1 IDL compiler supports pickling, i.e., support for encoding and decoding data types to and from a byte stream format. A sensor may take advantage of this pickling process to encode data into the opaque array of bytes, which it is able to transmit via the standard interfaces. This allows sensors to be created with elaborate data types and provides a mechanism for that data to be marshalled.
Collecting host-specific resource consumption (such as service demand) requires accessing the host operating system's measurement system. Specifically, each DCE host's operating system should provide the following application specific metrics via a standard interface:
These host OS performance metrics can be reported by the observer as a process global metric. The X/Open DCI [CMG] is a good candidate to provide a standard interface to operating system measures. If the host OS does not support the DCI, then these sensors will require porting to the proprietary OS measurement interface.
This section describes the macros that are used at various probe points to construct sensors. These software probes, implemented as a set of macros, implement each sensor to ensure consistency and decrease implementation time for DCE developers and application writers.
During process initialization, various process-wide sensors, such as
rpc_call_thread_utlization
and
rpc_queue_utilization
, are initialized
and registered with the observer, using the functions in section 6.2.
The sensors associated with specific server interface operations are not
registered until the server registers this interface via the RTL call to
rpc_server_register_if()
. Probes defining these sensors are
located in
the execution path of the RPC and store their data into a structure that
travels with the RPC call. At the end of the call, after the call
response has been sent to the client, all probe data is tallied, and
the global sensor data structure is updated. Some sensors are updated
directly by the probe that executes during the event being sensed.
When the observer thread executes it checks for entries on its tally queue and updates those sensors. Then it searches the lists of registered sensors and builds a batch of updates to send to the PRI.
Functions for registering, unregistering sensors and queueing sensor data are described in this section.
/* These function-pointer definitions allow a subsystem * designer to provide callbacks to the observer for * controlling a subsystem and its sensors. The functions * which are referenced must be re-entrant, as the code * updating the sensors and/or subsystems from the * middleware/application will be asynchronous from the * observer. Each function defines a pointer to a control * block defined by the function writer as an [in] parameter, * and a 32-bit DCE format status value as an [out] parameter. * These may be passed in as NULL values, but this will prevent * any control information from being passed back up to the * subsystem/sensor from PMAs. */ typedef void (*dms_subsys_ctl_fn_t) (void *ctlblock, unsigned32 *st); typedef void (*dms_sensor_ctl_fn_t) (void *ctlblock, unsigned32 *st); typedef void (*dms_data_pickle_fn_t) (void *data, unsigned32 *st); /* This structure contains information about a subsystem which * the observer may use to construct its persistent storage -- * it's patterned from the information needed for an RPC * interface, but may be used for any type of subsystem defined * by a middleware or application designer. Note the * presumption that all operations have the same properties and * are instrumented with the same number of sensors per * operation. This functionality is for batching registrations. * Sensor registration may be performed individually. * * The array of sensor descriptors is defined with dimension 1 * to accommodate certain compiler limitations. Nonetheless, * the array may be allocated at any size. For example, one * may allocate an appropriately sized subsystem descriptor * with the following malloc call: * * ssd = (dms_subsys_descriptor_p_t) malloc ( * (size_t) (sizeof(struct dms_subsys_descriptor) + * (n_ops * sizeof struct dms_sensor_descriptor) * )); * * The array does not need to be null-terminated. */ typedef struct dms_subsys_descriptor { uuid_t subsys_uuid; void *subsys_handle; dms_subsys_ctl_fn_t ctl_fn; int n_ops; int n_sensors_per_op; char *subsysname; dms_sensor_descriptor_t sensors[1] } dms_subsys_descriptor_t, *dms_subsys_descriptor_p_t; /* This structure contains information about individual sensors * which the observer needs to construct its persistent storage * of sensor data and for registering sensors through the PRI. * These structures may be chained into the sensors field of * the subsystem descriptor to batch sensor registrations. * * The following fields may be set to 0 (or NULL) to disable * the respective functionality: * ctl_fn * millisec * attrs */ typedef struct dms_sensor_descriptor { uuid_t sensor_id; void *sensor_handle; int op_num; dms_sensor_ctl_fn_t ctl_fn; char *sensorname; int millisec; /* sampling interval; 0 if event-sampled */ dms_data_descriptor_p_t sensor_data; void *attrs[dms_HIGHEST_ATTRIBUTE] } dms_sensor_descriptor_t, *dms_sensor_descriptor_p_t; /* The following structure is for describing a sensor's data * format. */
.be 5
typedef struct dms_data_descriptor { size_t datasize; void *data; dms_data_pickle_fn_t data_fn } dms_data_descriptor_t, *dms_data_descriptor_p_t; /* For registering interfaces or custom subsystems. */ void dms_obs_register_subsys ( dms_subsys_descriptor_t *subsys, void **subsys_handle, unsigned32 *st ); /* Opposite of register_subsys. */ void dms_obs_unregister_subsys( void *subsys_handle, unsigned32 *st ); /* For registering sensors. */ void dms_obs_register_sensor( dms_sensor_descriptor_t *sensor, void *subsys_handle, void **sensor_handle, unsigned32 *st ); /* Opposite of register_sensor. */ void dms_obs_unregister_sensor( void *sensor_handle, unsigned32 *st ); void dms_obs_queue_data( void *sensor_handle, dms_sensor_descriptor_t *sensor, unsigned32 *st );
This section describes the probe macros used to create sensors. For each macro, only the function signature (pseudo-prototype) is provided. The macro body has been excluded in the interest of brevity. Note that the sensor data location is passed into each relevant macro.
/* Utility functions: Zero-out the values in a timestamp * Pseudo-prototype: * void DMSTIMEZERO(struct dms_timestamp *); */ /************************************************************** * For those cases where interval times are deemed more * appropriate, the following data and macro definitions may be * used. */ /* An interval timer data structure allows preservation of both * begin and end timestamps, returning the interval in a new * timeval structure. */ typedef struct dms_itimer { struct dms_timestamp intervalstart; struct dms_timestamp intervalstop; struct timeval interval; } dms_itimer_t; /* Start interval timer * Pseudo-prototype: * void DMS_INTERVALSTART(struct dms_itimer); */ /* Stop interval timer, and calculate wallclock time * Pseudo-prototype: * void DMSINTERVALEND(struct dms_itimer); */ /************************************************************** * Counter and MIN/MAX Probe Data structures */ /* Counter element. */ struct dms_probe_cnt { long counter; /* local value maintained by probe */ }; /* Minimum/Maximum element. */ struct dms_probe_mm { int reset; /* reset command from sensor */ unsigned long value; /* value maintained by probe */ unsigned long *datum; /* ptr to comparison datum */ }; /* A pass-through probe datatypes: to be used for sensing * counters and/or timers (in gettimeofday() format) and/or * amorphous data chunks maintained elsewhere. */ struct dms_probe_vpt { unsigned long localval; /* local value maintained by probe */ unsigned long *value; /* pointer to value fetched by probe */ }; struct dms_probe_tpt { struct timeval localval; /* local value maintained by probe */ struct timeval *value; /* pointer to value fetched by probe */ }; /************************************************************** * Counter Probe. * * This probe will add any value to its counter. The second * argument may be a reference to a delta value maintained * elsewhere or to a constant. */ /* Pseudo-prototype: * void CNTPINIT(struct probeCounter A); */ #define CNTPINIT(A) (A).counter = 0; /* Pseudo-prototype: * void CNTPROBE(struct probeCounter A, long valp); * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */ /* Minimum/Maximum probes. * * These probes store the minimum [maximum] value of their * current value and a value stored elsewhere at the time they * execute. * * They are implemented to allow resetting. The process for * resetting utilizes a "reset flag" in the probe structure. * When the controlling thread, usually the observer or a * thread under it's control, wants to reset the probe, it * unconditionally writes a non-zero value to the reset flag. * When the probe actually executes it checks this flag for * non-zero and branches based on its value: * If zero, it executes the minimum [maximum] function. * If non-zero, it sets the data value to the current value * of the data and then clears the reset flag. Once the * reset flag is clear, the controlling thread may consider * the data valid again. * This procedure is designed to minimize exposure to a case of * multiple threads trying to write data to the value location, * resulting in lost data. */ /* Pseudo-prototype: * void MAXPINIT(struct probeMinMax A, long *datp); * `datp' points to a long which is the comparison value in * this and the following probes. */ /* Pseudo-prototype: * void MAXPINIT(struct probeMinMax A, long *datp); */ /* Pseudo-prototype: * void MAXPRESET(struct probeMinMax A); */ /* Pseudo-prototype: * void MINPRESET(struct probeMinMax A); */ /* The minimum probe will store the minimum of its present * value and the datum it is sensing to its value. The maximum * probe simply reverses the comparison clause of the ternary * operation. The value is an unsigned long, the datum is a * pointer to unsigned long. */ /* Pseudo-prototype: * void MAXPROBE(struct probeMinMax A); * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */ /* Pseudo-prototype: * void MINPROBE(struct probeMinMax A); * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */ /* Pseudo-prototype: * void PASSPROBE(dms_probe_vpt) * The function of this probe macro is to snapshot a dynamic * value stored outside the context of the DMS to a local value * in order to lessen concurrency issues and hopefully provide * more stable readings. Its use is not mandatory. * * This macro should work fine for either value or time * pass-throughs. * * This probe may need to be protected by an appropriate mutex, * but is often used in conjunction with another probe also * needing the same mutex lock. Therefore, the code * instantiating this macro is responsible for explicitly * locking and unlocking the appropriate mutex if desired. * RPC_MUTEX_[UN]LOCK((X)->m); */
Timestamps play a crucial role in instrumentation but can also have high overhead. To resolve this the specification has defined several high-speed timer functions.
/************************************************************** * TIME functions. * * The DCE runtime maintains a correlation between of the value * returned by dms_gettime() with that returned from * gettimeofday(). The clocks should be presumed to be stable * and accurate and to remain exactly correlated over the * periodic re-correlation interval. The re-correlation * interval should be a fairly small fraction of the * dms_gettime() wrap interval. For instance, a 200 MHz * machine for which the time is maintained as a 32-bit value * of system clock ticks will wrap in about 20 seconds. * * We recommend a re-correlation interval of 5 seconds. This * should be a small enough fraction of the wrap time, yet * infrequent enough to avoid unnecessarily increasing the * gettimeofday() overhead. */ #include <limits.h> /* The following should be available from <limits.h>. */ #ifndef ULONG_MAX # define ULONG_MAX 0xFFFFFFFFUL #endif #ifndef UINT_MAX # define UINT_MAX 0xFFFFFFFFU #endif #ifndef INT_MAX # define INT_MAX 0x7FFFFFFF #endif #define USEC_PER_SEC 1000000 typedef unsigned long dms_time_offset_t; typedef struct dms_timestamp { struct timeval base_wallclock; dms_time_offset_t base_ticks; dms_time_offset_t current_ticks; } dms_timestamp_t; /************************************************************** * DMS_TIMESTAMP() retrieves the information necessary for * computing an accurate timestamp (later) without calling * gettimeofday() inline. It is structured to preserve the * information which will be required for later, out-of-line * calculation of time intervals. This macro must be passed a * valid pointer to struct dms_timestamp. * Pseudo-prototype: * void DMS_TIMESTAMP(struct dms_timestamp *); */ /************************************************************** * DMS_TICKS_TO_USEC() converts system-clock ticks to * microseconds. This macro must be passed a valid * dms_time_offset_t. It is not normally invoked directly by * user code. * Pseudo-prototype: * unsigned long DMS_TICKS_TO_USEC(dms_time_offset_t); */ /************************************************************** * DMS_TS_TO_TV() converts the time stored in a dms_timestamp * structure to the format of timeval. Both input pointer * parameters must be valid. It is not normally invoked by * user code. * Pseudo-prototype: * void DMS_TS_TO_TV(struct dms_timestamp *, struct timeval *) */ /************************************************************** * DMS_SUB_TIME() returns the difference between two timestamps * into a timeval structure * Pseudo-prototype: * void DMS_SUB_TIME( * struct dms_timestamp *, * struct dms_timestamp *, * struct timeval *); * If the timestamp for the end time is earlier than the * timestamp for the begin time, this macro will compute a * negative interval which may cause problems. Therefore, the * caller must check for the error condition (negative seconds * field -- the microseconds field is unsigned). */ /* DMS_GETTIMEOFDAY() fills in a struct timeval with the "real, * current" wallclock time without calling gettimeofday(). * Pseudo-prototype: * void DMS_GETTIMEOFDAY(struct timeval *); * This macro requires a valid pointer-to-struct-timeval. */ /************************************************************** * dms_gettime_int() is a fast, implementation-specific * function which returns an unsigned long with a * machine-dependent resolution. Each implementor must provide * this system-specific function and the conversion factor * specifying the relationship of this number to a standard * time unit such as seconds or microseconds. */ extern dms_time_offset_t dms_gettime_int(void);
To achieve pervasiveness in a heterogeneous environment, the measurement system must support standardized interfaces that support access and control of both server and client sensors. This section provides an overview of the standard application programming interfaces (API), data structures and related capabilities. The official IDL files are located in appendices, and they supercede the discussion in this section.
The standard interfaces of this spec provide the framework for inter-node and intra-node DCE performance instrumentation control and data transfer. Four APIs provide for the relationships diagramed in Figure 4 for each node in a DCE cell.
These four interfaces are the:
There are two categories of APIs. First, there are APIs at the DCE process level (the PMI and PRI); second, APIs at the node (machine) level (the NPMI and NPRI). The NPMI and the NPRI are used by the PMA developer. The PMI and PRI are used by the DCE vendor and the NPCS developer.
The NPMI provides the interface between the NPCS and any Performance Management Applications (PMA's) that wish to access DCE performance instrumentation. The PMI provides the interface between the NPCS and DCE client and server processes. These processes contain the performance instrumentation, sensors. Basically PMA's use the NPMI to discover and request/receive data from sensors. The NPCS uses the PMI to gain knowledge of DCE client and server processes, control the configuration of sensors, and receive data from sensors.
The NPMI and NPRI interfaces are RPC interfaces to leverage security and naming features of DCE. The PMI and PRI are node-local and can use any relevant IPC mechanism, including RPC, implemented in the encapsulated library described in section 7.2.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
circle at 6.513,7.825 rad 0.087 line -> from 1.675,8.387 to 2.862,8.387 line -> from 3.300,7.950 to 2.050,7.950 line -> from 5.800,7.950 to 4.550,7.950 box with .sw at (2.42,6.58) width 4.56 height 2.94 line from 6.237,7.388 to 6.050,7.763 line from 6.300,7.388 to 6.487,7.763 line -> from 4.175,8.262 to 5.425,8.262 box with .sw at (0.99,8.01) width 0.69 height 0.50 box with .sw at (3.30,7.83) width 0.88 height 0.56 box with .sw at (5.80,7.64) width 1.06 height 0.87 circle at 6.062,7.875 rad 0.087 dashwid = 0.050i line dashed from 1.675,8.137 to 2.050,8.137 to 2.050,7.513 to 1.675,7.513 to 1.675,8.012 "\s10\fRPRI\fP" at 4.362,7.603 line dashed from 3.300,8.387 to 3.300,8.762 to 2.862,8.762 to 2.862,8.200 to 3.300,8.200 line dashed from 4.175,8.012 to 4.550,8.012 to 4.550,7.388 to 4.175,7.388 to 4.175,7.888 line dashed from 5.862,8.512 to 5.862,8.762 to 5.425,8.762 to 5.425,8.200 to 5.800,8.200 "\s10\fRPMA\fP" at 1.363,8.228 "\s10\fRNPCS\fP" at 3.737,8.103 "\s10\fRsensors\fP" at 6.300,7.290 "\s10\fRNPRI\fP" at 1.863,7.728 "\s10\fRPMI\fP" at 5.612,8.540 "\s10\fRDCE Client\fP" at 6.300,8.290 "\s10\fRor server\fP" at 6.300,8.103 "\s10\fRNPMI\fP" at 3.062,8.553
Figure 4. NPRI, NPMI, NPCS, PMI, PRI and sensor relationships.
Also, the NPCS is shown as an independent mechanism. Whether it is an independent process or part of another process is implementation-specific.
The NPMI, PMI, NPCS, and sensors exist and operate to provide PMA's with DCE performance instrumentation in the manner described below. The PRI and NPRI provide the communication channel to efficiently return sensor data to the PMA using a push protocol.
During the steady-state, runtime sensors collect specific metrics within the DCE environment whenever a thread executes their set of probes. Probes are the (inline) code sequences that capture the data needed to produce a metric, e.g., timestamps for a response time metric. This relationship is illustrated in Figure 5. During the execution of a distributed application, the flow of control passes from the client code into the client stub into the DCE runtime library (RTL1), possibly across a network, into the DCE runtime library (RTL2), into the server stub and into the server code. The thread of execution returns in a reverse manner. As it passes through RTL2 it encounters two probes, a begin-response-probe and an end-response-probe. After it passes through the end-response-probe the appropriate sensor is located and updated.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
"\s10\fRRT\fP" at 5.612,6.040 "\s10\fIBegin\fP" at 3.362,8.165 ljust "\s10\fIProbe\fP" at 3.362,7.978 ljust "\s10\fIEnd\fP" at 4.612,8.165 rjust "\s10\fIProbe\fP" at 4.612,7.978 rjust circle at 5.612,6.138 rad 0.438 box with .sw at (3.61,6.14) width 0.75 height 0.38 box with .sw at (0.99,9.89) width 0.75 height 0.38 box with .sw at (3.11,6.76) width 0.75 height 0.38 box with .sw at (2.61,7.39) width 0.75 height 0.38 box with .sw at (1.49,9.26) width 0.75 height 0.38 box with .sw at (1.99,8.64) width 0.75 height 0.38 box with .sw at (6.24,9.89) width 0.75 height 0.38 box with .sw at (5.74,9.26) width 0.75 height 0.38 line -> from 1.613,9.887 to 1.613,9.637 line -> from 2.112,9.262 to 2.112,9.012 line -> from 3.237,7.388 to 3.237,7.138 line -> from 3.737,6.763 to 3.737,6.513 box with .sw at (4.11,6.76) width 0.75 height 0.38 box with .sw at (4.61,7.33) width 0.75 height 0.38 line -> from 4.237,6.513 to 4.237,6.763 line -> from 4.737,7.138 to 4.737,7.325 box with .sw at (5.24,8.64) width 0.75 height 0.38 "\s10\fRSensor\fP" at 5.612,6.228 line -> from 5.862,9.012 to 5.862,9.262 "\s10\fRtimestamps\fP" at 5.737,7.103 line -> from 6.362,9.637 to 6.362,9.887 dashwid = 0.050i line dashed -> from 2.675,8.637 to 2.675,7.763 line dashed -> from 5.300,7.700 to 5.300,8.637 box with .sw at (1.74,5.64) width 4.50 height 2.75 dashwid = 0.037i line dotted -> from 3.300,7.950 to 5.237,6.388 line dotted -> from 4.862,7.950 to 5.425,6.575 line from 4.925,6.638 to 5.612,7.013 to 5.612,7.013 to 5.362,6.763 dashwid = 0.050i line dashed -> from 0.988,6.138 to 1.738,6.138 line from 3.300,7.763 to 3.300,8.325 to 3.112,8.325 to 3.112,7.763 line from 4.862,7.700 to 4.862,8.262 to 4.675,8.262 to 4.675,7.700 "\s10\fRNetwork\fP" at 0.988,6.228 ljust "\s10\fIProcess address space\fP" at 1.863,5.790 ljust "\s10\fRcstub\fP" at 1.863,9.415 "\s10\fRRTL\s-3\d1\u\s+3\fP" at 2.362,8.790 "\s10\fRclient\fP" at 1.363,10.040 "\s10\fRclient\fP" at 6.612,10.040 "\s10\fRcstub\fP" at 6.112,9.415 "\s10\fRRTL\s-3\d1\u\s+3\fP" at 5.612,8.790 "\s10\fRRTL\s-3\d2\u\s+3\fP" at 2.987,7.540 "\s10\fRRTL\s-3\d2\u\s+3\fP" at 4.987,7.540 "\s10\fRsstub\fP" at 3.487,6.915 "\s10\fRsstub\fP" at 4.487,6.915 "\s10\fRserver\fP" at 3.987,6.290
Figure 5. Flow of control, probes and sensors shown for a response time sensor in the DCE run time library. Probes are not restricted to the RTL and can also occur in client or server source or stubs.
Sensors provide metrics by supplying the component values necessary to calculate intervalized metric values in probes and store sensor data in a process accessible structure. The component values provided by sensors are in the form of cumulative totals, for example.
A sensor with the purpose of providing a response time metric (ignoring location) would make available a total number of responses (R), and a total of the time spans to produce those responses (RT). These values could be taken from the sensor at the beginning and ending of a time interval and the mean response time for that interval.
The observer (also known as the address space helper thread) periodically captures the metric component values for each sensor that has been configured. The capture periodicity is specific/unique to each sensor. The observer will then communicate the captured metric component values and a timestamp to NPCS through the PRI interface.
The NPCS provides a consistent node-level view of all DCE performance instrumentation on a given node. It maintains a registry of sensors and observers provided to it through the PRI interface. It responds to queries against that registry made through the NPMI interface. It maintains a (single) copy of the latest captured metric component values for all registered sensors, communicated to it through the PRI interface. It maintains a registry of the collection of sensors that each PMA has configured through the NPMI interface. Based on the configurations requested by all PMAs, NPCS configures individual sensors through the PMI interface. It communicates the component metric values of any sensors that have been active during the requested (PMA-specific) interval through the NPRI interface.
The connection between the NPCS and the instrumented DCE processes is a critical one; it is very high volume, so its performance is a major factor in minimizing the impact of instrumentation on the overall performance of a node. Because of this, the connection is specified as two interfaces whose implementation is deliberately left vendor-specific; the goal is to allow full use of any available system-specific mechanisms to minimize the overall cost of transfers. The central focus is on the actual reporting of collected data, since this will be the greatest volume and the most likely to occur during normal operation.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
box with .sw at (2.67,6.76) width 1.00 height 2.69 line -> from 2.675,8.637 to 3.362,8.637 line -> from 5.300,7.325 to 4.612,7.325 box with .sw at (1.05,7.14) width 1.62 height 2.00 box with .sw at (5.30,7.14) width 1.69 height 2.00 line from 3.675,9.325 to 4.300,9.325 line from 3.675,6.888 to 4.300,6.888 line from 3.675,8.825 to 3.362,8.825 to 3.362,8.262 to 3.675,8.262 line from 4.300,7.700 to 4.612,7.700 to 4.612,7.138 to 4.300,7.138 line -> from 3.112,7.575 to 2.800,7.575 box with .sw at (3.11,7.44) width 0.81 height 0.26 dashwid = 0.050i line dashed from 2.675,7.825 to 2.987,7.825 to 2.987,7.263 to 2.675,7.263 line -> from 4.800,8.700 to 5.050,8.700 box with .sw at (4.05,8.56) width 0.75 height 0.26 box with .sw at (4.30,6.76) width 1.00 height 2.69 line dashed from 5.300,9.012 to 4.987,9.012 to 4.987,8.450 to 5.300,8.450 "\s10\fIpmi_talker\fP" at 4.425,8.665 "\s10\fRNPCS\fP" at 1.863,8.290 "\s10\fRprocess\fP" at 1.863,8.065 "\s10\fRDCE\fP" at 6.112,8.290 "\s10\fRprocess(es)\fP" at 6.112,8.065 "\s10\fRIPC\fP" at 3.987,7.978 "\s10\fRPMI\fP" at 3.525,8.403 "\s10\fRPRI\fP" at 4.475,7.465 "\s10\fR[A]\fP" at 3.175,9.603 "\s10\fR[B]\fP" at 4.800,9.603 "\s10\fRobserver_lib\fP" at 4.862,6.603 "\s10\fRnpcs_lib\fP" at 3.175,6.603 "\s10\fRpri2\fP" at 2.837,7.378 "\s10\fIpri_talker\fP" at 3.550,7.540 "\s10\fRpmi2\fP" at 5.150,8.853
Figure 6. The architecture of the Encapsulated Library.
The model is illustrated in Figure 6. It provides two libraries which
use/support the PRI and PMI interfaces. Servers of PRI and clients of
PMI would link with npcs_lib
. It is worthwhile to
emphasize
that there is only one NPCS server of PRI per node. This is denoted in
the diagram as pri2, thus indicating the subset of
PRI[B] (functions) specified through the PMI[A]. The point is that
npcs_lib
defines the
functions (entry point symbols) named in the PMI specification and
observer_lib
defines the functions named in the PRI specification.
Servers of PMI and clients of PRI would link with
observer_lib
. This is
very analogous to DCE RPC client and server stubs. The libraries may
create threads needed to support asynchronous communication. The
pmi_talker and pri_talker threads are shown
in Figure 6, and are named
talker, to contrast with RPC listener threads. The
middle region labeled
IPC represents an intra-node IPC mechanism whose choice is
unspecified as long as
the PRI and PMI interfaces provide the connecting mechanisms described
in this API section. This flexibility will permit many implementation
approaches without requiring ANY modification to the NPCS or DCE
processes. The interface is made independent of the underlying IPC
mechanism, by the use of procedures provided by the recipient (server) of
a request, which are invoked whenever a (client) request is made. This
is analogous to an RPC, but to allow for a more general implementation
the procedure names are passed to the libraries as procedure-valued
parameters to the initialization calls: dms_pmi_el_initialize
in
section 10.2, and dms_pri_el_initialize
in section 11.2.
The subset of the PRI functions passed to the PMI is denoted as pri2 in Figure 6. These functions perform local initialization, and then take whatever steps are required to open a communication path for the processes to communicate. The exact nature of these steps depend on the particular implementation of the PMI/PRI interface. Possibilities include, but are not limited to:
dciInitialize()
/dciRegister()
(see
the discussion regarding the DCI in [CMG]).
dms_pmi_el_initialize()
and dms_pri_el_initialize()
functions are used to
initialize the library
and underlying IPC mechanisms. The dms_pmi_el_free_outputs()
and the
dms_pri_el_free_outputs()
functions are used for freeing up
memory resources, and encapsulate RPC free routines if necessary.
This section summarizes the important state maintained or passed via the standard interfaces.
Sensor data components are described by sensor_data
of type dms_datum_t
.
These types allow a wide range of sensor data representations, including
opaque data structures for extensibility.
Sensor data is reported using the sensor_report_list
of type
dms_observations_data_t
.
typedef struct dms_opaque { unsigned long size; [size_is(size)] byte bytes[]; } dms_opaque_t; typedef enum { dms_LONG, dms_HYPER, dms_FLOAT, dms_DOUBLE, dms_BOOLEAN, dms_CHAR, dms_STRING, dms_BYTE, dms_OPAQUE, dms_DATA_STATUS } dms_datum_type_t; typedef union dms_datum switch (dms_datum_type_t type) { case dms_LONG: long long_v; case dms_HYPER: hyper hyper_v; case dms_FLOAT: float float_v; case dms_DOUBLE: double double_v; case dms_BOOLEAN: boolean boolean_v; case dms_CHAR: char char_v; case dms_STRING: dms_string_t *string_p; case dms_BYTE: byte byte_v; case dms_OPAQUE: dms_opaque_t *opaque_p; case dms_DATA_STATUS: error_status_t status_v; } dms_datum_t; typedef struct dms_sensor_data { dms_sensor_id_t sensor_id; unsigned long count; [size_is(count)] dms_datum_t sensor_data[]; } dms_sensor_data_t; typedef struct dms_timevalue { unsigned long sec; unsigned long usec; } dms_timevalue_t; typedef struct dms_observation_data { dms_timevalue_t end_timestamp; unsigned long count; [size_is(count)] dms_sensor_data_t* sensor[]; } dms_observation_data_t; typedef struct dms_observations_data { unsigned long count; [size_is(count)] dms_observation_data_t* observation[]; } dms_observations_data_t;
Sensors are registered using the sensor_register_list
of type
dms_instance_dir_t
.
Sensors in the sensor registry are named using the
registry_list
of type dms_instance_dir_t
.
/* This interface defines the data structures that represent * the dms namespace. There are two forms of names that can be * represented, a simple string-only form, and a fully * decorated form. */ typedef struct dms_name_node* dms_name_node_p_t; typedef struct dms_name_nodes { unsigned long count; [size_is(count)] dms_name_node_p_t names[]; } dms_name_nodes_t; typedef struct dms_name_node { dms_string_t* name; /*"*" == wildcard*/ dms_name_nodes_t children; } dms_name_node_t; typedef struct dms_attr { dms_string_t* attr_name; dms_datum_t attr_value; } dms_attr_t; typedef struct dms_attrs { unsigned long count; [size_is(count)] dms_attr_t* attrs[]; } dms_attrs_t; typedef struct dms_sensor { dms_sensor_id_t sensor_id; dms_attrs_t* attributes; unsigned short count; [size_is(count)] small metric_id[]; } dms_sensor_t; typedef struct dms_instance_leaf { unsigned long count; [size_is(count)] dms_sensor_t* sensors[]; } dms_instance_leaf_t; typedef struct dms_instance_node* dms_instance_node_p_t; typedef struct dms_instance_dir { unsigned long count; [size_is(count)] dms_instance_node_p_t children[]; } dms_instance_dir_t; typedef enum { dms_DIRECTORY, dms_LEAF, dms_NAME_STATUS } dms_select_t; typedef union dms_instance_data switch (dms_select_t data_type) { case dms_DIRECTORY: dms_instance_dir_t* directory; case dms_LEAF: dms_instance_leaf_t* leaf; case dms_NAME_STATUS: error_status_t status; } dms_instance_data_t; typedef struct dms_instance_node { dms_string_t* name; dms_datum_t* alternate_name; dms_instance_data_t data; } dms_instance_node_t;
The naming data structure is illustrated in Figure 7.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
line from 2.425,8.262 to 2.862,8.262 line from 2.425,8.012 to 2.862,8.012 line from 2.425,7.763 to 2.862,7.763 line from 2.425,7.513 to 2.862,7.513 line from 2.425,7.263 to 2.862,7.263 line from 2.425,8.762 to 2.862,8.762 box with .sw at (2.42,7.01) width 0.44 height 2.00 line from 5.175,8.262 to 5.612,8.262 line from 5.175,8.012 to 5.612,8.012 line from 5.175,7.763 to 5.612,7.763 line from 5.175,7.513 to 5.612,7.513 line from 5.175,7.263 to 5.612,7.263 line from 5.175,7.013 to 5.612,7.013 line from 5.175,8.512 to 5.612,8.512 box with .sw at (5.17,6.76) width 0.44 height 2.00 line from 5.175,4.825 to 5.612,4.825 line from 5.175,4.575 to 5.612,4.575 line from 5.175,4.325 to 5.612,4.325 line from 5.175,4.075 to 5.612,4.075 line from 5.175,3.825 to 5.612,3.825 line from 5.175,3.575 to 5.612,3.575 line from 5.175,5.075 to 5.612,5.075 box with .sw at (5.17,3.33) width 0.44 height 2.00 line from 1.238,9.137 to 1.863,9.137 line from 2.425,8.512 to 2.862,8.512 line from 3.550,5.325 to 4.362,5.325 "\s10\fRdepth_limit = 1\fP" at 2.638,3.965 line from 3.550,8.825 to 4.362,8.825 box with .sw at (1.24,8.76) width 0.62 height 0.75 box with .sw at (3.55,8.51) width 0.81 height 0.62 box with .sw at (3.55,5.01) width 0.81 height 0.62 line -> from 0.988,9.762 to 0.988,9.450 to 1.238,9.450 line -> from 1.738,8.887 to 2.425,8.887 line -> from 2.800,8.950 to 3.550,8.950 line -> from 2.800,8.637 to 3.175,8.637 to 3.175,5.450 to 3.550,5.450 line -> from 4.300,8.637 to 5.175,8.637 line -> from 5.550,8.387 to 6.425,8.387 box with .sw at (6.42,7.89) width 0.56 height 0.62 line from 6.425,8.200 to 6.963,8.200 line -> from 4.300,5.200 to 5.175,5.200 line -> from 5.550,4.950 to 6.425,4.950 box with .sw at (6.42,4.45) width 0.56 height 0.62 line from 6.438,4.763 to 6.987,4.763 line from 2.300,4.388 to 2.300,4.138 to 3.050,4.138 to 3.050,4.388 "\s10\fRrequest_list\fP" at 0.950,9.865 ljust "\s10\fRroot\fP" at 1.538,9.278 "\s10\fRdce\fP" at 4.000,8.940 "\s10\fRdfs\fP" at 3.938,5.478 "\s10\fIchildren\fP" at 2.688,9.190 "\s10\fIchildren\fP" at 5.375,8.940
Figure 7. Sensor naming data structure. This example uses the parameters defined in the functiondms_npmi_get_registry()
, and shows the structures supporting the namesroot/dce/...
androot/dfs/...
, where root refers to the local network node where the NPCS resides. The depth parameter limits searches of subtrees.
Sensor configuration data is returned in the
sensor_config_list
of type dms_configs_t
.
const unsigned long dms_NO_METRIC_COLLECTION = 0; const unsigned long dms_THRESHOLD_CHECKING = 0x00000001; const unsigned long dms_COLLECT_MIN_MAX = 0x00000002; const unsigned long dms_COLLECT_TOTAL = 0x00000004; const unsigned long dms_COLLECT_COUNT = 0x00000008; const unsigned long dms_COLLECT_SUM_SQUARES = 0x00000010; const unsigned long dms_COLLECT_SUM_CUBES = 0x00000020; const unsigned long dms_COLLECT_SUM_X_TO_4TH = 0x00000040; const unsigned long dms_CUSTOM_INFO_SET = 0x80000000; typedef unsigned long dms_info_set_t; typedef struct dms_threshold_values { dms_datum_t lower_value; dms_datum_t upper_value; } dms_threshold_values_t; typedef union dms_threshold switch (boolean have_values) { case TRUE: dms_threshold_values_t values; case FALSE: ; } dms_threshold_t; typedef struct dms_config { dms_sensor_id_t sensor_id; dms_timevalue_t reporting_interval; /*0 == infinite*/ dms_info_set_t info_set; dms_threshold_t* threshold; error_status_t status; } dms_config_t; typedef struct dms_configs { unsigned long count; [size_is(count)] dms_config_t config[]; } dms_configs_t;
Several handles are defined to bind elements, speed up searching and decrease communication costs.
/* This interface defines the data structures used to represent * relationships between entities (sensors, processes, nodes) * within DMS. Some are transparent, meaning that a user of * that structure can manipulate its contents. Some are * opaque, meaning that only the creating entity can manipulate * its contents. */ /* TRANSPARENT BINDING TYPES */ typedef [string] unsigned char dms_string_t[]; typedef unsigned long dms_protect_level_t; /* see rpc.h */ typedef [string] unsigned char dms_string_binding_t[]; /* OPAQUE BINDING TYPES */ typedef unsigned long dms_pma_index_t; typedef unsigned long dms_npcs_index_t; typedef unsigned long dms_process_index_t; typedef unsigned long dms_sensor_id_t; typedef struct dms_sensor_ids { unsigned long count; [size_is(count)] dms_sensor_id_t ids[]; } dms_sensor_ids_t;
The sensor registry contains descriptive information about the sensors located on a particular node. This registry is maintained by the NPCS. An entry contains:
There is no explicit interface for obtaining modifications to the sensor registry. The PMA must periodically request interested sensors and compare this with previous requests.
A configuration registry contains configuration state about the sensors located on a particular node. This registry is maintained by the NPCS. An entry contains:
This may be combined with the sensor registry within the NPCS.
There is no explicit interface for obtaining modifications to the sensor configuration registry.
There are several sensor and metric attributes. These include:
typedef enum { dms_METRIC_ID, dms_METRIC_DATUM_TYPE, dms_DATA_LENGTH, dms_METRIC_TYPE, dms_METRIC_NAME_INDEX, dms_HELP_TEXT_INDEX, dms_INFO_SET_SUPPORT, dms_SENSOR_UNITS, dms_LAST_ATTRIBUTE /* this should remain last */ } dms_attribute_t;
Runtime behavior for sensor value subcomponent attributes is described below:
OSF must maintain a global sensor registry similar to the IETF SNMP registry [Rose], allowing vendors to provide globally known metrics and sensors but preserving local (vendor) autonomy and number assignment. This registry should be divided into domains analogous to the sensor naming described in section 5.1, to ease administration and interpretation of the sensors.
These official sensors are registered within the CDS when the DCE cell is brought up, and updates are registered as new versions of DCE are started within the cell.
A user branch must be available in the global sensor registry so that application developers may place well-known metrics and sensors there. An experimental branch should be supported to be used however deemed in each cell.
The specification proposes that this registry have the following tree structure (note that each entry level listed below represents a subdirectory; object identifiers are shown in parentheses following names):
The above tree ignores the other branches already in use with the Internet SNMP community. We have added a branch for OSF with object identifier 5 (this value requires verification with IETF). Under the OSF branch are several subtrees for various DCE services. The user branch is unique to each customer's cell, and contains the results of custom sensors registered by user applications as described in section 7.4. The experimental subtree is for temporary use within a cell. The vendor subtree allows vendors the autonomy to assign and manage their custom sensors without requiring intervention from OSF. These vendor sensor must be registered within the cell in the same way as user custom sensors.
The OSF needs to work with the Internet Assigned Numbers Authority to register sensors and attributes.
Custom sensor attributes must be registered and stored in the CDS so that they are available to all PMAs in the cell. This specification recommends that they be stored in the CDS with the form:
/.:/dms/sensors/domain
where domain
is one of dce
, dfs
,
security
, cds
, user
,
experimental
or vendor
.
It is a requirement to provide secure network transmission of performance data if mandated by local administrative policies. This allows protection against unauthorized users obtaining cleartext names of server processes, interfaces, operations or binding handles; falsifying client or server identities; or modifying transported data.
What are the implications on the 4 interfaces defined? The 2 control interfaces, NPMI and PMI, must be protected by access control to ensure that configuration data is modified only by those with proper authorization. The 2 data transport interfaces, PRI and NPRI, must be free from eavesdropping.
This specification assumes that intra-node communication via the PMI and PRI is secured by the host OS or the communication mechanism used. Consequently, it is not addressed further here.
To ensure that clients and servers are authentic, this specification
recommends the creation of the new DCE security group,
perf_admin
, and
enroll each host in this group. Principals for this group must be added
to the security registry, and both the PMA and NPCS must login and
execute as one of the principals (refreshing credentials
programmatically as necessary). The host key is already available on
the node and is automatically changed every 30 minutes. The benefit of
making perf_admin
a group is that the performance
principal on each host
(node) can change passwords independent of other hosts (nodes).
The NPCS must be able to execute as the owner of the performance principal's keytab file. Since the NPCS must be able to assume the identity of the host, it must run as root. However, this specification does not recommend that the NPCS run as root, but rather with a separate identity with sufficient capabilities to utilize DCE security services.
This does not solve the problem of users who can become root on a local
host, and thereby become a member of the perf_admin
group.
Implementations of the measurements system should not preclude the
extension of supporting several performance administration groups to
address this security hole, when needed in hostile environments.
Authorization must be handled through the use of a reference monitor
hard-coded into the manager routines of the NPMI and NPRI. The security
policy enforced via this reference monitor is that clients with the
perf_admin
principal identity are authorized to
invoke an NPMI or NPRI
function. Client requests with any other principal identity should be
rejected.
This reference monitor is universally enforced across all functions of the NPMI and NPRI. (It is possible to create an ACL manager that provides a much richer set of authorization capabilities, but that is beyond the scope of this version of the specification.) The reference monitor does not require support from IDL parameters, since the reference manager code obtains security information directly from the local RTL prior to processing the NPMI or NPRI function. (Note that the X/Open DCI uses a security key as a parameter. The PMI and PRI routines do not explicitly refer to this parameter, since that it is an implementation detail encapsulated by the PMI and PRI, and should be transparent to the calling process.)
Authenticated RPCs are used to address eavesdropping. Parameters in string form can appear for both NPMI and NPRI functions. The RPC data protection level is specified by the PMA when it first registers with the NPCS. Because all NPCSs may not support the same maximum protection level (for example, some data encryptions algorithms may not be available world-wide due to international export laws), the NPCS responds to the PMA request with the actual protection level that it can support. The PMA may unregister from this NPCS if the actual protection level is insufficient. The actual protection level can be set during sensor registration by specifying a minimum data protection level. This allows application developers and system managers to jointly specify the data protection level on an application basis if necessary. The policy enforced by the NPCS is the maximum of the PMA request and the sensor specified. The NPCS may also refuse service to a PMA that does not meet its minimum security requirements.
The use of a keytab file is also required (to hold the encryption key) for authenticated RPC, and implies that the NPCS executes with a dedicated user identifier to protect the keytab from unauthorized users. Although not recommended, unauthenticated RPC requests can be optionally supported by an NPCS on an implementation-dependent basis (this requires a configuration or command line parameter to enable).
The security policy outlined here does not prevent a PMA from accessing
another's NPRI interface. Since this is an interface for trusted users
(i.e., perf_admin
principal), it is expected that PMA developers not
invoke another NPRI.
PMAs that support cross-cell monitoring must use cross-cell authentication mechanisms prior to contacting an NPCS in a separate cell.
Errors are described for each of the four APIs. Error conditions are
returned in the error_status_t
function return parameter. A
general engineering
philosophy is that error conditions should not be used to convey
non-error-related state. This will assure efficient use of exception
handling code for future implementations that decide to use C++. These
function errors are described in detail in appendix I.
The following naming conventions are used in this specification:
dms_pmi_
.
dms_pmi_get_sensor_data()
.
_p
for pointers. Type
names should end with
the suffix _t
. String names will end with the suffix
_str
.
The next four sections describe the standard APIs:
Each of the functions is described with the following format:
The NPMI and NPRI interfaces are used by the PMAs to access and control sensors on any node in a DCE cell. The NPMI is supplied by the NPCS on each node. The NPRI is an optional, although recommended, interface provided by the PMA. The NPMI is described in this section, and the NPRI in section 9.
The NPMI interface provides each PMA with its own view of the sensors on a node in the DCE environment. Each PMA communicates with the NPCS to arrange delivery of sensor data via the NPMI or NPRI interfaces. The NPMI interface requires that PMA's explicitly discover and enable (configure) sensors, and then receive changed sensor data as it is pushed to them by the NPCS via the PMA's NPRI server interface. Specifically, the NPMI supports registering and unregistering PMAs interested in local sensors, getting and setting sensor configuration, and getting sensor data in a polled manner.
The NPMI is an RPC interface that is exported by the NPCS. Since this interface is accessed over the network, a non-RPC implementation is not recommended for security reasons. The NPMI functions pass parameters that local system administration policies may require protection from reading or modifying over a network. Therefore, the use of RPC data protection is supported for all NPMI functions (except for the initial act of registering a PMA).
The following Figure 8 illustrates the relationship between the physical sensors in an instrumented process and the PMA's logical view of sensors that is supported through the NPMI. Sensors are located in distinct processes and communicate with the NPCS via the observer. Each PMA, however, is only aware of the NPCS and sensors; the observer is transparent to the PMA.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
"\s10\fRNPCS\fP" at 3.550,7.728 circle at 6.237,7.138 rad 0.125 "\s10\fRs1\fP" at 6.237,7.103 circle at 6.612,6.888 rad 0.125 "\s10\fRs2\fP" at 6.612,6.853 circle at 6.237,6.200 rad 0.125 "\s10\fRs3\fP" at 6.237,6.165 circle at 6.612,5.950 rad 0.125 "\s10\fRs4\fP" at 6.612,5.915 circle at 5.050,9.512 rad 0.125 "\s10\fRs1\fP" at 5.050,9.478 circle at 5.675,9.512 rad 0.125 "\s10\fRs2\fP" at 5.675,9.478 "\s10\fRNPCS's view\fP" at 3.925,6.728 "\s10\fRof sensors\fP" at 3.925,6.540 "\s10\fRPMA's view\fP" at 6.237,9.853 "\s10\fRof sensors\fP" at 6.237,9.665 circle at 5.050,8.825 rad 0.125 "\s10\fRs3\fP" at 5.050,8.790 circle at 5.737,8.825 rad 0.125 "\s10\fRs4\fP" at 5.737,8.790 circle at 4.862,8.700 rad 0.125 "\s10\fRs3\fP" at 4.862,8.665 circle at 5.550,8.700 rad 0.125 "\s10\fRs4\fP" at 5.550,8.665 circle at 4.862,9.387 rad 0.125 "\s10\fRs1\fP" at 4.862,9.353 circle at 5.487,9.387 rad 0.125 "\s10\fRs2\fP" at 5.487,9.353 "\s10\fRNPRI\fP" at 1.925,8.228 "\s10\fRInterface\fP" at 1.925,8.040 "\s10\fRNPMI\fP" at 3.237,8.540 "\s10\fRInterface\fP" at 3.237,8.353 "\s10\fR.\fP" at 5.737,9.290 box with .sw at (3.11,7.51) width 0.88 height 0.50 "\s10\fR.\fP" at 5.675,9.228 "\s10\fIprocess B\fP" at 6.987,6.415 rjust "\s10\fR.\fP" at 5.800,9.353 "\s10\fR.\fP" at 5.112,9.290 "\s10\fR.\fP" at 5.050,9.228 "\s10\fR.\fP" at 5.175,9.353 "\s10\fR.\fP" at 5.112,8.603 "\s10\fR.\fP" at 5.050,8.540 "\s10\fR.\fP" at 5.175,8.665 "\s10\fR.\fP" at 5.800,8.603 "\s10\fR.\fP" at 5.737,8.540 "\s10\fR.\fP" at 5.862,8.665 arc at 1.940,8.849 from 2.925,8.950 to 2.175,7.888 cw line from 1.488,8.762 to 1.238,8.762 to 1.238,9.262 to 1.988,9.262 to 1.988,9.012 box with .sw at (1.49,8.51) width 0.75 height 0.50 line from 1.238,9.012 to 0.988,9.012 to 0.988,9.512 to 1.738,9.512 to 1.738,9.262 box with .sw at (5.11,6.89) width 0.50 height 0.25 line <-> from 5.612,7.075 to 6.112,7.138 line <-> from 5.612,6.950 to 6.487,6.888 box with .sw at (5.11,5.95) width 0.50 height 0.25 line <-> from 5.612,6.138 to 6.112,6.200 line <-> from 5.612,6.013 to 6.487,5.950 line -> from 2.300,8.637 to 3.175,8.075 line -> from 3.050,7.888 to 2.175,8.450 line <-> from 4.050,7.638 to 5.050,7.075 line <-> from 3.862,7.450 to 5.050,6.138 box with .sw at (4.92,6.76) width 2.06 height 0.50 box with .sw at (4.92,5.83) width 2.06 height 0.50 dashwid = 0.050i line dashed from 3.862,8.137 to 4.737,10.012 line dashed from 4.112,7.950 to 6.362,8.575 "\s10\fRPMA2\fP" at 1.613,9.103 "\s10\fRPMA1\fP" at 1.863,8.853 "\s10\fRPMA3\fP" at 1.363,9.353 "\s10\fRobs1\fP" at 5.362,6.978 "\s10\fIprocess A\fP" at 6.987,7.353 rjust "\s10\fRobs2\fP" at 5.362,6.040
Figure 8. PMA versus NPCS view of sensors. A PMA's view of a sensor is limited to its own configuration request. The NPCS maintains the configuration state of all sensors on its node for all interested PMAs. In this example there are four sensors: s1, s2, s3, s4, and three PMAs: PMA1, PMA2, PMA3. For sensor s1, PMA1 and PMA3 have it enabled, while PMA2 does not. Similarly, for sensor s2, PMA1 does not have it enabled, while PMA2 and PMA3 do. The observer in each process (obs1 and obs2) control requests and data from the NPCS and the sensors.
The complete IDL file is provided in appendix E.
This interface is provided by NPCS to allow PMAs to establish a connection. A PMA uses this interface to register its existence, the binding handle of its NPRI, and to establish data protection levels.
Any PMA that requests a greater protection level than specified by the minimum_protection_level will have to decide whether to continue. The protection level will be applied to parameters of all function calls and to ALL sensor data transported from this node to the PMA via the NPRI. This may cause excessive overhead, so it should be used with caution.
If a new instrumented process begins execution and requires a higher protection level than that in place when a PMA previously registered with the NPCS, then the NPCS must not make any of this sensor data available to the PMA until the PMA re-registers with the proper protection level.
error_status_t dms_npmi_register_pma ( [in ] handle_t handle, [in,ptr] dms_string_binding_t* npri_binding, /*null == client-only PMA*/ [in ] dms_npcs_index_t npcs_index, [in ] dms_protect_level_t requested_protect, [ out] dms_pma_index_t* pma_index, [ out] dms_protect_level_t* granted_protect );
handle
-- RPC binding handle of NPMI.
npri_binding
-- Pointer to a string binding handle
of PMA's NPRI interface. If this is NULL, then the PMA does
not support a NPRI.
npcs_index
-- Unique identifier assigned by PMA that
provides a shorthand for future NPCS-to-PMA communication.
requested_protect
-- The PMA's requested level of RPC
data protection for use in subsequent NPMI calls, or when data
is returned via NPRI functions.
pma_index
-- Unique identifier assigned by NPCS that
provides a shorthand for future PMA-to-NPCS communication.
granted_protect
-- The NPCS's granted level of RPC
data protection used by the
NPCS when returning data via NPRI functions, or for subsequent NPMI
functions. It might not be the same as that requested by the PMA.
It is established by the system manager at NPCS execution time.
dms_status
-- status of call; non-zero if call encountered an error.
REGISTER_FAILED
-- NPCS unable to complete registration.
ALREADY_REGISTERED
-- PMA previously registered.
PROTECT_LEVEL_NOT_SUPPORTED
-- Requested data protection
level not supported; granted_protect
will be used.
ILLEGAL_BINDING
-- Binding handle illegal.
dced
will
deliver. The UUID of the NPMI is specified in section 8.1.
pma_index
for each of these registrations.
This interface is provided by NPCS to supply all or part of the node-level sensor registry. The PMAs use this interface to discover available sensors and the current configuration state.
The NPCS does not support a synchronous event function to notify a PMA
of changes to the sensor registry, namely the addition and deletion of
individual sensors. The PMA must periodically invoke this function with
interested sensors in the request_list
and compare the results with
previous calls to determine what changes have occurred within the sensor
registry. This function should be used sparingly for this need, to
minimize network resource utilization.
The registry structure is defined by the data structures in section 7.3.2, and is illustrated in Figure 7.
There is no support for a wildcard using regular expressions.
Rather,
the tree of interest is provided in the request_list
with a
depth_limit
,
and all subtrees matching these constraints are returned. This
bulk-input parameter allows support for requesting multiple sensors in a
single call to this function. However, a more generalized query
processor is delegated to the PMA, which must then translate requests to
this function.
If the requested depth_limit
is greater then the implicit
depth_limit
of
the request_list
, then this function returns the sensors at a
depth equal
to that of the request_list
. Otherwise, only the requested
depth_limit
of the registry is returned.
Requests can only be made with string sensor instance names.
error_status_t dms_npmi_get_registry ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,ptr] dms_name_nodes_t* request_list, /*null == entire registry*/ [in ] long depth_limit, /*0 == infinity*/ [ out] dms_instance_dir_t** registry_list );
handle
-- RPC binding handle of NPMI.
pma_index
-- Unique identifier assigned by NPCS that provides a
shorthand for NPCS-to-PMA communication. This also provides a test to
determine whether
NPCS has terminated and restarted since the last
dms_npmi_get_registry()
call,
because a new NPCS won't know this value.
request_list
-- A pointer to a tree of sensor names that the
PMA is interested in. This parameter uses a tree structure that
contains one or more subtrees. If the pointer is NULL, then
the entire registry is returned.
depth_limit
-- This limits the search depth, and consequently the
number of subtrees, returned by the NPCS. This value is the number of
nodes starting with the root node of the NPCS sensor registry.
If this value is 0, then all subtrees are returned.
registry_list
-- Registry data for one or more sensors that
satisfy
the request_list
. The sensor identifiers contained within this
structure are used by the PMA for subsequent configuration actions, and
to identify sensor data reported via the NPRI.
dms_status
-- Status
of call; non-zero if call encountered an error.
UNKNOWN_PMA
-- PMA not registered.
UNKNOWN_SENSOR
-- One or more sensors included in
sensor_list
were not registered.
dms_npmi_register_pma()
call.
/.:/sec
),
then the PMA must translate this to a legal sensor name before
contacting the NPCS.
This interface is provided by NPCS to allow PMAs to configure which
sensor metric components to collect, and reporting frequency. This view
of the sensor is unique to each requesting PMA (pma_index
), and
conflicts, if any, are arbitrated by the NPCS. Requested configuration
changes are set on a sensor-by-sensor basis.
A list of sensor_configs
is used to request configuration, and
to return
configuration status. Only sensors that could not be set to requested
configuration state are returned, along with their current configuration
state. If a sensor cannot be set to one or more of the requested
parameters, then no configuration changes are made to the sensor. No
sensor data will be reported for sensors that were not successfully
configured. PMA must re-invoke this function with acceptable
configuration parameters before data will be returned for a sensor.
The PMA also uses this function to disable sensors it is no longer
interested in collecting data on. It does this by providing a list of
sensors in sensor_configs
with the info_set
value set to 0.
There is no explicit support in this specification for getting sensor configuration data since this function can satisfy this need.
error_status_t dms_npmi_set_sensor_config ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,out] dms_configs_t** sensor_configs );
handle
-- RPC binding handle of NPMI.
pma_index
-- Unique identifier assigned by NPCS that provides a
shorthand for NPCS-to-PMA communication.
sensor_configs
-- A list of sensor identifiers and
configuration state that PMA is interested in.
sensor_configs
-- A list of sensor identifiers, status of
configuration request, and
configuration state returned by the NPCS. Only sensors that could not be
configured as requested are returned in this structure.
dms_status
-- Status
of call; non-zero if call encountered an error.
UNKNOWN_PMA
-- PMA not registered.
UNKNOWN_SENSOR
-- One or more sensors included in
sensor_list
were not registered.
NO_SENSOR_REQUESTED
-- sensor_configs
contains no
sensors.
FUNCTION_FAILED
-- The set operation failed due to one or
more specified parameters conflicting
with a previous request. No sensor configuration modifications were made.
UNKNOWN_INFO_SET
-- Information set level out of range.
UNKNOWN_THRESHOLD_LEVEL
-- Threshold level out of range.
This interface is provided by NPCS to permit a poll of metric data without waiting for the next reporting interval. The sensor data is returned as an [out] parameter of the RPC.
Users of this interface include SNMP agents, PMAs with a monitoring policy of an occasional one-shot request, client-only PMAs, and special monitors for benchmarking or load-balancing that capture state before and after a workload's execution.
To access the current content of a sensor set the bypass_cache
flag to
TRUE. This forces the NPCS to collect requested sensor data by invoking
dms_pmi_get_sensor_data()
for each requested process. This provides
current sensor data, but is very costly. When the flag is FALSE, the NPCS
returns the latest complete version of sensor data from its internal
cache. The NPCS never returns data from a partial interval, only the
latest complete interval. This is much more efficient, but may provide
old sensor data, depending on the sensor reporting interval.
If the bypass_cache flag is TRUE, then this function has the
side-effects
of resetting all sensor minimum and maximum values. This is because the
action of a poll, by definition, results in the termination of the
current summarization interval. The observer's next scheduled reporting
interval, if there is one, is not affected. To prevent these
side-effects from affecting other PMAs that receive this data, a PMA using
this function must first set the sensor reporting interval to
NO_REPORT_INTERVAL
. This interval value is also used by the NPCS to
ensure that only one PMA in the cell can access this sensor using this
function, since this mode assumes that only one PMA owns the sensor
and wants no interference from other PMA requests. All other PMAs are
then prevented from modifying the sensors configuration, although they
can access its data. These side-effects do not occur if the
bypass_cache flag is FALSE.
This get operation will fail if the PMA has not previously registered
and set the sensor configuration correctly. In this failing case, a
NULL list of sensor_data
is returned.
The use of this polling interface is discouraged, since it requires significant network bandwidth.
error_status_t dms_npmi_get_sensor_data ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in ] dms_sensor_ids_t* sensor_id_list, [in ] boolean bypass_cache, [ out] dms_observations_data_t** sensor_data );
handle
-- RPC binding handle of NPMI.
pma_index
-- Unique identifier assigned by NPCS that
provides a shorthand for NPCS-to-PMA communication; this handle is
NULL for client only PMAs.
sensor_id_list
-- A list of sensor identifiers that the PMA
is interested in.
bypass_cache
-- A flag that when TRUE forces the NPCS to
collect requested sensor data directly from each sensor. This provides
current sensor data, but is very costly. When the flag is FALSE, the NPCS
returns the latest version of sensor data from the NPCS internal cache.
This is much more efficient, but may provide old sensor data
depending on the sensor reporting interval.
sensor_data
-- One or more sensor identifiers and
corresponding data are returned.
dms_status
-- Status
of call; non-zero if call encountered an error
UNKNOWN_PMA
-- PMA not registered.
UNKNOWN_SENSOR
-- One or more sensors included in
sensor_list
were not registered.
NO_SENSOR_REQUESTED
-- sensor_list
contained no sensors.
BYPASS_NOT_ALLOWED
-- Sensor configuration does not allow
cache bypass, due to conflict with another PMA.
dms_npmi_register_pma()
call.
This interface is provided by a NPCS to break the connection between a PMA and a NPCS, and free up NPCS resources. All sensors that have been configured by this PMA are disabled if the NPCS arbitration rules permit. PMAs use this interface to permanently break a connection. There is no support in this specification for a PMA temporarily suspending a connection.
Client-only PMAs (COPs) must use this interface to minimize resources unnecessarily consumed by the NPCS. The NPCS will maintain COP requests for a maximum interval of one between COP requests for getting sensor data.
error_status_t dms_npmi_unregister_pma ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index );
handle
-- RPC binding handle of NPMI.
pma_index
-- Unique identifier assigned by NPCS that
provides a shorthand for NPCS-to-PMA communication.
dms_status
-- Status
of call; non-zero if call encountered an error.
UNKNOWN_PMA
-- PMA not registered.
granted_protect
returned in
dms_npmi_register_pma()
call --
this may cause problems for international users whose PMAs and NPCS are
in different countries with different export controls on the use of
authenticated RPC. This issue is beyond the scope of this RFC.
The NPRI's primary purpose is to provide a data transport channel so
that a PMA can receive sensor data from an NPCS without the need to poll
for each update. Specifically, this interface supports network
reporting of a node's sensor data. All PMAs must implement this
interface to receive data from an NPCS without the need to poll for it.
However, a polling interface, dms_npmi_get_sensor_data()
, is
provided by the NPMI for simple or client-only PMAs (COPs). All other state
information about NPCS and sensors is obtained explicitly by invoking
the NPMI routines. To simplify the design the NPCS does not notify the
PMA of changes in sensor or NPCS state.
The NPRI is an RPC interface that is a part of the PMA. Since this
interface is accessed over the network a non-RPC implementation is not
recommended, due to security issues. The PMA sets the data protection
level of this interface in the dms_npmi_register_pma()
call.
The complete IDL is located in appendix F.
This interface is provided by the PMAs to assimilate updated sensor metric components without the need for polling. All sensor data that has changed within the last reporting interval is packaged together by the NPCS and reported in a single report.
The state diagram in Figure 9 illustrates when data is pushed from the NPCS to the PMA. All state transitions occur only at PMA-specified reporting interval boundaries, with the exception of the reconfiguration state transition, which occurs asynchronously with respect to reporting intervals. The nesting of state indicates a separate state machine for each PMA's view of a sensor configured.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
arc -> at 3.163,6.888 from 4.350,9.188 to 1.950,9.175 arc -> at 1.275,8.225 from 1.488,8.350 to 1.163,8.425 cw arc -> at 5.168,8.312 from 4.900,8.387 to 5.125,8.588 circle at 1.538,8.850 rad 0.463 circle at 4.612,8.775 rad 0.463 line -> from 3.550,8.200 to 3.550,7.575 line -> from 4.987,8.075 to 4.237,7.575 dashwid = 0.075i line dashed <- from 1.988,8.838 to 4.175,8.838 box with .sw at (2.86,6.95) width 1.75 height 0.62 line -> from 4.612,7.200 to 5.300,7.200 arc -> at 3.038,10.758 from 1.850,8.450 to 4.250,8.463 line -> from 4.612,7.263 to 5.300,7.325 "\s10\fRreconfiguration\fP" at 3.062,8.915 line -> from 4.612,7.138 to 5.300,7.075 "\s10\fRNoMod\fP" at 1.488,8.790 "\s10\fRData\fP" at 4.625,8.903 "\s10\fRModified\fP" at 4.600,8.715 "\s10\fRThreshold Check\fP" at 3.725,7.303 "\s10\fR(optional)\fP" at 3.725,7.153 "\s10\fRNPCS\fP" at 3.737,6.840 "\s10\fRComponents\fP" at 5.975,7.115 "\s10\fRto PMAs\fP" at 5.975,6.903 "\s10\fROutput Sensor\fP" at 6.000,7.365
Figure 9. dms_npri_report_sensor_data()
sensor state
machine. Sensor data is pushed to the PMA by the NPCS only if it was
modified during the current reporting interval.
This call requires the PMA to have previously registered with the NPCS, and
provided a binding to its NPRI interface. Data will not flow to the
NPRI until the PMA enables sensors using the
dms_npmi_set_sensor_config()
function.
The NPCS will return a NULL sensor data list of there is no sensor data to report for this interval. This serves as a still-alive message to the PMA during periods of application (and hence sensor) inactivity or when no thresholds were exceeded.
error_status_t dms_npri_report_sensor_data ( [in ] handle_t handle, [in ] dms_npcs_index_t npcs_index, [in,ptr] dms_observations_data_t* sensor_data /*null == keep-alive*/ );
handle
-- The RPC binding handle of the NPRI.
npcs_index
-- Unique identifier assigned by PMA by function
dms_npmi_register_pma()
that provides a shorthand for
NPCS-to-PMA communication.
sensor_data
-- A structure containing one or more sensors
and the data
components as configured by this PMA. See section 7.3.1 for details.
May be NULL if no sensor data to report in this interval.
dms_status
-- Status
of call; non-zero if call encountered an error.
UNKNOWN_SENSOR
-- Reported sensor not requested by PMA; PMA
should call dms_npmi_set_sensor_config()
and disable this sensor.
UNKNOWN_NPCS
-- Reporting NPCS not recognized; PMA should
re-register with this NPCS to reestablish a valid npcs_index
.
npri_binding
handle in dms_npmi_register_pma
.
The PMI and PRI are the two low-level interfaces. These interfaces are used by the observer and NPCS to control sensors and transmit state. These interfaces are provided by the DCE vendor and are transparent to the PMA developer.
The PMI's primary purpose is to provide a control and access interface to sensors located within a process that supports DCE instrumented services. An NPCS uses the PMI routines to set sensor configuration state, get sensor data state, and initialize and terminate the connection to the NPCS.
The PMI is implemented in the encapsulated library as described in section 7.2. The actual communication is implemented as either an RPC interface or as an implementation-specific IPC mechanism. The encapsulated library hides the actual communication mechanism from the programmer.
The complete IDL is located in appendix G.
This utility function is necessary to initialize the encapsulated library. It records the PRI procedures in private variables, and takes whatever steps are required to open a communication path for processes to communicate with NPCS. The exact nature of these steps depend on the particular implementation of the PMI/PRI interface. Possibilities include, but are not limited to:
dciInitialize()
.
error_status_t dms_pmi_el_initialize ( [in ] dms_pri_reg_proc_fp_t pri_register_process, [in ] dms_pri_reg_sensor_fp_t pri_register_sensor, [in ] dms_pri_report_data_fp_t pri_report_sensor_data, [in ] dms_pri_unreg_sensor_fp_t pri_unregister_sensor, [in ] dms_pri_unreg_proc_fp_t pri_unregister_process );
pri_register_process
pri_register_sensor
pri_report_sensor_data
pri_unregister_sensor
pri_unregister_process
These are all callback (local) procedures exported by NPCS, invoked by the encapsulated library whenever the corresponding PRI procedure is invoked by an instrumented process. These procedures have identical signatures to their corresponding PRI procedures.
dms_status
-- Status
of call; non-zero if call encountered an error.
FUNCTION_FAILED
-- Initialization function failed due to an
internal encapsulated library error.
This utility function is necessary to initialize free output data in the encapsulated library, encapsulate RPC free memory functions, and eliminate possible memory leaks.
error_status_t dms_pmi_el_free_outputs ( [in,ptr] dms_configs_t* sensor_config_list, /*null == absent*/ [in,ptr] dms_observation_data_t* sensor_report_list /*null == absent*/ );
sensor_config_list
-- A pointer to the sensor configuration
list that the programmer desires to free allocated memory. Set this to
NULL if no list is to be freed.
sensor_report_list
-- A pointer to the sensor reporting
list that the programmer desires to free allocated memory. Set this to
NULL if no list is to be freed.
dms_status
-- Status
of call; non-zero if call encountered an error.
FUNCTION_FAILED
-- Initialization function failed due to an
internal encapsulated library error.
This function disconnects the NPCS from all registered observers, and is
useful for planned shutdowns of the NPCS. The function undoes the
actions of the dms_pmi_el_initialize()
function. The specific
actions are implementation-dependent. The observer's response to this
request is to return all sensors to a quiescent state.
There is no comparable call from the NPMI, so a PMA cannot cause this
action. This call should be supported via the normal DCE control
programs (such as dcecp
).
error_status_t dms_pmi_terminate ( void );
None.
dms_status
-- Status
of call; non-zero if call encountered an error.
FUNCTION_FAILED
-- Terminate action failed due to an
internal encapsulated library error.
dms_pmi_terminate()
call can determine that the
NPCS has stopped execution and should invoke its internal clean-up routines.
This interface is provided to select which metric components (information set, etc.) a sensor supplies, and the interval between sensor summarizing and reporting those components. The NPCS uses this interface to set sensors on a per-process basis (i.e., for one observer at a time). Consequently, to set the sensors in N processes requires N invocations of this function (one call each to N observers).
All requested operations are done on a sensor-by-sensor basis only, for
sensors requested in the sensor_config_list
. No global sensor
configurations are supported.
This function does not return verification status about each sensor configured. It returns status only on sensors that were not modified. Sensors are never left in a partially modified state. If any of the requested configuration states were not modified, then no sensor state is modified, and this current state is returned as function output with the appropriate error status. If all or nothing semantics are required, then the application must explicitly reset all sensors that were successfully set.
error_status_t dms_pmi_set_sensor_config ( [in ] dms_process_index_t process_index, [in,out] dms_configs_t** sensor_config_list );
process_index
-- Shorthand provided by NPCS via
dms_pri_register_process()
.
sensor_config_list
-- A list of sensor identifiers and
requested configuration states.
sensor_config_list
-- A list of sensor identifiers and
resulting configuration states for sensors that could NOT be set to the
requested level.
dms_status
-- Status
of call; non-zero if call encountered an error.
CHECK_INTERNAL_STATUS
-- Sensor configuration not changed
due to non-existent sensor, illegal request, or previous state is
mutually exclusive of requested state. The status of each failing sensor
request is returned in the internal status fields of the
sensor_config_list
.
process_index
is an input parameter for use by the
encapsulated library to identify the requested observer.
NO_THOLD
as
an error.
This function is provided as a polling interface that obtains current
sensor data as function output. The function returns data for each
sensor requested, whether the sensor data has changed in the last
interval or not. A timestamp is also returned so that this data can be
correlated with other measurements in the cell. This function is not
directly callable by a PMA, but is only invoked when the
dms_npmi_get_sensor_data()
function is invoked with the
bypass_cache
flag set to TRUE.
This function has the side-effects of resetting all sensor minimum and
maximum values. The observer's next scheduled reporting interval, if
there is one, is not affected. To prevent these side-effects from
affecting other PMAs that receive their data in the recommended way, a
PMA using this function must first set the sensor reporting interval to
NO_REPORT_INTERVAL
. This interval value is also used by the NPCS to
ensure that only one PMA in the cell can access this sensor using this
function.
This function is not the recommended method of obtaining sensor data, but
is provided for compatibility with existing management applications
(such as SNMP), and to support client-only PMAs. The recommended mode of
access is using the PRI dms_pri_report_sensor_data()
function,
which is more efficient and scalable.
error_status_t dms_pmi_get_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list, [ out] dms_observation_data_t** sensor_report_list );
process_index
-- Shorthand provided by NPCS via
dms_pri_register_process()
sensor_id_list
-- A list of sensor identifiers as assigned
by the NPCS via the dms_pri_register_sensor()
function.
sensor_report_list
-- Returns a list of sensors and
individual values, and a timestamp that corresponds to when the observer
returned the data.
dms_status
-- Status
of call; non-zero if call encountered an error.
UNKNOWN_SENSOR
-- Sensor does not exist, or unknown sensor
identifier.
SENSOR_NOT_CONFIGURED
-- Sensor not configured to collect data.
SENSOR_CONFIG_CONFLICT
-- Sensor not configured for access
via this method, since its reporting interval was not set to
NO_REPORT_INTERVAL
.
dms_pri_report_sensor_data()
. The IPC mechanism for non-RPC
implementations of the encapsulated library is implementation-dependent
but must support this function's input and output parameters.
sensor_report_list
is obtained by the
observer at the end of the reporting interval, i.e., after it has
prepared sensor data for transport but just prior to actually
transporting the data.
The PRI's primary purpose is to provide an efficient, interprocess data transportation channel for observer-to-NPCS communication. Specifically, the PRI supports routines to register processes (observers) and sensors, transmit (push) sensor data between the instrumented process's address space and the NPCS's, and unregister processes (observers) and sensors. The observer is the only DMS element allowed to invoke these routines. The registration routine is invoked prior to providing any data collection or support of PMI routines.
The PRI is implemented as either an RPC server interface exported by the NPCS, or as an IPC mechanism.
The complete IDL is located in appendix H.
This utility function is necessary to initialize the encapsulated library. It records the PMI procedures in private variables, and takes whatever steps are required to locate the communication path to communicate with the instrumented process. The exact nature of these steps depend on the particular implementation of the PMI/PRI interface. Possibilities include, but are not limited to:
error_status_t dms_pri_el_intialize ( [in ] dms_pmi_set_config_fp_t pmi_set_sensor_config, [in ] dms_pmi_get_data_fp_t pmi_get_sensor_data, [in ] dms_pmi_terminate_fp_t pmi_terminate );
pmi_set_sensor_config
pmi_get_sensor_data
pmi_terminate
These are all callback (local) procedures provided by the instrumented process, that are invoked by the encapsulated library whenever the corresponding PMI procedure is invoked by the NPCS. These procedures have identical signatures to their corresponding PMI procedures.
dms_status
-- Status
of call; non-zero if call encountered an error.
FUNCTION_FAILED
-- Initialization function failed due to an
internal encapsulated library error.
This utility function is necessary to initialize free output data in the encapsulated library, encapsulate RPC free memory functions, and eliminate possible memory leaks.
error_status_t dms_pri_el_free_outputs ( [in,ptr] dms_instance_dir_t* sensor_register_list /*null == absent*/ );
sensor_register_list
-- A pointer to the sensor
registration list that the programmer desires to free allocated memory.
dms_status
-- Status
of call; non-zero if call encountered an error.
FUNCTION_FAILED
-- Initialization function failed due to an
internal encapsulated library error.
This interface is invoked by instrumented DCE processes to provide the NPCS with the data necessary to build and maintain the node-level sensor registry. The observer in a DCE process uses this interface to register process specific state.
error_status_t dms_pri_register_process ( [in ] dms_string_t* process_name, [in ] long process_pid, [ out] dms_process_index_t* process_index );
process_name
-- A string that contains the
argv[0]
value of the instrumented DCE process.
process_pid
-- The value returned by getpid()
.
Note that these function inputs are described for an operating system exporting a POSIX-conformant interface.
process_index
-- Shorthand reference for future
observer-to-NPCS communication; assigned and maintained by the NPCS.
dms_status
-- Status
of call; non-zero if call encountered an error.
process_index
is used by the encapsulated library to
determine which PMI/observer requested NPCS action using the PRI.
dms_pri_register_process()
prior to invoking
dms_pri_register_sensor()
. This
ensures proper behavior of the registration process in environments
where all of DCE or DMS are not yet executing. In addition, an observer
blocked in dms_pri_register_process()
, or an observer that has
not yet invoked dms_pri_register_process()
, must not prevent
sensors from calling
their registration macros in a non-blocking fashion. The registration
macros must enqueue the registration data so that it is available to the
observer after it is un-blocked.
This function allows observers to provide the data to the NPCS to build the node level sensor registry. Standard and custom sensors within the process address space are registered by the observer using this function. The NPCS returns a sensor identifier that is used for all subsequent references to the registered sensor.
Sensors can be registered singly or in bulk. For efficiency, bulk registration should be used wherever possible. Since most DCE processes will contain dozens to hundreds of sensors, a bulk registration significantly reduces the RPC/IPC access overhead.
It is our assumption that the standard sensors (i.e., client, server,
and global sensors) reside in DCE RTL, stubs, and DCE services (such as
secd
and cdsd
). The custom sensors are those added
by middleware components providers (such as Encina and DFS), and
application client or server developers.
error_status_t dms_pri_register_sensor ( [in ] dms_process_index_t process_index, [in,out] dms_instance_dir_t** sensor_register_list );
process_index
-- Shorthand provided by NPCS via
dms_pri_register_process()
.
sensor_registration_list
-- Specifies one or more sensors
to register. Configuration data includes sensor name, sensor attributes
and metric attributes.
sensor_registration_list
-- The structure passed as input
is returned with the sensor identifier and registration status fields set.
dms_status
-- Status
of call; non-zero if call encountered an error.
Returned for entire call (i.e., summarizes results for all sensors that requested registration).
CHECK_INTERNAL_STATUS
-- One or more sensors failed to
register (see individual status for details). Check the status contained
within the returned structure for details.
registration_status
-- Registration results for this
particular sensor; one of:
STATUS_OK
-- Sensor registered with no problems.
DUPLICATE_SENSOR
-- Sensor already registered.
ILLEGAL_NAME
-- Sensor name not legal.
ILLEGAL_CLASS
-- Unknown sensor class.
ILLEGAL_METRIC
-- Unknown metric identifier.
UNKNOWN_PROCESS
-- Process has never registered.
NO_NPCS
-- NPCS not present. Unlike the
dms_pri_register_process()
function, the
observer does not block if the NPCS is not present. On receipt of this
error, the observer should
initiate the restart policy described in section 13.8.
minimum_protection_level
will have to decide whether to continue (see
dms_npmi_register_pma()
). The highest
minimum_protection_level
requested
during the registration of sensors will be applied to ALL sensor data
transported from this node to the PMA via the NPRI. This may cause
excessive overhead, so use with caution.
registration_status
of STATUS_OK
should be immediately unregistered, using the
dms_pri_unregister_sensor()
.
friendly
names
require extensions to IDL to support a new structure in the stub or RTL
that contains the string names. An API to retrieve these via the RTL
must also be specified. The details of this are beyond the scope of the
specification, but must be supported in the encapsulated library.
metric_id
numbers
for custom sensors must be unique within the process. This requires a
utility function (not described in this spec), \(lBget_metric_id(),
that returns
a unique metric_id\*(le each time it is invoked. Additional details
regarding the need for a global repository are described in section 7.4.
The observer uses this NPCS interface to report (push) modified sensor data during the last reporting interval. This allows the observer to report sensor data in an efficient manner, since it does not require the NPCS to poll for the next request and returns sensor data in bulk.
To speed up the performance of the steady-state path, it is not required that this function return errors synchronous with each call. Errors are guaranteed to be returned no later then by the next invocation of this function. Any data associated with bad status may be lost.
error_status_t dms_pri_report_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_observation_data_t* sensor_report_list );
process_index
-- Shorthand provided by NPCS via
dms_pri_register_process()
.
sensor_report_list
-- One or more sensors and their
component values are contained in this structure. See section 7.3.1 for
additional details.
dms_status
-- Status
of call; non-zero if call encountered an error.
REPORT_FAILED
-- Unknown error prevented NPCS from updating
sensor data values (possible causes include lack of resources or
execution time of NPCS).
NO_NPCS
-- NPCS not present; observer should begin clean-up
process.
# ####### ### ##### # # ###### ####### # # # # # # # # # # # # # # # # # # # # # # # ##### # # #### # # ###### ##### # # # # # # # # # # # # # # # # # # # # # # # # # ### ##### ##### # # ####### # [Figure not available in ASCII version of this document.]
"\s10\fRSensor\fP" at 6.737,8.290 "\s10\fRData\fP" at 6.737,8.103 arc -> at 4.221,5.830 from 1.613,7.763 to 4.112,9.075 cw arc -> at 4.560,7.877 from 4.112,7.325 to 4.425,8.575 cw arc -> at 4.196,8.103 from 4.862,8.637 to 4.550,7.325 cw arc -> at -0.360,14.338 from 4.175,8.825 to 1.925,7.575 cw arc -> at 3.035,8.045 from 3.862,6.825 to 1.925,7.075 cw arc -> at 5.288,9.182 from 5.050,8.950 to 4.987,9.325 "\s10\fRReport\fP" at 6.737,8.478 arc -> at 5.003,6.466 from 4.675,6.487 to 4.862,6.763 "\s10\fRModified\fP" at 4.362,6.728 circle at 1.488,7.312 rad 0.463 circle at 4.600,9.050 rad 0.463 circle at 4.375,6.862 rad 0.463 line -> from 5.112,8.012 to 6.425,8.387 line -> from 5.237,6.700 to 6.425,8.075 "\s10\fRConfig\fP" at 1.488,7.290 "\s10\fRNoMod\fP" at 4.550,9.040 "\s10\fRData\fP" at 4.362,6.978
Figure 10. dms_pri_report_sensor_data()
sensor state
machine. Sensor data is pushed to the NPCS only if it was modified
during the current reporting interval.
The observer uses this NPCS interface to notify the NPCS that one or more sensors can be removed from the node-level sensor registry. This allows the NPCS to free resources associated with these sensors.
In most cases, groups of sensors are unregistered only in the (unlikely) event of a server unregistering an interface.
error_status_t dms_pri_unregister_sensor ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list );
process_index
-- Shorthand provided by NPCS via
dms_pri_register_process
.
sensor_id_list
-- A list of sensor identifiers to unregister.
dms_status
-- Status
of call; non-zero if call encountered an error.
NOT_REGISTERED
-- One or more sensors were never registered.
NO_NPCS
-- NPCS not present.
An observer uses this NPCS interface to notify the NPCS to remove all of the sensors in the instrumented DCE process from the node-level sensor registry. This allows the NPCS to free resources associated with the unregistering process.
error_status_t dms_pri_unregister_process ( [in ] dms_process_index_t process_index );
process_index
-- Shorthand provided by NPCS via
dms_pri_register_process()
.
dms_status
-- Status
of call; non-zero if call encountered an error.
NOT_REGISTERED
-- Observer was never registered.
NO_NPCS
-- NPCS not present.
This section describes additional functions supplied by the two standard mechanisms: the observer and the NPCS. Core functions were described in the relevant API sections. This section focuses on additional functionality necessary for the implementor of the measurement system to provide.
Core observer functions were described in sections 7, 10 and 11.
The additional responsibilities are expressed in terms of an idealized implementation. It is possible that the responsibilities outlined here might require, or benefit from multiple observer threads.
A snapshot of the raw data for each active sensor in an address space
(process) must be made at the end of each summarization interval, by the
data intervalizer executing on the observer thread. The notion of an
active
sensor is any sensor that has reached the end of its
summarization interval, and has had execution of some thread pass
through its final probe point during that interval (i.e., the sensor has
produced some raw data from which its metric can be computed). This
frees sensors from any direct responsibility for interval summarization,
and provides the basis for time correlated metrics.
All sensor metric computations that are performed once per summarization interval are made on the snapshot raw data, by the metric calculator executing on the observer thread. This helps to minimize in-line sensor overhead. An example of this is the computation of mean response time, where the observer calculates the mean by dividing the cumulative response time by the number of completions.
Any interval sensor, i.e., a sensor that has probes executed once and only once each summarization interval, independent of any (normal) thread, will precede the data intervalizer's execution on the observer thread. This provide the means of supplying process global metrics that are independent of any other sensors, and minimizes overhead by collecting them out of the application's in-line path. Most of these sensors are described in section 5.6.
A Performance Management Application (PMA) is the value-added performance management and display application supplied by a vendor or third party. The PMA interacts with NPCS from across the network.
The NPCS is a trusted process, but is used only for the collection and control of performance data. It should run as a non-privileged user.
Core NPCS functions were described in sections 7, 8, 9, 10 and 11.
The NPCS is a many-to-one funnel for sensors on a node. It fulfils a similar function for the users of the data as well. While there may be many management stations wanting information, the NPCS buffers these requests so the sensors in the application server or client process do not have to manage multiple logical connections. The local sensor mechanism needs only to move the latest information to the (single) NPCS at the required rate, and for the requested information set. Then, NPCS will satisfy the various demands of the management stations requesting information. As such, it handles the state structures required to most efficiently assemble and move requested information to the performance management applications.
NPCS may be implemented as a long-running daemon. Memory leaks in any form would be debilitating for a standard, required daemon. NPCS must have measures to identify sensors which have disappeared for whatever reason (e.g., process containing the sensors is killed or crashes). The memory and state associated with these sensors must be completely recovered. Similarly the state associated with defunct or disinterested PMAs must be recovered when the connection with the PMA is broken or unused.
As part of NPCS's role as multiplexer, it instructs the sensors in processes on the local node to report at the least common denominator (LCD) time interval to handle the requests from performance management applications. A bound would be selected that limits the time intervals that can be selected. For those performance management applications requesting relatively longer time intervals, NPCS summarizes multiple reports from the servers/clients reporting information on that node at the lower rate, and transmits only the data requested by the PMA. This is in keeping with our philosophy of transmitting the minimum data necessary across interfaces.
In the steady state, the NPCS will be supplying data to a PMA for several dozen, or even hundreds, of sensors. If each sensor is provided in a separate communication (RPC), the measurement system specification goals cannot be met. Thus the NPCS batches data at regular intervals from numerous sensors bound for a particular PMA.
On systems which are DCE-compliant, or which have some RPC mechanism of interest (but not truly DCE), a form of NPCS must be made available if data is to be collected. Perhaps in its translation capability, NPCS can be made available to management stations, even if running on a PC or non-POSIX operating system.
This section documents all engineering issues related to the measurement system that were not described elsewhere in this document.
The minimum functionality that is required to support this specification is:
The requirements on the underlying implementation of the encapsulated library are that it correctly implements the various functions. A few points are emphasized here:
dms_pri_register_process()
call by a process observer thread of an
instrumented process will block until it has executed in the NPCS.
The case of single-threaded DCE clients could be handled by immediately
returning a no NPCS yet status. Checking the CMA value
cma__g_thrdcnt
can be used to determine multi-threaded support. Sensors being
registered by other threads in the process will need to be queued for
later registration with the NPCS, but these threads cannot be blocked
because the NPCS may not ever appear.
dms_pri_report_data()
. Because this is a bulk data
transfer mechanism,
it can return immediately, improving its efficiency. Note the
caller of dms_pri_report_sensor_data()
must be permitted to
deallocate the
input dms_observation_data_t
data structures as soon as the call
returns. This implies that either the return must be delayed, the data
must be copied before returning, or some other (more complicated) PMI
deallocation callback must be added if the underlying implementation
permits, to allow more data to be queued. Errors may be reported later,
on subsequent calls. Also, the possibility exists that a failing NPCS
will cause a dms_pmi_terminate()
callback, rather than bad
status on a subsequent (pri2) call.
dms_pmi_terminate()
or dms_pri_unregister_process()
,
to allow the other end to clean up.
dms_pri_register_process()
.
dms_pri_register_sensor()
, in order to
maintain a mapping from them to communications paths.
dms_pmi_el_free_outputs()
and
dms_pri_el_free_outputs()
, to handle the deallocation of output data
structures. This permits the underlying memory management mechanisms
to be the responsibility of the allocating module (NPCS, npcs_lib
,
observer_lib
, DCE process). This also implies that in/out parameters
need to be handled correctly to avoid memory leaks (i.e., save a copy of
input pointers).
dms_pri_register_process()
, then sensors
must be allowed to continue to invoke the registration macros.
Individual sensor data is then enqueued until the observer is unblocked
and able to process the sensor registration requests. The observer then
processes these sensor registrations in bulk using the
dms_pri_register_sensor()
call.
The DCE RTL needs to support a mechanism that allows client process to be identified and contacted if necessary for monitoring purposes.
Additional investigation is necessary to understand how to collect and report data for nested RPCs (i.e., an RPC that invokes a server, that causes the server to act as a client and invoke a different server).
The DCE IDL must support a structure in the stub that contains data to construct friendly named sensors, since the RTL knows server operations by a UUID and an operation number (which is not very meaningful to a system administrator).
After the RTL is instrumented, all DCE core services should be recompiled
to incorporate the instrumented libdce
.
Since this capability is represented by a set, and individual sensors
can support subsets, then it is a policy that all sensor data value
components be returned in order of their definition in
dms_info_set_t
.
If a particular sensor does not support a given set component, it must
return NULL values in this sensor data value component location in
dms_sensor_data_t
. This also allows new set components to be defined
and processed for future versions, as long as no set value is ever
reused.
The non-existence or errors of the elements of the instrumentation must not decrease the availability of applications or DCE core services. This restriction reinforces the notion that the instrumentation is an aid to management, and not a hindrance.
The instrumentation system must not decrease the availability of DCE applications or core services. Initialization and recovery of the measurement system are controlled to minimize impact on applications and core services. Thus this specification addresses a measurement system that supplements application and DCE core service functionality, and simplifies the design by eliminating recoverable data state mechanisms such as checkpoints.
Start-up dependencies are a crucial issue that must be addressed to ensure a robust implementation. An example of the problem illustrates the challenge: If the NPCS starts execution on a node prior to security or naming services, then the NPCS cannot provide secure communications (since this requires using a DCE login context that is not available without a security service). And if the NPCS on the same node as the security server starts execution after the security server, then the observer in the security process cannot register sensors (since this requires an NPCS supporting the PRI functions).
To resolve this dependency problem, a lazy connection strategy that allows elements to defer initialization and registration if the requested server component is not currently available is recommended. For the example in the previous paragraph, the security service defers registering sensors until the NPCS is available. The observer maintains registration context and periodically tests until the NPCS is available to complete registration. The NPCS has less of an issue since it responds to observer requests and does not initiate them. This technique has the benefit of allowing upgraded or failed NPCSs to be restarted in a live environment with no impact on application availability (although no performance data is available during the interval of NPCS inactivity).
Specifically, the following scenarios must be supported in conforming implementations. For each scenario the implementation policies are described.
In this scenario, the cell, node, and instrumented DCE process start up for the first time.
Assumptions/requirements:
Recommendation:
dms_pri_register_process()
since no NPCS exists on the node. While the observer is blocked, sensors
must still be able to register within the process (but no calls to
dms_pri_register_sensor()
are allowed until the observer unblocks on
dms_pri_register_process()
). The observer is a separate
thread, so there is no impact on the instrumented application.
dms_pri_register_sensor()
function.
In this scenario, the node is restarting after a planned or unplanned shutdown.
Assumptions/requirements:
Recommendation:
In this scenario, the PMA unexpectedly terminates and restarts. The NPCS and sensors are unaware of this event.
Assumptions/requirements:
Recommendation:
In this scenario, the NPCS process gracefully exits.
Assumptions/requirements:
Recommendation:
dms_pmi_terminate()
prior to exiting. This informs the
encapsulated library that the NPCS is no longer available.
In this scenario, the NPCS unexpectedly terminates and restarts. The PMA and sensors are unaware of this event.
Assumptions/requirements:
Recommendation:
Note that the encapsulated library must provide a synchronous mechanism to notify observers that the NPCS has terminated. Otherwise, an observer that is not currently reporting data will be lost and not reachable when the NPCS restarts.
In this scenario, the instrumented DCE process gracefully exits.
Assumptions/requirements:
Recommendation:
In this scenario, the instrumented DCE process unexpectedly terminates.
Assumptions/requirements:
Recommendation:
In this scenario, the PMA and NPCS are separated by a network partition.
Assumptions/requirements:
Recommendation:
Sensors contain cleartext descriptions that assist the end-user in interpreting the metric values. These descriptions are contained in a help text string. This string must support internationalization conventions as described in the various DCE RFCs on internationalization. Sensor names conform to the DCE portable character set.
The DCI provides a standard interface to operating system performance data. The spec was submitted to X/Open in early 1994. That technology was evaluated for support by the functions in this specification. However, due to the concerns of availability and the uncertainty of the final shape of that standard, this specification does not explicitly support the DCI. But the following areas have been influenced by the DCI X/Open standard proposal:
A list of DCE instrumentation requirements was provided to the authors of the DCI, for possible incorporation into the X/Open spec.
It may be desirable to collect performance measures on the four APIs themselves. The activities associated with these APIs should not be included in the totals for the process. Optionally, they should be measurable by a PMA just like any other interface. The implementations of the observer, NPCS, and the four APIs must support self-instrumentation.
This section describes several factors that influenced our design and recommendations.
The measurement infrastructure must perform efficiently over a wide range of network topologies and cell sizes. While our design supports monitoring across cells, the primary monitoring functions will align with the administrative domain of the cell. Table 2 illustrates the scale of the measurement system from a server perspective (clients are not included, although they represent a potentially larger pool). The table estimates the following quantities to gauge the demands placed on the measurement system (DCE specific terminology is used):
The number of operational sensors on a single node is large (500-8,000), and the number in a cell is very large (50,000-8,000,000 or more). (Note that transaction processing and distributed object applications may support a dozen or more interfaces. This may increase the actual number of sensors in a cell.) These estimates, however, are probably pessimistic with respect to the number of active sensors, since cells will contain a large number of different applications in different domains that are managed separately and therefore require fewer active sensors.
+---------------------+-------------+------------+ | | "Typical" | "Large" | | | Application | Application| +=====================+=============+============+ |Sensors / Operation | 10 | 20| +---------------------+-------------+------------+ |Operations / Manager | 5 | 10| +---------------------+-------------+------------+ |Managers / Interface | 1 | 1| +---------------------+-------------+------------+ |Interfaces / Server | 1 | 2| +---------------------+-------------+------------+ |Server / Node | 10 | 20| +---------------------+-------------+------------+ |Nodes / Cell | 100 | 1,000| +---------------------+-------------+------------+ |Sensors / Node | 500 | 8,000| +---------------------+-------------+------------+ |Sensors / Cell | 50,000 | 8,000,000| +---------------------+-------------+------------+ Table 2. Instrumentation Scale Considerations.
center, box, tab( | ); | |
cbw(2i) | cbw(1i) | cbw(1i) | ||
cb | cb | cb | ||
l | r | r. | ||
Typical | Large | |
Application | Application | |
= | ||
Sensors / Operation | 10 | 20 |
_ | ||
Operations / Manager | 5 | 10 |
_ | ||
Managers / Interface | 1 | 1 |
_ | ||
Interfaces / Server | 1 | 2 |
_ | ||
Server / Node | 10 | 20 |
_ | ||
Nodes / Cell | 100 | 1,000 |
_ | ||
Sensors / Node | 500 | 8,000 |
_ | ||
Sensors / Cell | 50,000 | 8,000,000 |
Having control over the sensor state is crucial for meeting measurement system overhead goals. This is accomplished by the end-user judiciously selecting the information sets for the sensors of interest. Only sensors of interest can be enabled and collected.
The above estimates do not include the number of active client sensors. This specification expects that only rarely will all clients have active instrumentation, due to excessive loading of node and network alike. To improve scalability of the measurement system it is expected that only a few clients are monitored at any time per application, in order to gather status and response times as proxies for others on the same node or in the same network. One final practical limitation for clients is that DCE does not support an identification mechanism for locating clients (only servers that register with the CDS).
A major implementation issue of the measurement system was whether to transport data by periodically pushing it across the network, or forcing PMA's to explicitly request or poll for data, similar to the SNMP philosophy. After significant discussion, it was decided to require NPCSs to push data to PMAs. Basically, the reasons why the push model was selected for implementation follows:
Since the situation is really a large number of servers (sensors) pushing to a smaller number of NPCSs (e.g., 1 per system), which in turn pushes to a very small number of PMAs (maybe 1-10 per enterprise), then pushing scales better than polling potentially thousands of sensors to find only those with new data. In fact, keeping the amount of data sent small is very important for network utilization and scalability. Pushing also allows thresholds to be used, and significantly reduces the amount of data sent, even for the largest of systems.
In the push case, the pusher needs to keep state information about all its consumers (PMAs). It needs to know who, where and when. It also needs to know if a data item has not been delivered. Moreover, only the NPCSs know exactly when the data for the PMA is available. Storing this state is simpler for NPCSs, because of the small number of PMAs registered at any one moment.
In pull, the NPCSs would not be able to ignore the state information. Since no real saving in state is possible, the push case minimizes the state for PMAs. The PMAs will get cumulative data so they won't loose information if a sample is dropped, and they can tell if a sample is dropped or stale from timestamps.
Although push is inherently serial, NPCSs can start multiple threads to push, but an NPCS thread is blocked during the push. (It may take some time for the PMA to respond.) Most important, since there are often practical limits to the number of active threads, the NPCSs would have very few active threads for push, while PMAs would have to have a large number of threads for parallel pulls. For scalability issues, NPCSs would have a limited pool of threads to push. There would normally be enough to dedicate one per PMA, but a pool would remove any hard limit.
This is an advantage since NPCS controls the flow of data, it can discard data that has been delivered to all interested parties. It also does not need to maintain a queue of requests. However, it does need to maintain a table of state information on ALL PMAs. In addition, the assumption was made that all data for a sample to a PMA would be packaged together into a single push.
Because of the need to ensure (if not guarantee) delivery of the data to PMAs, the push is at least a data ACK pair. Pulls would require one more message. In addition, to minimize traffic, only data is sent and packaged into one response per sample to the PMA. A stateless pull (like NFS) would require state information in the pull, which increases traffic.
Since the sensors have an observer thread that is pushing to the NPCS, the timing of when to send the sample data to the PMA is only precisely known to the NPCS. That makes the scheduling of the data send time easy for the NPCS. Most important, for thresholds where data is only sent when a value is exceeded, the NPCS is the ONLY place that knows when this occurs, and that a data send is required. A pull would require the NPCS to wait and collect all the information anyway.
There is still an issue for scheduling of the PMA's data reduction, and correlation with the data arriving asynchronously from many NPCSs. However, since that is the highest level of the measurement system, and is the element with the least time sensitivity in the measurement system, it was considered an acceptable requirement. There may be several receiver threads, or one simply collecting data.
For the push model, data is flowing to the PMAs from the NPCSs. By providing timestamps and cumulative data, the PMAs can deal with missing data by either extrapolating, skipping, or another make right strategy. As far as dealing with failures, the NPCSs would know who and where they were sending data to, so the lack of a PMA ACK indicates a failed PMA, which allows the NPCS to free up resources belonging to that PMA.
NOTE: Even though the steady-state system is push-based, it was decided that a polling request function would be included in the NPMI to support special PMAs. This allows flexibility for something like a pull if used infrequently. The reason this is required is for SNMP support, client-only PMAs, and PMAs that register only thresholds but have not seen any data for awhile. A pull request allows the PMA to see the current data even if no thresholds were exceeded.
This section describes sensor implementation issues and placement locations within the RPC runtime library (RTL).
The fundamental implementation question regards the placement of the sensors: Are they generated by the IDL compiler and placed in the stubs, or are they an integrated part of the DCE kernel (runtime library)?
Instrumenting the stubs using IDL has merit. Coupled with an internal tracing tool, these form very powerful application development/debugging utility. Unfortunately, for performance monitoring of arbitrary applications in a large environment, the IDL approach has several shortcomings.
First, sensors within stubs are visible to application developers, and
thus modifiable by them. This is not safe for standard functions. Sensors
within the RTL are not modifiable by the application writer. Second,
supporting standard libraries is a pragmatic software engineering
technique that minimizes implementation divergence in production
environments. It also provides extensibility without the need to
recompile an application's source code (users dislike recompilation because it
almost always causes something to break). If sensors are in the RTL,
then merely relinking the application with libdce
provides new
sensors.
The requirement to relink (instead of recompile) also makes it easier to
instrument other DCE services (CDS, Security) and middleware (Encina and
CICS).
Other issues also influenced this direction. First is a lack of control over the granularity of collection (all or nothing), and the resulting deluge of data that is generated (especially for all clients) with a stub based architecture. (The scalability of this approach is unacceptably poor in a large environments.) The RTL is dynamically configurable to collect only the minimum amount of data that is requested. Finally, the need for pervasive support of this sensor requires a standard interface to sensors. Creating a standard performance interface to a stub is problematic.
Because of these arguments, we have chosen a hybrid implementation of the standard sensors. Most are located in the RTL but some are located in the stubs to capture stub specific processing.
To minimize the amount of data transferred across the network counter and timer sensors, we support a threshold level detection mechanism. For example, a response time sensor set at the threshold would report data only when a user-configured threshold condition is TRUE (for example, when the maximum response time exceeds 20 seconds). In practice, we simplified the sensor implementation, and have the NPCS analyze the incoming data from the sensor to detect thresholds. This allows different PMAs to configure the same sensor with different threshold values, and still minimize the amount of data transported across the network. It is important to note that sensors report summarized data, thus the threshold detection is based on integrated values (mean, minimum or maximum) over a sampling interval.
Two distinct timer sensors, each with a different granularity, were proposed: seconds and nanoseconds. This will provide sufficient resolution, and future growth for the next 5-7 years. Note that overflow concerns may require that sum-of-squares terms have a coarser granularity.
To ensure timer resolution and efficient timestamp access, the spec
defines a function that returns the time from the host OS with the proper
granularity, and is implemented as efficiently as possible (this
eliminates the problems with the POSIX gettimeofday()
function). This implementation-specific routine is described in section
6.4.
IDL pickling is used to support pass-thru sensors. This results in several advantages:
The use of pickling results in several issues:
The following items have been deferred for a future working group:
dms_pri_register_sensor()
to allow a process to specify its
minimum data protection level, to automatically control the RPC data
protection level used for PMA and NPCS communication. This feature
eases system administration by allowing application clients or servers
to establish the protection level during the development phase.
A validation suite is required to ensure the correctness of the initial implementation of the sensors, and to provide a test case to demonstrate future correctness. Furthermore, an interoperability test for the interfaces is required to ensure interface compatibility.
This document is the result of many individuals who contributed their time and expertise.
Rich Friedrich, Joe Martinka, Steve Saunders, Gary Zaidenweber, Tracy Sienknecht, Dave Glover (Hewlett-Packard Company).
Dave Bachmann, Ellen Stokes, Robert Berry (International Business Machines, Inc.).
Barry Wolman, Dimitris Varotsis, David Van Ryzin (Transarc).
Sarr Blumson (CITI (Center for Information Technology Integration), University of Michigan).
Art Gaylord (Project Pilgrim, University of Massachusetts).
[ version(2.2) ] interface dms_binding /* * This interface defines the data structures used to represent * relationships between entities (sensors/processes/nodes) within * DMS. Some are "transparent", meaning that a user of that * structure can manipulate its contents. Some are "opaque", meaning * that only the creating entity can manipulate its contents. */ { /* TRANSPARENT BINDING TYPES */ typedef [string] unsigned char dms_string_t[]; typedef unsigned long dms_protect_level_t; /*see rpc.h*/ typedef [string] unsigned char dms_string_binding_t[]; /* OPAQUE BINDING TYPES */ typedef unsigned long dms_pma_index_t; typedef unsigned long dms_npcs_index_t; typedef unsigned long dms_process_index_t; typedef unsigned long dms_sensor_id_t; typedef struct dms_sensor_ids { unsigned long count; [size_is(count)] dms_sensor_id_t ids[]; } dms_sensor_ids_t; }
[ version(2.3), pointer_default(ptr) ] interface dms_config /* * This interface defines the sensor configuration data structures * for specifying the configuration of individual sensors. */ { import "dms_binding.idl", "dms_data.idl", "dms_status.idl"; const unsigned long dms_NO_METRIC_COLLECTION = 0; const unsigned long dms_THRESHOLD_CHECKING = 0x00000001; const unsigned long dms_COLLECT_MIN_MAX = 0x00000002; const unsigned long dms_COLLECT_TOTAL = 0x00000004; const unsigned long dms_COLLECT_COUNT = 0x00000008; const unsigned long dms_COLLECT_SUM_SQUARES = 0x00000010; const unsigned long dms_COLLECT_SUM_CUBES = 0x00000020; const unsigned long dms_COLLECT_SUM_X_TO_4TH = 0x00000040; const unsigned long dms_CUSTOM_INFO_SET = 0x80000000; typedef unsigned long dms_info_set_t; typedef struct dms_threshold_values { dms_datum_t lower_value; dms_datum_t upper_value; } dms_threshold_values_t; typedef union dms_threshold switch (boolean have_values) { case TRUE: dms_threshold_values_t values; case FALSE: ; } dms_threshold_t; typedef struct dms_config { dms_sensor_id_t sensor_id; dms_timevalue_t reporting_interval; /*0 == infinite*/ dms_info_set_t info_set; dms_threshold_t* threshold; error_status_t status; } dms_config_t; typedef struct dms_configs { unsigned long count; [size_is(count)] dms_config_t config[]; } dms_configs_t; }
[ version(2.2), pointer_default(ptr) ] interface dms_data /* * This interface defines the data structures that represent the * (sensor & attribute) data values communicated through DMS. */ { import "dms_binding.idl", "dms_status.idl"; typedef struct dms_opaque { unsigned long size; [size_is(size)] byte bytes[]; } dms_opaque_t; typedef enum { dms_LONG, dms_HYPER, dms_FLOAT, dms_DOUBLE, dms_BOOLEAN, dms_CHAR, dms_STRING, dms_BYTE, dms_OPAQUE, dms_DATA_STATUS } dms_datum_type_t; typedef union dms_datum switch (dms_datum_type_t type) { case dms_LONG: long long_v; case dms_HYPER: hyper hyper_v; case dms_FLOAT: float float_v; case dms_DOUBLE: double double_v; case dms_BOOLEAN: boolean boolean_v; case dms_CHAR: char char_v; case dms_STRING: dms_string_t *string_p; case dms_BYTE: byte byte_v; case dms_OPAQUE: dms_opaque_t *opaque_p; case dms_DATA_STATUS: error_status_t status_v; } dms_datum_t; typedef struct dms_sensor_data { dms_sensor_id_t sensor_id; unsigned long count; [size_is(count)] dms_datum_t sensor_data[]; } dms_sensor_data_t; typedef struct dms_timevalue { unsigned long sec; unsigned long usec; } dms_timevalue_t; typedef struct dms_observation_data { dms_timevalue_t end_timestamp; unsigned long count; [size_is(count)] dms_sensor_data_t* sensor[]; } dms_observation_data_t; typedef struct dms_observations_data { unsigned long count; [size_is(count)] dms_observation_data_t* observation[]; } dms_observations_data_t; }
[ uuid(5e542624-e9d6-11cd-a3a9-080009273eb9), version(2.2), pointer_default(ptr) ] interface dms_naming /* * This interface defines the data structures that represent the dms * namespace. There are two forms of names that can be represented, * a simple string only form and a fully decorated form. */ { import "dms_binding.idl", "dms_data.idl", "dms_status.idl"; typedef struct dms_name_node* dms_name_node_p_t; typedef struct dms_name_nodes { unsigned long count; [size_is(count)] dms_name_node_p_t names[]; } dms_name_nodes_t; typedef struct dms_name_node { dms_string_t* name; /*"*" == wildcard*/ dms_name_nodes_t children; } dms_name_node_t; typedef struct dms_attr { dms_string_t* attr_name; dms_datum_t attr_value; } dms_attr_t; typedef struct dms_attrs { unsigned long count; [size_is(count)] dms_attr_t* attrs[]; } dms_attrs_t; typedef struct dms_sensor { dms_sensor_id_t sensor_id; dms_attrs_t* attributes; unsigned short count; [size_is(count)] small metric_id[]; } dms_sensor_t; typedef struct dms_instance_leaf { unsigned long count; [size_is(count)] dms_sensor_t* sensors[]; } dms_instance_leaf_t; typedef struct dms_instance_node* dms_instance_node_p_t; typedef struct dms_instance_dir { unsigned long count; [size_is(count)] dms_instance_node_p_t children[]; } dms_instance_dir_t; typedef enum { dms_DIRECTORY, dms_LEAF, dms_NAME_STATUS } dms_select_t; typedef union dms_instance_data switch (dms_select_t data_type) { case dms_DIRECTORY: dms_instance_dir_t* directory; case dms_LEAF: dms_instance_leaf_t* leaf; case dms_NAME_STATUS: error_status_t status; } dms_instance_data_t; typedef struct dms_instance_node { dms_string_t* name; dms_datum_t* alternate_name; dms_instance_data_t data; } dms_instance_node_t; }
[ uuid(e8f6e46e-e9d7-11cd-be13-080009273eb9), version(2.2), pointer_default(ptr) ] interface dms_npmi /* * This interface defines the operations provided to a PMA by a NPCS. * The interface can by utilized by two styles of PMA, full-function * and client-only PMA. A full function PMA must support the * dms_npri interface, and can either have sensor data pushed to it, * or pull sensor data from a NPCS. The client-only PMA (COP) will * not support the dms_npri interface, and must pull sensor data from * a NPCS. */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl", "dms_config.idl", "dms_naming.idl"; error_status_t dms_npmi_register_pma ( [in ] handle_t handle, [in,ptr] dms_string_binding_t* npri_binding, /*null == client-only PMA*/ [in ] dms_npcs_index_t npcs_index, [in ] dms_protect_level_t requested_protect, [ out] dms_pma_index_t* pma_index, [ out] dms_protect_level_t* granted_protect ); [idempotent] error_status_t dms_npmi_get_registry ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,ptr] dms_name_nodes_t* request_list, /*null == entire registry*/ [in ] long depth_limit, /*0 == infinity*/ [ out] dms_instance_dir_t** registry_list ); error_status_t dms_npmi_set_sensor_config ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in,out] dms_configs_t** sensor_configs ); error_status_t dms_npmi_get_sensor_data ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index, [in ] dms_sensor_ids_t* sensor_id_list, [in ] boolean bypass_cache, [ out] dms_observations_data_t** sensor_data ); error_status_t dms_npmi_unregister_pma ( [in ] handle_t handle, [in ] dms_pma_index_t pma_index ); }
[ uuid(ee7599b2-e9d7-11cd-8e49-080009273eb9), version(2.2), pointer_default(ptr) ] interface dms_npri /* * This interface defines the operation provided to a NPCS by a PMA * to received sensor data from that NPCS. This interface is not * provided by a client- only PMA (COP). */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl"; [idempotent] error_status_t dms_npri_report_sensor_data ( [in ] handle_t handle, [in ] dms_npcs_index_t npcs_index, [in,ptr] dms_observations_data_t* sensor_data /*null == keep-alive*/ ); }
[ local, version(2.2) ] interface dms_pmi /* * This interface defines the operations provided to a NPCS by the * encapsulating library (npcs_lib). Additionally the operations * that must be provided to npcs_lib by a NPCS are specified. */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl", "dms_config.idl", "dms_naming.idl"; typedef [ref] error_status_t (*dms_pri_reg_proc_fp_t) ( [in] dms_string_t* process_name, [in] long process_pid, [out] dms_process_index_t* process_index ); typedef [ref] error_status_t (*dms_pri_reg_sensor_fp_t) ( [in] dms_process_index_t process_index, [in] dms_protect_level_t min_protect_level, [in,out] dms_instance_dir_t** sensor_register_list ); typedef [ref] error_status_t (*dms_pri_report_data_fp_t) ( [in] dms_process_index_t process_index, [in] dms_observation_data_t* sensor_report_list ); typedef [ref] error_status_t (*dms_pri_unreg_sensor_fp_t) ( [in] dms_process_index_t process_index, [in] dms_sensor_ids_t* sensor_id_list ); typedef [ref] error_status_t (*dms_pri_unreg_proc_fp_t) ( [in] dms_process_index_t process_index ); /* * The following functions are needed to encapsulated the dms_pmi and * dms_pri interfaces in a library (npcs_lib). */ error_status_t dms_pmi_el_initialize ( [in ] dms_pri_reg_proc_fp_t pri_register_process, [in ] dms_pri_reg_sensor_fp_t pri_register_sensor, [in ] dms_pri_report_data_fp_t pri_report_sensor_data, [in ] dms_pri_unreg_sensor_fp_t pri_unregister_sensor, [in ] dms_pri_unreg_proc_fp_t pri_unregister_process ); error_status_t dms_pmi_el_free_outputs ( [in,ptr] dms_configs_t* sensor_config_list, /*null == absent*/ [in,ptr] dms_observation_data_t* sensor_report_list /*null == absent*/ ); /* * The following functions provide the basic dms_pmi functionality. */ error_status_t dms_pmi_set_sensor_config ( [in ] dms_process_index_t process_index, [in,out] dms_configs_t** sensor_config_list ); error_status_t dms_pmi_get_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list, [ out] dms_observation_data_t** sensor_report_list ); error_status_t dms_pmi_terminate ( void ); }
[ local, version(2.3) ] interface dms_pri /* * This interface defines the operations provided to an instrumented * process by the encapsulating library (observer_lib). Additionally * the operations that must be provided to observer_lib by an * instrumented process are specified. */ { import "dms_status.idl", "dms_binding.idl", "dms_data.idl", "dms_config.idl", "dms_naming.idl"; typedef [ref] error_status_t (*dms_pmi_set_config_fp_t) ( [in] dms_process_index_t process_index, [in,out] dms_configs_t** sensor_configs ); typedef [ref] error_status_t (*dms_pmi_get_data_fp_t) ( [in] dms_process_index_t process_index, [in] dms_sensor_ids_t* sensor_id_list, [out] dms_observation_data_t** sensor_report_list ); typedef [ref] error_status_t (*dms_pmi_terminate_fp_t) ( void ); /* * The following functions are needed to encapsulated the dms_pri and * dms_pmi interfaces in a library (observer_lib). */ error_status_t dms_pri_el_initialize ( [in ] dms_pmi_set_config_fp_t pmi_set_sensor_config, [in ] dms_pmi_get_data_fp_t pmi_get_sensor_data, [in ] dms_pmi_terminate_fp_t pmi_terminate ); error_status_t dms_pri_el_free_outputs ( [in,ptr] dms_instance_dir_t* sensor_register_list /*null == absent*/ ); /* * The following functions provide the basic dms_pri functionality. */ error_status_t dms_pri_register_process ( [in ] dms_string_t* process_name, [in ] long process_pid, [ out] dms_process_index_t* process_index ); error_status_t dms_pri_register_sensor ( [in ] dms_process_index_t process_index, [in,out] dms_instance_dir_t** sensor_register_list ); error_status_t dms_pri_report_sensor_data ( [in ] dms_process_index_t process_index, [in ] dms_observation_data_t* sensor_report_list ); /*Note: return (status) may correspond to previous call!*/ error_status_t dms_pri_unregister_sensor ( [in ] dms_process_index_t process_index, [in ] dms_sensor_ids_t* sensor_id_list ); error_status_t dms_pri_unregister_process ( [in ] dms_process_index_t process_index ); }
[ version(2.4) ] interface dms_status /* * This interface defines the set of (resulting) status values for * all the operations and data structures defined in DMS. */ { import "dce/nbase.idl"; const error_status_t dms_STATUS_BASE = 0x114b2001; const error_status_t dms_STATUS_OK = error_status_ok; const error_status_t dms_NOT_IMPLEMENTED = dms_STATUS_BASE + 0; const error_status_t dms_UNKNOWN_SENSOR = dms_STATUS_BASE + 1; const error_status_t dms_UNKNOWN_PROCESS = dms_STATUS_BASE + 2; const error_status_t dms_UNKNOWN_INFO_SET = dms_STATUS_BASE + 3; const error_status_t dms_UNKNOWN_THRESHOLD_LEVEL = dms_STATUS_BASE + 4; const error_status_t dms_UNKNOWN_NPCS = dms_STATUS_BASE + 5; const error_status_t dms_UNKNOWN_PMA = dms_STATUS_BASE + 6; const error_status_t dms_ILLEGAL_NAME = dms_STATUS_BASE + 7; const error_status_t dms_ILLEGAL_METRIC = dms_STATUS_BASE + 8; const error_status_t dms_ILLEGAL_SENSORID = dms_STATUS_BASE + 9; const error_status_t dms_ILLEGAL_VALUE = dms_STATUS_BASE + 10; const error_status_t dms_ILLEGAL_BINDING = dms_STATUS_BASE + 11; const error_status_t dms_SENSOR_CONFIG_CONFLICT = dms_STATUS_BASE + 12; const error_status_t dms_SENSOR_NOT_CONFIGURED = dms_STATUS_BASE + 13; const error_status_t dms_SENSOR_NOT_MODIFIED = dms_STATUS_BASE + 14; const error_status_t dms_DUPLICATE_SENSOR = dms_STATUS_BASE + 15; const error_status_t dms_NO_SENSOR_REQUESTED = dms_STATUS_BASE + 16; const error_status_t dms_NO_NPCS = dms_STATUS_BASE + 17; const error_status_t dms_NO_THRESHOLD = dms_STATUS_BASE + 18; const error_status_t dms_REPORT_FAILED = dms_STATUS_BASE + 19; const error_status_t dms_FUNCTION_FAILED = dms_STATUS_BASE + 20; const error_status_t dms_NOT_REGISTERED = dms_STATUS_BASE + 21; const error_status_t dms_REGISTER_FAILED = dms_STATUS_BASE + 22; const error_status_t dms_ALREADY_REGISTERED = dms_STATUS_BASE + 23; const error_status_t dms_PROTECT_LEVEL_NOT_SUPPORTED = dms_STATUS_BASE + 24; const error_status_t dms_BYPASS_NOT_ALLOWED = dms_STATUS_BASE + 25; const error_status_t dms_NO_OUTPUTS_FREED = dms_STATUS_BASE + 26; const error_status_t dms_CHECK_INTERNAL_STATUS = dms_STATUS_BASE + 27; const error_status_t dms_BAD_STATUS = dms_STATUS_BASE + 28; }
Rich Friedrich | Internet email: richf@hpl.hp.com | |
Hewlett-Packard Company | Telephone: +1-415-857-1501 | |
1501 Page Mill Road, Mailstop 1U-14 | ||
Palo Alto, CA 94304 | ||
USA |
| |
Steve Saunders | Internet email: saunders@cup.hp.com | |
Hewlett-Packard Company | Telephone: +1-408-725-8900 | |
11000 Wolfe Road, Mailstop 42U | ||
Cupertino, CA 95014 | ||
USA |
| |
Gary Zaidenweber | Internet email: gaz@ch.hp.com | |
Hewlett-Packard Company | Telephone: +1-508-256-6600 | |
300 Apollo Drive | ||
Chelmsford, MA 01824 | ||
USA |
| |
Dave Bachmann | Internet email: bachmann@austin.ibm.com | |
International Business Machines, Inc. | Telephone: +1-512-838-3170 | |
11500 Burnet Road, MS 9132 | ||
Austin, TX 78758 | ||
USA |
| |
Sarr Blumson | Internet email: sarr@citi.umich.edu | |
CITI, University of Michigan | Telephone: +1-313-764-0253 | |
519 W William | ||
Ann Arbor, MI 48103 | ||
USA |
|