This draft also defines a Diagnostics Usage, which can be used to
obtain diagnostic information about a peer in the overlay. The
Diagnostics Usage is interesting both to administrators monitoring
the overlay as well as to some overlay algorithms that base their
decisions on capabilities and current load of nodes in the
overlay.
defines a diagnostic
usage for obtaining information about node performance.
The Diagnostic Usage allows a node to report various statistics about
itself that may be useful for diagnostics or performance management. It
can be used to discover information such as the software version,
uptime, routing table, stored resource-objects, and performance
statistics of a peer. The usage defines several new kinds which can be
retrieved to get the statistics and also allows to retrieve other kinds
that a node stores. In essence, the usage allows querying a node's state
such as storage and network to obtain the relevant information.
Additional diagnostic capabilities have been proposed in .
The access control model for all kinds is a local policy defined by
the peer or the overlay policy. The peer may be configured with a list
of users that it is willing to return the information for and restrict
access to users with that name. Unless specific policy overrides it,
data SHOULD NOT be returned for users not on the list. The access
control can also be determined on a per kind basis - for example, a node
may be willing to return the software version to any users while
specific information about performance may not be returned.
TODO - need to explain how this is addressed to node-id. [TODO: Do we
need a DIAGNOSTIC method? Access control mechanisms for DIAGNOSTIC may
be different from a Fetch.]
The following kinds are defined:
A single value element containing
an unsigned 32-bit integer representing the number of peers in the
peer's routing table.
A single value element containing a
US-ASCII string that identifies the manufacture, model, and version
of the software.
A single value element containing an
unsigned 64-bit integer specifying the time the nodes has been up in
seconds.
A single value element containing an
unsigned 64-bit integer specifying the time the p2p application has
been up in seconds.
A single value element containing an
unsigned 32-bit integer representing the memory footprint of the
peer program in kilo bytes.
What's a kilo byte? 1000 or 1024? --
Cullen
Good question. 1000 seems like not quite
enough room but 1024 is too much? -- EKR
An unsigned 64-bit integer
representing the number of bytes of data being stored by this
node.
An array element containing the
number of instances of each kind stored. The array is index by
Kind-ID. Each entry is an unsigned 64-bit integer.
An array element containing the
number of messages sent and received. The array is indexed by method
code. Each entry in the array is a pair of unsigned 64-bit integers
(packed end to end) representing sent and received.
A single value element containing an
unsigned 32-bit integer representing an exponential weighted average
of bytes sent per second by this peer.
sent = alpha x sent_present + (1 - alpha) x sent
where sent_present represents the bytes sent per second since the
last calculation and sent represents the last calculation of bytes
sent per second. A suitable value for alpha is 0.8. This value is
calculated every five seconds.
A single value element containing an
unsigned 32-bit integer representing an exponential weighted average
of bytes received per second by this peer. Same calculation as
above.
[[TODO: We would like some sort of bandwidth measurement, but we're
kind of unclear on the units and representation.]]
(OPEN QUESTION: any other metrics?)
Below, we sketch how these metrics can be used. A peer can use
EWMA_BYTES_SENT and EWMA_BYTES_RCVD of another peer to infer whether
it is acting as a media relay. It may then choose not to forward any
requests for media relay to this peer. Similarly, among the various
candidates for filling up routing table, a peer may prefer a peer with
a large UPTIME value, small RTT, and small LAST_CONTACT value.