aerospike_helpers.metrics package

Classes used for metrics.

ConnectionStats, NamespaceMetrics, Node, and Cluster do not have a constructor because they are not meant to be created by the user. They are only meant to be returned from MetricsListeners callbacks for reading data about the server and client.

NodeStats and ClusterStats also do not have a constructor because they are meant to be returned using a Python client API method.

class aerospike_helpers.metrics.Cluster

Bases: object

Cluster of server nodes.

cluster_name

Expected cluster name for all nodes. May be None.

Type:: Optional[str]

invalid_node_count

Count of add node failures in the most recent cluster tend iteration.

Type:: int

command_count

Command count. The value is cumulative and not reset per metrics interval.

Type:: int

retry_count

Command retry count. There can be multiple retries for a single command. The value is cumulative and not reset per metrics interval.

Type:: int

nodes

Active nodes in cluster.

Type:: list[Node]

__weakref__: list of weak references to the object

class aerospike_helpers.metrics.ClusterStats

Bases: object

Cluster statistics.

nodes

Statistics for all nodes.

Type:: list[aerospike_helpers.metrics.NodeStats]

retry_count

Count of command retries since cluster was started.

Type:: int

thread_pool_queued_tasks

Count of sync batch/scan/query tasks awaiting execution. If the count is greater than zero, then all threads in the thread pool are active.

Type:: int

recover_queue_size

Count of sync sockets currently in timeout recovery.

Type:: int

__weakref__: list of weak references to the object

class aerospike_helpers.metrics.ConnectionStats

Bases: object

Connection statistics.

in_use

Connections actively being used in database commands on this node. There can be multiple pools per node. This value is a summary of those pools on this node.

Type:: int

in_pool

Connections residing in pool(s) on this node. There can be multiple pools per node. This value is a summary of those pools on this node.

Type:: int

opened

Total number of node connections opened since node creation.

Type:: int

closed

Total number of node connections closed since node creation.

Type:: int

recovered

Total number of recovered connections since node creation. A recovered connection is a connection that timed out on a socket read and then independently drained (read all incoming data) so the connection can be put back into the connection pool. The recovery process is attempted when the timeout_delay policy is greater than zero.

Type:: int

aborted

Total number of aborted connections since node creation. An aborted connection is a connection that timed out on a socket read and the drain (read all incoming data) failed. The drain failure is mostly likely due a downed node and results in the connection being closed. The recovery process is attempted when the timeout_delay policy is greater than zero.

Type:: int

__weakref__: list of weak references to the object

class aerospike_helpers.metrics.MetricsListeners(enable_listener: Callable[[], None], snapshot_listener: Callable[[Cluster], None], node_close_listener: Callable[[Node], None], disable_listener: Callable[[Cluster], None])

Bases: object

Metrics listener callbacks.

All callbacks must be set.

enable_listener

Periodic extended metrics has been enabled for the given cluster.

Type:: Callable[[], None]

snapshot_listener

A metrics snapshot has been requested for the given cluster.

Type:: Callable[[Cluster], None]

node_close_listener

A node is being dropped from the cluster.

Type:: Callable[[Node], None]

disable_listener

Periodic extended metrics has been disabled for the given cluster.

Type:: Callable[[Cluster], None]

__init__(enable_listener: Callable[[], None], snapshot_listener: Callable[[Cluster], None], node_close_listener: Callable[[Node], None], disable_listener: Callable[[Cluster], None])

__weakref__: list of weak references to the object

class aerospike_helpers.metrics.MetricsPolicy(metrics_listeners: MetricsListeners | None = None, report_dir: str = '.', report_size_limit: int = 0, interval: int = 30, latency_columns: int = 7, latency_shift: int = 1, labels: dict[str, str] = {})

Bases: object

Client periodic metrics configuration.

metrics_listeners

Listeners that handles metrics notification events. If set to None, the default listener implementation will be used, which writes the metrics snapshot to a file which can later be read and forwarded to OpenTelemetry by a separate offline application. Otherwise, use all listeners set in the class instance.

The listener could be overridden to send the metrics snapshot directly to OpenTelemetry.

Type:: Optional[MetricsListeners]

report_dir

Directory path to write metrics log files for listeners that write logs.

Type:: str

report_size_limit

Metrics file size soft limit in bytes for listeners that write logs. When report_size_limit is reached or exceeded, the current metrics file is closed and a new metrics file is created with a new timestamp. If report_size_limit is zero, the metrics file size is unbounded and the file will only be closed when disable_metrics() or close() is called.

Type:: int

interval

Number of cluster tend iterations between metrics notification events. One tend iteration is defined as "tend_interval" in the client config plus the time to tend all nodes.

Type:: int

latency_columns

Number of elapsed time range buckets in latency histograms.

Type:: int

latency_shift

Power of 2 multiple between each range bucket in latency histograms starting at column 3. The bucket units are in milliseconds. The first 2 buckets are “<=1ms” and “>1ms”.

Type:: int

labels

List of name/value labels that is applied when exporting metrics.

Example:

# latencyColumns=7 latencyShift=1
# <=1ms >1ms >2ms >4ms >8ms >16ms >32ms

# latencyColumns=5 latencyShift=3
# <=1ms >1ms >8ms >64ms >512ms

Type:: dict[str, str]

__init__(metrics_listeners: MetricsListeners | None = None, report_dir: str = '.', report_size_limit: int = 0, interval: int = 30, latency_columns: int = 7, latency_shift: int = 1, labels: dict[str, str] = {})

__weakref__: list of weak references to the object

class aerospike_helpers.metrics.NamespaceMetrics

Bases: object

Namespace metrics.

Each command group has its own histogram (i.e list of latency buckets). Latency histogram counts are cumulative and not reset on each metrics snapshot interval.

ns

namespace

Type:: str

bytes_in

Bytes received from the server.

Type:: int

bytes_out

Bytes sent to the server.

Type:: int

error_count

Command error count since node was initialized. If the error is retryable, multiple errors per command may occur.

Type:: int

timeout_count

Command timeout count since node was initialized. If the timeout is retryable (i.e socket_timeout), multiple timeouts per command may occur.

Type:: int

key_busy_count

Command key busy error count since node was initialized.

Type:: int

conn_latency

Type:: list[int]

write_latency

Type:: list[int]

read_latency

Type:: list[int]

batch_latency

Type:: list[int]

query_latency

Type:: list[int]

__weakref__: list of weak references to the object

class aerospike_helpers.metrics.Node

Bases: object

Server node representation.

name

The name of the node.

Type:: str

address

The IP address / host name of the node (not including the port number).

Type:: str

port

Port number of the node’s address.

Type:: int

conns

Synchronous connection stats on this node.

Type:: ConnectionStats

metrics

Node/namespace metrics

Type:: list[NamespaceMetrics]

__weakref__: list of weak references to the object

class aerospike_helpers.metrics.NodeStats

Bases: object

Node statistics.

name

The name of the node.

Type:: str

address

The IP address / host name of the node (not including the port number).

Type:: str

port

Port number of the node’s address.

Type:: int

conns

Synchronous connection stats on this node.

Type:: aerospike_helpers.metrics.ConnectionStats

error_count

Command error count since node was initialized. If the error is retryable, multiple errors per command may occur.

Type:: int

timeout_count

Command timeout count since node was initialized. If the timeout is retryable (i.e socket_timeout), multiple timeouts per command may occur.

Type:: int

key_busy_count

Command key busy error count since node was initialized.

Type:: int

__weakref__: list of weak references to the object