Prometheus Metrics for On-prem Users
BuildBuddy exposes Prometheus metrics that allow monitoring the four golden signals: latency, traffic, errors, and saturation.
Prometheus metrics are exposed under the path metrics/
on port 9090
by default.
To view these metrics in a live-updating dashboard, we recommend using a tool like Grafana.
Invocation build event metrics
All invocation metrics are recorded at the end of each invocation.
buildbuddy_invocation_count
(Counter)
The total number of invocations whose logs were uploaded to BuildBuddy.
Labels
- invocation_status: Invocation status:
success
,failure
,disconnected
, orunknown
. - bazel_exit_code: Exit code of a completed bazel command
- bazel_command: Command provided to the Bazel daemon:
run
,test
,build
,coverage
,mobile-install
, ...
Examples
# Number of invocations per second by invocation status
sum by (invocation_status) (rate(buildbuddy_invocation_count[5m]))
# Invocation success rate
sum(rate(buildbuddy_invocation_count{invocation_status="success"}[5m]))
/
sum(rate(buildbuddy_invocation_count[5m]))
buildbuddy_invocation_duration_usec
(Histogram)
The total duration of each invocation, in microseconds.
Labels
- invocation_status: Invocation status:
success
,failure
,disconnected
, orunknown
. - bazel_command: Command provided to the Bazel daemon:
run
,test
,build
,coverage
,mobile-install
, ...
Examples
# Median invocation duration in the past 5 minutes
histogram_quantile(
0.5,
sum(rate(buildbuddy_invocation_duration_usec_bucket[5m])) by (le)
)
buildbuddy_invocation_open_streams
(Gauge)
Number of build event streams currently being handled by the server.
buildbuddy_invocation_build_event_count
(Counter)
Number of build events uploaded to BuildBuddy.
Labels
- status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
Examples
# Build events uploaded per second
sum(rate(buildbuddy_invocation_build_event_count[5m]))
# Approximate error rate of build event upload handler
sum(rate(buildbuddy_invocation_build_event_count{status="0"}[5m]))
/
sum(rate(buildbuddy_invocation_build_event_count[5m]))
buildbuddy_invocation_stats_recorder_workers
(Gauge)
Number of invocation stats recorder workers currently running.
buildbuddy_invocation_stats_recorder_duration_usec
(Histogram)
How long it took to finalize an invocation's stats, in microseconds.
This includes the time required to wait for all BuildBuddy apps to flush their local metrics to Redis (if applicable) and then record the metrics to the DB.
buildbuddy_invocation_webhook_invocation_lookup_workers
(Gauge)
Number of webhook invocation lookup workers currently running.
buildbuddy_invocation_webhook_invocation_lookup_duration_usec
(Histogram)
How long it took to lookup an invocation before posting to the webhook, in microseconds.
buildbuddy_invocation_webhook_notify_workers
(Gauge)
Number of webhook notify workers currently running.
buildbuddy_invocation_webhook_notify_duration_usec
(Histogram)
How long it took to post an invocation proto to the webhook, in microseconds.
Remote cache metrics
NOTE: Cache metrics are recorded at the end of each invocation, which means that these metrics provide approximate real-time signals.
buildbuddy_remote_cache_events
(Counter)
Number of cache events handled.
Labels
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage. - cache_event_type: Cache event type:
hit
,miss
, orupload
. - group_id: Group (organization) ID associated with the request.
buildbuddy_remote_cache_download_size_bytes
(Histogram)
Number of bytes downloaded from the remote cache in each download.
Use the _sum
suffix to get the total downloaded bytes and the _count
suffix to get the number of downloaded files.
Labels
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage. - server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"
Examples
# Cache download rate (bytes per second)
sum(rate(buildbuddy_cache_download_size_bytes_sum[5m]))
buildbuddy_remote_cache_download_duration_usec
(Histogram)
Download duration for each file downloaded from the remote cache, in microseconds.
Labels
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage.
Examples
# Median download duration for content-addressable store (CAS)
histogram_quantile(
0.5,
sum(rate(buildbuddy_remote_cache_download_duration_usec{cache_type="cas"}[5m])) by (le)
)
buildbuddy_remote_cache_upload_size_bytes
(Histogram)
Number of bytes uploaded to the remote cache in each upload.
Use the _sum
suffix to get the total uploaded bytes and the _count
suffix to get the number of uploaded files.
Labels
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage. - server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"
Examples
# Cache upload rate (bytes per second)
sum(rate(buildbuddy_cache_upload_size_bytes_sum[5m]))
buildbuddy_remote_cache_upload_duration_usec
(Histogram)
Upload duration for each file uploaded to the remote cache, in microseconds.
Labels
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage.
Examples
# Median upload duration for content-addressable store (CAS)
histogram_quantile(
0.5,
sum(rate(buildbuddy_remote_cache_upload_duration_usec{cache_type="cas"}[5m])) by (le)
)
buildbuddy_remote_cache_disk_cache_last_eviction_age_usec
(Gauge)
The age of the item most recently evicted from the cache, in microseconds.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_eviction_age_msec
(Histogram)
Age of items evicted from the cache, in milliseconds.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_num_evictions
(Counter)
Number of items evicted.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_partition_size_bytes_evicted
(Counter)
Number of bytes in the partition evicted.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_partition_size_bytes
(Gauge)
Number of bytes in the partition.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_partition_group_size_bytes
(Gauge)
Number of bytes in the partition, by group ID.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_cache_disk_cache_partition_capacity_bytes
(Gauge)
Number of bytes in the partition.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_partition_num_items
(Gauge)
Number of items in the partition.
Labels
- partition_id: The ID of the disk cache partition this event applied to.
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage.
buildbuddy_remote_cache_disk_cache_duplicate_writes
(Counter)
Number of writes for digests that already exist.
Labels
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_added_file_size_bytes
(Histogram)
Size of artifacts added to the file cache, in bytes.
Labels
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_filesystem_total_bytes
(Gauge)
Total size of the underlying filesystem.
Labels
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_disk_cache_filesystem_avail_bytes
(Gauge)
Available bytes in the underlying filesystem.
Labels
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
Examples
# Total number of duplicate writes.
sum(buildbuddy_remote_cache_duplicate_writes)
buildbuddy_remote_cache_disk_cache_duplicate_writes_bytes
(Counter)
Number of bytes written that already existed in the cache.
Labels
- cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
buildbuddy_remote_cache_distributed_cache_peer_lookups
(Histogram)
Number of peers consulted (including the 'local peer') for a distributed cache read before returning a response.
For batch requests, one observation is recorded for each digest in the request.
Labels
- op: Distributed cache operation name, such as "FindMissing" or "Get".
- cache_status: Cache lookup result - One of: - "hit" - "miss" - "partial" (for batched RPCs where part of a request was cached) - Or "uncacheable" (for e.g. encrypted resources)
buildbuddy_remote_cache_migration_not_found_error_count
(Counter)
Number of not found errors from the destination cache during a cache migration.
Labels
- type: Describes the type of cache request
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_cache_migration_double_read_hit_count
(Counter)
Number of double reads where the source and destination caches hold the same digests during a cache migration.
Labels
- type: Describes the type of cache request
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_cache_migration_copy_chan_size
(Gauge)
Number of digests queued to be copied during a cache migration.
buildbuddy_remote_cache_migration_bytes_copied
(Counter)
Number of bytes copied from the source to destination cache during a cache migration.
Labels
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage. - group_id: Group (organization) ID associated with the request.
buildbuddy_remote_cache_migration_blobs_copied
(Counter)
Number of blobs copied from the source to destination cache during a cache migration.
Labels
- cache_type: Cache type:
action
for action cache,cas
for content-addressable storage. - group_id: Group (organization) ID associated with the request.
buildbuddy_remote_cache_tree_cache_lookup_count
(Counter)
Total number of TreeCache lookups.
Labels
- status: The TreeCache status: hit/miss/invalid_entry.
- level: TreeCache directory depth: 0 for the root dir, 1 for a direct child of the root dir, and so on.
buildbuddy_remote_cache_tree_cache_split_lookup_count
(Counter)
Total number of TreeCache split lookups.
Labels
- status: The TreeCache split lookup status: hit/miss/failure
buildbuddy_remote_cache_tree_cache_split_write_count
(Counter)
Total number of splits written to TreeCache.
buildbuddy_remote_cache_tree_cache_set_count
(Counter)
Total number of TreeCache sets.
Labels
- status: The TreeCache set status: success/deadline_exceeded/other_error
buildbuddy_remote_cache_tree_cache_bytes_transferred
(Counter)
Number of bytes written or read from tree cache
Labels
- op: TreeCache operation "read" or "write"
buildbuddy_remote_execution_get_tree_directory_lookup_count
(Counter)
Number of directories fetched by GetTree calls, split by where the directory was found.
Labels
- location: Where a directory from a GetTree request was found: one of "uncached", "filecache", "remote"
buildbuddy_remote_execution_get_tree_filecache_trees_read
(Counter)
Number of trees read from the local filecache.
buildbuddy_remote_execution_get_tree_filecache_bytes_read
(Counter)
Total size in bytes of trees read from the local filecache.
buildbuddy_remote_execution_get_tree_filecache_trees_written
(Counter)
Number of trees written to the local filecache.
buildbuddy_remote_execution_get_tree_filecache_bytes_written
(Counter)
Total size in bytes of trees written to the local filecache.
buildbuddy_remote_cache_lookaside_cache_lookup_count
(Counter)
Total number of Lookaside Cache lookups.
Labels
- status: The Lookaside cache status: hit/miss.
buildbuddy_remote_cache_lookaside_cache_eviction_age_msec
(Histogram)
Age of items evicted from the cache, in milliseconds.
Labels
- eviction_reason: The reason an item was evicted from the lookaside cache. One of: "expired" or "size"
Remote execution metrics
buildbuddy_remote_execution_count
(Counter)
Number of actions executed remotely.
This only includes actions which reached the execution phase. If an action fails before execution (for example, if it fails authentication) then this metric is not incremented.
Labels
- exit_code: Process exit code of an executed action.
- status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".
- isolation: Effective workload isolation type used for an executed task, such as "docker", "podman", "firecracker", or "none".
Examples
# Total number of actions executed per second
sum(rate(buildbuddy_remote_execution_count[5m]))
buildbuddy_remote_execution_tasks_started_count
(Counter)
Number of tasks started remotely, but not necessarily completed.
Includes retry attempts of the same task.
buildbuddy_remote_execution_executed_action_metadata_durations_usec
(Histogram)
Time spent in each stage of action execution, in microseconds.
Queries should filter or group by the stage
label, taking care not to aggregate different stages.
Labels
- stage: Executed action stage. Action execution is split into stages corresponding to the timestamps defined in
ExecutedActionMetadata
:queued
,input_fetch
,execution
, andoutput_upload
. An additional stage,worker
, includes all stages during which a worker is handling the action, which is all stages except thequeued
stage. - group_id: Group (organization) ID associated with the request.
Examples
# Median duration of all command stages
histogram_quantile(
0.5,
sum(rate(buildbuddy_remote_execution_executed_action_metadata_durations_usec_bucket[5m])) by (le, stage)
)
# p90 duration of just the command execution stage
histogram_quantile(
0.9,
sum(rate(buildbuddy_remote_execution_executed_action_metadata_durations_usec_bucket{stage="execution"}[5m])) by (le)
)
buildbuddy_remote_execution_task_pressure_stall_duration_fraction
(Histogram)
Linux PSI stall time as a fraction of each action's execution duration (0-1).
Labels
- resource: System resource: "cpu", "memory", or "io".
- stall_type: Pressure stall type: "some" (task is partially stalled on the resource) or "full" (task is completely stalled on the resource).
buildbuddy_remote_execution_task_size_read_requests
(Counter)
Number of read requests to the task sizer, which estimates action resource usage based on historical execution stats.
Labels
- status: Status of the task size read request:
hit
,miss
, orerror
. - isolation: Effective workload isolation type used for an executed task, such as "docker", "podman", "firecracker", or "none".
- os: OS associated with the request.
- arch: CPU architecture associated with the request.
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_execution_task_size_write_requests
(Counter)
Number of write requests to the task sizer, which estimates action resource usage based on historical execution stats.
Labels
- status: Status of the task size write request:
ok
,missing_stats
orerror
. - isolation: Effective workload isolation type used for an executed task, such as "docker", "podman", "firecracker", or "none".
- os: OS associated with the request.
- arch: CPU architecture associated with the request.
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_execution_task_size_prediction_duration_usec
(Histogram)
Task size prediction model request duration in microseconds.
Labels
- status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".
buildbuddy_remote_execution_enqueued_task_milli_cpu
(Histogram)
Milli-CPU prediction of enqueued tasks.
buildbuddy_remote_execution_enqueued_task_memory_bytes
(Histogram)
Memory prediction of enqueued tasks.
buildbuddy_remote_execution_waiting_execution_result
(Gauge)
Number of execution requests for which the client is actively waiting for results.
Labels
- group_id: Group (organization) ID associated with the request.
Examples
# Total number of execution requests with client waiting for result.
sum(buildbuddy_remote_execution_waiting_execution_result)
buildbuddy_remote_execution_requests
(Counter)
Number of execution requests received.
Labels
- group_id: Group (organization) ID associated with the request.
- os: OS associated with the request.
- arch: CPU architecture associated with the request.
buildbuddy_remote_execution_executor_registration_count
(Counter)
Number of executor registrations on the scheduler.
Labels
- version: Binary version. Example:
v2.0.0
.
Examples
# Rate of new execution requests by OS/Arch.
sum(rate(buildbuddy_remote_execution_requests[1m])) by (os, arch)
buildbuddy_remote_execution_merged_actions
(Counter)
Number of identical execution requests that have been merged.
Labels
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_execution_hedged_actions
(Counter)
Number of identicial execution request which were merged for which a hedged execution was run in the background.
Labels
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_execution_merged_actions_per_execution
(Histogram)
Distribution of how many actions were submitted and merged against a single, canonical execution over the lifetime of that canonical execution.
Note that this metric is recorded once per merged-action, so distribution values are cumulative, or recorded n-times per canonical execution.
Labels
- group_id: Group (organization) ID associated with the request.
buildbuddy_remote_execution_merged_action_submit_time_offset_usec
(Histogram)
The offset, in microseconds of wall-time, between the time when a merged action was submitted to the execution server and when the original action was submitted to the execution server.
Labels
- group_id: Group (organization) ID associated with the request.
Examples
# Rate of merged actions by group.
sum(rate(buildbuddy_remote_execution_merged_actions[1m])) by (group_id)
buildbuddy_remote_execution_queue_length
(Gauge)
Number of actions currently waiting in the executor queue.
Labels
- group_id: Group (organization) ID associated with the request.
Examples
# Median queue length across all executors
quantile(0.5, buildbuddy_remote_execution_queue_length)
buildbuddy_remote_execution_tasks_executing
(Gauge)
Number of tasks currently being executed by the executor.
Labels
- stage: Executed action stage. Action execution is split into stages corresponding to the timestamps defined in
ExecutedActionMetadata
:queued
,input_fetch
,execution
, andoutput_upload
. An additional stage,worker
, includes all stages during which a worker is handling the action, which is all stages except thequeued
stage.
Examples
# Fraction of idle executors
count_values(0, buildbuddy_remote_execution_tasks_executing)
/
count(buildbuddy_remote_execution_tasks_executing)
buildbuddy_remote_execution_assigned_ram_bytes
(Gauge)
Estimated RAM on the executor that is currently allocated for task execution, in bytes.
buildbuddy_remote_execution_assigned_and_queued_estimated_ram_bytes
(Gauge)
Estimated RAM on the executor that is currently allocated for queued or executing tasks, in bytes.
Note that this is a fuzzy estimate because there's no guarantee that tasks queued on a machine will be handled by that machine.
buildbuddy_remote_execution_assignable_ram_bytes
(Gauge)
Maximum total RAM that can be allocated for task execution, in bytes.
buildbuddy_remote_execution_assigned_milli_cpu
(Gauge)
Estimated CPU time on the executor that is currently allocated for task execution, in milliCPU (CPU-milliseconds per second).
buildbuddy_remote_execution_assigned_and_queued_estimated_milli_cpu
(Gauge)
Estimated CPU time on the executor that is currently allocated for queued or executing tasks, in milliCPU (CPU-milliseconds per second).
Note that this is a fuzzy estimate because there's no guarantee that tasks queued on a machine will be handled by that machine.
buildbuddy_remote_execution_assignable_milli_cpu
(Gauge)
Maximum total CPU time on the executor that can be allocated for task execution, in milliCPU (CPU-milliseconds per second).
buildbuddy_remote_execution_cpu_utilization_milli_cpu
(Gauge)
Approximate current CPU utilization of tasks executing, in milli-CPU (CPU-milliseconds per second).
This allows for much higher granularity than using a rate()
on used_milli_cpu
metric.
buildbuddy_remote_execution_file_download_count
(Histogram)
Number of files downloaded during remote execution.
buildbuddy_remote_execution_file_download_size_bytes
(Histogram)
Total number of bytes downloaded during remote execution.
buildbuddy_remote_execution_file_download_duration_usec
(Histogram)
Per-file download duration during remote execution, in microseconds.
buildbuddy_remote_execution_file_upload_count
(Histogram)
Number of files uploaded during remote execution.
buildbuddy_remote_execution_file_upload_size_bytes
(Histogram)
Total number of bytes uploaded during remote execution.
buildbuddy_remote_execution_skipped_output_bytes
(Counter)
Total number of output bytes that weren't uploaded after remote execution.
buildbuddy_remote_execution_file_upload_duration_usec
(Histogram)
Per-file upload duration during remote execution, in microseconds.
buildbuddy_remote_execution_networking_command_duration_usec
(Histogram)
Duration of networking commands, in microseconds.
Labels
- command: Command being run. Specific arguments to the command are omitted to reduce metric cardinality.
buildbuddy_remote_execution_vfs_cas_files_count
(Counter)
Total number of CAS files in VFS filesystems.
buildbuddy_remote_execution_vfs_cas_files_accessed_count
(Counter)
Number of CAS files in VFS filesystems that were accessed by the action.
buildbuddy_remote_execution_vfs_cas_files_size_bytes
(Counter)
Total size of CAS files in VFS filesystems.
buildbuddy_remote_execution_vfs_cas_files_accessed_bytes
(Counter)
Size of CAS files in VFS filesystems that were accessed by the action.
buildbuddy_remote_execution_networking_command_cpu_usage_usec
(Histogram)
CPU usage of networking commands, in CPU-microseconds.
Labels
- command: Command being run. Specific arguments to the command are omitted to reduce metric cardinality.
buildbuddy_firecracker_stage_duration_usec
(Histogram)
The total duration of each firecracker stage, in microseconds.
Labels
- stage: Generic label to describe the stage the metric is capturing
Stage label values
- "init": Time for the VM to start up (either a new VM or from a snapshot)
- "exec": Time to run the command inside the container
- "task_lifecycle": Time from when the task if first assigned to the VM (beginning of init) to after it's finished execution. This roughly represents what a customer will wait for the task to complete after it's been scheduled to a firecracker runner
- "pause": Time to pause the VM, save a snapshot, and cleanup resources
Examples
# P95 workflow lifecycle duration in the past 5 minutes, grouped by group_id
histogram_quantile(
0.95,
sum by(le, group_id) (
rate(buildbuddy_firecracker_stage_duration_usec_bucket{job="executor-workflows", stage="task_lifecycle"}[5m])
)
)
buildbuddy_firecracker_exec_dial_duration_usec
(Histogram)
Time taken to dial the VM guest execution server after it has been started or resumed, in microseconds.
Labels
- created_from_snapshot: CreatedFromSnapshot indicates if a firecracker execution used a snapshot.
buildbuddy_firecracker_snapshot_remote_cache_upload_size_bytes
(Counter)
After a copy-on-write snapshot has been used, the total count of compressed bytes written to the cache (i.e.
will be 0 if the artifact is already cached).
Labels
- file_name: Name of a file.
buildbuddy_firecracker_cow_snapshot_dirty_chunk_ratio
(Histogram)
After a copy-on-write snapshot has been used, the ratio of dirty/total chunks.
Labels
- file_name: Name of a file.
Examples
# To view how many elements fall into each bucket
# Visualize with the Bar Gauge type
# Legend: {{le}}
# Format: Heatmap
sum(increase(buildbuddy_firecracker_cow_snapshot_dirty_chunk_ratio_bucket[5m])) by(le)
buildbuddy_firecracker_cow_snapshot_dirty_bytes
(Counter)
After a copy-on-write snapshot has been used, the total count of bytes dirtied.
Labels
- file_name: Name of a file.
buildbuddy_firecracker_cow_snapshot_empty_chunk_ratio
(Histogram)
After a copy-on-write snapshot has been used, the ratio of empty (i.e.
all 0s) /total chunks.
Labels
- file_name: Name of a file.
buildbuddy_firecracker_cow_snapshot_chunk_source_ratio
(Histogram)
After a copy-on-write snapshot has been used, the percentage of chunks that were initialized by the given source.
Labels
- file_name: Name of a file.
- chunk_source: For chunked snapshot files, describes the initialization source of the chunk (Ex.
remote_cache
orlocal_filecache
)
buildbuddy_firecracker_cow_snapshot_skipped_remote_bytes
(Counter)
The number of uncompressed bytes that were not written to the remote cache due to only writing locally.
Labels
- file_name: Name of a file.
buildbuddy_firecracker_cow_snapshot_memory_mapped_bytes
(Gauge)
Total number of bytes currently memory-mapped.
Labels
- file_name: Name of a file.
buildbuddy_firecracker_cow_snapshot_page_fault_total_duration_usec
(Histogram)
For a snapshotted VM, total time spent fulfilling page faults.
Labels
- stage: Generic label to describe the stage the metric is capturing
buildbuddy_firecracker_cow_snapshot_chunk_operation_duration_usec
(Histogram)
For a COW snapshot, cumulative time spent on an operation type.
Labels
- file_name: Name of a file.
- name: The name used to identify the type of an unexpected event.
- stage: Generic label to describe the stage the metric is capturing
buildbuddy_firecracker_cow_bytes_read
(Counter)
Total number of bytes read from COW chunked files.
Labels
- file_name: Name of a file.
buildbuddy_firecracker_cow_bytes_written
(Counter)
Total number of bytes written to COW chunked files.
Labels
- file_name: Name of a file.
buildbuddy_firecracker_workspace_conversion_disk_write_ops
(Counter)
Total number of disk write operations performed converting Firecracker action workspaces to/from ext4 images.
Labels
- command: Command being run. Specific arguments to the command are omitted to reduce metric cardinality.
buildbuddy_remote_execution_max_recyclable_resource_usage_event
(Counter)
Counter for firecracker runners that reach max disk/memory usage and won't get recycled.
Labels
- group_id: Group (organization) ID associated with the request.
- name: The name used to identify the type of an unexpected event.
- recycled_runner_status: For remote execution runners, describes the recycling status (Ex. 'clean' if the runner is not recycled or 'recycled')
buildbuddy_remote_execution_recycle_runner_requests
(Counter)
Number of execution requests with runner recycling enabled (via the platform property recycle-runner=true
).
Labels
- status: Status of the recycle runner request:
hit
if the executor assigned a recycled runner to the action;miss
otherwise.
buildbuddy_remote_execution_runner_pool_count
(Gauge)
Number of command runners that are currently pooled (and available for recycling).
buildbuddy_remote_execution_runner_pool_evictions
(Counter)
Number of command runners removed from the pool to make room for other runners.
buildbuddy_remote_execution_runner_pool_failed_recycle_attempts
(Counter)
Number of failed attempts to add runners to the pool.
Labels
- reason: Reason for a runner not being added to the runner pool.
buildbuddy_remote_execution_runner_pool_memory_usage_bytes
(Gauge)
Total memory usage of pooled command runners, in bytes.
Currently only supported for Docker-based executors.
buildbuddy_remote_execution_runner_pool_disk_usage_bytes
(Gauge)
Total disk usage of pooled command runners, in bytes.
buildbuddy_remote_execution_file_cache_requests
(Counter)
Number of local executor file cache requests.
Labels
- status: Status of the file cache request:
hit
if found in cache,miss
otherwise.
buildbuddy_remote_execution_file_cache_link_latency_usec
(Histogram)
Latency of individual file cache link operations.
buildbuddy_remote_execution_file_cache_last_eviction_age_usec
(Gauge)
Age of the last entry evicted from the executor's local file cache (relative to when it was added to the cache), in microseconds.
buildbuddy_remote_execution_file_cache_added_file_size_bytes
(Histogram)
Size of artifacts added to the file cache, in bytes.
buildbuddy_remote_execution_file_cache_added_file_bytes_count
(Counter)
Total number of bytes written to the filecache by groupid.
Labels
- group_id: Group (organization) ID associated with the request.
Blobstore metrics
"Blobstore" refers to the backing storage that BuildBuddy uses to store objects in the cache, as well as certain pieces of temporary data (such as invocation events while an invocation is in progress).
buildbuddy_blobstore_read_count
(Counter)
Number of files read from the blobstore.
Labels
- status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
- blobstore_type:
gcs
(Google Cloud Storage),aws_s3
, ordisk
.
buildbuddy_blobstore_read_size_bytes
(Histogram)
Number of bytes read from the blobstore per file.
Labels
- blobstore_type:
gcs
(Google Cloud Storage),aws_s3
, ordisk
.
Bytes downloaded per second
sum(rate(buildbuddy_blobstore_read_size_bytes[5m]))
### **`buildbuddy_blobstore_read_duration_usec`** (Histogram)
Duration per blobstore file read, in **microseconds**.
#### Labels
- **blobstore_type**: `gcs` (Google Cloud Storage), `aws_s3`, or `disk`.
### **`buildbuddy_blobstore_write_count`** (Counter)
Number of files written to the blobstore.
#### Labels
- **status**: Status code as defined by [grpc/codes](https://godoc.org/google.golang.org/grpc/codes#Code). This is a numeric value; any non-zero code indicates an error.
- **blobstore_type**: `gcs` (Google Cloud Storage), `aws_s3`, or `disk`.
# Bytes uploaded per second
sum(rate(buildbuddy_blobstore_write_size_bytes[5m]))