domino logo
About DominoArchitecture
Kubernetes
Cluster RequirementsDomino on EKSDomino Kubernetes Version CompatibilityDomino on GKEDomino on AKSDomino on OpenShiftNVIDIA DGX in DominoDomino in Multi-Tenant Kubernetes ClusterEncryption in Transit
Installation
Installation ProcessConfiguration ReferenceInstaller Configuration ExamplesPrivate or Offline Installationfleetcommand-agent release notes
Configuration
Central ConfigurationNotificationsChange The Default Project For New UsersProject Stage ConfigurationDomino Integration With Atlassian Jira
Compute
Manage Domino Compute ResourcesHardware Tier Best PracticesModel Resource QuotasPersistent Volume ManagementAdding a Node Pool to your Domino ClusterRemove a Node from Service
Keycloak Authentication Service
Operations
Domino Application LoggingDomino MonitoringSizing Infrastructure for Domino
Data Management
Data in DominoData Flow In DominoExternal Data VolumesDatasets AdministrationSubmit GDPR Requests
User Management
RolesView User InformationRun a User Activity ReportSchedule a User Activity Report
Environments
Environment Management Best PracticesCache Environment Images in EKS
Disaster Recovery
Control Center
Control Center OverviewExport Control Center Data with The API
domino logo
About Domino
Domino Data LabKnowledge BaseData Science BlogTraining
Admin Guide
>
Operations
>
Domino Monitoring

Domino Monitoring

Monitoring Domino involves tracking several key application metrics. These metrics reveal the health of the application and can provide advance warning of any issues or failures of Domino components.

Metrics

Domino recommends tracking these metrics in priority order:

MetricSuggested thresholdNotes

Latency to /health

1000ms

Measures the time to receive a response to a request to the Domino API server. If the response time is too high, this suggests that the system is unhealthy and that user experience might be impacted. This can be measured by calls to the Domino application at a path of /health.

Dispatcher pod availability from metrics server

nucleus-dispatcher pods available = 0 for > 10 minutes

If the number of pods in the nucleus-dispatcher deployment is 0 for greater than 10 minutes, its an indication of critical issues that Domino will not automatically recover from, and functionality will be degraded.

Frontend pod availability from metrics server

nucleus-frontend pods available < 2 for > 10 minutes

If the number of pods in the nucleus-frontend deployment is less than two for greater than 10 minutes, its an indication of critical issues that Domino will not automatically recover from, and functionality will be degraded.

There are many application monitoring tools you can use to track these metrics, including:

  • NewRelic

  • Splunk

  • Datadog

Alerts

Users are advised to configure alerts to their application administrators if the thresholds listed previously are exceeded. These alerts are an indication of potential resourcing issues or unusual usage patterns worth investigation. See Domino application logs, the Domino administration UI, and the Domino Control Center to gather additional information.

Domino Data LabKnowledge BaseData Science BlogTraining
Copyright © 2022 Domino Data Lab. All rights reserved.