Monitoring the Cluster
We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.
Monitoring the Kubernetes cluster
Dgraph exposes Prometheus metrics to monitor the state of various components involved in the cluster, including Dgraph Alpha and Zero nodes. You can setup Prometheus monitoring for your cluster.
You can use Helm to install kube-prometheus-stack chart. This Helm chart is a collection of Kubernetes manifests, Grafana dashboards, Prometheus rules combined with scripts to provide monitoring with Prometheus using the Prometheus Operator. This Helm chart also installs Grafana, node_exporter, kube-state-metrics.
Before you begin
- Install the Kubernetes command line tool.
- Ensure that you have a production-ready Kubernetes cluster with at least three worker nodes running in a cloud provider of your choice.
- Install Helm
Install using Helm chart
-
Create a
YAML
file nameddgraph-prometheus-operator.yaml
and edit the values as appropriate for adding endpoints, adding alert rules, adjusting alert manager configuration, adding Grafana dashboard, and others. For more information see, Dgraph Helm chart values. -
Create a
YAML
file namedsecrets.yaml
that has the credentials for Grafana. -
Add the
prometheus-operator
Helm chart: -
Install kube-prometheus-stack with the
<MY-RELEASE-NAME>
in the namespace namedmonitoring
:An output similar to the following appears:
-
Check the list of services in the
monitoring
namespace usingkubectl get svc -n monitoring
: -
Use
kubectl port-forward svc/dgraph-prometheus-release-prometheus -n monitoring 9090
to access Prometheus atlocalhost:9090
. -
Use
kubectl --namespace monitoring port-forward svc/grafana 3000:80
to access Grafana atlocalhost:3000
. -
Log in to Grafana using the password that you had set in the
secrets.yaml
file. -
In the Dashboards menu of Grafana, select Import.
-
In the Dashboards/Import dashboard page copy the contents of the
dgraph-kubernetes-grafana-dashboard.json
file in Import via panel JSON and click Load.You can visualize all Dgraph Alpha and Zero Kubernetes Pods, using the regular expression pattern
"/dgraph-.*-[0-9]*$/
. You can change this in the dashboard configuration and select the variable Pod. For example, if you have multiple releases, and only want to visualize the current release namedmy-release-3
, change the regular expression pattern to"/my-release-3.*dgraph-.*-[0-9]*$/"
in the Pod variable of the dashboard configuration. By default, the Prometheus that you installed is configured as theDatasource
in Grafana.
Kubernetes storage
The Kubernetes configurations in the previous sections were configured to run
Dgraph with any storage type (storage-class: anything
). On the common cloud
environments like AWS, Google Cloud, and Azure, the default storage type are
slow disks like hard disks or low IOPS SSDs. We highly recommend using faster
disks for ideal performance when running Dgraph.
Local storage
The AWS storage-optimized i-class instances provide locally attached NVMe-based SSD storage which provide consistent very high IOPS. The Dgraph team uses i3.large instances on AWS to test Dgraph.
You can create a Kubernetes StorageClass
object to provision a specific type
of storage volume which you can then attach to your Dgraph Pods. You can set up
your cluster with local SSDs by using
Local Persistent Volumes.
This Kubernetes feature is in beta at the time of this writing (Kubernetes
v1.13.1). You can first set up an EC2 instance with locally attached storage.
Once it’s formatted and mounted properly, then you can create a StorageClass to
access it.:
Currently, Kubernetes doesn’t allow automatic provisioning of local storage. So a PersistentVolume with a specific mount path should be created:
Then, in the StatefulSet configuration you can claim this local storage in .spec.volumeClaimTemplate:
You can repeat these steps for each instance that’s configured with local node storage.
Non-local persistent disks
EBS volumes on AWS and PDs on Google Cloud are persistent disks that can be configured with Dgraph. The disk performance is much lower than locally attached storage but can be sufficient for your workload such as testing environments.
When using EBS volumes on AWS, we recommend using Provisioned IOPS SSD EBS
volumes (the io1 disk type) which provide consistent IOPS. The available IOPS
for AWS EBS volumes is based on the total disk size. With Kubernetes, you can
request io1 disks to be provisioned with this configuration with 50 IOPS/GB
using the iopsPerGB
parameter:
Example: requesting a disk size of 250Gi with this storage class would provide 12.5K IOPS.
Removing a Dgraph pod
In the event that you need to completely remove a Pod (for example, the disk
became corrupted and data can’t be recovered), you can use the /removeNode
API
to remove the node from the cluster. With a Kubernetes StatefulSet, you’ll need
to remove the node in this order:
- On the Zero leader, call
/removeNode
to remove the Dgraph instance from the cluster (see More about Dgraph Zero). The removed instance will immediately stop running. Any further attempts to join the cluster fails for that instance since it has been removed. - Remove the PersistentVolumeClaim associated with the Pod to delete its data. This prepares the Pod to join with a clean state.
- Restart the Pod. This creates a new PersistentVolumeClaim to create new data directories.
When an Alpha Pod restarts in a replicated cluster, it joins as a new member of the cluster, be assigned a group and an unused index from Zero, and receive the latest snapshot from the Alpha leader of the group.
When a Zero Pod restarts, it must join the existing group with an unused index
ID. You set the index ID with the --raft
superflag’s idx
option. This might
require you to update the StatefulSet configuration.
Kubernetes and Bulk Loader
You may want to initialize a new cluster with an existing data set such as data from the Dgraph Bulk Loader. You can use Init Containers to copy the data to the Pod volume before the Alpha process runs.
See the initContainers
configuration in
dgraph-ha.yaml
to learn more.
Was this page helpful?