Debugging

We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.

Each Dgraph data node exposes profile over /debug/pprof endpoint and metrics over /debug/vars endpoint. Each Dgraph data node has it’s own profiling and metrics information. Below is a list of debugging information exposed by Dgraph and the corresponding commands to retrieve them.

Metrics Information

If you are collecting these metrics from outside the Dgraph instance you need to pass --expose_trace=true flag, otherwise there metrics can be collected by connecting to the instance over localhost.

curl http://<IP>:<HTTP_PORT>/debug/vars

Metrics can also be retrieved in the Prometheus format at /debug/prometheus_metrics. See the Metrics section for the full list of metrics.

Profiling Information

Profiling information is available via the go tool pprof profiling tool built into Go. The “Profiling Go programs” Go blog post should help you get started with using pprof. Each Dgraph Zero and Dgraph Alpha exposes a debug endpoint at /debug/pprof/<profile> via the HTTP port.

go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap
Fetching profile from ...
Saved Profile in ...

The output of the command would show the location where the profile is stored.

In the interactive pprof shell, you can use commands like top to get a listing of the top functions in the profile, web to get a visual graph of the profile opened in a web browser, or list to display a code listing with profiling information overlaid.

CPU profile

go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/profile

Memory profile

go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap

Block profile

Dgraph by default doesn’t collect the block profile. Dgraph must be started with --profile_mode=block and --block_rate=<N> with N > 1.

go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/block

Goroutine stack

The HTTP page /debug/pprof/ is available at the HTTP port of a Dgraph Zero or Dgraph Alpha. From this page a link to the “full goroutine stack dump” is available (for example, on a Dgraph Alpha this page would be at http://localhost:8080/debug/pprof/goroutine?debug=2). Looking at the full goroutine stack can be useful to understand goroutine usage at that moment.

Profiling Information with `debuginfo`

Instead of sending a request to the server for each CPU, memory, and goroutine profile, you can use the debuginfo command to collect all of these profiles, along with several metrics.

You can run the command like this:

dgraph debuginfo -a <alpha_address:port> -z <zero_address:port> -d <path_to_dir_to_store_profiles>

Your output should look like:

I0311 14:13:53.243667   32654 run.go:118] using directory /tmp/dgraph-debuginfo037351492 for debug info dump.
I0311 14:13:53.243864   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/heap
I0311 14:13:53.243872   32654 debugging.go:70] please wait... (30s)
I0311 14:13:53.245338   32654 debugging.go:58] saving heap metric in /tmp/dgraph-debuginfo037351492/alpha_heap.gz
I0311 14:13:53.245349   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/profile?seconds=30
I0311 14:13:53.245357   32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.250079   32654 debugging.go:58] saving cpu metric in /tmp/dgraph-debuginfo037351492/alpha_cpu.gz
I0311 14:14:23.250148   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/state
I0311 14:14:23.250173   32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.255467   32654 debugging.go:58] saving state metric in /tmp/dgraph-debuginfo037351492/alpha_state.gz
I0311 14:14:23.255507   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/health
I0311 14:14:23.255528   32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.257453   32654 debugging.go:58] saving health metric in /tmp/dgraph-debuginfo037351492/alpha_health.gz
I0311 14:14:23.257507   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/jemalloc
I0311 14:14:23.257548   32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.259009   32654 debugging.go:58] saving jemalloc metric in /tmp/dgraph-debuginfo037351492/alpha_jemalloc.gz
I0311 14:14:23.259055   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/trace?seconds=30
I0311 14:14:23.259091   32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.266092   32654 debugging.go:58] saving trace metric in /tmp/dgraph-debuginfo037351492/alpha_trace.gz
I0311 14:14:53.266152   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/metrics
I0311 14:14:53.266181   32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.276357   32654 debugging.go:58] saving metrics metric in /tmp/dgraph-debuginfo037351492/alpha_metrics.gz
I0311 14:14:53.276414   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/vars
I0311 14:14:53.276439   32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.278295   32654 debugging.go:58] saving vars metric in /tmp/dgraph-debuginfo037351492/alpha_vars.gz
I0311 14:14:53.278340   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/trace?seconds=30
I0311 14:14:53.278366   32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.286770   32654 debugging.go:58] saving trace metric in /tmp/dgraph-debuginfo037351492/alpha_trace.gz
I0311 14:15:23.286830   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/goroutine?debug=2
I0311 14:15:23.286886   32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.291120   32654 debugging.go:58] saving goroutine metric in /tmp/dgraph-debuginfo037351492/alpha_goroutine.gz
I0311 14:15:23.291164   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/block
I0311 14:15:23.291192   32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.304562   32654 debugging.go:58] saving block metric in /tmp/dgraph-debuginfo037351492/alpha_block.gz
I0311 14:15:23.304664   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/mutex
I0311 14:15:23.304706   32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.309171   32654 debugging.go:58] saving mutex metric in /tmp/dgraph-debuginfo037351492/alpha_mutex.gz
I0311 14:15:23.309228   32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/threadcreate
I0311 14:15:23.309256   32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.313026   32654 debugging.go:58] saving threadcreate metric in /tmp/dgraph-debuginfo037351492/alpha_threadcreate.gz
I0311 14:15:23.385359   32654 run.go:150] Debuginfo archive successful: dgraph-debuginfo037351492.tar.gz

When the command finishes, debuginfo returns the tarball’s file name. If no destination has been specified, the file is created in the same directory from where you ran the debuginfo command.

The following files contain the metrics collected by the debuginfo command:

dgraph-debuginfo639541060
├── alpha_block.gz
├── alpha_goroutine.gz
├── alpha_health.gz
├── alpha_heap.gz
├── alpha_jemalloc.gz
├── alpha_mutex.gz
├── alpha_profile.gz
├── alpha_state.gz
├── alpha_threadcreate.gz
├── alpha_trace.gz
├── zero_block.gz
├── zero_goroutine.gz
├── zero_health.gz
├── zero_heap.gz
├── zero_jemalloc.gz
├── zero_mutex.gz
├── zero_profile.gz
├── zero_state.gz
├── zero_threadcreate.gz
└── zero_trace.gz

Command parameters

  -a, --alpha string       Address of running dgraph alpha. (default "localhost:8080")
  -x, --archive            Whether to archive the generated report (default true)
  -d, --directory string   Directory to write the debug info into.
  -h, --help               help for debuginfo
  -m, --metrics strings    List of metrics & profiles to dump in the report. (default [heap,cpu,state,health,jemalloc,trace,metrics,vars,trace,goroutine,block,mutex,threadcreate])
  -s, --seconds uint32     Duration for time-based metric collection. (default 30)
  -z, --zero string        Address of running dgraph zero.

The metrics flag (`-m`)

By default, debuginfo collects:

heap
cpu
state
health
jemalloc
trace
metrics
vars
trace
goroutine
block
mutex
threadcreate

If needed, you can collect some of them (not necessarily all). For example, this command collects only jemalloc and health profiles:

dgraph debuginfo -m jemalloc,health

Profiles details

cpu profile: CPU profile determines where a program spends its time while actively consuming CPU cycles (as opposed to while sleeping or waiting for I/O).
heap: Heap profile reports memory allocation samples; used to monitor current and historical memory usage, and to check for memory leaks.
threadcreate: Thread creation profile reports the sections of the program that lead the creation of new OS threads.
goroutine: Goroutine profile reports the stack traces of all current goroutines.
block: Block profile shows where goroutines block waiting on synchronization primitives (including timer channels).
mutex: Mutex profile reports the lock contentions. When you think your CPU isn’t fully utilized due to a mutex contention, use this profile.
trace: this capture a wide range of runtime events. Execution tracer is a tool to detect latency and utilization problems. You can examine how well the CPU is utilized, and when networking or syscalls are a cause of preemption for the goroutines. Tracer is useful to identify poorly parallelized execution, understand some of the core runtime events, and how your goroutines execute.

Using the `debug` tool

To debug a running Dgraph cluster, first copy the postings (“p”) directory to another location. If the Dgraph cluster isn’t running, then you can use the same postings directory with the debug tool. If the “p” directory has been encrypted, then the debug tool needs to use the --keyfile <path-to-keyfile> option. This file must contain the same key that was used to encrypt the “p” directory.

The dgraph debug tool can be used to inspect Dgraph’s posting list structure. You can use the debug tool to inspect the data, schema, and indices of your Dgraph cluster.

Some scenarios where the debug tool is useful:

Verify that mutations committed to Dgraph have been persisted to disk.
Verify that indices are created.
Inspect the history of a posting list.
Parse a badger key into meaningful struct

Example

Debug the p directory.

dgraph debug --postings ./p

Debug the p directory, not opening in read-only mode. This is typically necessary when the database wasn’t closed properly.

dgraph debug --postings ./p --readonly=false

Debug the p directory, only outputting the keys for the predicate 0-name. Note that 0 is the namespace and name is the predicate.

dgraph debug --postings ./p --readonly=false --pred=0-name

Debug the p directory, looking up a particular key:

dgraph debug --postings ./p --lookup 01000000000000000000046e616d65

Debug the p directory, inspecting the history of a particular key:

dgraph debug --postings ./p --lookup 01000000000000000000046e616d65 --history

Debug an encrypted p directory with the key in a local file at the path ./key_file:

dgraph debug --postings ./p --encryption=key-file=./key_file

The key file contains the key used to decrypt/encrypt the db. This key should be kept secret. As a best practice,

Don’t store the key file on the disk permanently. Back it up in a safe place and delete it after using it with the debug tool.
If the this isn’t possible, make sure correct privileges are set on the key file. Only the user who owns the dgraph process should be able to read or write the key file: chmod 600

Debug tool output

Let’s go over an example with a Dgraph cluster with the following schema with a term index, full-text index, and two separately committed mutations:

$ curl localhost:8080/alter -d '
  name: string @index(term) .
  url: string .
  description: string @index(fulltext) .
'

$ curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -d '{
  set {
    _:dgraph <name> "Dgraph" .
    _:dgraph <dgraph.type> "Software" .
    _:dgraph <url> "https://github.com/hypermodeinc/dgraph" .
    _:dgraph <description> "Fast, Transactional, Distributed Graph Database." .
  }
}'

$ curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -d '{
  set {
    _:badger <name> "Badger" .
    _:badger <dgraph.type> "Software" .
    _:badger <url> "https://github.com/hypermodeinc/badger" .
    _:badger <description> "Embeddable, persistent and fast key-value (KV) database written in pure Go." .
  }
}'

After stopping Dgraph, you can run the debug tool to inspect the postings directory:

The debug output can be very large. Typically you would redirect the debug tool to a file first for easier analysis.

dgraph debug --postings ./p

Opening DB: ./p

prefix =
{d} ns: 0x0  attr: url uid: 1  ts: 5 item: [79, b0100] sz: 79 dcnt: 1 key: 000000000000000000000375726c000000000000000001
{d} ns: 0x0  attr: url uid: 2  ts: 8 item: [108, b1000] sz: 108 dcnt: 0 isz: 187 icount: 2 key: 000000000000000000000375726c000000000000000002
{d} ns: 0x0  attr: name uid: 1  ts: 5 item: [51, b0100] sz: 51 dcnt: 1 key: 00000000000000000000046e616d65000000000000000001
{d} ns: 0x0  attr: name uid: 2  ts: 8 item: [80, b1000] sz: 80 dcnt: 0 isz: 131 icount: 2 key: 00000000000000000000046e616d65000000000000000002
{i} ns: 0x0  attr: name term: [1] [badger]  ts: 8 item: [41, b1000] sz: 41 dcnt: 0 isz: 79 icount: 2 key: 00000000000000000000046e616d650201626164676572
{i} ns: 0x0  attr: name term: [1] [dgraph]  ts: 5 item: [38, b0100] sz: 38 dcnt: 1 key: 00000000000000000000046e616d650201646772617068
{d} ns: 0x0  attr: description uid: 1  ts: 5 item: [100, b0100] sz: 100 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e000000000000000001
{d} ns: 0x0  attr: description uid: 2  ts: 8 item: [156, b1000] sz: 156 dcnt: 0 isz: 283 icount: 2 key: 000000000000000000000b6465736372697074696f6e000000000000000002
{i} ns: 0x0  attr: description term: [8] [databas]  ts: 8 item: [49, b1000] sz: 49 dcnt: 0 isz: 141 icount: 3 key: 000000000000000000000b6465736372697074696f6e020864617461626173
{i} ns: 0x0  attr: description term: [8] [distribut]  ts: 5 item: [48, b0100] sz: 48 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e0208646973747269627574
{i} ns: 0x0  attr: description term: [8] [embedd]  ts: 8 item: [48, b1000] sz: 48 dcnt: 0 isz: 93 icount: 2 key: 000000000000000000000b6465736372697074696f6e0208656d62656464
{i} ns: 0x0  attr: description term: [8] [fast]  ts: 8 item: [46, b1000] sz: 46 dcnt: 0 isz: 132 icount: 3 key: 000000000000000000000b6465736372697074696f6e020866617374
{i} ns: 0x0  attr: description term: [8] [go]  ts: 8 item: [44, b1000] sz: 44 dcnt: 0 isz: 85 icount: 2 key: 000000000000000000000b6465736372697074696f6e0208676f
{i} ns: 0x0  attr: description term: [8] [graph]  ts: 5 item: [44, b0100] sz: 44 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e02086772617068
{i} ns: 0x0  attr: description term: [8] [kei]  ts: 8 item: [45, b1000] sz: 45 dcnt: 0 isz: 87 icount: 2 key: 000000000000000000000b6465736372697074696f6e02086b6569
{i} ns: 0x0  attr: description term: [8] [kv]  ts: 8 item: [44, b1000] sz: 44 dcnt: 0 isz: 85 icount: 2 key: 000000000000000000000b6465736372697074696f6e02086b76
{i} ns: 0x0  attr: description term: [8] [persist]  ts: 8 item: [49, b1000] sz: 49 dcnt: 0 isz: 95 icount: 2 key: 000000000000000000000b6465736372697074696f6e020870657273697374
{i} ns: 0x0  attr: description term: [8] [pure]  ts: 8 item: [46, b1000] sz: 46 dcnt: 0 isz: 89 icount: 2 key: 000000000000000000000b6465736372697074696f6e020870757265
{i} ns: 0x0  attr: description term: [8] [transact]  ts: 5 item: [47, b0100] sz: 47 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e02087472616e73616374
{i} ns: 0x0  attr: description term: [8] [valu]  ts: 8 item: [46, b1000] sz: 46 dcnt: 0 isz: 89 icount: 2 key: 000000000000000000000b6465736372697074696f6e020876616c75
{i} ns: 0x0  attr: description term: [8] [written]  ts: 8 item: [49, b1000] sz: 49 dcnt: 0 isz: 95 icount: 2 key: 000000000000000000000b6465736372697074696f6e02087772697474656e
{d} ns: 0x0  attr: dgraph.type uid: 1  ts: 5 item: [60, b0100] sz: 60 dcnt: 1 key: 000000000000000000000b6467726170682e74797065000000000000000001
{d} ns: 0x0  attr: dgraph.type uid: 2  ts: 8 item: [88, b1000] sz: 88 dcnt: 0 isz: 148 icount: 2 key: 000000000000000000000b6467726170682e74797065000000000000000002
{i} ns: 0x0  attr: dgraph.type term: [2] [Software]  ts: 8 item: [50, b1000] sz: 50 dcnt: 0 isz: 144 icount: 3 key: 000000000000000000000b6467726170682e747970650202536f667477617265
{s} ns: 0x0  attr: url ts: 3 item: [23, b0001] sz: 23 dcnt: 0 isz: 23 icount: 1 key: 010000000000000000000375726c
{s} ns: 0x0  attr: name ts: 3 item: [33, b0001] sz: 33 dcnt: 0 isz: 33 icount: 1 key: 01000000000000000000046e616d65
{s} ns: 0x0  attr: description ts: 3 item: [51, b0001] sz: 51 dcnt: 0 isz: 51 icount: 1 key: 010000000000000000000b6465736372697074696f6e
{s} ns: 0x0  attr: dgraph.type ts: 1 item: [50, b0001] sz: 50 dcnt: 0 isz: 50 icount: 1 key: 010000000000000000000b6467726170682e74797065
{s} ns: 0x0  attr: dgraph.drop.op ts: 1 item: [45, b0001] sz: 45 dcnt: 0 isz: 45 icount: 1 key: 010000000000000000000e6467726170682e64726f702e6f70
{s} ns: 0x0  attr: dgraph.graphql.xid ts: 1 item: [64, b0001] sz: 64 dcnt: 0 isz: 64 icount: 1 key: 01000000000000000000126467726170682e6772617068716c2e786964
{s} ns: 0x0  attr: dgraph.graphql.schema ts: 1 item: [59, b0001] sz: 59 dcnt: 0 isz: 59 icount: 1 key: 01000000000000000000156467726170682e6772617068716c2e736368656d61
{s} ns: 0x0  attr: dgraph.graphql.p_query ts: 1 item: [71, b0001] sz: 71 dcnt: 0 isz: 71 icount: 1 key: 01000000000000000000166467726170682e6772617068716c2e705f7175657279
 ns: 0x0  attr: dgraph.graphql ts: 1 item: [98, b0001] sz: 98 dcnt: 0 isz: 98 icount: 1 key: 020000000000000000000e6467726170682e6772617068716c
 ns: 0x0  attr: dgraph.graphql.persisted_query ts: 1 item: [105, b0001] sz: 105 dcnt: 0 isz: 105 icount: 1 key: 020000000000000000001e6467726170682e6772617068716c2e7065727369737465645f7175657279

Found 34 keys

Each line in the debug output contains a prefix indicating the type of the key:

{d}: data key
{i}: index key
{c}: count key
{r}: reverse key
{s}: schema key

In the preceding debug output, we see data keys, index keys, and schema keys.

Each index key has a corresponding index type. For example, in attr: name term: [1] [dgraph] the [1] shows that this is the term index (0x1). In attr: description term: [8] [fast], the [8] shows that this is the full-text index (0x8). These IDs match the index IDs in tok.go.

Key lookup

Every key can be inspected further with the --lookup flag for the specific key.

dgraph debug --postings ./p --lookup 000000000000000000000b6465736372697074696f6e020866617374

Opening DB: ./p

Key: 000000000000000000000b6465736372697074696f6e020866617374 Length: 2 Is multi-part list? false Uid: 1 Op: 0
 Uid: 2 Op: 0

For data keys, a lookup shows its type and value. Below, we see that the key for attr: url uid: 1 is a string value.

dgraph debug --postings ./p --lookup 000000000000000000000375726c000000000000000001

Opening DB: ./p

Key: 000000000000000000000375726c000000000000000001 Length: 1 Is multi-part list? false Uid: 18446744073709551615 Op: 1  Type: STRING.  String Value: "https://github.com/hypermodeinc/dgraph

For index keys, a lookup shows the UIDs that are part of this index. Below, we see that the fast index for the <description> predicate has UIDs 0x1 and 0x2.

dgraph debug --postings ./p --lookup 000000000000000000000b6465736372697074696f6e020866617374

Opening DB: ./p
Key: 000000000000000000000b6465736372697074696f6e020866617374 Length: 2 Is multi-part list? false Uid: 1 Op: 0
 Uid: 2 Op: 0

Key history

You can also look up the history of values for a key using the --history option.

dgraph debug --postings ./p --lookup 000000000000000000000b6465736372697074696f6e020866617374 --history

Opening DB: ./p

==> key: 000000000000000000000b6465736372697074696f6e020866617374. PK: UID: 0, Attr: 0-description, IsIndex: true, Term: 0
ts: 8 {item}{discard}{complete}
 Num uids = 2. Size = 16
 Uid = 1
 Uid = 2

ts: 7 {item}{delta}
 Uid: 2 Op: 1

ts: 5 {item}{delta}
 Uid: 1 Op: 1

Above, we see that UID 0x1 was committed to this index at ts 5, and UID 0x2 was committed to this index at ts 7.

The debug output also shows UserMeta information:

{complete}: Complete posting list
{uid}: UID posting list
{delta}: Delta posting list
{empty}: Empty posting list
{item}: Item posting list
{deleted}: Delete marker

Parse key

You can parse a key into its constituent components using --parse_key. This doesn’t require a p directory.

dgraph debug --parse_key 000000000000000000000b6467726170682e74797065000000000000000001

{d} Key: UID: 1, Attr: 0-dgraph.type, Data key

Getting Started

Connecting

Query Language

GraphQL-based Development

Administration

Tools

Resources

Metrics Information

Profiling Information

CPU profile

Memory profile

Block profile

Goroutine stack

Profiling Information with `debuginfo`

Command parameters

The metrics flag (`-m`)

Profiles details

Using the `debug` tool

Example

Debug tool output

Key lookup

Key history

Parse key

Getting Started

Connecting

Query Language

GraphQL-based Development

Administration

Tools

Resources

​Metrics Information

​Profiling Information

​CPU profile

​Memory profile

​Block profile

​Goroutine stack

​Profiling Information with debuginfo

​Command parameters

​The metrics flag (-m)

​Profiles details

​Using the debug tool

​Example

​Debug tool output

​Key lookup

​Key history

​Parse key

Metrics Information

Profiling Information

CPU profile

Memory profile

Block profile

Goroutine stack

Profiling Information with `debuginfo`

Command parameters

The metrics flag (`-m`)

Profiles details

Using the `debug` tool

Example

Debug tool output

Key lookup

Key history

Parse key