Live Import (Live Loader)

We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.

You can import data on a running Dgraph instance (which may have prior data) using Dgraph CLI command dgraph live referred to as Live Loader. Live Loader sends mutations to a Dgraph cluster and has options to handle unique IDs assignment and to update existing data.

Live Loader accepts RDF N-Quad/Triple data or JSON in plain or gzipped format. Refers to data migration to see how to convert other data formats.

Before you begin

Verify that you have a local folder <local-path-to-data> containing

at least one data file in RDF or JSON in plain or gzip format with the data to import
an optional schema file.

Those files have been generated by an export or by a data migration tool.

Batch upserts

You can use Live Loader to update existing data, either to modify existing predicates are to add new predicates to existing nodes.

To do so, use the -U, --upsertPredicate flag or the -x, --xidmap flag.

upsertPredicate flag

Use the -U, --upsertPredicate flag to specify the predicate name in your data that serve as unique identifier.

For example:

dgraph live --files <directory-with-data-files> --schema <path-to-schema-file> --upsertPredicate xid

The upsert predicate used must be present the Dgraph instance or in the schema file and must be indexed.

For each node, Live Loader uses the node name provided in the data file as the upsert predicate value. For example if your data file contains

<_:my.org/customer/1>       <firstName>  "John"     .

The previous command creates or updates the node with predicate xid equal to my.org/customer/1 and sets the predicate firstName with the value John.

xidmap flag

dgraph live --files <directory-with-data-files> --schema <path-to-schema-file> --xidmap <local-directory>

Live Loader uses -x, --xidmap directory to lookup the uid value for each node name used in the data file or to store the mapping between the node names and the generated uid for every new node.

Import data on Dgraph self-hosted

Run the Live Loader using the -a, --alpha flag as follows

docker run -it --rm -v <local-path-to-data>:/tmp dgraph/dgraph:latest \
  dgraph live --alpha <Dgraph Alpha gRPC endpoint> -f /tmp/<data-file> -s /tmp/<schema-file>

Load multiple data files by using

docker run -it --rm -v <local-path-to-data>:/tmp dgraph/dgraph:latest \
  dgraph live --alpha <Dgraph Alpha gRPC endpoint> -f /tmp -s /tmp/<schema-file>

--alpha default value is localhost:9080. You can specify a comma separated list of alphas addresses in the same cluster to distribute the load.

When the path provided with -f, --files option is a directory, then all files ending in .rdf, .rdf.gz, .json, and .json.gz are loaded. Be sure that your schema file has another extension (.txt or .schema for example).

Load from S3

To live load from Amazon S3 (Simple Storage Service), you must have either permissions to access the S3 bucket from the system performing live load (see IAM setup below) or explicitly add the following AWS credentials set via environment variables:

Environment Variable	Description
`AWS_ACCESS_KEY_ID` or `AWS_ACCESS_KEY`	AWS access key with permissions to write to the destination bucket.
`AWS_SECRET_ACCESS_KEY` or `AWS_SECRET_KEY`	AWS access key with permissions to write to the destination bucket.

IAM setup

In AWS, you can accomplish this by doing the following:

Create an IAM Role with an IAM Policy that grants access to the S3 bucket.
Depending on whether you want to grant access to an EC2 instance, or to a pod running on EKS, you can do one of these options:
- Instance Profile can pass the IAM Role to an EC2 Instance
- IAM Roles for Amazon EC2 to attach the IAM Role to a running EC2 Instance
- IAM roles for service accounts to associate the IAM Role to a Kubernetes Service Account.
Once your setup is ready, you can execute the live load from S3. As examples:

## short form of S3 URL
dgraph live \
  --files s3:///<bucket-name>/<directory-with-data-files> \
  --schema s3:///<bucket-name>/<directory-with-data-files>/schema.txt

## long form of S3 URL
dgraph live \
  --files s3://s3.<region>.amazonaws.com/<bucket>/<directory-with-data-files> \
  --schema s3://s3.<region>.amazonaws.com/<bucket>/<directory-with-data-files>/schema.txt

The short form of the S3 URL requires S3 URL is prefixed with s3:/// (noticed the triple-slash ///). The long form for S3 buckets requires a double slash (s3://).

Load from MinIO

To live load from MinIO, you must have the following MinIO credentials set via environment variables:

Environment Variable	Description
`MINIO_ACCESS_KEY`	MinIO access key with permissions to write to the destination bucket.
`MINIO_SECRET_KEY`	MinIO secret key with permissions to write to the destination bucket.

Once your setup is ready, you can execute the bulk load from MinIO:

dgraph live \
  --files minio://minio-server:port/<bucket-name>/<directory-with-data-files> \
  --schema minio://minio-server:port/<bucket-name>/<directory-with-data-files>/schema.txt

Enterprise features

Multi-tenancy

Since multi-tenancy requires ACL, when using the Live Loader you must provide the login credentials using the --creds flag. By default, Live Loader loads the data into the user’s namespace.

Guardians of the Galaxy can load the data into multiple namespaces. Using --force-namespace, a Guardian can load the data into the namespace specified in the data and schema files.

The Live Loader requires that the namespace from the data and schema files exist before loading the data.

For example, to preserve the namespace while loading data first you need to create the namespace(s) and then run the Live Loader command:

dgraph live \
  --schema /tmp/data/1million.schema \
  --files /tmp/data/1million.rdf.gz --creds="user=groot;password=password;namespace=0" \
  --force-namespace -1

A Guardian of the Galaxy can also load data into a specific namespace. For example, to force the data loading into namespace 123:

dgraph live \
  --schema /tmp/data/1million.schema \
  --files /tmp/data/1million.rdf.gz \
  --creds="user=groot;password=password;namespace=0" \
  --force-namespace 123

The Live Loader requires that the namespace from the data and schema files exist before loading the data.

Encrypted imports

A new flag --encryption key-file=value is added to the Live Loader. This option is required to decrypt the encrypted export data and schema files. Once the export files are decrypted, the Live Loader streams the data to a live Alpha instance. Alternatively, starting with v20.07.0, the vault_* options can be used to decrypt the encrypted export and schema files.

If the live Alpha instance has encryption turned on, the p directory is encrypted. Otherwise, the p directory is unencrypted.

For example, to load an encrypted RDF/JSON file and schema via Live Loader:

dgraph live \
 --files <path-containerizing-encrypted-data-files> \
 --schema <path-to-encrypted-schema> \
 --encryption key-file=<path-to-keyfile-to-decrypt-files>

You can import your encrypted data into a new Dgraph Alpha node without encryption enabled.

# Encryption Key from the file path
dgraph live --files "<path-to-gzipped-RDF-or-JSON-file>" --schema "<path-to-schema>"  \
  --alpha "<dgraph-alpha-address:grpc_port>" --zero "<dgraph-zero-address:grpc_port>" \
  --encryption key-file="<path-to-enc_key_file>"

# Encryption Key from HashiCorp Vault
dgraph live --files "<path-to-gzipped-RDF-or-JSON-file>" --schema "<path-to-schema>"  \
  --alpha "<dgraph-alpha-address:grpc_port>" --zero "<dgraph-zero-address:grpc_port>" \
  --vault addr="http://localhost:8200";enc-field="enc_key";enc-format="raw";path="secret/data/dgraph/alpha";role-id-file="./role_id";secret-id-file="./secret_id"

Other Live Loader options

--new_uids (default: false): assign new UIDs instead of using the existing UIDs in data files. This is useful to avoid overriding the data in a DB already in operation.

--format: specify file format (rdf or json) instead of getting it from filenames. This is useful if you need to define a strict format manually.

-b, --batch (default: 1000): number of N-Quads to send as part of a mutation.

-c, --conc (default: 10): number of concurrent requests to make to Dgraph. Don’t confuse with -C.

-C, --use_compression (default: false): enable compression for connections to and from the Alpha server.

--vault superflag’s options specify the Vault server address, role id, secret id, and field that contains the encryption key required to decrypt the encrypted export.

Getting Started

Connecting

Query Language

GraphQL-based Development

Administration

Tools

Resources

Live Import (Live Loader)

Before you begin

Batch upserts

upsertPredicate flag

xidmap flag

Import data on Dgraph self-hosted

Load from S3

IAM setup

Load from MinIO

Enterprise features

Multi-tenancy

Encrypted imports

Other Live Loader options

Getting Started

Connecting

Query Language

GraphQL-based Development

Administration

Tools

Resources

​Before you begin

​Batch upserts

​upsertPredicate flag

​xidmap flag

​Import data on Dgraph self-hosted

​Load from S3

​IAM setup

​Load from MinIO

​Enterprise features

​Multi-tenancy

​Encrypted imports

​Other Live Loader options

Before you begin

Batch upserts

upsertPredicate flag

xidmap flag

Import data on Dgraph self-hosted

Load from S3

IAM setup

Load from MinIO

Enterprise features

Multi-tenancy

Encrypted imports

Other Live Loader options