Live Import (Live Loader)
We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.
You can import data on a running Dgraph instance (which may have prior data) using Dgraph CLI command dgraph live referred to as Live Loader. Live Loader sends mutations to a Dgraph cluster and has options to handle unique IDs assignment and to update existing data.
Live Loader accepts RDF N-Quad/Triple data or JSON in plain or gzipped format. Refers to data migration to see how to convert other data formats.
Before you begin
Verify that you have a local folder <local-path-to-data>
containing
- at least one data file in RDF or JSON in plain or gzip format with the data to import
- an optional schema file.
Those files have been generated by an export or by a data migration tool.
Batch upserts
You can use Live Loader to update existing data, either to modify existing predicates are to add new predicates to existing nodes.
To do so, use the -U, --upsertPredicate
flag or the -x, --xidmap
flag.
upsertPredicate flag
Use the -U, --upsertPredicate
flag to specify the predicate name in your data
that serve as unique identifier.
For example:
The upsert predicate used must be present the Dgraph instance or in the schema file and must be indexed.
For each node, Live Loader uses the node name provided in the data file as the upsert predicate value. For example if your data file contains
The previous command creates or updates the node with predicate xid
equal to
my.org/customer/1
and sets the predicate firstName
with the value John
.
xidmap flag
Live Loader uses -x, --xidmap
directory to lookup the uid
value for each
node name used in the data file or to store the mapping between the node names
and the generated uid
for every new node.
Import data on Dgraph self-hosted
Run the Live Loader using the -a, --alpha
flag as follows
Load multiple data files by using
--alpha
default value is localhost:9080
. You can specify a comma separated
list of alphas addresses in the same cluster to distribute the load.
When the path provided with -f, --files
option is a directory, then all files
ending in .rdf
, .rdf.gz
, .json
, and .json.gz
are loaded. Be sure that
your schema file has another extension (.txt or .schema for example).
Load from S3
To live load from Amazon S3 (Simple Storage Service), you must have either permissions to access the S3 bucket from the system performing live load (see IAM setup below) or explicitly add the following AWS credentials set via environment variables:
Environment Variable | Description |
---|---|
AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY | AWS access key with permissions to write to the destination bucket. |
AWS_SECRET_ACCESS_KEY or AWS_SECRET_KEY | AWS access key with permissions to write to the destination bucket. |
IAM setup
In AWS, you can accomplish this by doing the following:
-
Create an IAM Role with an IAM Policy that grants access to the S3 bucket.
-
Depending on whether you want to grant access to an EC2 instance, or to a pod running on EKS, you can do one of these options:
- Instance Profile can pass the IAM Role to an EC2 Instance
- IAM Roles for Amazon EC2 to attach the IAM Role to a running EC2 Instance
- IAM roles for service accounts to associate the IAM Role to a Kubernetes Service Account.
Once your setup is ready, you can execute the live load from S3. As examples:
The short form of the S3 URL requires S3 URL is prefixed with s3:///
(noticed the triple-slash ///
). The long form for S3 buckets requires a
double slash (s3://
).
Load from MinIO
To live load from MinIO, you must have the following MinIO credentials set via environment variables:
Environment Variable | Description |
---|---|
MINIO_ACCESS_KEY | MinIO access key with permissions to write to the destination bucket. |
MINIO_SECRET_KEY | MinIO secret key with permissions to write to the destination bucket. |
Once your setup is ready, you can execute the bulk load from MinIO:
Enterprise features
Multi-tenancy
Since multi-tenancy requires ACL, when using
the Live Loader you must provide the login credentials using the --creds
flag.
By default, Live Loader loads the data into the user’s namespace.
Guardians of the Galaxy
can load the data into multiple namespaces. Using --force-namespace
, a
Guardian can load the data into the namespace specified in the data and schema
files.
The Live Loader requires that the namespace
from the data and schema files
exist before loading the data.
For example, to preserve the namespace while loading data first you need to create the namespace(s) and then run the Live Loader command:
A Guardian of the Galaxy can also load data into a specific namespace. For
example, to force the data loading into namespace 123
:
The Live Loader requires that the namespace
from the data and schema files
exist before loading the data.
Encrypted imports
A new flag --encryption key-file=value
is added to the Live Loader. This
option is required to decrypt the encrypted export data and schema files. Once
the export files are decrypted, the Live Loader streams the data to a live Alpha
instance. Alternatively, starting with v20.07.0, the vault_*
options can be
used to decrypt the encrypted export and schema files.
If the live Alpha instance has encryption turned on, the p
directory is
encrypted. Otherwise, the p
directory is unencrypted.
For example, to load an encrypted RDF/JSON file and schema via Live Loader:
You can import your encrypted data into a new Dgraph Alpha node without encryption enabled.
Other Live Loader options
--new_uids
(default: false
): assign new UIDs instead of using the existing
UIDs in data files. This is useful to avoid overriding the data in a DB already
in operation.
--format
: specify file format (rdf
or json
) instead of getting it from
filenames. This is useful if you need to define a strict format manually.
-b, --batch
(default: 1000
): number of N-Quads to send as part of a
mutation.
-c, --conc
(default: 10
): number of concurrent requests to make to Dgraph.
Don’t confuse with -C
.
-C, --use_compression
(default: false
): enable compression for connections
to and from the Alpha server.
--vault
superflag’s options specify the Vault
server address, role id, secret id, and field that contains the encryption key
required to decrypt the encrypted export.
Was this page helpful?