Schema Migration
This document describes all the things that you need to take care while doing a schema update or migration.
We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.
In every app’s development lifecycle, there’s a point where the underlying schema doesn’t fit the requirements and must be changed for good. That requires a migration for both schema and the underlying data. This article guides you through common migration scenarios you can encounter with Dgraph and help you avoid any pitfalls around them.
These are the most common scenarios that can occur:
- Renaming a type
- Renaming a field
- Changing a field’s type
- Adding
@id
to an existing field
As long as you can avoid migration, avoid it. Because there can be scenarios where you might need to update downstream clients, which can be hard. So, its always best to try out things first, once you are confident enough, then only push them to production.
Renaming a type
Let’s say you had the following schema:
and you had your app working fine with it. Now, you feel that the name AppUser
would be more sensible than the name User
because User
seems a bit generic
to you. Then you are in a situation where you need migration.
This can be handled in a couple of ways:
- Migrate all the data for type
User
to use the new nameAppUser
. OR, - Just use the
@dgraph(type: ...)
directive to maintain backward compatibility with the existing data.
Depending on your use-case, you might find option 1 or 2 better for you. For
example, if you have accumulated very little data for the User
type till now,
then you might want to go with option #1. But, if you have an active app with a
very large dataset then updating the node of each user may not be a thing you
might want to commit to, as that can require some maintenance downtime. So,
option #2 could be a better choice in such conditions.
Option #2 makes your new schema compatible with your existing data. Here’s an example:
So, no downtime required. Migration is done by just updating your schema. Fast, easy, and simple.
Note that, irrespective of what option you choose for migration on Dgraph side,
you still need to migrate your GraphQL clients to use the new name in
queries/mutations. For example, the query getUser
would now be renamed to
getAppUser
. So, your downstream clients need to update that bit in the code.
Renaming a field
Just like renaming a type, let’s say you had the following working schema:
and now you figured that it would be better to call phone
as tel
. You need
migration.
You have the same two choices as before:
- Migrate all the data for the field
phone
to use the new nametel
. OR, - Just use the
@dgraph(pred: ...)
directive to maintain backward compatibility with the existing data.
Here’s an example if you want to go with option #2:
Again, note that, irrespective of what option you choose for migration on Dgraph side, you still need to migrate your GraphQL clients to use the new name in queries/mutations. For example, the following query:
would now have to be changed to:
So, your downstream clients need to update that bit in the code.
Changing a field’s type
There can be multiple scenarios in this category:
- List -> Single item
String
->Int
- Any other combination you can imagine
It is strictly advisable that you figure out a solid schema before going in production, so that you don’t have to deal with such cases later. Nevertheless, if you ended up in such a situation, you have to migrate your data to fit the new schema. There is no easy way around here.
An example scenario is, if you initially had this schema:
and later you decided that you want an owner to have only one to do at a time. So, you want to make your schema look like this:
If you try updating your schema, you may end up getting an error like this:
This is a red flag. As the error message says, you should revert to the old schema to make your clients work correctly. In such cases, you should have migrated your data to fit the new schema before applying the new schema. The steps for such a data migration varies from case to case, and so can’t all be listed down here, but you need to migrate your data first, is all you need to keep in mind while making such changes.
Adding @id
to an existing field
Let’s say you had the following schema:
and now you think that username
must be unique for every user. So, you change
the schema to this:
Now, here’s the catch: with the old schema, it was possible that there could
have existed multiple users with the username Alice
. If that was true, then
the queries would break in such cases. Like, if you run this query after the
schema change:
Then it might error out saying:
So, while making such a schema change, you need to make sure that the underlying data really honors the uniqueness constraint on the username field. If not, you need to do a data migration to honor such constraints.
Unused fields
For example, let’s assume that you have deployed the following schema:
Then you create a TestDataMigration
with someOtherInfo
value.
Then you update the Schema and remove the field.
The data you have previously created is still in the graph database !
Moreover if you delete the TestDataMigration
object using its id
, the
GraphQL API delete operation is successful.
If you followed the GraphQL - DQL Schema mapping, you
understand that Dgraph has used the list the known list of predicates (id
,
someInfo
) and removed them. In fact, Dgraph also removed the
dgraph.type
predicate and so thisTestDataMigration
node isn’t visible anymore
to the GraphQL API.
The point is that a node with this uid
exists and has a predicate
someOtherInfo
. This is because this data has been created initially and
nothing in the process of deploying a new version and then using a delete
operation by ID instructed Dgraph to delete this predicate.
You end up with a node without type (i.e without a dgraph.type
predicate) and
with an old predicate value which is ‘invisible’ to your GraphQL API!
When doing a GraphQL schema deployment, you must take care of the data cleaning and data migration. The good news is that DQL offers you the tools to identify (search) potential issues and to correct the data (mutations).
In the previous case, you can alter the database and completely delete the predicate or you can write an ‘upsert’ DQL query that searches the nodes of interest and delete the unused predicate for those nodes.
New non-nullable field
Another obvious example appears if you deploy a new version containing a new non-nullable field for an existing type. The existing ‘nodes’ of the same type in the graph don’t have this predicate. A GraphQL query reaching those nodes returns a list of errors. You can easily write an ‘upsert’ DQL mutation to find all node of this type not having the new predicate and update them with a default value.
Was this page helpful?