Functions
We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.
Functions allow filtering based on properties of nodes or variables. Functions can be applied in the query root or in filters.
Comparison functions (eq
, ge
, gt
, le
, lt
) in the query root (func:
)
can only be applied on indexed predicates. Comparison
functions can be used on @filter directives even on predicates that
haven’t been indexed. Filtering on non-indexed predicates can be slow for large
datasets, as they require iterating over all of the possible values at the level
where the filter is being used.
All other functions, in the query root or in the filter can only be applied to indexed predicates.
For functions on string valued predicates, if no language preference is given, the function is applied to all languages and strings without a language tag. If a language preference is given, the function is applied only to strings of the given language.
Term matching
allofterms
Syntax Example: allofterms(predicate, "space-separated term list")
Schema Types: string
Index Required: term
Matches strings that have all specified terms in any order, case-insensitive.
Usage at root
Query Example: all nodes that have name
containing terms indiana
and
jones
, returning the English name and genre in English.
Usage as filter
Query Example: all Steven Spielberg films that contain the words indiana
and
jones
. The @filter(has(director.film))
removes nodes with name Steven
Spielberg that aren’t the director --- the data also contains a character in a
film called Steven Spielberg.
anyofterms
Syntax Example: anyofterms(predicate, "space-separated term list")
Schema Types: string
Index Required: term
Matches strings that have any of the specified terms in any order; case insensitive.
Usage at root
Query Example: All nodes that have a name
containing either poison
or
peacock
. Many of the returned nodes are movies, but people like Joan Peacock
also meet the search terms because without a cascade directive the
query doesn’t require a genre.
Usage as filter
Query Example: All Steven Spielberg movies that contain war
or spies
. The
@filter(has(director.film))
removes nodes with name Steven Spielberg that
aren’t the director --- the data also contains a character in a film called
Steven Spielberg.
Regular expressions
Syntax Examples: regexp(predicate, /regular-expression/)
or case insensitive
regexp(predicate, /regular-expression/i)
Schema Types: string
Index Required: trigram
Matches strings by regular expression. The regular expression language is that of go regular expressions.
Query Example: At root, match nodes with Steven Sp
at the start of name
,
followed by any characters. For each such matched UID, match the films
containing ryan
. Note the difference with allofterms
, which would match only
ryan
but regular expression search also matches within terms, such as bryan
.
Technical details
A Trigram is a substring of three continuous runes. For example, Dgraph
has
trigrams Dgr
, gra
, rap
, aph
.
To ensure efficiency of regular expression matching, Dgraph uses trigram indexing. Dgraph converts the regular expression to a trigram query, uses the trigram index and trigram query to find possible matches and applies the full regular expression search only to the possibles.
Writing efficient regular expressions and limitations
Keep the following in mind when designing regular expression queries.
- At least one trigram must be matched by the regular expression (patterns shorter than 3 runes aren’t supported) since Dgraph requires regular expressions that can be converted to a trigram query.
- The number of alternative trigrams matched by the regular expression should be
as small as possible (
[a-zA-Z][a-zA-Z][0-9]
isn’t a good idea). Many possible matches means the full regular expression is checked against many strings; where as, if the expression enforces more trigrams to match, Dgraph can make better use of the index and check the full regular expression against a smaller set of possible matches. - Thus, the regular expression should be as precise as possible. Matching longer strings means more required trigrams, which helps to effectively use the index.
- If repeat specifications (
*
,+
,?
,{n,m}
) are used, the entire regular expression must not match the empty string or any string: for example,*
may be used like[Aa]bcd*
but not like(abcd)*
or(abcd)|((defg)*)
- Repeat specifications after bracket expressions (e.g.
[fgh]{7}
,[0-9]+
or[a-z]{3,5}
) are often considered as matching any string because they match too many trigrams. - If the partial result (for subset of trigrams) exceeds 1000000 UIDs during index scan, the query is stopped to prohibit expensive queries.
Fuzzy matching
Syntax: match(predicate, string, distance)
Schema Types: string
Index Required: trigram
Matches predicate values by calculating the Levenshtein distance to the string, also known as fuzzy matching. The distance parameter must be greater than zero (0). Using a greater distance value can yield more but less accurate results.
Query Example: At root, fuzzy match nodes similar to Stephen
, with a distance
value of less than or equal to 8.
Same query with a Levenshtein distance of 3.
Vector Similarity Search
Syntax Examples: similar_to(predicate, 3, "[0.9, 0.8, 0, 0]")
Alternatively the vector can be passed as a variable:
similar_to(predicate, 3, $vec)
This function finds the nodes that have predicate
close to the provided
vector. The search is based on the distance metric specified in the index
(cosine
, euclidean
, or dotproduct
). The shorter distance indicates more
similarity. The second parameter, 3
specifies that top 3 matches be returned.
Schema Types: float32vector
Index Required: hnsw
Full-Text Search
Syntax Examples: alloftext(predicate, "space-separated text")
and
anyoftext(predicate, "space-separated text")
Schema Types: string
Index Required: fulltext
Apply full-text search with stemming and stop words to find strings matching all or any of the given text.
The following steps are applied during index generation and to process full-text search arguments:
- Tokenization (according to Unicode word boundaries).
- Conversion to lowercase.
- Unicode-normalization (to Normalization Form KC).
- Stemming using language-specific stemmer (if supported by language).
- Stop words removal (if supported by language).
Dgraph uses bleve for its full-text search indexing. See also the bleve language specific stop word lists.
Following table contains all supported languages, corresponding country-codes, stemming and stop words filtering support.
Language | Country Code | Stemming | Stop words |
---|---|---|---|
Arabic | ar | ✓ | ✓ |
Armenian | hy | ✓ | |
Basque | eu | ✓ | |
Bulgarian | bg | ✓ | |
Catalan | ca | ✓ | |
Chinese | zh | ✓ | ✓ |
Czech | cs | ✓ | |
Danish | da | ✓ | ✓ |
Dutch | nl | ✓ | ✓ |
English | en | ✓ | ✓ |
Finnish | fi | ✓ | ✓ |
French | fr | ✓ | ✓ |
Gaelic | ga | ✓ | |
Galician | gl | ✓ | |
German | de | ✓ | ✓ |
Greek | el | ✓ | |
Hindi | hi | ✓ | ✓ |
Hungarian | hu | ✓ | ✓ |
Indonesian | id | ✓ | |
Italian | it | ✓ | ✓ |
Japanese | ja | ✓ | ✓ |
Korean | ko | ✓ | ✓ |
Norwegian | no | ✓ | ✓ |
Persian | fa | ✓ | |
Portuguese | pt | ✓ | ✓ |
Romanian | ro | ✓ | ✓ |
Russian | ru | ✓ | ✓ |
Spanish | es | ✓ | ✓ |
Swedish | sv | ✓ | ✓ |
Turkish | tr | ✓ | ✓ |
Query Example: All names that have dog
, dogs
, bark
, barks
, barking
,
etc. Stop word removal eliminates the
and which
.
Inequality
equal to
Syntax Examples:
eq(predicate, value)
eq(val(varName), value)
eq(predicate, val(varName))
eq(count(predicate), value)
eq(predicate, [val1, val2, ..., valN])
eq(predicate, [$var1, "value", ..., $varN])
Schema Types: int
, float
, bool
, string
, dateTime
Index Required: An index is required for the eq(predicate, ...)
forms (see
table below) when used at query root. For count(predicate)
at the query root,
the @count
index is required. For variables the values have been calculated as
part of the query, so no index is required.
Type | Index Options |
---|---|
int | int |
float | float |
bool | bool |
string | exact , hash , term , fulltext |
dateTime | dateTime |
Test for equality of a predicate or variable to a value or find in a list of values.
The boolean constants are true
and false
, so with eq
this becomes, for
example, eq(boolPred, true)
.
Query Example: Movies with exactly thirteen genres.
Query Example: Directors called Steven who have directed 1,2 or 3 movies.
less than, less than or equal to, greater than and greater than or equal to
Syntax Examples: for inequality IE
IE(predicate, value)
IE(val(varName), value)
IE(predicate, val(varName))
IE(count(predicate), value)
With IE
replaced by
le
less than or equal tolt
less thange
greater than or equal togt
greater than
Schema Types: int
, float
, string
, dateTime
Index required: An index is required for the IE(predicate, ...)
forms (see
table below) when used at query root. For count(predicate)
at the query root,
the @count
index is required. For variables the values have been calculated as
part of the query, so no index is required.
Type | Index Options |
---|---|
int | int |
float | float |
string | exact |
dateTime | dateTime |
Query Example: Ridley Scott movies released before 1980.
Query Example: Movies with directors with Steven
in name
and have directed
more than 100
actors.
Query Example: A movie in each genre that has over 30000 movies. Because there is no order specified on genres, the order will be by UID. The count index records the number of edges out of nodes and makes such queries more.
Query Example: Directors called Steven and their movies which have
initial_release_date
greater than that of the movie Minority Report.
between
Syntax Example: between(predicate, startDateValue, endDateValue)
Schema Types: Scalar types, including dateTime
, int
, float
and string
Index Required: dateTime
, int
, float
, and exact
on strings
Returns nodes that match an inclusive range of indexed values. The between
keyword performs a range check on the index to improve query efficiency, helping
to prevent a wide-ranging query on a large set of data from running slowly.
A common use case for the between
keyword is to search within a dataset
indexed by dateTime
. The following example query demonstrates this use case.
Query Example: Movies initially released in 1977, listed by genre.
uid
Syntax Examples:
q(func: uid(<uid>))
predicate @filter(uid(<uid1>, ..., <uidn>))
predicate @filter(uid(a))
for variablea
q(func: uid(a,b))
for variablesa
andb
q(func: uid($uids))
for multiple uids in DQL Variables. You have to set the value of this variable as a string (e.g"[0x1, 0x2, 0x3]"
) in queryWithVars.
Filters nodes at the current query level to only nodes in the given set of UIDs.
For query variable a
, uid(a)
represents the set of UIDs stored in a
. For
value variable b
, uid(b)
represents the UIDs from the UID to value map. With
two or more variables, uid(a,b,...)
represents the union of all the variables.
uid(<uid>)
, like an identity function, will return the requested UID even if
the node does not have any edges.
If the UID of a node is known, values for the node can be read directly.
Query Example: The films of Priyanka Chopra by known UID.
Query Example: The films of Taraji Henson by genre.
Query Example: Taraji Henson films ordered by number of genres, with genres listed in order of how many films Taraji has made in each genre.
uid_in
Syntax Examples:
q(func: ...) @filter(uid_in(predicate, <uid>))
predicate1 @filter(uid_in(predicate2, <uid>))
predicate1 @filter(uid_in(predicate2, [<uid1>, ..., <uidn>]))
predicate1 @filter(uid_in(predicate2, uid(myVariable) ))
Schema Types: UID
Index Required: none
While the uid
function filters nodes at the current level based on UID,
function uid_in
allows looking ahead along an edge to check that it leads to a
particular UID. This can often save an extra query block and avoids returning
the edge.
uid_in
cannot be used at root. It accepts multiple UIDs as its argument, and
it accepts a UID variable (which can contain a map of UIDs).
Query Example: The collaborations of Marc Caro and Jean-Pierre Jeunet (UID
0x99706). If the UID of Jean-Pierre Jeunet is known, querying this way removes
the need to have a block extracting his UID into a variable and the extra edge
traversal and filter for ~director.film
.
You can also query for Jean-Pierre Jeunet if you don’t know his UID and use it in a UID variable.
type
Query Example: all nodes of type “Animal”
type(Animal)
equivalent to eq(dgraph.type,"Animal")
type() can also be used as a filter:
has
Syntax Examples: has(predicate)
Schema Types: all
Determines if a node has a particular predicate.
Query Example: First five directors and all their movies that have a release
date recorded. Directors have directed at least one film --- equivalent
semantics to gt(count(director.film), 0)
.
Geolocation
As of now we only support indexing Point, Polygon and MultiPolygon geometry types. However, Dgraph can store other types of gelocation data.
Note that for geo queries, any polygon with holes is replace with the outer loop, ignoring holes. Also, as for version 0.7.7 polygon containment checks are approximate.
Mutations
To make use of the geo functions you would need an index on your predicate.
Here is how you would add a Point
.
Here is how you would associate a Polygon
with a node. Adding a MultiPolygon
is also similar.
The above examples have been picked from our SF Tourism dataset.
Query
near
Syntax Example: near(predicate, [long, lat], distance)
Schema Types: geo
Index Required: geo
Matches all entities where the location given by predicate
is within
distance
meters of geojson coordinate [long, lat]
.
Query Example: Tourist destinations within 1000 meters (1 kilometer) of a point in Golden Gate Park in San Francisco.
within
Syntax Example: within(predicate, [[[long1, lat1], ..., [longN, latN]]])
Schema Types: geo
Index Required: geo
Matches all entities where the location given by predicate
lies within the
polygon specified by the geojson coordinate array.
Query Example: Tourist destinations within the specified area of Golden Gate Park, San Francisco.
contains
Syntax Examples: contains(predicate, [long, lat])
or
contains(predicate, [[long1, lat1], ..., [longN, latN]])
Schema Types: geo
Index Required: geo
Matches all entities where the polygon describing the location given by
predicate
contains geojson coordinate [long, lat]
or given geojson polygon.
Query Example : All entities that contain a point in the flamingo enclosure of San Francisco Zoo.
intersects
Syntax Example: intersects(predicate, [[[long1, lat1], ..., [longN, latN]]])
Schema Types: geo
Index Required: geo
Matches all entities where the polygon describing the location given by
predicate
intersects the given geojson polygon.
Was this page helpful?