Table of Contents
Titan supports two different kinds of indexing to speed up query processing: graph indexes and vertex-centric indexes. Most graph queries start the traversal from a list of vertices or edges that are identified by their properties. Graph indexes make these global retrieval operations efficient on large graphs. Vertex-centric indexes speed up the actual traversal through the graph, in particular when traversing through vertices with many incident edges.
Graph indexes are global index structures over the entire graph which allow efficient retrieval of vertices or edges by their properties for sufficiently selective conditions. For instance, consider the following queries
The first query asks for all vertices with the name
hercules. The second asks for all edges where the property reason contains the word
loves. Without a graph index answering those queries would require a full scan over all vertices or edges in the graph to find those that match the given condition which is very inefficient and infeasible for huge graphs.
Titan distinguishes between two types of graph indexes: composite and mixed indexes. Composite indexes are very fast and efficient but limited to equality lookups for a particular, previously-defined combination of property keys. Mixed indexes can be used for lookups on any combination of indexed keys and support multiple condition predicates in addition to equality depending on the backing index store.
Both types of indexes are created through the Titan management system and the index builder returned by
TitanManagement.buildIndex(String,Class) where the first argument defines the name of the index and the second argument specifies the type of element to be indexed (e.g.
Vertex.class). The name of a graph index must be unique.
Graph indexes built against newly defined property keys, i.e. property keys that are defined in the same management transaction as the index, are immediately available. Graph indexes built against property keys that are already in use require the execution of a reindex procedure to ensure that the index contains all previously added elements. Until the reindex procedure has completed, the index will not be available. It is encouraged to define graph indexes in the same transaction as the initial schema.
In the absence of an index, Titan will default to a full graph scan in order to retrieve the desired list of vertices. While this produces the correct result set, the graph scan can be very inefficient and lead to poor overall system performance in a production environment. Enable the
Composite indexes retrieve vertices or edges by one or a (fixed) composition of multiple keys. Consider the following composite index definitions.
mgmt = g.openManagement() name = mgmt.makePropertyKey('name').dataType(String.class).make() age = mgmt.makePropertyKey('age').dataType(Integer.class).make() mgmt.buildIndex('byName',Vertex.class).addKey(name).buildCompositeIndex() mgmt.buildIndex('byNameAndAge',Vertex.class).addKey(name).addKey(age).buildCompositeIndex() mgmt.commit()
First, two property keys
age are defined. Next, a simple composite index on just the name property key is built. Titan will use this index to answer the following query.
The second composite graph index includes both keys. Titan will use this index to answer the following query.
Note, that all keys of a composite graph index must be found in the query’s equality conditions for this index to be used. For example, the following query cannot be answered with either of the indexes because it only contains a constraint on
age but not
Also note, that composite graph indexes can only be used for equality constraints like those in the queries above. The following query would be answered with just the simple composite index defined on the
name key because the age constraint is not an equality constraint.
Composite indexes do not require configuration of an external indexing backend and are supported through the primary storage backend. Hence, composite index modifications are persisted through the same transaction as graph modifications which means that those changes are atomic and/or consistent if the underlying storage backend supports atomicity and/or consistency.
A composite index may comprise just one or multiple keys. A composite index with just one key is sometimes referred to as a key-index.
Composite indexes can also be used to enforce property uniqueness in the graph. If a composite graph index is defined as
unique() there can be at most one vertex or edge for any given concatenation of property values associated with the keys of that index.
For instance, to enforce that names are unique across the entire graph the following composite graph index would be defined.
mgmt = g.openManagement() name = mgmt.makePropertyKey('name').dataType(String.class).make() mgmt.buildIndex('byName',Vertex.class).addKey(name).unique().buildCompositeIndex() mgmt.commit()
To enforce uniqueness against an eventually consistent storage backend, the consistency of the index must be explicitly set to enabling locking.
Mixed indexes retrieve vertices or edges by any combination of previously added property keys. Mixed indexes provide more flexibility than composite indexes and support additional condition predicates beyond equality. On the other hand, mixed indexes are slower for most equality queries than composite indexes.
Unlike composite indexes, mixed indexes require the configuration of an indexing backend and use that indexing backend to execute lookup operations. Titan can support multiple indexing backends in a single installation. Each indexing backend must be uniquely identified by name in the Titan configuration which is called the indexing backend name.
mgmt = g.openManagement() name = mgmt.makePropertyKey('name').dataType(String.class).make() age = mgmt.makePropertyKey('age').dataType(Integer.class).make() mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name).addKey(age).buildMixedIndex("search") mgmt.commit()
The example above defines a mixed index containing the property keys
age. The definition refers to the indexing backend name
search so that Titan knows which configured indexing backend it should use for this particular index.
While this index definition looks similar to the composite index above, it provides greater query support and can answer any of the following queries.
g.V.has('name',CONTAINS,'hercules').interval('age',20,50) g.V.has('name',CONTAINS,'hercules') g.V.has('age',lt,50)
Mixed indexes support full-text search, range search, geo search and others. Refer to Chapter 19, Search Predicates and Data Types for a list of predicates supported by a particular indexing backend.
Unlike composite indexes, mixed indexes do not support uniqueness.
Property keys can be added to an existing mixed index which allows subsequent queries to include this key in the query condition.
mgmt = g.openManagement() location = mgmt.makePropertyKey('location').dataType(Geoshape.class).make() nameAndAge = mgmt.getGraphIndex('nameAndAge') mgmt.addIndexKey(nameAndAge,location) mgmt.commit()
To add a newly defined key, we first retrieve the existing index from the management transaction by its name and then invoke the
addIndexKey method to add the key to this index.
If the added key is defined in the same management transaction, it will be immediately available for querying. If the property key has already been in use, adding the key requires the execution of a reindex procedure to ensure that the index contains all previously added elements. Until the reindex procedure has completed, the key will not be available in the mixed index.
When adding a property key to a mixed index - either through the index builder or the
addIndexKey method - a list of parameters can be optionally specified to adjust how the property value is mapped into the indexing backend. Refer to the mapping parameters overview for a complete list of parameter types supported by each indexing backend.
The order in which the results of a graph query are returned can be defined using the
orderBy() directive. The
orderBy() method expects two parameters:
- The name of the property key by which to order the results. The results will be ordered by the value of the vertices or edges for this property key.
- The sort order: either increasing
For example, the query
g.V.has('name',CONTAINS,'hercules').orderBy('age',decr).limit(10) retrieves the ten oldest individuals with hercules in their name.
orderBy() it is important to note that:
- Composite graph indexes do not natively support ordering search results. All results will be retrieved and then sorted in-memory. For large result sets, this can be very expensive.
- Mixed indexes support ordering natively and efficiently. However, the property key used in the orderBy method must have been previously added to the mixed indexed for native result ordering support. This is important in cases where the orderBy key is different from the query keys. If the property key is not part of the index, then sorting requires loading all results into memory.
In many cases it is desirable to only index vertices or edges with a particular label. For instance, one may want to index only gods by their name and not every single vertex that has a name property.
When defining an index it is possible to restrict the index to a particular vertex or edge label using the
indexOnly method of the index builder. The following creates a composite index for the property key
name that indexes only vertices labeled
mgmt = g.openManagement() name = mgmt.makePropertyKey('name').dataType(String.class).make() god = mgmt.getVertexLabel('god') mgmt.buildIndex('byName',Vertex.class).addKey(name).indexOnly(god).buildCompositeIndex() mgmt.commit()
Label restrictions similarly apply to mixed indexes. When a composite index with label restriction is defined as unique, the uniqueness constraint only applies to properties on vertices or edges for the specified label.
Use a composite index for exact match index retrievals. Composite indexes do not require configuring or operating an external index system and are often significantly faster than mixed indexes.
- As an exception, use a mixed index for exact matches when the number of distinct values for query constraint is relatively small or if one value is expected to be associated with many elements in the graph (i.e. in case of low selectivity).
- Use a mixed indexes for numeric range, full-text or geo-spatial indexing. Also, using a mixed index can speed up orderBy queries.
Vertex-centric indexes are local index structures built individually per vertex. In large graphs vertices can have thousands of incident edges. Traversing through those vertices can be very slow because a large subset of the incident edges has to be retrieved and then filtered in memory to match the conditions of the traversal. Vertex-centric indexes can speed up such traversals by using localized index structures to retrieve only those edges that need to be traversed.
Suppose that Hercules battled hundreds of monsters in addition to the three captured in the introductory Graph of the Gods. Without a vertex-centric index, a query asking for those monsters battled between time point
20 would require retrieving all
battled edges even though there are only a handful of matching edges.
h = g.V.has('name','hercules').next() h.outE('battled').interval('time',10,20).inV
Building a vertex-centric index by time speeds up such traversal queries.
mgmt = g.openManagement() time = mgmt.makePropertyKey('time').dataType(Integer.class).make() battled = mgmt.makeEdgeLabel('battled').make() mgmt.buildEdgeIndex(battled,'battlesByTime',Direction.BOTH,Order.decr,time);
This example builds a vertex-centric index which indexes
battled edges in both direction by time in decreasing order.
A vertex-centric index is built against a particular edge label which is the first argument to the index construction method
TitanManagement.buildEdgeIndex(). The index only applies to edges of this label -
battled in the example above. The second argument is a unique name for the index. The third argument is the edge direction in which the index is built. The index will only apply to traversals along edges in this direction. In this example, the vertex-centric index is built in both direction which means that time restricted traversals along
battled edges can be served by this index in both the
OUT direction. Titan will maintain a vertex-centric index on both the in- and out-vertex of
battled edges. Alternatively, one could define the index to apply to the
OUT direction only which would speed up traversals from Hercules to the monsters but not in the reverse direction. This would only require maintaining one index and hence half the index maintenance and storage cost.
The last two arguments are the sort order of the index and a list of property keys to index by. The sort order is optional and defaults to ascending order (i.e.
Order.ASC). The list of property keys must be non-empty and defines the keys by which to index the edges of the given label. A vertex-centric index can be defined with multiple keys.
mgmt = g.openManagement() time = mgmt.makePropertyKey('time').dataType(Integer.class).make() rating = mgmt.makePropertyKey('rating').dataType(Decimal.class).make() battled = mgmt.makeEdgeLabel('battled').make() mgmt.buildEdgeIndex(battled,'battlesByRatingAndTime',Direction.OUT,Order.decr,rating,time);
This example extends the schema by a
rating property on
battled edges and builds a vertex-centric index which indexes
battled edges in the out-going direction by rating and time in decreasing order. Note, that the order in which the property keys are specified is important because vertex-centric indexes are prefix indexes. This means, that
battled edges are indexed by
rating first and
h.outE('battled').has('rating',LARGER_THAN,3.0).inV h.outE('battled').has('rating',5.0).interval('time',10,50).inV h.outE('battled').interval('time',10,50).inV
battlesByLocationAndTime index can speed up the first two but not the third query.
Multiple vertex-centric indexes can be built for the same edge label in order to support different constraint traversals. Titan’s query optimizer attempts to pick the most efficient index for any given traversal. Vertex-centric indexes only support equality and range/interval constraints.
The property keys used in a vertex-centric index must have an explicitly defined data type (i.e. not
If the vertex-centric index is built against an edge label that is defined in the same management transaction, the index will be immediately available for querying. If the edge label has already been in use, building a vertex-centric index against it requires the execution of a reindex procedure to ensure that the index contains all previously added edges. Until the reindex procedure has completed, the index will not be available.
Titan automatically builds vertex-centric indexes per edge label and property key. That means, even with thousands of incident
Vertex-centric indexes cannot speed up unconstrained traversals which require traversing through all incident edges of a particular label. Those traversals will become slower as the number of incident edges increases. Often, such traversals can be rewritten as constrained traversals that can utilize a vertex-centric index to ensure acceptable performance at scale.
The following queries specify an order in which the incident edges are to be traversed. Use the
localLimit command to retrieve a subset of the edges (in a given order) for EACH vertex that is traversed.
The first query asks for the names of the 10 most recently battled monsters by Hercules. The second query asks for the places of the 10 most recent battles of Hercules that are rated 5 stars. In both cases, the query is constrained by an order on a property key with a limit on the number of elements to be returned.
Such queries can also be efficiently answered by vertex-centric indexes if the order key matches the key of the index and the requested order (i.e. increasing or decreasing) is the same as the one defined for the index. The
battlesByTime index would be used to answer the first query and
battlesByLocationAndTime applies to the second. Note, that the
battlesByLocationAndTime index cannot be used to answer the first query because an equality constraint on
rating must be present for the second key in the index to be effective.
Ordered vertex queries are a Titan extension to Gremlin which causes the verbose syntax and requires the