Chapter 5. Schema and Data Modeling

Each Titan graph has a schema comprised of the edge labels, property keys, and vertex labels used therein. A Titan schema can either be explicitly or implicitly defined. Users are encouraged to explicitly define the graph schema during application development. An explicitly defined schema is an important component of a robust graph application and greatly improves collaborative software development. Note, that a Titan schema can be evolved over time without any interruption of normal database operations. Extending the schema does not slow down query answering and does not require database downtime.

The schema type - i.e. edge label, property key, or vertex label - is assigned to elements in the graph - i.e. edge, properties or vertices respectively - when they are first created. The assigned schema type cannot be changed for a particular element. This ensures a stable type system that is easy to reason about.

Beyond the schema definition options explained in this section, schema types provide performance tuning options that are discussed in Chapter 25, Advanced Schema.

5.1. Defining Edge Labels

Each edge connecting two vertices has a label which defines the semantics of the relationship. For instance, an edge labeled friend between vertices A and B encodes a friendship between the two individuals.

To define an edge label, call makeEdgeLabel(String) on an open graph or management transaction and provide the name of the edge label as the argument. Edge label names must be unique in the graph. This method returns a builder for edge labels that allows to define its multiplicity. The multiplicity of an edge label defines a multiplicity constraint on all edges of this label, that is, a maximum number of edges between pairs of vertices. Titan recognizes the following multiplicity settings.

5.1.1. Edge Label Multiplicity

Multiplicity Settings

  • MULTI: Allows multiple edges of the same label between any pair of vertices. In other words, the graph is a multi graph with respect to such edge label. There is no constraint on edge multiplicity.
  • SIMPLE: Allows at most one edge of such label between any pair of vertices. In other words, the graph is a simple graph with respect to the label. Ensures that edges are unique for a given label and pairs of vertices.
  • MANY2ONE: Allows at most one outgoing edge of such label on any vertex in the graph but places no constraint on incoming edges. The edge label mother is an example with MANY2ONE multiplicity since each person has at most one mother but mothers can have multiple children.
  • ONE2MANY: Allows at most one incoming edge of such label on any vertex in the graph but places no constraint on outgoing edges. The edge label winnerOf is an example with ONE2MANY multiplicity since each contest is won by at most one person but a person can win multiple contests.
  • ONE2ONE: Allows at most one incoming and one outgoing edge of such label on any vertex in the graph. The edge label marriedTo is an example with ONE2ONE multiplicity since a person is married to exactly one other person.

The default multiplicity is MULTI. The definition of an edge label is completed by calling the make() method on the builder which returns the defined edge label as shown in the following example.

mgmt = graph.openManagement()
follow = mgmt.makeEdgeLabel('follow').multiplicity(MULTI).make()
mother = mgmt.makeEdgeLabel('mother').multiplicity(MANY2ONE).make()
mgmt.commit()

5.2. Defining Property Keys

Properties on vertices and edges are key-value pairs. For instance, the property name='Daniel' has the key name and the value 'Daniel'. Property keys are part of the Titan schema and can constrain the allowed data types and cardinality of values.

To define a property key, call makePropertyKey(String) on an open graph or management transaction and provide the name of the property key as the argument. Property key names must be unique in the graph. This method returns a builder for the property keys.

5.2.1. Property Key Data Type

Use dataType(Class) to define the data type of a property key. Titan will enforce that all values associated with the key have the configured data type and thereby ensures that data added to the graph is valid. For instance, one can define that the name key has a String data type.

Define the data type as Object.class in order to allow any (serializable) value to be associated with a key. However, it is encouraged to use concrete data types whenever possible. Configured data types must be concrete classes and not interfaces or abstract classes. Titan enforces class equality, so adding a sub-class of a configured data type is not allowed.

Titan natively supports the following data types.

Table 5.1. Native Titan Data Types

NameDescription

String

Character sequence

Character

Individual character

Boolean

true or false

Byte

byte value

Short

short value

Integer

integer value

Long

long value

Float

4 byte floating point number

Double

8 byte floating point number

Decimal

Number with 3 decimal digits

Precision

Number with 6 decimal digits

Date

Date

Geoshape

Geographic shape like point, circle or box

UUID

UUID


5.2.2. Property Key Cardinality

Use cardinality(Cardinality) to define the allowed cardinality of the values associated with the key on any given vertex.

Cardinality Settings

  • SINGLE: Allows at most one value per element for such key. In other words, the key→value mapping is unique for all elements in the graph. The property key birthDate is an example with SINGLE cardinality since each person has exactly one birth date.
  • LIST: Allows an arbitrary number of values per element for such key. In other words, the key is associated with a list of values allowing duplicate values. Assuming we model sensors as vertices in a graph, the property key sensorReading is an example with LIST cardinality to allow lots of (potentially duplicate) sensor readings to be recorded.
  • SET: Allows multiple values but no duplicate values per element for such key. In other words, the key is associated with a set of values. The property key name has SET cardinality if we want to capture all names of an individual (including nick name, maiden name, etc).

The default cardinality setting is SINGLE. Note, that property keys used on edges and properties have cardinality SINGLE. Attaching multiple values for a single key on an edge or property is not supported.

mgmt = graph.openManagement()
birthDate = mgmt.makePropertyKey('birthDate').dataType(Long.class).cardinality(Cardinality.SINGLE).make()
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SET).make()
sensorReading = mgmt.makePropertyKey('sensorReading').dataType(Double.class).cardinality(Cardinality.LIST).make()
mgmt.commit()

5.3. Relation Types

Edge labels and property keys are jointly referred to as relation types. Names of relation types must be unique in the graph which means that property keys and edge labels cannot have the same name. There are methods in the Titan API to query for the existence or retrieve relation types which encompasses both property keys and edge labels.

mgmt = graph.openManagement()
if (mgmt.containsRelationType('name'))
    name = mgmt.getPropertyKey('name')
mgmt.getRelationTypes(EdgeLabel.class)
mgmt.commit()

5.4. Defining Vertex Labels

Like edges, vertices have labels. Unlike edge labels, vertex labels are optional. Vertex labels are useful to distinguish different types of vertices, e.g. user vertices and product vertices.

For compatibility with Blueprints, Titan provides differently-named methods for adding labeled and unlabeled vertices:

  • addVertexWithLabel
  • addVertex

Although labels are optional at the conceptual and data model level, Titan assigns all vertices a label as an internal implementation detail. Vertices created by the addVertex methods use Titan’s default label.

To create a label, call makeVertexLabel(String).make() on an open graph or management transaction and provide the name of the vertex label as the argument. Vertex label names must be unique in the graph.

mgmt = graph.openManagement()
person = mgmt.makeVertexLabel('person').make()
mgmt.commit()
// Create a labeled vertex
person = graph.addVertex(label, 'person')
// Create an unlabeled vertex
v = graph.addVertex()
graph.tx().commit()

5.5. Automatic Schema Maker

If an edge label, property key, or vertex label has not been defined explicitly, it will be defined implicitly when it is first used during the addition of an edge, vertex or the setting of a property. The DefaultSchemaMaker configured for the Titan graph defines such types.

By default, implicitly created edge labels have multiplicity MULTI and implicitly created property keys have cardinality SINGLE and data type Object.class. Users can control automatic schema element creation by implementing and registering their own DefaultSchemaMaker.

It is strongly encouraged to explicitly define all schema elements and to disable automatic schema creation by setting schema.default=none in the Titan graph configuration.

5.6. Changing Schema Elements

The definition of an edge label, property key, or vertex label cannot be changed once its committed into the graph. However, the names of schema elements can be changed via TitanManagement.changeName(TitanSchemaElement, String) as shown in the following example where the property key place is renamed to location.

mgmt = graph.openManagement()
place = mgmt.getPropertyKey('place')
mgmt.changeName(place, 'location')
mgmt.commit()

Note, that schema name changes may not be immediately visible in currently running transactions and other Titan graph instances in the cluster. While schema name changes are announced to all Titan instances through the storage backend, it may take a while for the schema changes to take effect and it may require a instance restart in the event of certain failure conditions - like network partitions - if they coincide with the rename. Hence, the user must ensure that either of the following holds:

  • The renamed label or key is not currently in active use (i.e. written or read) and will not be in use until all Titan instances are aware of the name change.
  • Running transactions actively accomodate the brief intermediate period where either the old or new name is valid based on the specific Titan instance and status of the name-change announcement. For instance, that could mean transactions query for both names simultaneously.

Should the need arise to re-define an existing schema type, it is recommended to change the name of this type to a name that is not currently (and will never be) in use. After that, a new label or key can be defined with the original name, thereby effectively replacing the old one. However, note that this would not affect vertices, edges, or properties previously written with the existing type. Redefining existing graph elements is not supported online and must be accomplished through a batch graph transformation.