Chapter 13. Common Questions

13.1. Accidental type creation

By default, Titan will automatically create property keys and edge labels when a new type is encountered. It is strongly encouraged that users explicitly schemata as documented in Chapter 5, Schema and Data Modeling before loading any data and disable automatic type creation by setting the option schema.default = none.

Automatic type creation can cause problems in multi-threaded or highly concurrent environments. Since Titan needs to ensure that types are unique, multiple attempts at creating the same type will lead to locking or other exceptions. It is generally recommended to create all needed types up front or in one batch when new property keys and edge labels are needed.

13.2. Custom Class Datatype

Titan supports arbitrary objects as attribute values on properties. To use a custom class as data type in Titan, either register a custom serializer or ensure that the class has a no-argument constructor and implements the equals method because Titan will verify that it can successfully de-/serialize objects of that class. Please see Chapter 31, Datatype and Attribute Serializer Configuration for more information.

13.3. Transactional Scope for Edges

Edges should not be accessed outside the scope in which they were originally created or retrieved.

13.4. Locking Exceptions

When defining unique types with locking enabled (i.e. requesting that Titan ensures uniqueness) it is likely to encounter locking exceptions of the type PermanentLockingException under concurrent modifications to the graph.

Such exceptions are to be expected, since Titan cannot know how to recover from a transactional state where an earlier read value has been modified by another transaction since this may invalidate the state of the transaction. In most cases it is sufficient to simply re-run the transaction. If locking exceptions are very frequent, try to analyze and remove the source of congestion.

13.5. Floating point numbers in Vertex-centric Indices

Titan does not allow property keys with Double or Float data type to be part of a vertex centric index because their serialization does not support index creation. Use custom, fixed-digit data types Decimal (3 decimal digits) or `Precision (6 decimal digits) instead.

13.6. Ghost Vertices

When the same vertex is concurrently removed in one transaction and modified in another, both transactions will successfully commit on eventually consistent storage backends and the vertex will still exist with only the modified properties or edges. This is referred to as a ghost vertex. It is possible to guard against ghost vertices on eventually consistent backends using key uniqueness but this is prohibitively expensive in most cases. A more scalable approach is to allow ghost vertices temporarily and clearing them out in regular time intervals, for instance using Titan tools.

Another option is to detect them at read-time using the option checkInternalVertexExistence() documented in Section 9.8, “Transaction Configuration”.

13.7. Debug-level Logging Slows Execution

When the log level is set to DEBUG Titan produces a lot of logging output which is useful to understand how particular queries get compiled, optimized, and executed. However, the output is so large that it will impact the query performance noticeably. Hence, use INFO severity or higher for production systems or benchmarking.

13.8. Titan OutOfMemoryException or excessive Garbage Collection

If you experience memory issues or excessive garbage collection while running Titan it is likely that the caches are configured incorrectly. If the caches are too large, the heap may fill up with cache entries. Try reducing the size of the transaction level cache before tuning the database level cache, in particular if you have many concurrent transactions. See Chapter 10, Titan Cache for more information.

13.9. JAMM Warning Messages

When launching Titan with embedded Cassandra, the following warnings may be displayed:

958 [MutationStage:25] WARN org.apache.cassandra.db.Memtable - MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead

Cassandra uses a Java agent called MemoryMeter which allows it to measure the actual memory use of an object, including JVM overhead. To use JAMM (Java Agent for Memory Measurements), the path to the JAMM jar must be specific in the Java javaagent parameter when launching the JVM (e.g. -javaagent:path/to/jamm.jar) through either titan.sh, gremlin.sh, or Rexster:

export TITAN_JAVA_OPTS=-javaagent:$TITAN_HOME/lib/jamm-0.2.5.jar

13.10. Cassandra Connection Problem

By default, Titan uses the Astyanax library to connect to Cassandra clusters. On EC2 and Rackspace, it has been reported that Astyanax was unable to establish a connection to the cluster. In those cases, changing the backend to storage.backend=cassandrathrift solved the problem.

13.11. ElasticSearch OutOfMemoryException

When numerous clients are connecting to ElasticSearch, it is likely that an OutOfMemoryException occurs. This is not due to a memory issue, but to the OS not allowing more threads to be spawned by the user (the user running ElasticSearch). To circumvent this issue, increase the number of allowed processes to the user running ElasticSearch. For example, increase the ulimit -u from the default 1024 to 10024.

13.12. Blueprints Index Creation

Indexes in Titan must be created through Titan’s management system as documented in Chapter 8, Indexing for better Performance and cannot be created through the Blueprints API.