Table of Contents
- 13.1. Accidental type creation
- 13.2. Custom Class Datatype
- 13.3. Transactional Scope for Edges
- 13.4. Locking Exceptions
- 13.5. Floating point numbers in Vertex-centric Indices
- 13.6. Ghost Vertices
- 13.7. Debug-level Logging Slows Execution
- 13.8. Titan OutOfMemoryException or excessive Garbage Collection
- 13.9. JAMM Warning Messages
- 13.10. Cassandra Connection Problem
- 13.11. ElasticSearch OutOfMemoryException
By default, Titan will automatically create property keys and edge labels when a new type is encountered. It is strongly encouraged that users explicitly schemata as documented in Chapter 5, Schema and Data Modeling before loading any data and disable automatic type creation by setting the option
schema.default = none.
Automatic type creation can cause problems in multi-threaded or highly concurrent environments. Since Titan needs to ensure that types are unique, multiple attempts at creating the same type will lead to locking or other exceptions. It is generally recommended to create all needed types up front or in one batch when new property keys and edge labels are needed.
Titan supports arbitrary objects as attribute values on properties. To use a custom class as data type in Titan, either register a custom serializer or ensure that the class has a no-argument constructor and implements the
equals method because Titan will verify that it can successfully de-/serialize objects of that class. Please see Chapter 31, Datatype and Attribute Serializer Configuration for more information.
Edges should not be accessed outside the scope in which they were originally created or retrieved.
When defining unique types with locking enabled (i.e. requesting that Titan ensures uniqueness) it is likely to encounter locking exceptions of the type
PermanentLockingException under concurrent modifications to the graph.
Such exceptions are to be expected, since Titan cannot know how to recover from a transactional state where an earlier read value has been modified by another transaction since this may invalidate the state of the transaction. In most cases it is sufficient to simply re-run the transaction. If locking exceptions are very frequent, try to analyze and remove the source of congestion.
Titan does not allow property keys with
Float data type to be part of a vertex centric index because their serialization does not support index creation. Use custom, fixed-digit data types
Decimal (3 decimal digits) or `Precision (6 decimal digits) instead.
When the same vertex is concurrently removed in one transaction and modified in another, both transactions will successfully commit on eventually consistent storage backends and the vertex will still exist with only the modified properties or edges. This is referred to as a ghost vertex. It is possible to guard against ghost vertices on eventually consistent backends using key uniqueness but this is prohibitively expensive in most cases. A more scalable approach is to allow ghost vertices temporarily and clearing them out in regular time intervals, for instance using Titan tools.
Another option is to detect them at read-time using the option
checkInternalVertexExistence() documented in Section 9.8, “Transaction Configuration”.
When the log level is set to
DEBUG Titan produces a lot of logging output which is useful to understand how particular queries get compiled, optimized, and executed. However, the output is so large that it will impact the query performance noticeably. Hence, use
INFO severity or higher for production systems or benchmarking.
If you experience memory issues or excessive garbage collection while running Titan it is likely that the caches are configured incorrectly. If the caches are too large, the heap may fill up with cache entries. Try reducing the size of the transaction level cache before tuning the database level cache, in particular if you have many concurrent transactions. See Chapter 10, Titan Cache for more information.
When launching Titan with embedded Cassandra, the following warnings may be displayed:
958 [MutationStage:25] WARN org.apache.cassandra.db.Memtable - MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead
Cassandra uses a Java agent called
MemoryMeter which allows it to measure the actual memory use of an object, including JVM overhead. To use JAMM (Java Agent for Memory Measurements), the path to the JAMM jar must be specific in the Java javaagent parameter when launching the JVM (e.g.
-javaagent:path/to/jamm.jar) through either titan.sh, gremlin.sh, or Rexster:
By default, Titan uses the Astyanax library to connect to Cassandra clusters. On EC2 and Rackspace, it has been reported that Astyanax was unable to establish a connection to the cluster. In those cases, changing the backend to
storage.backend=cassandrathrift solved the problem.
When numerous clients are connecting to ElasticSearch, it is likely that an
OutOfMemoryException occurs. This is not due to a memory issue, but to the OS not allowing more threads to be spawned by the user (the user running ElasticSearch). To circumvent this issue, increase the number of allowed processes to the user running ElasticSearch. For example, increase the
ulimit -u from the default 1024 to 10024.