Chapter 1. The Benefits of Titan

Titan is designed to support the processing of graphs so large that they require storage and computational capacities beyond what a single machine can provide. Scaling graph data processing for real time traversals and analytical queries is Titan’s foundational benefit. This section will discuss the various specific benefits of Titan and its underlying, supported persistence solutions.

1.1. General Titan Benefits

  • Support for very large graphs. Titan graphs scale with the number of machines in the cluster.
  • Support for very many concurrent transactions and operational graph processing. Titan’s transactional capacity scales with the number of machines in the cluster and answers complex traversal queries on huge graphs in milliseconds.
  • Support for global graph analytics and batch graph processing through the Hadoop framework.
  • Support for geo, numeric range, and full text search for vertices and edges on very large graphs.
  • Native support for the popular property graph data model exposed by TinkerPop.
  • Native support for the graph traversal language Gremlin.
  • Easy integration with the Gremlin graph server for programming language agnostic connectivity.
  • Numerous graph-level configurations provide knobs for tuning performance.
  • Vertex-centric indices provide vertex-level querying to alleviate issues with the infamous super node problem.
  • Provides an optimized disk representation to allow for efficient use of storage and speed of access.
  • Open source under the liberal Apache 2 license.

1.2. Benefits of Titan with Cassandra

cassandra-small

  • Continuously available with no single point of failure.
  • No read/write bottlenecks to the graph as there is no master/slave architecture.
  • Elastic scalability allows for the introduction and removal of machines.
  • Caching layer ensures that continuously accessed data is available in memory.
  • Increase the size of the cache by adding more machines to the cluster.
  • Integration with Hadoop.
  • Open source under the liberal Apache 2 license.

1.3. Benefits of Titan with HBase

hbase_logo

  • Tight integration with the Hadoop ecosystem.
  • Native support for strong consistency.
  • Linear scalability with the addition of more machines.
  • Strictly consistent reads and writes.
  • Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
  • Support for exporting metrics via JMX.
  • Open source under the liberal Apache 2 license.

1.4. Titan and the CAP Theorem

 

Despite your best efforts, your system will experience enough faults that it will have to make a choice between reducing yield (i.e., stop answering requests) and reducing harvest (i.e., giving answers based on incomplete data). This decision should be based on business requirements.

 
 -- Coda Hale

When using a database, the CAP theorem should be thoroughly considered (C=Consistency, A=Availability, P=Partitionability). Titan is distributed with 3 supporting backends: Cassandra, HBase, and BerkeleyDB. Their tradeoffs with respect to the CAP theorem are represented in the diagram below. Note that BerkeleyDB is a non-distributed database and as such, is typically only used with Titan for testing and exploration purposes.

titan-captheorem

HBase gives preference to consistency at the expense of yield, i.e. the probability of completing a request. Cassandra gives preference to availability at the expense of harvest, i.e. the completeness of the answer to the query (data available/complete data).