Chapter 17. BerkeleyDB


Berkeley DB enables the development of custom data management solutions, without the overhead traditionally associated with such custom projects. Berkeley DB provides a collection of well-proven building-block technologies that can be configured to address any application need from the hand-held device to the datacenter, from a local storage solution to a world-wide distributed one, from kilobytes to petabytes.

 -- BerkeleyDB Homepage

The Oracle BerkeleyDB storage backend runs in the same JVM as Titan and provides local persistence on a single machine. Hence, the BerkeleyDB storage backend requires that all of the graph data fits on the local disk and all of the frequently accessed graph elements fit into main memory. This imposes a practical limitation of graphs with 10-100s million vertices on commodity hardware. However, for graphs of that size the BerkeleyDB storage backend exhibits high performance because all data can be accessed locally within the same JVM.

17.1. BerkeleyDB Setup

Since BerkeleyDB runs in the same JVM as Titan, connecting the two only requires a simple configuration and no additional setup:

TitanGraph g =
set("storage.backend", "berkeleyje").
set("", "/tmp/graph").

In the Gremlin shell, you can not define the type of the variables conf and g. Therefore, simply leave off the type declaration.

17.2. BerkeleyDB Specific Configuration

Refer to Chapter 12, Configuration Reference for a complete listing of all BerkeleyDB specific configuration options in addition to the general Titan configuration options.

When configuring BerkeleyDB it is recommended to consider the following BerkeleyDB specific configuration options:

  • transactions: Enables transactions and detects conflicting database operations. CAUTION: While disabling transactions can lead to better performance it can cause to inconsistencies and even corrupt the database if multiple Titan instances interact with the same instance of BerkeleyDB.
  • cache-percentage: The percentage of JVM heap space (configured via -Xmx) to be allocated to BerkeleyDB for its cache. Try to give BerkeleyDB as much space as possible without causing memory problems for Titan. For instance, if Titan only runs short transactions, use a value of 80 or higher.

17.3. Ideal Use Case

The BerkeleyDB storage backend is best suited for small to medium size graphs with up to 100 million vertices on commodity hardware. For graphs of that size, it will likely deliver higher performance than the distributed storage backends. Note, that BerkeleyDB is also limited in the number of concurrent requests it can handle efficiently because it runs on a single machine. Hence, it is not well suited for applications with many concurrent users mutating the graph, even if that graph is small to medium size.

Since BerkeleyDB runs in the same JVM as Titan, this storage backend is ideally suited for unit testing of application code using Titan.

Note, that BerkeleyDB has a more restrictive license than Titan. Please review BerkleyDB’s license to determine if it can accommodate your use case.

17.4. Global Graph Operations

Titan backed by BerkeleyDB supports global graph operations such as iterating over all vertices or edges. However, note that such operations need to scan the entire database which can require a significant amount of time for larger graphs.

In order to not run out of memory, it is advised to disable transactions (storage.transactions=false) when iterating over large graphs. Having transactions enabled requires BerkeleyDB to acquire read locks on the data it is reading. When iterating over the entire graph, these read locks can easily require more memory than is available.