What is Crux?

Crux—to use Martin Kleppmann’s phrase—is an unbundled database.

What do we have to gain from turning the database inside out? Simpler code, better scalability, better robustness, lower latency, and more flexibility for doing interesting things with data.

Crux embodies this principle using a combination of:

  • Apache Kafka for the primary (default) storage of transactions and documents as semi-immutable logs

  • RocksDB or LMDB to host indexes for rich query support

  • Clojure protocols for pluggablity (e.g. swap Kafka for SQLite)

This decoupled design enables Crux to be maintained as a small and efficient core that can be scaled and evolved to support a large variety of use-cases.

Crux Kafka Node Diagram

Crux is an open source document database with bitemporal graph queries.

Document database with graph queries


Crux is a bitemporal database that stores transaction time and valid time histories for all data:

  • Transaction time provides essential auditability and horizontal scaling

  • Valid time provides an optional ability to model temporal domain concepts and integrate upstream sources of temporal data

Many databases can support various levels of "time travel" queries across transaction time (i.e. the transactional sequence of database states from the moment of database creation to its current state), however such capabilities are typically complex to use and have practical limitations. By contrast, Crux provides an always-on capability for point-in-time querying of past transactional states and across the valid time axis.

Bitemporal modelling is broadly useful for event-based architectures and is a critical requirement for systems in any industry with strong auditing regulations, where you need to be able to answer the question "what did you know and when did you know it?".

Read more about Bitemporality in Crux or specifically the known uses for these capabilities.


Crux supports a Datalog query interface for traversing graph relationships across your documents. Query results are lazily streamed from the underlying Key-Value indexes.

Crux is ultimately a store of versioned EDN documents. The fields within these documents are automatically indexed as Entity-Attribute-Value triples to support efficient graph queries.


Crux does not enforce any schema for the documents it stores. One reason for this is that data might come from many different places, and may not ultimately be owned by the service using Crux to query the data. This design enables schema-on-write and/or schema-on-read to be achieved outside of the core of Crux, to meet the exact application requirements.


Nodes can come and go, with local indexes stored in a Key/Value store such as RocksDB, whilst reading and writing master data to central log topics (hosted by Kafka or a JDBC data store such as Postgres). Queries are not distributed and there is no sharding of data or indexes across nodes.

Crux can also run in a non-distributed "standalone" mode, where the transaction and document logs exist only inside of a local Key/Value store such as RocksDB or SQLite. This is appropriate for non-critical usage where availability and durability requirements do not warrant additional infrastructure services like Kafka or Postgres.


Crux supports eviction of active and historical data to assist with technical compliance for information privacy regulations.

The main transaction log contains only hashes and is immutable. All document content is stored in a dedicated document store which supports eviction, such as a Kafka topic with compacted tombstones.



Crux is optimised for efficient and globally consistent point-in-time queries using a pair of transaction-time and valid-time timestamps.

Ad-hoc systems for bitemporal recordkeeping typically rely on explicitly tracking either valid-from and valid-to timestamps or range types directly within relations. The bitemporal document model that Crux provides is very simple to reason about and it is universal across the entire database, therefore it does not require you to consider which historical information is worth storing in special "bitemporal tables" upfront.

One or more documents may be inserted into Crux via a put transaction at a specific valid-time, defaulting to the transaction time (i.e. now), and each document remains valid until explicitly updated with a new version via put or deleted via delete.


The rationale for bitemporality is also explained in this blog post.

A baseline notion of time that is always available is transaction-time; the point at which data is transacted into the database.

Bitemporality is the addition of another time-axis: valid-time.

Table 1. Time Axes
Time Purpose


Used for audit purposes, technical requirements such as event sourcing.


Used for querying data across time, historical analysis.

transaction-time represents the point at which data arrives into the database. This gives us an audit trail and we can see what the state of the database was at a particular point in time. You cannot write a new transaction with a transaction-time that is in the past.

valid-time is an arbitrary time that can originate from an upstream system, or by default is set to transaction-time. Valid time is what users will typically use for query purposes.

Writes can be made in the past of valid-time as retroactive operations. Users will normally ask "what is the value of this entity at valid-time?" regardless if this history has been rewritten several times at multiple transaction-times. Writes can also be made in the future of valid-time as proactive operations.

Typically you only need to consider using both transaction-time and valid-time for ensuring globally consistent reads across nodes or to query for audit reasons.

In Crux, when transaction-time isn’t specified, it is set to now. When writing data, in case there isn’t any specific valid-time available, valid-time and transaction-time take the same value.

Valid Time

In situations where your database is not the ultimate owner of the data—where corrections to data can flow in from various sources and at various times—use of transaction-time is inappropriate for historical queries.

Imagine you have a financial trading system and you want to perform calculations based on the official 'end of day', that occurs each day at 17:00 hours. Does all the data arrive into your database at exactly 17:00? Or does the data arrive from an upstream source where we have to allow for data to arrive out of order, and where some might always arrive after 17:00?

This can often be the case with high throughput systems where there are clusters of processing nodes, enriching the data before it gets to our store.

In this example, we want our queries to include the straggling bits of data for our calculation purposes, and this is where valid-time comes in. When data arrives into our database, it can come with an arbitrary time-stamp that we can use for querying purposes.

We can tolerate data arriving out of order, as we’re not completely dependent on transaction-time.

In a ecosystem of many systems, where one cannot control the ultimate time line, or other systems abilities to write into the past, one needs bitemporality to ensure evolving but consistent views of the data.

Transaction Time

For audit reasons, we might wish to know with certainty the value of a given entity-attribute at a given tx-instant. In this case, we want to exclude the possibility of the valid past being amended, so we need a pre-correction view of the data, relying on tx-instant.

To achieve this you can use as-of using ts (valid-time) and tx-ts (transaction-time).

Only if you want to ensure consistent reads across nodes or to query for audit reasons, would you want to consider using both transaction-time and valid-time.

Domain Time

Valid time is valuable for tracking a consistent view of the entire state of the database, however, unless you explicitly include a timestamp or other temporal component within your documents you cannot currently use this information about valid time inside of your Datalog queries.

Domain time or "user-defined" time is simply the storing of any additional time-related information within your documents, for instance valid-time, duration or timestamps relating to additional temporal life-cycles (e.g. decision, receipt, notification, availability).

Queries that use domain times do not automatically benefit from any kind of native indexes to support efficient execution, however Crux encourages you to build additional layers of functionality to do so. See decorators for examples.

Known Uses

Recording bitemporal information with your data is essential when dealing with lag, corrections, and efficient auditability:

  • Lag is found wherever there is risk of non-trivial delay until an event can be recorded. This is common between systems that communicate over unreliable networks.

  • Corrections are needed as errors are uncovered and as facts are reconciled.

  • Ad-hoc auditing is an otherwise intensive and slow process requiring significant operational complexity.

With Crux you retain visibility of all historical changes whilst compensating for lag, making corrections, and performing audit queries. By default, deleting data only erases visibility of that data from the current perspective. You may of course still evict data completely as the legal status of information changes.

These capabilities are known to be useful for:

  • Event Sourcing (e.g. retroactive and scheduled events and event-driven computing on evolving graphs)

  • Ingesting out-of-order temporal data from upstream timestamping systems

  • Maintaining a slowly changing dimension for decision support applications

  • Recovering from accidental data changes and application errors (e.g. billing systems)

  • Auditing all data changes and performing data forensics when necessary

  • Responding to new compliance regulations and audit requirements

  • Avoiding the need to set up additional databases for historical data and improving end-to-end data governance

  • Building historical models that factor in all historical data (e.g. insurance calculations)

  • Accounting and financial calculations (e.g payroll systems)

  • Development, simulation and testing

  • Live migrations from legacy systems using ad-hoc batches of backfilled temporal data

  • Scheduling and previewing future states (e.g. publishing and content management)

  • Reconciling temporal data across eventually consistent systems

Applied industry-specific examples include:

  • Legal Documentation – maintain visibility of all critical dates relating to legal documents, including what laws were known to be applicable at the time, and any subsequent laws that may be relevant and applied retrospectively

  • Insurance Coverage – assess the level of coverage for a beneficiary across the lifecycle of care and legislation changes

  • Reconstruction of Trades – readily comply with evolving financial regulations

  • Adverse Events in Healthcare – accurately record a patient’s records over time and mitigate human error

  • Intelligence Gathering – build an accurate model of currently known information to aid predictions and understanding of motives across time

  • Criminal Investigations – efficiently organise analysis and evidence whilst enabling a simple retracing of investigative efforts

Example Queries

Crime Investigations

This example is based on an academic paper.

Indexing temporal data using existing B +-trees
Cheng Hian Goh, Hongjun Lu, Kian-Lee Tan, Published in Data Knowl. Eng. 1996
See "7. Support for complex queries in bitemporal databases"

During a criminal investigation it is critical to be able to refine a temporal understanding of past events as new evidence is brought to light, errors in documentation are accounted for, and speculation is corroborated. The paper referenced above gives the following query example:

Find all persons who are known to be present in the United States on day 2 (valid time), as of day 3 (transaction time)

The paper then lists a sequence of entry and departure events at various United States border checkpoints. We as the investigator will step through this sequence to monitor a set of suspects. These events will arrive in an undetermined chronological order based on how and when each checkpoint is able to manually relay the information.

Day 0

Assuming Day 0 for the investigation period is #inst "2018-12-31", the initial documents are ingested using the Day 0 valid time:

  {:crux.db/id :p2
   :entry-pt :SFO
   :arrival-time #inst "2018-12-31"
   :departure-time :na}

  {:crux.db/id :p3
   :entry-pt :LA
   :arrival-time #inst "2018-12-31"
   :departure-time :na}
  #inst "2018-12-31"

The first document shows that Person 2 was recorded entering via :SFO and the second document shows that Person 3 was recorded entering :LA.

Day 1

No new recorded events arrive on Day 1 (#inst "2019-01-01"), so there are no documents available to ingest.

Day 2

A single event arrives on Day 2 showing Person 4 arriving at :NY:

  {:crux.db/id :p4
   :entry-pt :NY
   :arrival-time #inst "2019-01-02"
   :departure-time :na}
  #inst "2019-01-02"
Day 3

Next, we learn on Day 3 that Person 4 departed from :NY, which is represented as an update to the existing document using the Day 3 valid time:

  {:crux.db/id :p4
   :entry-pt :NY
   :arrival-time #inst "2019-01-02"
   :departure-time #inst "2019-01-03"}
  #inst "2019-01-03"
Day 4

On Day 4 we begin to receive events relating to the previous days of the investigation.

First we receive an event showing that Person 1 entered :NY on Day 0 which must ingest using the Day 0 valid time #inst "2018-12-31":

  {:crux.db/id :p1
   :entry-pt :NY
   :arrival-time #inst "2018-12-31"
   :departure-time :na}
  #inst "2018-12-31"

We then receive an event showing that Person 1 departed from :NY on Day 3, so again we ingest this document using the corresponding Day 3 valid time:

  {:crux.db/id :p1
   :entry-pt :NY
   :arrival-time #inst "2018-12-31"
   :departure-time #inst "2019-01-03"}
  #inst "2019-01-03"

Finally, we receive two events relating to Day 4, which can be ingested using the current valid time:

  {:crux.db/id :p1
   :entry-pt :LA
   :arrival-time #inst "2019-01-04"
   :departure-time :na}

  {:crux.db/id :p3
   :entry-pt :LA
   :arrival-time #inst "2018-12-31"
   :departure-time #inst "2019-01-04"}
  #inst "2019-01-04"
Day 5

On Day 5 there is an event showing that Person 2, having arrived on Day 0 (which we already knew), departed from :SFO on Day 5.

  {:crux.db/id :p2
   :entry-pt :SFO
   :arrival-time #inst "2018-12-31"
   :departure-time #inst "2019-01-05"}
  #inst "2019-01-05"
Day 6

No new recorded events arrive on Day 6 (#inst "2019-01-06"), so there are no documents available to ingest.

Day 7

On Day 7 two documents arrive. The first document corrects the previous assertion that Person 3 departed on Day 4, which was misrecorded due to human error. The second document shows that Person 3 has only just departed on Day 7, which is how the previous error was noticed.

  {:crux.db/id :p3
   :entry-pt :LA
   :arrival-time #inst "2018-12-31"
   :departure-time :na}
  #inst "2019-01-04"

  {:crux.db/id :p3
   :entry-pt :LA
   :arrival-time #inst "2018-12-31"
   :departure-time #inst "2019-01-07"}
  #inst "2019-01-07"
Day 8

Two documents have been received relating to new arrivals on Day 8. Note that Person 3 has arrived back in the country again.

  {:crux.db/id :p3
   :entry-pt :SFO
   :arrival-time #inst "2019-01-08"
   :departure-time :na}
  #inst "2019-01-08"

  {:crux.db/id :p4
   :entry-pt :LA
   :arrival-time #inst "2019-01-08"
   :departure-time :na}
  #inst "2019-01-08"
Day 9

On Day 9 we learn that Person 3 also departed on Day 8.

  {:crux.db/id :p3
   :entry-pt :SFO
   :arrival-time #inst "2019-01-08"
   :departure-time #inst "2019-01-08"}
  #inst "2019-01-09"
Day 10

A single document arrives showing that Person 5 entered at :LA earlier that day.

  {:crux.db/id :p5
   :entry-pt :LA
   :arrival-time #inst "2019-01-10"
   :departure-time :na}
  #inst "2019-01-10"
Day 11

Similarly to the previous day, a single document arrives showing that Person 7 entered at :NY earlier that day.

  {:crux.db/id :p7
   :entry-pt :NY
   :arrival-time #inst "2019-01-11"
   :departure-time :na}
  #inst "2019-01-11"
Day 12

Finally, on Day 12 we learn that Person 6 entered at :NY that same day.

  {:crux.db/id :p6
   :entry-pt :NY
   :arrival-time #inst "2019-01-12"
   :departure-time :na}
  #inst "2019-01-12"
Question Time

Let’s review the question we need to answer to aid our investigations:

Find all persons who are known to be present in the United States on day 2 (valid time), as of day 3 (transaction time)

We are able to easily express this as a query in Crux:

  {:find [p entry-pt arrival-time departure-time]
   :where [[p :entry-pt entry-pt]
           [p :arrival-time arrival-time]
           [p :departure-time departure-time]]}
  #inst "2019-01-03"                    ; `as of` transaction time
  #inst "2019-01-02"                    ; `as at` valid time

The answer given by Crux is a simple set of the three relevant people along with the details of their last entry and confirmation that none of them were known to have yet departed at this point:

  #{[:p2 :SFO #inst "2018-12-31" :na]
    [:p3 :LA #inst "2018-12-31" :na]
    [:p4 :NY #inst "2019-01-02" :na]}

Retroactive Data Structures

At a theoretical level Crux has similar properties to retroactive data structures, which are data structures that support "efficient modifications to a sequence of operations that have been performed on the structure […​] modifications can take the form of retroactive insertion, deletion or updating of an operation that was performed at some time in the past".

Crux’s bitemporal indexes are partially persistent due to the immutability of transaction time. This allows you to query any previous version, but only update the latest version. The efficient representation of valid time in the indexes makes Crux "fully retroactive", which is analogous to partial persistence in the temporal dimension, and enables globally-consistent reads.

Crux does not natively implement "non-oblivious retroactivity" (i.e. persisted queries and cascading corrections), although this is an important area of investigation for event sourcing applications, temporal constraints, and reactive bitemporal queries.

In summary, the Crux indexes as a whole could be described as a "partially persistent and fully retroactive data structure".

Get Started


This guide contains simple steps showing how to transact data and run a simple query. However, there are a few topics you might benefit from learning about before you get too far with attempting to use Crux:

  • EDN – the extensible data notation format used throughout the Crux APIs, see Essential EDN for Crux.

  • The Datalog query language – Crux supports an EDN-flavoured version of Datalog. The Queries section within this documentation provides a good overview. You can also find an interactive tutorial for EDN-flavoured Datalog here.

  • Clojure – whilst a complete Java and HTTP API is provided, a basic understanding of Clojure is recommended – Clojure is a succinct and pragmatic data-oriented language with strong support for immutability and parallelism. See

Setting Up

Follow the below steps to quickly set yourself up a Crux playground…​

Project Dependency

First add the crux-core module as a project dependency:

  juxt/crux-core {:mvn/version "20.07-1.9.2-beta"}

Start a Crux node

(require '[crux.api :as crux]
         '[ :as io])

(defn start-standalone-node ^crux.api.ICruxAPI [storage-dir]
  (crux/start-node {:crux.node/topology '[crux.standalone/topology]
                    :crux.kv/db-dir (str (io/file storage-dir "db"))}))

(comment ; which can be used as
  (def node (start-standalone-node "crux-store")))

For the purposes of this "Hello World" we are using the simplest configuration of Crux, which only requires the crux-core module. All of the logs and indexes are held purely in-memory, so your data won’t be persisted across restarts. This is useful when testing and experimenting as there is no additional complexity or stateful use of Kafka or RocksDB to think about.

Once started, the node gives us access to an empty Crux database instance and API for running transactions and issuing queries. Depending on how you configure your topology, multiple nodes may share the same database, but in this case your node exclusively owns the database instance. If a Kafka or JDBC module topology was used instead then the same database instance would be available across multiple nodes to provide fault-tolerant durability, high-availabilty and horizontal scaling.


   {:crux.db/id :dbpedia.resource/Pablo-Picasso ; id
    :name "Pablo"
    :last-name "Picasso"}
   #inst "2018-05-18T09:20:27.966-00:00"]]) ; valid time


A query executes against a db context. This db context represents the database "as a value" at a fixed & consistent point-in-time against which other APIs can be used. The db context should be thought of as a lightweight reference - it is not a container for a resource or lock. The point-in-time is implicitly "now", unless otherwise specified, and there the context returned represents the latest view of the database following the most recently processed transaction. The main query API is eager but you may also consume the results lazily as the entire Crux query engine and index structure is lazy by design.

(crux/q (crux/db node)
        '{:find [e]
          :where [[e :name "Pablo"]]})

You should get:


An entity query would be:

(crux/entity (crux/db node) :dbpedia.resource/Pablo-Picasso)

You should get:

{:crux.db/id :dbpedia.resource/Pablo-Picasso
 :name "Pablo"
 :last-name "Picasso"}

Next Steps

Now you know the basics of how to interact with Crux you may want to dive into our tutorials below. Otherwise, let’s take a look at further options for setting up Crux.


Space Adventure

The "choose your own adventure" style tutorial series is an interactive story that you can follow along with and complete assignments.

For the no-install browser-based tutorial, follow the Nextjournal edition.

Otherwise, visit the original version in our blog post.

A Bitemporal Tale

For an interactive no-install browser-based tutorial see the Nextjournal edition of the Tale.

Otherwise, see the original the version in our blog post.

Essential EDN

edn (Extensible Data Notation)

; Comments start with a semicolon.
; Anything after the semicolon is ignored.

nil         ; also known in other languages as null

; Booleans

; Strings are enclosed in double quotes
"time travel is fun"
"time traveller's fun"

; Keywords start with a colon. They behave like enums. Kind of
; like symbols in Ruby.

; Symbols are used to represent identifiers.
; You can namespace symbols by using /. Whatever precedes / is
; the namespace of the symbol.
kitchen/spoon ; not the same as spoon
github/fork   ; you can't eat with this

; Underscore is a valid symbol identifier that has a special
; meaning in Crux Datalog where it is treated like a wildcard
; that prevents binding/unification. These are called "blanks".

; Integers and floats

; Lists are sequences of values
(:widget :sprocket 9 "some text!")

; Vectors allow random access. Kind of like arrays in JavaScript.
[:first 1 2 :fourth]

; Maps are associative data structures that associate the key with its value
{:avocado     2
 :pepper      1
 :lemon-juice 3.5}

; You may use commas for readability. They are treated as whitespace.
{:avocado 2, :pepper 1, :lemon-juice 3.5}

; Sets are collections that contain unique elements.
#{:a :b 88 "huat"}

; Quoting is used by languages to prevent evaluation of an edn data
; structure. In Clojure the apostrophe is used as the short-hand for
; quoting and it enables you to easily construct complex Crux queries.
; Without the apostrophes inside this map, Clojure would expect `a`,
; `b`, and `c` to be valid symbols.
{:find '[a b c]
 :where [['a 'b 'c]]}

; Adapted from
; License
; © 2019 Jason Yeo, Jonathan D Johnston

For further information on EDN, see Official EDN Format



To start a Crux node, use the Java API or the Clojure crux.api.

Within Clojure, we call start-node from within crux.api, passing it a set of options for the node. There are a number of different configuration options a Crux node can have, grouped into topologies.

Table 2. Crux Topologies
Name Transaction Log Topology


Uses local event log



Uses Kafka



Uses JDBC event log


Use a Kafka node when horizontal scalability is required or when you want the guarantees that Kafka offers in terms of resiliency, availability and retention of data.

Multiple Kafka nodes participate in a cluster with Kafka as the primary store and as the central means of coordination.

The JDBC node is useful when you don’t want the overhead of maintaining a Kafka cluster. Read more about the motivations of this setup here.

The Standalone node is a single Crux instance which has everything it needs locally. This is good for experimenting with Crux and for small to medium sized deployments, where running a single instance is permissible.

Crux nodes implement the ICruxAPI interface and are the starting point for making use of Crux. Nodes also implement and can therefore be lifecycle managed.


The following properties are within the topology used as a base for the other topologies, crux.node:

Table 3. crux.node configuration
Property Default Value



From version 20.01-1.7.0-alpha-SNAPSHOT the kv-store should be specified by including an extra module in the node’s topology vector. For example a rocksdb backend looks like {:crux.node/topology '[crux.standalone/topology crux.kv.rocksdb/kv-store]}

The following set of options are used by KV backend implementations, defined within crux.kv:

Table 4. crux.kv options
Property Description Default Value


Directory to store K/V files



Sync the KV store to disk after every write?



Check and store index version upon start?


Standalone Node

Using a Crux standalone node is the best way to get started. Once you’ve started a standalone Crux instance as described below, you can then follow the getting started example.

Local Standalone Mode
Table 5. Standalone configuration
Property Description Default Value


Key/Value store to use for standalone event-log persistence



Directory used to store the event-log and used for backup/restore, i.e. "data/eventlog-1"


Sync the event-log backend KV store to disk after every write?


Project Dependency

  juxt/crux-core {:mvn/version "20.07-1.9.2-beta"}

Getting started

The following code creates a default crux.standalone node which runs completely within memory (with both the event-log store and db store using crux.kv.memdb/kv):

(require '[crux.api :as crux]
         '[ :as io])

(defn start-standalone-node ^crux.api.ICruxAPI [storage-dir]
  (crux/start-node {:crux.node/topology '[crux.standalone/topology]
                    :crux.kv/db-dir (str (io/file storage-dir "db"))}))

(comment ; which can be used as
  (def node (start-standalone-node "crux-store")))

You can later stop the node if you wish:

(.close node)


RocksDB is often used as Crux’s primary store (in place of the in-memory kv store in the example above). In order to use RocksDB within Crux, however, you must first add RocksDB as a project dependency:

Project Dependency

  juxt/crux-rocksdb {:mvn/version "20.07-1.9.2-beta"}

Starting a node using RocksDB

(defn start-rocks-node [storage-dir]
  (crux/start-node {:crux.node/topology '[crux.standalone/topology
                    :crux.kv/db-dir (str (io/file storage-dir "db"))}))

You can create a node with custom RocksDB options by passing extra keywords in the topology. These are:

  • :crux.kv.rocksdb/disable-wal?, which takes a boolean (if true, disables the write ahead log)

  • :crux.kv.rocksdb/db-options, which takes a RocksDB 'Options' object (see more here, from the RocksDB javadocs)

To include rocksdb metrics in monitoring crux.kv.rocksdb/kv-store-with-metrics should be included in the topology map instead of the above.


An alternative to RocksDB, LMDB provides faster queries in exchange for a slower ingest rate.

Project Dependency

  juxt/crux-lmdb {:mvn/version "20.07-1.9.2-alpha"}

Starting a node using LMDB

(defn start-lmdb-node [storage-dir]
  (crux/start-node {:crux.node/topology '[crux.standalone/topology
                    :crux.kv/db-dir (str (io/file storage-dir "db"))}))

Kafka Nodes

When using Crux at scale it is recommended to use multiple Crux nodes connected via a Kafka cluster.

Local Cluster Mode

Kafka nodes have the following properties:

Table 6. Kafka node configuration
Property Description Default value


URL for connecting to Kafka



Name of Kafka transaction log topic



Name of Kafka documents topic



Option to automatically create Kafka topics if they do not already exist



Number of partitions for the document topic



Number of times to replicate data on Kafka



File to supply Kafka connection properties to the underlying Kafka API


Map to supply Kafka connection properties to the underlying Kafka API

Project Dependencies

  juxt/crux-core {:mvn/version "20.07-1.9.2-beta"}
  juxt/crux-kafka {:mvn/version "20.07-1.9.2-beta"}

Getting started

Use the API to start a Kafka node, configuring it with the bootstrap-servers property in order to connect to Kafka:

(defn start-cluster [kafka-port storage-dir]
  (crux/start-node {:crux.node/topology '[crux.kafka/topology crux.kv.rocksdb/kv-store]
                    :crux.kafka/bootstrap-servers (str "localhost:" kafka-port)
                    :crux.kv/db-dir (str (io/file storage-dir "db"))}))
If you don’t specify kv-store then by default the Kafka node will use RocksDB. You will need to add RocksDB to your list of project dependencies.

You can later stop the node if you wish:

(.close node)

Embedded Kafka

Crux is ready to work with an embedded Kafka for when you don’t have an independently running Kafka available to connect to (such as during development).

Project Depencies

  juxt/crux-core {:mvn/version "20.07-1.9.2-beta"}
  juxt/crux-kafka-embedded {:mvn/version "20.07-1.9.2-beta"}

Getting started

(require '[crux.kafka.embedded :as ek])

(defn start-embedded-kafka [kafka-port storage-dir]
  (ek/start-embedded-kafka {:crux.kafka.embedded/zookeeper-data-dir (str (io/file storage-dir "zk-data"))
                            :crux.kafka.embedded/kafka-log-dir (str (io/file storage-dir "kafka-log"))
                            :crux.kafka.embedded/kafka-port kafka-port}))

You can later stop the Embedded Kafka if you wish:

(.close embedded-kafka)

JDBC Nodes

JDBC Nodes use next.jdbc internally and pass through the relevant configuration options that you can find here.

Local Cluster Mode

Below is the minimal configuration you will need:

Table 7. Minimal JDBC Configuration
Property Description


One of: postgresql, oracle, mysql, h2, sqlite


Database Name

Depending on the type of JDBC database used, you may also need some of the following properties:

Table 8. Other JDBC Properties
Property Description


For h2 and sqlite


Database Host


Database Username


Database Password

Project Dependencies

  juxt/crux-core {:mvn/version "20.07-1.9.2-beta"}
  juxt/crux-jdbc {:mvn/version "20.07-1.9.2-beta"}

Getting started

Use the API to start a JDBC node, configuring it with the required parameters:

(defn start-jdbc-node []
  (crux/start-node {:crux.node/topology '[crux.jdbc/topology]
                    :crux.jdbc/dbtype "postgresql"
                    :crux.jdbc/dbname "cruxdb"
                    :crux.jdbc/host "<host>"
                    :crux.jdbc/user "<user>"
                    :crux.jdbc/password "<password>"}))


Crux can be used programmatically as a library, but Crux also ships with an embedded HTTP server, that allows clients to use the API remotely via REST.

Remote Cluster Mode

Set the server-port configuration property on a Crux node to expose a HTTP port that will accept REST requests:

Table 9. HTTP Nodes Configuration
Component Property Description



Port for Crux HTTP Server e.g. 8080

Visit the guide on using the REST api for examples of how to interact with Crux over HTTP.

Starting a HTTP Server

Project Dependency

  juxt/crux-http-server {:mvn/version "20.07-1.9.2-alpha"}

You can start up a HTTP server on a node by including crux.http-server/module in your topology, optionally passing the server port:

(defn start-standalone-http-node [port storage-dir]
  (crux/start-node {:crux.node/topology '[crux.standalone/topology crux.http-server/module]
                    :crux.kv/db-dir (str (io/file storage-dir "db"))
                    :crux.http-server/port port
                    ;; by default, the HTTP server is read-write - set this flag to make it read-only
                    :crux.http-server/read-only? false}))

Using a Remote API Client

Project Dependency

  juxt/crux-http-client {:mvn/version "20.07-1.9.2-beta"}

To connect to a pre-existing remote node, you need a URL to the node and the above on your classpath. We can then call crux.api/new-api-client, passing the URL. If the node was started on localhost:3000, you can connect to it by doing the following:

(defn start-http-client [port]
  (crux/new-api-client (str "http://localhost:" port)))
The remote client requires valid and transaction time to be specified for all calls to crux/db.


If you wish to use Crux with Docker (no JVM/JDK/Clojure install required!) we have the following:

  • Crux HTTP Node: An image of a standalone Crux node (using a in memory kv-store by default) & HTTP server, useful if you wish to a freestanding Crux node accessible over HTTP, only having to use Docker.


Alongside the various images available on Dockerhub, there are a number of artifacts available for getting started quickly with Crux. These can be found on the latest release of Crux. Currently, these consist of a number of common configuration uberjars and a custom artifact builder.

To create your own custom artifacts for crux, do the following:

  • Download and extract the crux-builder.tar.gz from the latest release

  • You can build an uberjar using either Clojure’s deps.edn or Maven (whichever you’re more comfortable with)

    • For Clojure, you can add further Crux dependencies in the deps.edn file, set the node config in crux.edn, and run

    • For Maven, it’s the same, but dependencies go in pom.xml

  • Additionally, you can build a Docker image using the script in the docker directory.

Backup and Restore

Crux provides utility APIs for local backup and restore when you are using the standalone mode.

An additional example of backup and restore is provided that only applies to a stopped standalone node here.

In a clustered deployment, only Kafka’s official backup and restore functionality should be relied on to provide safe durability. The standalone mode’s backup and restore operations can instead be used for creating operational snapshots of a node’s indexes for scaling purposes.


Crux can display metrics through a variety of interfaces. Internally, it uses dropwizard’s metrics library to register all the metrics and then passes the registry around to reporters to display the data in a suitable application.

Project Dependency

In order to use any of the crux-metrics reporters, you will need to include the following dependency on crux-metrics:

  juxt/crux-metrics {:mvn/version "20.07-1.9.2-alpha"}

The various types of metric reporters bring in their own sets of dependencies, so we expect these to be provided by the user in their own project (in order to keep the core of crux-metrics as lightweight as possible). Reporters requiring further dependencies will have an 'additional dependencies' section.

Getting Started

By default indexer and query metrics are included. It is also possible to add rocksdb metrics when it is being used. These arguments can be used whenever any of the topologies to display metrics are included.

Table 10. Registry arguments
Field Property Default Description




Includes indexer metrics in the metrics registry




Includes query metrics in the metrics registry

RocksDB metrics

To include the RocksDB metrics when monitoring the 'crux.kv.rocksdb/kv-store-with-metrics module should be included in the topology map (in place of 'crux.kv.rocksdb/kv-store):

(api/start-node {:crux.node/topology ['crux.standalone/topology


Crux currently supports the following outputs:


This component logs metrics to sysout at regular intervals.

(api/start-node {:crux.node/topology ['crux.standalone/topology
Table 11. Console metrics arguments
Field Property Description



Interval in seconds between output dump



Unit which rates are displayed



Unit which durations are displayed


This component logs metrics to a csv file at regular intervals. Only filename is required.

(api/start-node {:crux.node/topology ['crux.standalone/topology
                 :crux.metrics.dropwizard.csv/file-name "csv-out"
Table 12. CSV metrics arguments
Field Property Required Description




Output folder name (must already exist)




Interval in seconds between file write




Unit which rates are displayed




Unit which durations are displayed


Provides JMX mbeans output.

Additional Dependencies

You will need to add the following dependencies, alongside crux-metrics, in your project:

   [io.dropwizard.metrics/metrics-jmx "4.1.2"]

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
Table 13. JMX metrics arguments
Field Property Description



Change metrics domain group



Unit which rates are displayed



Unit which durations are displayed


Additional Dependencies

You will need to add the following dependencies, alongside crux-metrics, in your project:

   [org.dhatim/dropwizard-prometheus "2.2.0"]
   [io.prometheus/simpleclient_pushgateway "0.8.1"]
   [io.prometheus/simpleclient_dropwizard "0.8.1"]
   [io.prometheus/simpleclient_hotspot "0.8.1"]
   [clj-commons/iapetos "0.1.9"]

The prometheus http exporter starts a standalone server hosting prometheus metrics by default at http://localhost:8080/metrics. The port can be changed with an argument, and jvm metrics can be included in the dump.

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
Table 14. Prometheus exporter metrics arguments
Field Property Description



Desired port number for prometheus client server. Defaults to 8080



If true jvm metrics are included in the metrics dump


This component pushes prometheus metrics to a specified pushgateway at regular durations (by default 1 second).

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
                 :crux.metric.dropwizard.prometheus/pushgateway "localhost:9090"
Table 15. Prometheus reporter metrics arguments
Field Property Description



Address of the prometheus server. This field is required



Time in ISO-8601 standard between metrics push. Defaults to "PT1S".



Prefix all metric titles with this string

AWS Cloudwatch metrics

Pushes metrics to Cloudwatch. This is indented to be used with a crux node running inside a EBS/Fargate instance. It attempts to get the relevant credentials through system variables. Crux uses this in its aws benchmarking system which can be found here.

Additional Dependencies

You will need to add the following dependencies, alongside crux-metrics, in your project:

   [io.github.azagniotov/dropwizard-metrics-cloudwatch "2.0.3"]
   [ "2.10.61"]

Getting Started

(api/start-node {:crux.node/topology ['crux.standalone/topology
Table 16. Cloudwatch metrics arguments
Field Property Description



Time between metrics push



When true the reporter outputs to cloujure.logging/log*



Should jvm metrics be included in the pushed metrics?



Should jvm metrics be included in the pushed metrics?



Cloudwatch region for uploading metrics. Not required inside a EBS/Fargate instance but needed for local testing.



A list of strings to ignore specific metrics, in gitignore format. e.g. ["crux.tx" "!crux.tx.ingest"] would ignore crux.tx.*, except crux.tx.ingest

Tips for running

To upload metrics to Cloudwatch locally the desired region needs to be specified with :crux.metrics.dropwizard.prometheus/region, and your aws credentials at ~/.aws/credentials need to be visible (If ran in docker, mount these as a volume).

When ran on aws if using cloudformation the node needs to have the permission 'cloudwatch:PutMetricData'. For a example see Crux’s benchmarking system here.



There are four transaction (write) operations:

Table 17. Write Operations
Operation Purpose


Write a version of a document


Deletes the specific document at a given valid time


Check the document state against the given document


Evicts a document entirely, including all historical versions

A document looks like this:

{:crux.db/id :dbpedia.resource/Pablo-Picasso
 :name "Pablo"
 :last-name "Picasso"}

In practice when using Crux, one calls crux.db/submit-tx with a sequence of transaction operations:

 {:crux.db/id :dbpedia.resource/Pablo-Picasso
  :name "Pablo"
  :last-name "Picasso"}
 #inst "2018-05-18T09:20:27.966-00:00"]]

If the transaction contains pre-conditions, all pre-conditions must pass or the entire transaction is aborted. This happens at the query node during indexing, and not when submitting the transaction.

For operations containing documents, the id and the document are hashed, and the operation and hash is submitted to the tx-topic in the event log. The document itself is submitted to the doc-topic, using its content hash as key. In Kafka, the doc-topic is compacted, which enables later deletion of documents.

Valid IDs

The following types of :crux.db/id are allowed:

  • Keyword (e.g. {:crux.db/id :my-id} or {:crux.db/id :dbpedia.resource/Pablo-Picasso})

  • UUID (e.g. {:crux.db/id #uuid "6f0232d0-f3f9-4020-a75f-17b067f41203"} or {:crux.db/id #crux/id "6f0232d0-f3f9-4020-a75f-17b067f41203"})

  • URI (e.g. {:crux.db/id #crux/id ""})

  • URL (e.g. {:crux.db/id #crux/id ""}), including http, https, ftp and file protocols

  • Maps (e.g. {:crux.db/id #crux/id {:this :id-field}}) (Note: see issue #362).

The #crux/id reader literal will take any string and attempt to coerce it into a valid ID. Use of #crux/id with a valid ID type will also work (e.g. {:crux.db/id #crux/id :my-id}).

URIs and URLs are interpreted using Java classes ( and respectively) and therefore you can also use these directly.



Put’s a document into Crux. If a document already exists with the given :crux.db/id, a new version of this document will be created at the supplied valid time.

 {:crux.db/id :dbpedia.resource/Pablo-Picasso :first-name :Pablo} (1)
 #inst "2018-05-18T09:20:27.966-00:00"] (2)
  1. The document itself. Note that the ID must be included as part of the document.

  2. valid time

Note that valid time is optional and defaults to transaction time, which is taken from the Kafka log.

Crux puts into the past at a single point by default, so to overwrite several versions across a range in valid time, you can either submit a transaction containing several operations or supply a third argument to specify an end valid time. This period is inclusive-exclusive, such that the start of the validity period is included in the validity range, while the end is excluded.


Deletes a document at a given valid time. Historical versions of the document will still be available.

[:crux.tx/delete :dbpedia.resource/Pablo-Picasso
#inst "2018-05-18T09:21:52.151-00:00"]


Match operations check the current state of an entity - if the entity doesn’t match the provided doc, the transaction will not continue. You can also pass nil to check that the entity doesn’t exist prior to your transaction.

 :ivan (1)
 {..} (2)
 #inst "2018-05-18T09:21:31.846-00:00"] (3)
  1. Entity id

  2. Document (or nil)

  3. (optional) valid time


Evicts a document from Crux. Historical versions of the document will no longer be available.

[:crux.tx/evict :dbpedia.resource/Pablo-Picasso]

Transaction functions

Transaction functions are user-supplied functions that run on the individual Crux nodes when a transaction is being ingested. They can take any number of parameters, and return normal transaction operations which are then indexed as above. If they return false or throw an exception, the whole transaction will roll back.

Transaction functions can be used, for example, to safely check the current database state before applying a transaction, for integrity checks, or to patch an entity.

Transaction functions are created/updated by submitting a document to Crux with a crux.db/fn key. These functions are passed a 'context' parameter, which can be used to obtain a database value using db or open-db.

(crux/submit-tx node [[:crux.tx/put {:crux.db/id :increment-age
                                     ;; note that the function body is quoted.
                                     :crux.db/fn '(fn [ctx eid]
                                                    (let [db (crux.api/db ctx)
                                                          entity (crux.api/entity db eid)]
                                                      [[:crux.tx/put (update entity :age inc)]]))}]])

You can then invoke these transaction functions by submitting a :crux.tx/fn operation:

(crux/submit-tx node [[:crux.tx/put {:crux.db/id :ivan, :age 40}]])
(crux/submit-tx node [[:crux.tx/fn :increment-age :ivan]])

;; once those transactions have been indexed

(crux/entity (crux/db node) :ivan)
;; => {:crux.db/id :ivan, :age 41}


You can subscribe to Crux events using the (crux.api/listen node event-opts f) function. Currently we expose one event type, :crux/indexed-tx, called when Crux indexes a transaction.

(require '[crux.api :as crux])

(crux/listen node {:crux/event-type :crux/indexed-tx, :with-tx-ops? true}
  (fn [ev]
    (println "event received!")
    (clojure.pprint/pprint ev)))

(crux/submit-tx node [[:crux.tx/put {:crux.db/id :ivan, :name "Ivan"}]])


event received!
{:crux/event-type :crux/indexed-tx,
 :crux.tx/tx-id ...,
 :crux.tx/tx-time #inst "...",
 :committed? true,
 :crux/tx-ops [[:crux.tx/put {:crux.db/id :ivan, :name "Ivan"}]]}

You can .close the return value from (crux.api/listen …​) to detach the listener, should you need to.



Crux is a document database that provides you with a comprehensive means of traversing and querying across all of your documents and data without any need to define a schema ahead of time. This is possible because Crux is "schemaless" and automatically indexes the top-level fields in all of your documents to support efficient ad-hoc joins and retrievals. With these capabilities you can quickly build queries that match directly against the relations in your data without worrying too much about the shape of your documents or how that shape might change in future.

Crux is also a graph database. The central characteristic of a graph database is that it can support arbitrary-depth graph queries (recursive traversals) very efficiently by default, without any need for schema-level optimisations. Crux gives you the ability to construct graph queries via a Datalog query language and uses graph-friendly indexes to provide a powerful set of querying capabilities. Additionally, when Crux’s indexes are deployed directly alongside your application you are able to easily blend Datalog and code together to construct highly complex graph algorithms.

This page walks through many of the more interesting queries that run as part of Crux’s default test suite. See test/crux/query_test.clj for the full suite of query tests and how each test listed below runs in the wider context.

Extensible Data Notation (edn) is used as the data format for the public Crux APIs. To gain an understanding of edn see Essential EDN for Crux.

Note that all Crux Datalog queries run using a point-in-time view of the database which means the query capabilities and patterns presented in this section are not aware of valid times or transaction times.

Basic Query

A Datalog query consists of a set of variables and a set of clauses. The result of running a query is a result set of the possible combinations of values that satisfy all of the clauses at the same time. These combinations of values are referred to as "tuples".

The possible values within the result tuples are derived from your database of documents. The documents themselves are represented in the database indexes as "entity–attribute–value" (EAV) facts. For example, a single document {:crux.db/id :myid :color "blue" :age 12} is transformed into two facts [[:myid :color "blue"][:myid :age 12]].

In the most basic case, a Datalog query works by searching for "subgraphs" in the database that match the pattern defined by the clauses. The values within these subgraphs are then returned according to the list of return variables requested in the :find vector within the query.

Our first query runs on a database that contains the following 3 documents which get broken apart and indexed as "entities":

        [{:crux.db/id :ivan
          :name "Ivan"
          :last-name "Ivanov"}

         {:crux.db/id :petr
          :name "Petr"
          :last-name "Petrov"}

         {:crux.db/id :smith
          :name "Smith"
          :last-name "Smith"}]

Note that :ivan, :petr and :smith are edn keywords, which may be used as document IDs in addition to UUIDs.

The following query has 3 clauses, represented as edn vectors within the :where vector. These clauses constrain the result set to match only the entity (or subgraph of interconnected entities) that satisfy all 3 clauses at once:

 '{:find [p1]
   :where [[p1 :name n]
           [p1 :last-name n]
           [p1 :name "Smith"]]}

Let’s try to work out what these 3 clauses do…​

p1 and n are logical variables. Logic variables are often prefixed with ? for clarity but this is optional.

[p1 :name n] is looking for all entities that have a value under the attribute of :name and then binds the corresponding entity ID to p1 and the corresponding value to n. Since all 3 entities in our database have a :name attribute, this clause alone will simply return all 3 entities.

[p1 :last-name n] reuses the variable n from the previous clause which is significant because it constrains the query to only look for entities where the value of :name (from the first clause) is equal to the value of :last-name (from the second clause). Looking at documents that were processed by our database there is only one possible entity that can be returned, because it has the same values :name and :last-name.

[p1 :name "Smith"] only serves to reinforce the conclusion from the previous two clauses which is that the variable n can only be matched against the string "Smith" within our database.

…​so what is the actual result of the query? Well that is defined by the :find vector which states that only the values corresponding to p1 should be returned, which in this case is simply :smith (the keyword database ID for the document relating to our protagonist "Smith Smith"). Results are returned as an edn set, which means duplicate results will not appear.

Passing the query into crux.api/q (see how to submit a query in 'Get Started'), you get an edn result set containing the value :smith



For the next set of queries we will again use the same set of documents for our database as used in the previous section:

        [{:crux.db/id :ivan
          :name "Ivan"
          :last-name "Ivanov"}

         {:crux.db/id :petr
          :name "Petr"
          :last-name "Petrov"}

         {:crux.db/id :smith
          :name "Smith"
          :last-name "Smith"}]

Query: "Match on entity ID and value"

 {:find '[n]
  :where '[[e :name n]]
  :args [{'e :ivan
          'n "Ivan"}]}

Our first query supplies two arguments to the query via a map within the :args vector. The effect of this is to make sure that regardless of whether other :name values in the database also equal "Ivan", that only the entity with an ID matching our specific :ivan ID is considered within the query. Use of arguments means we can avoid hard-coding values directly into the query clauses.

Result Set:


Query: "Match entities with given values"

 {:find '[e]
  :where '[[e :name n]]
  :args [{'n "Ivan"}
         {'n "Petr"}]}

This next query shows how multiple argument values can be mapped to a single field. This allows us to usefully parameterise the input to a query such that we do not have to rerun a single query multiple times (which would be significantly less efficient!).

Result Set:

#{[:petr] [:ivan]}

Query: "Match entities with given value tuples"

 {:find '[e]
  :where '[[e :name n]
           [e :last-name l]]
  :args [{'n "Ivan" 'l "Ivanov"}
         {'n "Petr" 'l "Petrov"

Here we see how we can extend the parameterisation to match using multiple fields at once.

Result Set:

#{[:petr] [:ivan]}

Query: "Use range constraints with arguments"

 {:find '[age]
  :where '[[(>= age 21)]]
  :args [{'age 22}]}

Finally we can see how we can return an argument that passes all of the predicates by including it in the :find vector. This essentially bypasses any interaction with the data in our database.

Result Set:



Something else we can do with arguments is apply predicates to them directly within the clauses. Predicates return either true or false but all predicates used in clauses must return true in order for the given combination of field values to be part of the valid result set:

 {:find '[n]
  :where '[[(re-find #"I" n)]
           [(= l "Ivanov")]]
  :args [{'n "Ivan" 'l "Ivanov"}
         {'n "Petr" 'l "Petrov"}]}

In this case only :name "Ivan" satisfies [(re-find #"I" n)] (which returns true for any values that begin with "I"). Any fully qualified Clojure function that returns a boolean can be used in place of re-find, for example:

   {:find '[age]
    :where '[[(odd? age)]]
    :args [{'age 22} {'age 21}]}



Valid time travel

Congratulations! You already know enough about queries to build a simple CRUD application with Crux. However, your manager has just told you that the new CRUD application you have been designing needs to backfill the historical document versions from the legacy CRUD application. Luckily Crux makes it easy for your application to both insert and retrieve these old versions.

Here we will see how you are able to run queries at a given point in the valid time axis against, implicitly, the most recent transaction time.

First, we transact a very old document into the database with the ID :malcolm and the :name "Malcolm", and specify the valid time instant at which this document became valid in the legacy system: #inst "1986-10-22".

    {:crux.db/id :malcolm :name "Malcolm" :last-name "Sparks"}
    #inst "1986-10-22"

Next we transact a slightly more recent (though still very old!) revision of that same document where the :name has been corrected to "Malcolma", again using a historical timestamp extracted from the legacy system.

    {:crux.db/id :malcolm :name "Malcolma" :last-name "Sparks"}
    #inst "1986-10-24"

We are then able to query at different points in the valid time axis to check for the validity of the correction. We define a query q:

  '{:find [e]
    :where [[e :name "Malcolma"]
            [e :last-name "Sparks"]]}

Firstly we can verify that "Malcolma" was unknown at #inst "1986-10-23".

; Using Clojure: `(api/q (api/db my-crux-system #inst "1986-10-23") q)`

Result Set:


We can then verify that "Malcolma" is the currently known :name for the entity with ID :malcolm by simply not specifying a valid time alongside the query. This will be the case so long as there are no newer versions (in the valid time axis) of the document that affect the current valid time version.

; Using Clojure: `(api/q (api/db my-crux-system) q)`

Result Set:



Query: "Join across entities on a single attribute"

Given the following documents in the database

        [{:crux.db/id :ivan :name "Ivan"}
         {:crux.db/id :petr :name "Petr"}
         {:crux.db/id :sergei :name "Sergei"}
         {:crux.db/id :denis-a :name "Denis"}
         {:crux.db/id :denis-b :name "Denis"}]

We can run a query to return a set of tuples that satisfy the join on the attribute :name

 '{:find [p1 p2]
   :where [[p1 :name n]
           [p2 :name n]]}

Result Set:

#{[:ivan :ivan]
  [:petr :petr]
  [:sergei :sergei]
  [:denis-a :denis-a]
  [:denis-b :denis-b]
  [:denis-a :denis-b]
  [:denis-b :denis-a]}

Note that every person joins once, plus 2 more matches.

Query: "Join with two attributes, including a multi-valued attribute"

Given the following documents in the database

      [{:crux.db/id :ivan :name "Ivan" :last-name "Ivanov"}
       {:crux.db/id :petr :name "Petr" :follows #{"Ivanov"}}]

We can run a query to return a set of entities that :follows the set of entities with the :name value of "Ivan"

 '{:find [e2]
   :where [[e :last-name l]
           [e2 :follows l]
           [e :name "Ivan"]]}

Result Set:


Note that because Crux is schemaless there is no need to have elsewhere declared that the :follows attribute may take a value of edn type set.

Ordering and Pagination

A Datalog query naturally returns a result set of tuples, however, the tuples can also be consumed as a sequence and therefore you will always have an implicit order available. Ordinarily this implicit order is not meaningful because the join order and result order are unlikely to correlate.

The :order-by option is available for use in the query map to explicitly control the result order.

'{:find [time device-id temperature humidity]
  :where [[c :condition/time time]
          [c :condition/device-id device-id]
          [c :condition/temperature temperature]
          [c :condition/humidity humidity]]
  :order-by [[time :desc] [device-id :asc]]}

Use of :order-by will typically require that results are fully-realised by the query engine, however this happens transparently and it will automatically spill to disk when sorting large result sets.

Basic :offset and :limit options are supported however typical pagination use-cases will need a more comprehensive approach because :offset will naively scroll through the initial result set each time.

'{:find [time device-id temperature humidity]
  :where [[c :condition/time time]
          [c :condition/device-id device-id]
          [c :condition/temperature temperature]
          [c :condition/humidity humidity]]
  :order-by [[device-id :asc]]
  :limit 10
  :offset 90}

Pagination relies on efficient retrieval of explicitly ordered documents and this may be achieved using a user-defined attribute with values that get sorted in the desired order. You can then use this attribute within your Datalog queries to apply range filters using predicates.

{:find '[time device-id temperature humidity]
 :where '[[c :condition/time time]
          [c :condition/device-id device-id]
          [(>= device-id my-offset)]
          [c :condition/temperature temperature]
          [c :condition/humidity humidity]]
 :order-by '[[device-id :asc]]
 :limit 10
 :args [{'my-offset 990}]}

Additionally, since Crux stores documents and can traverse arbitrary keys as document references, you can model the ordering of document IDs with vector values, e.g. {:crux.db/id :zoe :closest-friends [:amy :ben :chris]}

More powerful ordering and pagination features may be provided in the future. Feel free to open an issue or get in touch to discuss your requirements.


This example of a rule demonstrates a recursive traversal of entities that are connected to a given entity via the :follow attribute.

'{:find [?e2]
  :where [(follow ?e1 ?e2)]
  :args [{?e1 :1}]
  :rules [[(follow ?e1 ?e2)
           [?e1 :follow ?e2]]
          [(follow ?e1 ?e2)
           [?e1 :follow ?t]
           (follow ?t ?e2)]]})

Streaming Queries

Query results can also be streamed, particularly for queries whose results may not fit into memory. For these, we use crux.api/open-q, which returns a Closeable sequence.

We’d recommend using with-open to ensure that the sequence is closed properly. Additionally, ensure that the sequence (as much of it as you need) is eagerly consumed within the with-open block - attempting to use it outside (either explicitly, or by accidentally returning a lazy sequence from the with-open block) will result in undefined behaviour.

(with-open [res (crux/open-q (crux/db node)
                             '{:find [p1]
                               :where [[p1 :name n]
                                       [p1 :last-name n]
                                       [p1 :name "Smith"]]})]
  (doseq [tuple (iterator-seq res)]
    (prn tuple)))

History API

Full Document History

Crux allows you to retrieve all versions of a document:

    {:crux.db/id :ids.persons/Jeff
     :person/name "Jeff"
     :person/wealth 100}
    #inst "2018-05-18T09:20:27.966"]
    {:crux.db/id :ids.persons/Jeff
     :person/name "Jeff"
     :person/wealth 1000}
    #inst "2015-05-18T09:20:27.966"]])

{:crux.tx/tx-id 1555314836178,
 :crux.tx/tx-time #inst "2019-04-15T07:53:56.178-00:00"}

(api/history system :ids.persons/Jeff)

; yields
[{:crux.db/id ; sha1 hash of document id
  :crux.db/content-hash ; sha1 hash of document contents
  :crux.db/valid-time #inst "2018-05-18T09:20:27.966-00:00",
  :crux.tx/tx-time #inst "2019-04-15T07:53:55.817-00:00",
  :crux.tx/tx-id 1555314835817}
 {:crux.db/id "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613",
  :crux.db/content-hash "a95f149636e0a10a78452298e2135791c0203529",
  :crux.db/valid-time #inst "2015-05-18T09:20:27.966-00:00",
  :crux.tx/tx-time #inst "2019-04-15T07:53:56.178-00:00",
  :crux.tx/tx-id 1555314836178}]

Document History Range

Retrievable document versions can be bounded by four time coordinates:

  • valid-time-start

  • tx-time-start

  • valid-time-end

  • tx-time-end

All coordinates are inclusive. All coordinates can be null.

(api/history-range system :ids.persons/Jeff
  #inst "2015-05-18T09:20:27.966"  ; valid-time start or nil
  #inst "2015-05-18T09:20:27.966"  ; transaction-time start or nil
  #inst "2020-05-18T09:20:27.966"  ; valid-time end or nil, inclusive
  #inst "2020-05-18T09:20:27.966") ; transaction-time end or nil, inclusive.

; yields
({:crux.db/id ; sha1 hash of document id
  :crux.db/content-hash  ; sha1 hash of document contents
  :crux.db/valid-time #inst "2015-05-18T09:20:27.966-00:00",
  :crux.tx/tx-time #inst "2019-04-15T07:53:56.178-00:00",
  :crux.tx/tx-id 1555314836178}
  {:crux.db/id "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613",
   :crux.db/content-hash "6ca48d3bf05a16cd8d30e6b466f76d5cc281b561",
   :crux.db/valid-time #inst "2018-05-18T09:20:27.966-00:00",
   :crux.tx/tx-time #inst "2019-04-15T07:53:55.817-00:00",
   :crux.tx/tx-id 1555314835817})

(api/entity (api/db system) "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613")

; yields
{:crux.db/id :ids.persons/Jeff,
 :d.person/name "Jeff",
 :d.person/wealth 100}

Clojure Tips


Logic variables used in queries must always be quoted in the :find and :where clauses, which in the most minimal case could look like the following:

(crux/q db
  {:find ['?e]
   :where [['?e :event/employee-code '?code]]}))

However it is often convenient to quote entire clauses or even the entire query map rather than each individual use of every logical variable, for instance:

(crux/q db
  '{:find [?e]
    :where [[?e :event/employee-code ?code]]}))

Confusion may arise when you later attempt to introduce references to Clojure variables within your query map, such as when using :args. This can be resolved by introducing more granular quoting for specific parts of the query map:

(let [my-code 101214]
  (crux/q db
    {:find '[?e]
     :where '[[?e :event/employee-code ?code]]
     :args [{'?code my-code}]}))

Maps and Vectors in data

Say you have a document like so and you want to add it to a Crux db:

{:crux.db/id :me
 :list ["carrots" "peas" "shampoo"]
 :pockets {:left ["lint" "change"]
           :right ["phone"]}}

Crux breaks down vectors into individual components so the query engine is able see all elements on the base level. As a result of this the query engine is not required to traverse any structures or any other types of search algorithm which would slow the query down. The same thing should apply for maps so instead of doing :pocket {:left thing :right thing} you should put them under a namespace, instead structuring the data as :pocket/left thing :pocket/right thing to put the data all on the base level. Like so:

    {:crux.db/id :me
     :list ["carrots" "peas" "shampoo"]
     :pockets/left ["lint" "change"]
     :pockets/right ["phone"]}]
    {:crux.db/id :you
     :list ["carrots" "tomatoes" "wig"]
     :pockets/left ["wallet" "watch"]
     :pockets/right ["spectacles"]}]])

To query inside these vectors the code would be:

(crux/q (crux/db node) '{:find [e l]
                         :where [[e :list l]]
                         :args [{l "carrots"}]})
;; => #{[:you "carrots"] [:me "carrots"]}

(crux/q (crux/db node) '{:find [e p]
                         :where [[e :pockets/left p]]
                         :args [{p "watch"}]})
;; => #{[:you "watch"]}

Note that l and p is returned as a single element as Crux decomposes the vector

DataScript Differences

This list is not necessarily exhaustive and is based on the partial re-usage of DataScript’s query test suite within Crux’s query tests.

Crux does not support:

  • vars in the attribute position, such as [e ?a "Ivan"] or [e _ "Ivan"]

Crux does not yet support:

  • ground, get-else, get-some, missing?, missing? back-ref

  • destructuring

  • source vars, e.g. function references passed into the query via :args

Note that many of these not yet supported query features can be achieved via simple function calls since you can currently fully qualify any function that is loaded. In future, limitations on available functions may be introduced to enforce security restrictions for remote query execution.

Test queries from DataScript such as "Rule with branches" and "Mutually recursive rules" work correctly with Crux and demonstrate advanced query patterns. See the Crux tests for details.



Please consult the Javadocs for the official Crux API.



Crux offers a lightweight REST API layer in the crux-http-server module that allows you to send transactions and run queries over HTTP. For instance, you could deploy your Crux nodes along with Kafka into a Kubernetes pod running on AWS and interact with Crux from your application purely via HTTP. Using Crux in this manner is a valid use-case but it cannot support all of the features and benfits that running the Crux node inside of your application provides, in particular the ability to efficiently combine custom code with multiple in-process Datalog queries.

Your application only needs to communicate with one Crux node when using the REST API. Multiple Crux nodes can placed be behind a HTTP load balancer to spread the writes and reads over a horizontally-scaled cluster transparently to the application. Each Crux node in such a cluster will be independently catching up with the head of the transaction log, and since different queries might go to different nodes, you have to be slightly conscious of read consistency when designing your application to use Crux in this way. Fortunately, you can readily achieve read-your-writes consistency with the ability to query consistent point-in-time snapshots using specific temporal coordinates.

The REST API also provides an experimental endpoint for SPARQL 1.1 Protocol queries under /sparql/, rewriting the query into the Crux Datalog dialect. Only a small subset of SPARQL is supported and no other RDF features are available.

Using the HTTP API

The HTTP interface is provided as a Ring middleware in a Clojure namespace, located at crux/crux-http-server/src/crux/http_server.clj. There is an example of using this middleware in a full example HTTP server configuration.

Whilst CORS may be easily configured for use while prototyping a Single Page Application that uses Crux directly from a web browser, it is currently NOT recommended to expose Crux directly to any untrusted endpoints (including web browsers) in production since the default query API does not sandbox or otherwise restrict the execution of queries.


Table 18. API
uri method description



returns various details about the state of the database



returns the document for a given hash



returns a map of document ids and respective documents for a given set of content hashes submitted in the request body



Returns an entity for a given ID and optional valid-time/transaction-time co-ordinates



Returns the transaction that most recently set a key



Returns the history of the given entity and optional valid-time/transaction-time co-ordinates



Takes a datalog query and returns its results



Wait until the Kafka consumer’s lag is back to 0



Returns a list of all transactions



The "write" endpoint, to post transactions.


Returns various details about the state of the database. Can be used as a health check.

curl -X GET $nodeURL/
{:crux.kv/kv-store "crux.kv.rocksdb/kv",
 :crux.kv/estimate-num-keys 92,
 :crux.kv/size 72448,
   {:crux.tx/tx-id 19,
    :crux.tx/tx-time #inst "2019-01-08T11:06:41.869-00:00"}
 :crux.zk/zk-active? true,
      {:next-offset 25,
       :time #inst "2019-01-08T11:06:41.867-00:00",
       :lag 0},
      {:next-offset 19,
       :time #inst "2019-01-08T11:06:41.869-00:00",
       :lag 0}}}
estimate-num-keys is an (over)estimate of the number of transactions in the log (each of which is a key in RocksDB). RocksDB does not provide an exact key count.
GET/POST /document/[content-hash]

Returns the document stored under that hash, if it exists.

curl -X GET $nodeURL/document/7af0444315845ab3efdfbdfa516e68952c1486f2
{:crux.db/id :foobar, :name "FooBar"}
Hashes for older versions of a document can be obtained with /entity-history, under the :crux.db/content-hash keys.
GET/POST /documents

Returns a map from the documents ids to the documents for ids set. Possible to get map keys as #crux/id literals if preserve-crux-ids param is set to "true"

curl -X POST $nodeURL/documents \
     -H "Content-Type: application/edn" \
     -d '#{"7af0444315845ab3efdfbdfa516e68952c1486f2"}'
{"7af0444315845ab3efdfbdfa516e68952c1486f2" {:crux.db/id :foobar, :name "FooBar"}}
GET /entity/[:key]

Takes a key and, optionally, a :valid-time and/or :transact-time (defaulting to now). Returns the value stored under that key at those times.

See Bitemporality for more information.

curl -X GET \
     -H "Content-Type: application/edn" \
{:crux.db/id :tommy, :name "Tommy", :last-name "Petrov"}
curl -X GET \
     -H "Content-Type: application/edn" \
GET /entity-tx

Takes a key and, optionally, :valid-time and/or :transact-time (defaulting to now). Returns the :put transaction that most recently set that key at those times.

See Bitemporality for more information.

curl -X GET \
     -H "Content-Type: application/edn" \
{:crux.db/id "8843d7f92416211de9ebb963ff4ce28125932878",
 :crux.db/content-hash "7af0444315845ab3efdfbdfa516e68952c1486f2",
 :crux.db/valid-time #inst "2019-01-08T16:34:47.738-00:00",
 :crux.tx/tx-id 0,
 :crux.tx/tx-time #inst "2019-01-08T16:34:47.738-00:00"}
GET /entity-history/[:key]

Returns the history for the given entity

curl -X GET $nodeURL/entity-history/:ivan?sort-order=desc

Also accepts the following as optional query parameters: * with-corrections - includes bitemporal corrections in the response, inline, sorted by valid-time then transaction-time (default false) * with-docs - includes the documents in the response sequence, under the :crux.db/doc key (default false) * start-valid-time, start-transaction-time - bitemporal co-ordinates to start at (inclusive, default unbounded) * end-valid-time, end-transaction-time - bitemporal co-ordinates to stop at (exclusive, default unbounded)

[{:crux.db/id "a15f8b81a160b4eebe5c84e9e3b65c87b9b2f18e",
  :crux.db/content-hash "c28f6d258397651106b7cb24bb0d3be234dc8bd1",
  :crux.db/valid-time #inst "2019-01-07T14:57:08.462-00:00",
  :crux.tx/tx-id 14,
  :crux.tx/tx-time #inst "2019-01-07T16:51:55.185-00:00"
  :crux.db/doc {...}}

POST /query

Takes a Datalog query and returns its results.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:query {:find [e] :where [[e :last-name "Petrov"]]}}' \

Note that you are able to add :full-results? true to the query map to easily retrieve the source documents relating to the entities in the result set. For instance to retrieve all documents in a single query:

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:query {:find [e] :where [[e :crux.db/id _]] :full-results? true}}' \
GET /sync

Wait until the Kafka consumer’s lag is back to 0 (i.e. when it no longer has pending transactions to write). Timeout is 10 seconds by default, but can be specified as a parameter in milliseconds. Returns the transaction time of the most recent transaction.

curl -X GET $nodeURL/sync?timeout=500
#inst "2019-01-08T11:06:41.869-00:00"
GET /tx-log

Returns a list of all transactions, from oldest to newest transaction time.

curl -X GET $nodeURL/tx-log
({:crux.tx/tx-time #inst "2019-01-07T15:11:13.411-00:00",
  :crux.api/tx-ops [[
    :crux.tx/put "c28f6d258397651106b7cb24bb0d3be234dc8bd1"
    #inst "2019-01-07T14:57:08.462-00:00"]],
  :crux.tx/tx-id 0}

 {:crux.tx/tx-time #inst "2019-01-07T15:11:32.284-00:00",
POST /tx-log

Takes a vector of transactions (any combination of :put, :delete, :match, and :evict) and executes them in order. This is the only "write" endpoint.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '[[:crux.tx/put {:crux.db/id :ivan, :name "Ivan" :last-name "Petrov"}],
          [:crux.tx/put {:crux.db/id :boris, :name "Boris" :last-name "Petrov"}],
          [:crux.tx/delete :maria  #inst "2012-05-07T14:57:08.462-00:00"]]' \
{:crux.tx/tx-id 7, :crux.tx/tx-time #inst "2019-01-07T16:14:19.675-00:00"}


(ns crux.api)

crux.api exposes a union of methods from ICruxAPI and ICruxDatasource, with few lifecycle members added.

    [node ^Date valid-time]
    [node ^Date valid-time ^Date transaction-time]
    "When a valid time is specified then returned db value contains only those
     documents whose valid time is before the specified time.

     When both valid and transaction time are specified returns a db value as of
     the valid time and the latest transaction time indexed at or before the
     specified transaction time.

     If the node hasn't yet indexed a transaction at or past the given
     transaction-time, this throws NodeOutOfSyncException")
    [node ^Date valid-time]
    [node ^Date valid-time ^Date transaction-time]
    "When a valid time is specified then returned db value contains only those
     documents whose valid time is before the specified time.

     When both valid and transaction time are specified returns a db value as of
     the valid time and the latest transaction time indexed at or before the
     specified transaction time.

     If the node hasn't yet indexed a transaction at or past the given
     transaction-time, this throws NodeOutOfSyncException

     This DB opens up shared resources to make multiple requests faster - it must
     be `.close`d when you've finished using it (for example, in a `with-open`
  (document [node content-hash]
    "Reads a document from the document store based on its
    content hash.")
  (documents [node content-hash-set]
    "Reads the set of documents from the document store based on their
    respective content hashes. Returns a map content-hash->document")
  (status [node]
    "Returns the status of this node as a map.")
  (submit-tx [node tx-ops]
    "Writes transactions to the log for processing
     tx-ops datalog style transactions.
     Returns a map with details about the submitted transaction,
     including tx-time and tx-id.")
  (tx-committed? [node submitted-tx]
    "Checks if a submitted tx was successfully committed.
     submitted-tx must be a map returned from `submit-tx`.
     Returns true if the submitted transaction was committed,
     false if the transaction was not committed, and throws `NodeOutOfSyncException`
     if the node has not yet indexed the transaction.")
    [node tx]
    [node tx ^Duration timeout]
    "Blocks until the node has indexed a transaction that is at or past the
  supplied tx. Will throw on timeout. Returns the most recent tx indexed by the
    [node ^Date tx-time]
    [node ^Date tx-time ^Duration timeout]
    "Blocks until the node has indexed a transaction that is past the supplied
  txTime. Will throw on timeout. The returned date is the latest index time when
  this node has caught up as of this call.")
    [node ^Duration timeout]
    "Blocks until the node has caught up indexing to the latest tx available at
  the time this method is called. Will throw an exception on timeout. The
  returned date is the latest transaction time indexed by this node. This can be
  used as the second parameter in (db valid-time, transaction-time) for
  consistent reads.

  timeout – max time to wait, can be nil for the default.
  Returns the latest known transaction time.")
  (listen ^java.lang.AutoCloseable [node event-opts f]
    "Attaches a listener to Crux's event bus.

  `event-opts` should contain `:crux/event-type`, along with any other options the event-type requires.

  We currently only support one public event-type: `:crux/indexed-tx`.
  Supplying `:with-tx-ops? true` will include the transaction's operations in the event passed to `f`.

  `(.close ...)` the return value to detach the listener.

  This is an experimental API, subject to change.")
(open-tx-log ^ICursor [this after-tx-id with-ops?]
  "Reads the transaction log. Optionally includes
  operations, which allow the contents under the :crux.api/tx-ops
  key to be piped into (submit-tx tx-ops) of another
  Crux instance.
  after-tx-id      optional transaction id to start after.
  with-ops?        should the operations with documents be included?
  Returns a cursor over the TxLog.")
  (attribute-stats [node]
    "Returns frequencies of indexed attributes")
  (attribute-stats [node]
    "Returns frequencies of indexed attributes")

Represents the database as of a specific valid and transaction time.

  (entity [db eid]
    "queries a document map for an entity.
    eid is an object which can be coerced into an entity id.
    returns the entity document map.")
  (entity-tx [db eid]
    "returns the transaction details for an entity. Details
    include tx-id and tx-time.
    eid is an object that can be coerced into an entity id.")
    [db query]
    "q[uery] a Crux db.
    query param is a datalog query in map, vector or string form.
    Returns a vector of result tuples.")
    [db query]
    "lazily q[uery] a Crux db.
      query param is a datalog query in map, vector or string form.

     This function returns a Closeable sequence of result tuples - once you've consumed
     as much of the sequence as you need to, you'll need to `.close` the sequence.
     A common way to do this is using `with-open`:

     (with-open [res (crux/open-q db '{:find [...]
                                       :where [...]})]
       (doseq [row res]

     Once the sequence is closed, attempting to iterate it is undefined.
    [db eid sort-order]
    [db eid sort-order {:keys [with-docs? with-corrections?]
                        {start-vt :crux.db/valid-time, start-tt :crux.tx/tx-time} :start
                        {end-vt :crux.db/valid-time, end-tt :crux.tx/tx-time} :end}]
    "Eagerly retrieves entity history for the given entity.

    * `sort-order`: `#{:asc :desc}`
    * `:with-docs?`: specifies whether to include documents in the entries
    * `:with-corrections?`: specifies whether to include bitemporal corrections in the sequence, sorted first by valid-time, then transaction-time.
    * `:start` (nested map, inclusive, optional): the `:crux.db/valid-time` and `:crux.tx/tx-time` to start at.
    * `:end` (nested map, exclusive, optional): the `:crux.db/valid-time` and `:crux.tx/tx-time` to stop at.

    No matter what `:start` and `:end` parameters you specify, you won't receive
    results later than the valid-time and transact-time of this DB value.

    Each entry in the result contains the following keys:
     * `:crux.db/valid-time`,
     * `:crux.db/tx-time`,
     * `:crux.tx/tx-id`,
     * `:crux.db/content-hash`
     * `:crux.db/doc` (see `with-docs?`).")
    [db eid sort-order]
    [db eid sort-order {:keys [with-docs? with-corrections?]
                        {start-vt :crux.db/valid-time, start-tt :crux.tx/tx-time} :start
                        {end-vt :crux.db/valid-time, end-tt :crux.tx/tx-time} :end}]
    "Lazily retrieves entity history for the given entity.
    Don't forget to close the cursor when you've consumed enough history!
    See `entity-history` for all the options")
  (valid-time [db]
    "returns the valid time of the db.
    If valid time wasn't specified at the moment of the db value retrieval
    then valid time will be time of the latest transaction.")
  (transaction-time [db]
    "returns the time of the latest transaction applied to this db value.
    If a tx time was specified when db value was acquired then returns
    the specified time."))
Lifecycle members
(defn start-node ^ICruxAPI [options])
requires any dependendies on the classpath that the Crux modules may need.


{:crux.node/topology ['crux.standalone/topology]}

Options are specified as keywords using their long format name, like :crux.kafka/bootstrap-servers etc. See the individual modules used in the specified topology for option descriptions.

returns a node which implements ICruxAPI and Latter allows the node to be stopped by calling (.close node).

throws IndexVersionOutOfSyncException if the index needs rebuilding. throws NonMonotonicTimeException if the clock has moved backwards since last run. Only applicable when using the event log.

(defn new-api-client ^ICruxAPI [url])

Creates a new remote API client ICruxAPI. The remote client requires valid and transaction time to be specified for all calls to db.

requires either clj-http or http-kit on the classpath, see crux.remote-api-client/internal-http-request-fn for more information.

Param url the URL to a Crux HTTP end-point.

Returns a remote API client.

(defn new-ingest-client ^ICruxAsyncIngestAPI [options])

Starts an ingest client for transacting into Kafka without running a full local node with index.

For valid options, see crux.kafka/default-options. Options are specified as keywords using their long format name, like :crux.kafka/bootstrap-servers etc.


{:crux.kafka/bootstrap-servers "kafka-cluster-kafka-brokers.crux.svc.cluster.local:9092"
 :crux.kafka/group-id "group-id"
 :crux.kafka/tx-topic "crux-transaction-log"
 :crux.kafka/doc-topic "crux-docs"
 :crux.kafka/create-topics true
 :crux.kafka/doc-partitions 1
 :crux.kafka/replication-factor 1}

Returns a crux.api.ICruxIngestAPI component that implements, which allows the client to be stopped by calling close.


Queries (Advanced)

Racket Datalog

Several Datalog tests from the Racket Datalog examples have been translated and re-used within Crux’s query tests.

  • tutorial.rkt

  • path.rkt

  • revpath.rkt

  • bidipath.rkt

  • sym.rkt

Datalog Research

Several Datalog examples from a classic Datalog paper have been translated and re-used within Crux’s query tests.

What you Always Wanted to Know About Datalog (And Never Dared to Ask)
Stefano Ceri, Georg Gottlob, Letizia Tanca, Published in IEEE Trans. Knowl. Data Eng. 1989


  • "sgc"

  • 3 examples of "stratified Datalog"

WatDiv SPARQL Tests

Waterloo SPARQL Diversity Test Suite

WatDiv has been developed to measure how an RDF data management system performs across a wide spectrum of SPARQL queries with varying structural characteristics and selectivity classes.

Benchmarking has been performed against the WatDiv test suite. These tests demonstrate comprehensive RDF subgraph matching. Note that Crux does not natively implement the RDF specification and only a simplified subset of the RDF tests have been translated for use in Crux. See the Crux tests for details.

LUBM Web Ontology Language (OWL) Tests

Lehigh University Benchmark

The Lehigh University Benchmark is developed to facilitate the evaluation of Semantic Web repositories in a standard and systematic way. The benchmark is intended to evaluate the performance of those repositories with respect to extensional queries over a large data set that commits to a single realistic ontology. It consists of a university domain ontology, customizable and repeatable synthetic data, a set of test queries, and several performance metrics.

Benchmarking has been performed against the LUBM test suite. These tests demonstrate extreme stress testing for subgraph matching. See the Crux tests for details.

Kafka Connect Crux


A Kafka Connect plugin for transferring data between Crux nodes and Kafka.

The Crux source connector will publish transacations on a node to a Kafka topic, and the sink connector can receive transactions from a Kafka topic and submit them to a node.

Table 19. Currently supported data formats
Data format Sink/Source









To get started with the connector, there are two separate guides (depending on whether you are using a full Confluent Platform installation, or a basic Kafka installation):

Confluent Platform Quickstart

Installing the connector

Use confluent-hub install juxt/kafka-connect-crux:20.07-1.9.2-beta to download and install the connector from Confluent hub. The downloaded connector is then placed within your confluent install’s 'share/confluent-hub-components' folder.

The connector can be used as either a source or a sink. In either case, there should be an associated Crux node to communicate with.

Creating the Crux node

To use our connector, you must first have a Crux node connected to Kafka. To do this, we start by adding the following dependencies to a project:

juxt/crux-core {:mvn/version "20.07-1.9.2-beta"}
juxt/crux-kafka {:mvn/version "20.07-1.9.2-beta"}
juxt/crux-http-server {:mvn/version "20.07-1.9.2-alpha"}
juxt/crux-rocksdb {:mvn/version "20.07-1.9.2-beta"}

Ensure first that you have a running Kafka broker to connect to. We import the dependencies into a file or REPL, then create our Kafka connected 'node' with an associated http server for the connector to communicate with:

(require '[crux.api :as crux]
         '[crux.http-server :as srv])
(import (crux.api ICruxAPI))

(def ^crux.api.ICruxAPI node
  (crux/start-node {:crux.node/topology '[crux.kafka/topology crux.http-server/module]
                    :crux.kafka/bootstrap-servers "localhost:9092"
                    :crux.http-server/port 3000}))

Sink Connector

Run the following command within the base of the Confluent folder, to create a worker which connects to the 'connect-test' topic, ready to send messages to the node. This also makes use of connect-file-source, checking for changes in a file called 'test.txt':

./bin/connect-standalone etc/kafka/ share/confluent-hub-components/juxt-kafka-connect-crux/etc/ etc/kafka/

Run the following within your Confluent directory, to add a line of JSON to 'test.txt':

echo '{"crux.db/id": "415c45c9-7cbe-4660-801b-dab9edc60c84", "value": "baz"}' >> test.txt

Now, verify that this was transacted within your REPL:

(crux/entity (crux/db node) "415c45c9-7cbe-4660-801b-dab9edc60c84")
{:crux.db/id #crux/id "415c45c9-7cbe-4660-801b-dab9edc60c84", :value "baz"}

Source Connector

Run the following command within the base of the Confluent folder, to create a worker connects to the 'connect-test' topic, ready to receive messages from the node. This also makes use of 'connect-file-sink', outputting transactions to your node within 'test.sink.txt':

./bin/connect-standalone etc/kafka/ share/confluent-hub-components/juxt-kafka-connect-crux/etc/ etc/kafka/

Within your REPL, transact an element into Crux:

(crux/submit-tx node [[:crux.tx/put {:crux.db/id #crux/id "415c45c9-7cbe-4660-801b-dab9edc60c82", :value "baz-source"}]])

Check the contents of 'test.sink.txt' using the command below, and you should see that the transactions were outputted to the 'connect-test' topic:

tail test.sink.txt
[[:crux.tx/put {:crux.db/id #crux/id "415c45c9-7cbe-4660-801b-dab9edc60c82", :value "baz-source"} #inst "2019-09-19T12:31:21.342-00:00"]]

Kafka Quickstart

Installing the connector

Download the connector from Confluent hub, then unzip the downloaded folder:


Navigate into the base of the Kafka folder, then run the following commands:

cp $CONNECTOR_PATH/lib/*-standalone.jar $KAFKA_HOME/libs
cp $CONNECTOR_PATH/etc/*.properties $KAFKA_HOME/config

The connector can be used as either a source or a sink. In either case, there should be an associated Crux node to communicate with.

Creating the Crux node

To use our connector, you must first have a Crux node connected to Kafka. To do this, we start by adding the following dependencies to a project:

juxt/crux-core {:mvn/version "20.07-1.9.2-beta"}
juxt/crux-kafka {:mvn/version "20.07-1.9.2-beta"}
juxt/crux-http-server {:mvn/version "20.07-1.9.2-alpha"}
juxt/crux-rocksdb {:mvn/version "20.07-1.9.2-beta"}

Ensure first that you have a running Kafka broker to connect to. We import the dependencies into a file or REPL, then create our Kafka connected 'node' with an associated http server for the connector to communicate with:

(require '[crux.api :as crux]
         '[crux.http-server :as srv])
(import (crux.api ICruxAPI))

(def ^crux.api.ICruxAPI node
  (crux/start-node {:crux.node/topology '[crux.kafka/topology crux.http-server/module]
                    :crux.kafka/bootstrap-servers "localhost:9092"
                    :crux.http-server/port 3000}))

Sink Connector

Run the following command within the base of the Kafka folder, to create a worker which connects to the 'connect-test' topic, ready to send messages to the node. This also makes use of connect-file-source, checking for changes in a file called 'test.txt':

./bin/ config/ config/ config/

Run the following within your Kafka directory, to add a line of JSON to 'test.txt':

echo '{"crux.db/id": "415c45c9-7cbe-4660-801b-dab9edc60c84", "value": "baz"}' >> test.txt

Now, verify that this was transacted within your REPL:

(crux/entity (crux/db node) "415c45c9-7cbe-4660-801b-dab9edc60c84")
{:crux.db/id #crux/id "415c45c9-7cbe-4660-801b-dab9edc60c84", :value "baz"}

Source Connector

Run the following command within the base of the Kafka folder, to create a worker connects to the 'connect-test' topic, ready to receive messages from the node. This also makes use of 'connect-file-sink', outputting transactions to your node within 'test.sink.txt':

./bin/ config/ config/ config/

Within your REPL, transact an element into Crux:

(crux/submit-tx node [[:crux.tx/put {:crux.db/id #crux/id "415c45c9-7cbe-4660-801b-dab9edc60c82", :value "baz-source"}]])

Check the contents of 'test.sink.txt' using the command below, and you should see that the transactions were outputted to the 'connect-test' topic:

tail test.sink.txt
[[:crux.tx/put {:crux.db/id #crux/id "415c45c9-7cbe-4660-801b-dab9edc60c82", :value "baz-source"} #inst "2019-09-19T12:31:21.342-00:00"]]

Source Configuration

  • Destination URL of Crux HTTP end point

  • Type: String

  • Importance: High

  • Default: "http://localhost:3000"

  • The Kafka topic to publish data to

  • Type: String

  • Importance: High

  • Default: "connect-test"

  • Format to send data out as: edn, json or transit

  • Type: String

  • Importance: Low

  • Default: "edn"

  • Mode to use: tx or doc

  • Type: String

  • Importance: Low

  • Default: "tx"

  • The maximum number of records the Source task can read from Crux at one time.

  • Type: Int

  • Importance: LOW

  • Default: 2000

Sink Configuration

  • Destination URL of Crux HTTP end point

  • Type: String

  • Importance: High

  • Default: "http://localhost:3000"

  • Record key to use as :crux.db/id

  • Type: String

  • Importance: Low

  • Default: "crux.db/id"

About Crux


JUXT has been working on Crux since 2017, following a set of experiences where bitemporality proved challenging to implement at-scale using existing off-the-shelf technologies. Crux has been built by a very small core team who have, by necessity, had to keep the requirements and implemented scope to a minimum. Crux represents a strong foundation for further research and development efforts and provides a basis on which JUXT can release and support a broad variety of open source software products.


Crux ultimately aims to make the use of bitemporal modelling intuitive and accessible to as wide an audience as possible. Whilst bitemporality is only considered absolutely essential for a small number of use-cases, we believe that broad usage bitemporality could significantly reduce the hidden complexity in our global information systems.


See our contributors.


Please contact us if you would like to discuss your support requirements:

Managed Hosting

JUXT offers a Managed Hosting service for Crux to accelerate your development and provide you with a secure and reliable service.


JUXT currently offers bespoke support packages for Crux with SLAs that can be customised to meet your requirements.


Are you looking to implement Crux as part of a solution for your client(s)? JUXT can provide Software Support and Managed Hosting on your behalf. Please consider joining our reseller program.


Engage JUXT for Deployment Services Support if you need help with the initial process of installing and configuring Crux in your environment.



Crux would not exist without the community of vibrant open source projects on which it depends, and we hope that the Crux community will serve to extend and reflect our gratitude.


We currently use GitHub Issues to work on near-term changes to the Crux codebase and documentation. Please see the issues labelled with "good first issue" if you are looking for ideas to help push Crux forward.

PRs with fixes and improvements to these docs are very welcome.


Please strive to follow the best-practices for commit messages which are outlined here.


A Contributor License Agreement (CLA) is necessary for us to ensure that we can support a healthy Crux ecosystem over the indefinite future. Please complete the very short .rtf template linked below and email it to us,, along with a reference to your current PR on GitHub.


For technical changes, see the Changelog.


Do I need to think about bitemporality to make use of Crux?

Not at all. Many users don’t have an immediate use for business-level time travel queries, in which case transaction time is typically regarded as "enough". However, use of valid time also enables operational advantages such as backfilling and other simple methods for migrating data between live systems in ways that isn’t easy when relying on transaction time alone (i.e. where logs must be replayed, merged and truncated to achieve the same effect). Therefore, it is sensible to use valid time in case you have these operational needs in the future. Valid time is recorded by default whenever you submit transactions.


How does Datalog compare to SQL

Datalog is a well-established deductive query language that combines facts and rules during execution to achieve the same power as relational algebra
recursion (e.g. SQL with Common Table Expressions). Datalog makes heavy use of efficient joins over granular indexes which removes any need for thinking about upfront normalisation and query shapes. Datalog already has significant traction in both industry and academia.

The EdgeDB team wrote a popular blog post outlining the shortcomings of SQL and Datalog is the only broadly-proven alternative. Additionally the use of EDN Datalog from Clojure makes queries "much more programmable" than the equivalent of building SQL strings in any other language, as explained in this blog post.

We plan to provide limited SQL/JDBC support for Crux in the future, potentially using Apache Calcite.

How does Crux compare to Datomic (On-Prem)?

At a high level Crux is bitemporal, document-centric, schemaless, and designed to work with Kafka as an "unbundled" database. Bitemporality provides a user-assigned "valid time" axis for point-in-time queries in addition to the underlying system-assigned "transaction time". The main similarities are that both systems support EDN Datalog queries (though they not compatible), are written using Clojure, and provide elegant use of the database "as a value".

In the excellent talk "Deconstructing the Database" by Rich Hickey, he outlines many core principles that informed the design of both Datomic and Crux:

  1. Declarative programming is ideal

  2. SQL is the most popular declarative programming language but most SQL databases do not provide a consistent "basis" for running these declarative queries because they do not store and maintain views of historical data by default

  3. Client-server considerations should not affect how queries are constructed

  4. Recording history is valuable

  5. All systems should clearly separate reaction and perception: a transactional component that accepts novelty and passes it to an indexer that integrates novelty into the indexed view of the world (reaction) + a query support component that accepts questions and uses the indexes to answer the questions quickly (perception)

  6. Traditionally a database was a big complicated thing, it was a special thing, and you only had one. You would communicate to it with a foreign language, such as SQL strings. These are legacy design choices

  7. Questions dominate in most applications, or in other words, most applications are read-oriented. Therefore arbitrary read-scalability is a more general problem to address than arbitrary write-scalability (if you need arbitrary write-scalability then you inevitably have to sacrifice system-wide transactions and consistent queries)

  8. Using a cache for a database is not simple and should never be viewed an architectural necessity: "When does the cache get invalidated? It’s your problem!"

  9. The relational model makes it challenging to record historical data for evolving domains and therefore SQL databases do not provide an adequate "information model"

  10. Accreting "facts" over time provides a real information model and is also simpler than recording relations (composite facts) as seen in a typical relational database

  11. RDF is an attempt to create a universal schema for information using [subject predicate object] triples as facts. However RDF triples are not sufficient because these facts do not have a temporal component (e.g. timestamp or transaction coordinate)

  12. Perception does not require coordination and therefore queries should not affect concurrently executing transactions or cause resource contention (i.e. "stop the world")

  13. "Reified process" (i.e. transaction metadata and temporal indexing) should enable efficient historical queries and make interactive auditing practical

  14. Enabling the programmer to use the database "as a value" is dramatically less complex than working with typical databases in a client-server model and it very naturally aligns with functional programming: "The state of the database is a value defined by the set of facts in effect at a given moment in time."

Rich then outlines how these principles are realised in the original design for Datomic (now "Datomic On-Prem") and this is where Crux and Datomic begin to diverge:

  1. Datomic maintains a global index which can be lazily retrieved by peers from shared "storage". Conversely, a Crux node represents an isolated coupling of local storage and local indexing components together with the query engine. Crux nodes are therefore fully independent asides from the shared transaction log and document log

  2. Both systems rely on existing storage technologies for the primary storage of data. Datomic’s covering indexes are stored in a shared storage service with multiple back-end options. Crux, when used with Kafka, uses basic Kafka topics as the primary distributed store for content and transaction logs.

  3. Datomic peers lazily read from the global index and therefore automatically cache their dynamic working sets. Crux does not use a global index and currently does not offer any node-level sharding either so each node must contain the full database. In other words, each Crux node is like an unpartitioned replica of the entire database, except the nodes do not store the transaction log locally so there is no "master". Crux may support manual node-level sharding in the future via simple configuration. One benefit of manual sharding is that both the size of the Crux node on disk and the long-tail query latency will be more predictable

  4. Datomic uses an explicit "transactor" component, whereas the role of the transactor in Crux is fulfilled by a passive transaction log (e.g. a single-partition Kafka topic) where unconfirmed transactions are optimistically appended, and therefore a transaction in Crux is not confirmed until a node reads from the transaction log and confirms it locally

  5. Datomic’s transactions and transaction functions are processed via a centralised transactor which can be configured for High-Availability using standby transactors. Centralised execution of transaction functions is effectively an optimisation that is useful for managing contention whilst minimising external complexity, and the trade-off is that the use of transaction functions will ultimately impact the serialised transaction throughput of the entire system. Crux does not currently provide a standard means of creating transaction functions but it is an area we are keen to see explored. If transaction functions and other kinds of validations of constraints are needed then it is recommended to use a gatekeeper pattern which involves electing a primary Crux node (e.g. using ZooKeeper) to execute transactions against, thereby creating a similar effect to Datomic’s transactor component

Other differences compared to Crux:

  1. Datomic’s datom model provides a very granular and comprehensive interface for expressing novelty through the assertion and retraction of facts. Crux instead uses documents (i.e. schemaless EDN maps) which are atomically ingested and processed as groups of facts that correspond to top-level fields with each document. This design choice simplifies bitemporal indexing (i.e. the use of valid time + transaction time coordinates) whilst satisfying typical requirements and improving the ergonomics of integration with other document-oriented systems. Additionally, the ordering of fields using the same key in a document is naturally preserved and can be readily retrieved, whereas Datomic requires explicit modelling of order for cardinality-many attributes. The main downside of Crux’s document model is that re-transacting entire documents to update a single field can be considered inefficient, but this could be mitigated using lower-level compression techniques and content-addressable storage. Retractions in Crux are implicit and deleted documents are simply replaced with empty documents

  2. Datomic enforces a simple information schema for attributes including explicit reference types and cardinality constraints. Crux is schemaless as we believe that schema should be optional and be implemented as higher level "decorators" using a spectrum of schema-on-read and/or schema-on write designs. Since Crux does not track any reference types for attributes, Datalog queries simply attempt to evaluate and navigate attributes as reference types during execution

  3. Datomic’s Datalog query language is more featureful and has more built-in operations than Crux’s equivalent, however Crux also returns results lazily and can spill to disk when sorting large result sets. Both systems provide powerful graph query possibilities

Note that Datomic Cloud is separate technology platform that is designed from the ground up to run on AWS and it is out of scope for this comparison.

In summary, Datomic (On-Prem) is a proven technology with a well-reasoned information model and sophisticated approach to scaling. Crux offloads primary scaling concerns to distributed log storage systems like Kafka (following the "unbundled" architecture) and to standard operational features within platforms like Kubernetes (e.g. snapshotting of nodes with pre-built indexes for rapid horizontal scaling). Unlike Datomic, Crux is document-centric and uses a bitemporal information model to enable business-level use of time-travel queries.


Is Crux eventually consistent? Strongly consistent? Or something else?

An easy answer is that Crux is "strongly consistent" with ACID semantics.

What consistency does Crux provide?

A Crux ClusterNode system provides sequential consistency by default due to the use of a single unpartitioned Kafka topic for the transaction log. Transactions are executed non-interleaved (i.e. a serial schedule) on every Crux node independently. Being able to read your writes when using the HTTP interface requires stickiness to a particular node. For a cluster of nodes to be linearizable as a whole would require that every node always sees the result of every transaction immediately after it is written. This could be achieved with the cost of non-trivial additional latency. Further reading: Highly Available Transactions: Virtues and Limitations, Sequential Consistency.

How is consistency provided by Crux?

Crux does not try to enforce consistency among nodes. All nodes consume the log in the same order, but nodes may be at different points. A client using the same node will have a consistent view. Reading your own writes can be achieved by providing the transaction details from the transaction log (returned from crux.api/submit-tx), in a call to crux.api/await-tx. This will block until this transaction time has been seen by the cluster node.

Write consistency across nodes is provided via the :crux.db/match operation. The user needs to include a match operation in their transaction, wait for the transaction time (as above), and check that the transaction committed. More advanced algorithms can be built on top of this. As mentioned above, all match operations in a transaction must pass for the transaction to proceed and get indexed, which enables one to enforce consistency across documents.

Will a lack of schema lead to confusion?

It of course depends.

While Crux does not enforce a schema, the user may do so in a layer above to achieve the semantics of schema-on-read (per node) and schema-on-write (via a gateway node). Crux only requires that the data can be represented as valid EDN documents. Data ingested from different systems can still be assigned qualified keys, which does not require a shared schema to be defined while still avoiding collision. Defining such a common schema up front might be prohibitive and Crux instead aims to enable exploration of the data from different sources early. This exploration can also help discover and define the common schema of interest.

Crux only indexes top-level attributes in a document, so to avoid indexing certain attributes, one can currently move them down into a nested map, as nested values aren’t indexed. This is useful both to increase throughput and to save disk space. A smaller index also leads to more efficient queries. We are considering to eventually give further control over what to index more explicitly.

How does Crux deal with time?

The valid time can be set manually per transaction operation, and might already be defined by an upstream system before reaching Crux. This also allows to deal with integration concerns like when a message queue is down and data arrives later than it should.

If not set, Crux defaults valid time to the transaction time, which is the LogAppendTime assigned by the Kafka broker to the transaction record. This time is taken from the local clock of the Kafka broker, which acts as the master wall clock time.

Crux does not rely on clock synchronisation or try to make any guarantees about valid time. Assigning valid time manually needs to be done with care, as there has to be either a clear owner of the clock, or that the exact valid time ordering between different nodes doesn’t strictly matter for the data where it’s used. NTP can mitigate this, potentially to an acceptable degree, but it cannot fully guarantee ordering between nodes.

Feature Support

Does Crux support RDF/SPARQL?

No. We have a simple ingestion mechanism for RDF data in crux.rdf but this is not a core feature. There is a also a query translator for a subset of SPARQL. RDF and SPARQL support could eventually be written as a layer on top of Crux as a module, but there are no plans for this by the core team.

Does Crux provide transaction functions?

Not directly, currently. You may use a "gatekeeper" pattern to enforce the desired level of transaction function consistency required.

  As the log is ingested in the same order at all nodes, purely
functional transformations of the tx-ops are possible. Enabling
experimental support for transaction functions, which are subject to
change and undocumented, can be done via the environment variable
feature flag `CRUX_ENABLE_TX_FNS`.
Does Crux support the full Datomic/DataScript dialect of Datalog?

No. There is no support for Datomic’s built-in functions, or for accessing the log and history directly. There is also no support for variable bindings or multiple source vars.

Other differences include that :rules and :args, which is a relation represented as a list of maps which is joined with the query, are being provided in the same query map as the :find and :where clause. Crux additionally supports the built-in == for unification as well as the !=. Both these unification operators can also take sets of literals as arguments, requiring at least one to match, which is basically a form of or.

Many of these aspects may be subject to change, but compatibility with other Datalog databases is not a goal for Crux.

Any plans for Datalog, Cypher, Gremlin or SPARQL support?

The goal is to support different languages, and decouple the query engine from its syntax, but this is not currently the case. There is a query translator for a subset of SPARQL in crux.sparql.

Does Crux support sharding?

Not currently. We are considering support for sharding the document topic as this would allow nodes to easily consume only the documents they are interested in. At the moment the tx-topic must use a single partition to guarantee transaction ordering. We are also considering support for sharding this topic via partitioning or by adding more transaction topics. Each partition / topic would have its own independent time line, but Crux would still support for cross shard queries. Sharding is mainly useful to increase throughput.

Does Crux support pull expressions?

No. As each Crux node is its own document store, the documents are local to the query node and can easily be accessed directly via the lower level read operations. We aim to make this more convenient soon.

We are also considering support for remote document stores via the crux.db.ObjectStore interface, mainly to support larger data sets, but there would still be a local cache. The indexes would stay local as this is key to efficient queries.

Do you have any benchmarks?

We are releasing a public benchmark dashboard in the near future. In the meantime feel free to run your own local tests using the scripts in the /test directory. The RocksDB project has performed some impressive benchmarks which give a strong sense of how large a single Crux node backed by RocksDB can confidently scale to. LMDB is generally faster for reads and RocksDB is generally faster for writes.