RethinkDB reviewed by a MongoDB fan

By Martin Rusev read

I have been using MongoDB in almost all my projects since 2012. In the last year or so the name RethinkDB frequently pops up in a lots of Mongo related discussions - as a direct competitor(but better) and almost drop in replacement. RethinkDB hit version 2.0 which according to the creators is production ready, so I decided to give it a go and share my findings.

My MongoDB Experience

I have been using MongoDB in the last 3 years as a JSON storage database for medium/large datasets. My biggest MongoDB database was 17GB with over 2 million documents for storing crawling data. I've been using it in Amon as well for storing huge amounts of various, unstructured metrics. I consider MongoDB as my first choice for storing and analyzing datasets above 2GB.

RethinkDB - First Impressions

RethinkDB has repositories for all the popular distros. Installation is a simple, straight forward process. One thing that I would like to mention - by default RethinkDB doesn't start automatically on system boot, you have to make some config tweaks to enable this functionality.

RethinkDB has a great documentation - it is detailed, well written and with enough examples for every command.

Overall - I have great first impressions and my instance was up and running in less then 5 minutes.

Client Libraries

The people behind RethinkDB provide official libraries for Python, Javascript and Ruby. There are also community provided libraries for all the other popular languages like Go, PHP, etc. Using those is not as easy as using the official libraries - they are not well documented(if at all) and you have to guess how to execute a particular command.

In RethinDB, you have to create databases and tables manually and it will raise an exception if they already exist. Compared to MongoDB that could be an inconvenience for some(and me) - one of the things I find appealing in MongoDB is the fluid interaction with databases(use db) and collections (db.collection('metrics')) - if a database or collection doesn't exists it will be created on the fly.

Performance Metrics

In MongoDB you can get metrics for almost anything of importance like db size(db.stats), collection size(db.collection.stats()), slow queries(db['system_profile'].find()), etc. with a single command or worse case scenario - a couple of lines of code. At the time of writing, RethinkDB doesn't expose programmatically any performance metrics like table/db/index size or slow queries.

The Admin Interface

Continuing my point about the lack of performance metrics - I have to mention the Admin panel. RethinkDB comes with a web app where you can check the state of your instance - existing databases, tables, sharding status, number of documents. A nice chart shows you the number of reads/writes in real time. The admin panel is really well designed.

I was a little disappointed from the data explorer. I expect to be able to click on a table and see the rows inside - like all the tools out there for Mongo, MySQL, PostgreSQL, etc. That is not the case here - you get a command prompt with auto complete, which is far from ideal. I haven't found any good/better third party Admin UIs (that could be a potential niche), so keep in mind that data browsing could be cumbersome.

MongoDB vs RethinkDB: Benchmarks

Here comes my favorite part - comparing MongoDB and RethinkDB head to head. Please keep in mind that these are not "professional" benchmarks. I just created a small script executing CRUD operations and compared the results. If you want - you can take a look at the source here and you can also run the benchmarks on your own server/dev machine by following the instructions in the readme. Let's get started, first the most basic operations for every DB:

Inserts

I did a lot of tweaks while testing, but every single time RethinkDB was slower compared to MongoDB. 100 000 documents in MongoDB ended up taking 37MB of storage and 124MB in RethinkDB. In MongoDB I did - db.collection.stats(), Rethink doesn't give programatic access to stats data, so had to do ls -lh /var/lib/rethinkdb/instance1/data instead.

metrics_document = {
    "table_name" : "django_session",
    "cumulative_pct_reads" : 69.859,
    "cache_hit_rate" : 90.2000000000000028,
    "last_update" : '',
    "reads" : 51,
    "index_hit_rate" : 37.5,
    "size" : 1
}

# MongoDB
mongo_connection = MongoClient(host='mongodb://localhost')
mongo_db = mongo_connection['rethinkvsmongo']
mongo_collection = mongo_db['metrics']
mongo_collection.ensure_index([('last_update', pymongo.DESCENDING)], background=True)

for i in range(0, 100000):
    data = metrics_document.copy() # Puts some memory overhead
    data['last_update'] = unix_timestamp+i
    mongo_collection.insert(data)

# RethinkDB
r.connect("localhost", 28015, db=database_name).repl()
r.db_create('rethinkvsmongo').run()
r.db('rethinkvsmongo').table_create('metrics').run()
r.table('metrics').index_create('last_update').run()


for i in range(0, 100000):
    metrics_document['last_update'] = unix_timestamp+i
    r.table('metrics').insert(metrics_document, durability='hard').run()

  MongoDB (100 000 documents) RethinkDB (100 000 documents) durability='hard' RethinkDB (100 000 documents) durability='soft'
Average 16573ms (16.5sec) 207573ms (207sec) 66639ms (66sec)
Min 16747ms 197515ms 65394ms
Max 17204ms 219420ms 67584ms

Updates


data = {'size': random_int(), 'reads': random_int(), 'cache_hit_rate': random_int()}

# MongoDB
for i in range(0, 100000):
    mongo_collection.update({"last_update": {"$in": [random_timestamp()]}},
     {"$set": data})

# RethinkDB
for i in range(0, 100000):
    r.table(table_name).filter({"last_update":random_timestamp()}).update(data,
    durability='hard').run()
  MongoDB (100 000 documents) RethinkDB (100 000 documents) durability='hard' RethinkDB (100 000 documents) durability='soft'
Average 21899ms (22sec) 61449ms(61 seconds) 61784ms (61sec)
Min 21830ms 60969ms 61784ms
Max 23984ms 63922ms 62114ms

Read/Find

Update: 27.05.2015 - Results updated to reflect the comment from coffeemug

From all the CRUD operations, RethinkDB is at it's best while reading, filtering documents. Still slow, compared to MongoDB


#MongoDB
for i in range(0, 100000):
    mongo_collection.find_one({"last_update": {"$in": [random_timestamp()]}})

#RethinkDB
for i in range(0, 100000):
    r.table(table_name).filter({"last_update":random_timestamp()}).run()
  MongoDB (100 000 documents) RethinkDB (100 000 documents)
Average 15193ms(15sec) 42939ms(42sec)
Min 15137ms 43815ms
Max 16112ms 44856ms

I am going to rerun these tests again in the future, but at this point at least for me it is clear, that I am not going to choose Rethink over Mongo, because its faster.

RethinkDB Specific Features

All you interactions with RethinkDB will go through RQL(Rethink Query language) which feels like a mix between Python and Underscore.js. With RQL you can filter your results(just like in any other database), but you can also do some lightweight manipulations with the results, possibly reducing the amount of boilerplate code. Apart from that, at least for me it doesn't bring something truly revolutionary to the table - you can already do those things in a SQL database like Postgres and in Mongo as well.

r.table('users').filter(lambda user: user['age'] > 30).run().to_json()

Streaming changes

This is the one feature that separates RethinkDB from MongoDB or any other SQL/NoSQL database. If I have to sell/use/recommend RethinkDB in once sentence it would be:

RethinkDB is a great JSON DB backend for real-time web/mobile apps

r.table('chat').orderBy('last_update').changes()

Until now one of the more convenient ways to built real-time apps was to use Firebase as a db back-end. Firebase is nice, simple and powerful, but it also in the cloud and recently acquired by Google - which at least for me doesn't make it a reliable choice for apps I would write now and support for years to come.

RethinkDB could be a viable Firebase alternative - it offers similar functionality, it is open source and well funded.

r.http

r.http('https://api.github.com/repos/rethinkdb/rethinkdb/stargazers')

With r.http you can consume API's directly from your database, manipulate and then store some/all the results. A great feature if your app relies on a lot of external API's and you want to reduce the boilerplate code.

Conclusion

After spending almost a week experimenting with RethinkDB I think the project has a great potential, especially for reall time web and mobile apps, consuming API's and for developers who want to minimize db related boilerplate code. Directly compared to MongoDB - at this point, it is obviously less mature and slower.

RethinkDB Pros

RethinkDB Cons

I really hope you enjoyed this post, please let me know in the comments below what you think about RethinkDB and if you would like to see more detailed posts about it in the future.