NoSQL | Which NoSQL type database to use and their features | Internet Excerpt

Below is an Internet Excerpt detailing which NoSQL solution should be used based on various requirement scenarios.

Applications are getting bigger

Web applications are increasing in scale. We have to store more data, we have to serve more users and we need more computing capability. To handle this scenario we have to scale. We can scale in two ways. We can scale up, that is buying better machines, more disk, more memory and so on. Or we can scale out, that is buy a lot of small machines and use them in a cluster. In big applications scale up is not an option. Bigger machines are more expensive and they have a limit, we don’t have a machine that can handle the traffic from Google or Facebook. Given this context, we need new databases, since relational database are not designed to run on clusters. Yes, you have clustered relational databases, but they work sharing a disk, that isn’t the scenario we want to have when we’re building a cluster. Some of the companies who needs to handle a lot of traffic like Google, Facebook, Amazon started to develop databases that are designed to run on clusters and this was the beginning of NoSQL era.


Nowadays, there are a lot of NoSQL databases, MongoDB, Redis, Riak, HBase, Cassandra and so on. And each one has at least one of these characteristics.

  • NoSQL databases don’t use SQL, some of the has query languages like MongoDB and Cassandra
  • Usually they are open-source projects
  • They we’re built to run on clusters
  • Schemaless, you don’t have rigid schema defining the data structure

Types of NoSQL

NoSQL databases can be divided in 4 types. Key-value, Document-Oriented, Column-Family Databases and Graph-Oriented Databases. Let’s see what are each one of these types, his characteristics and where we should be using them.

Key-Value Databases

What are: A key-value store works like a simple hashtable that we are used to use in traditional languages. You can add, retrieve and delete data through keys. Since they use primary key access they tend to have a good performance and are easily scalable.

Examples: Riak, Redis, Memcached, Amazon’s Dynamo, Project Voldemort

Who’s using: GitHub (Riak), BestBuy (Riak), Twitter (Redis and Memcached), StackOverFlow (Redis), Instagram (Redis), Youtube (Memcached), Wikipedia (Memcached).

When we should use:

  • To store user information, like Session, Profiles, Preferences, Shopping Cart and so on. These info are often associated to a id(key). This case is exactly the best scenario to use a key-value database.

When we shouldn’t use:

  • If we need to query the data by value instead by keys. There is no way to query a key-value database by value.
  • If we need to save relationship between data. We can’t relate data between two or more keys in a key-value database.
  • If we need transactions. In a key-value database, we can’t roll back a operation if a failure occurs.
 Document-Oriented Databases

What are: Document-Oriented databases store data as documents. Documents can be defined as a set of maps, collections and scalar values. Documents are like rows, but unlike rows that have to have the same schema, documents can be totally different between themselves. These documents can be stored using XML, JSON or JSONB.

Examples: MongoDB, CouchDB, RavenDB

Who’s using: SAP (MongoDB), Codecademy (MongoDB), Foursquare (MongoDB), NBC News (RavenDB)

When we should use:

  • Logging. In a enterprise environment, each application has different logging info. Document-oriented databases don’t have a fixed schema. So we can use them to store all these different info.
  • Analytics. Since they are schemaless, we can store different metrics and new metrics can be added without schema changes.

When we shouldn’t use:

  • If we need to have transactions between documents. Document-oriented databases don’t support transaction between documents, if we need it, we shouldn’t use a document database.

Column-Family Databases

What are: Column-Family databases store data in column families. A column family can be defined as groups of related data that are often queried together. Let me give a example. When we have a Person class we often access their name and age together but not his salary. In this case, name and age belong to one column-family and salary belongs to another one.

Examples: Cassandra, HBase

Who’s using: Ebay (Cassandra), Instagram (Cassandra), NASA (Cassandra), Twitter (Cassandra and HBase), Facebook (HBase), Yahoo!(HBase)

When we should use:

  • Logging. Since we can store data with different columns, each application can write their info with their own column families.
  • Blogging Platforms. We can store each info in different column families. For example, tags in one family, categories in another one, posts in another one and so on.

When we shouldn’t use:

  • If we need ACID transactions. Cassandra doesn’t support transactions.
  • Prototyping. If we analyze the Cassandra data structure, we can see that this structure is based in the pattern we expect to retrieve the data. When we are designing a prototype, we can’t predict how will be the query pattern and once it changes we will have to change the column families design.

Graph-Oriented Databases

What are: Graph databases allow us to store data as graphs. Entities can be represented as vertices and the relationships between these entities can be represented as edges. In a example, we could have 3 entities. Steve Jobs, Apple and Next. And two edges called “Founded by” that relate Apple to Steve Jobs and Next to Steve Jobs.

Examples: Neo4J, Infinite Graph, OrientDB

Who’s using: Adobe (Neo4J), Cisco (Neo4J), T-Mobile (Neo4J)

When we should use:

  • Connected Data. If we have data that are connected through relationship, we have a good case to use a graph database, the vertices can be people, cities, companies and edges can be “lives in”, “employed by” and so on.
  • Recommendation Engines. If we represent data in graph databases, they can be used to make recommendations like “people who bought this item also bought these items” like Amazon and Netflix.

When we shouldn’t use:

  • Data model not suitable. Most of the cases are not suitable for graph databases since operations involving the whole graph are not trivial.