NoSQL Database Overview

What is NoSQL

When talking about NoSQL database, there are many concepts are based on or compare to SQL database concepts. SQL database have much longer history and it is very mature.

Here are some points for NoSQL database:

NoSQL databases store data differently than relational tables.
It come out mostly because storage is getting cheaper.
It store data in a more natural and flexible way than relational tables.
The main types are document, key-value, wide-column, and graph.
You can say it is Schemaless Database.
They are BASE compliance. And some can be ACID compliance as well.

Brief history of NoSQL databases

NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model in order to avoid data duplication. NoSQL databases optimized for developer productivity.

In the early 2000s, a paper published by Google on BigTable, the wide-column database, explored the wide range of possibilities for a distributed storage system. 2009 saw a major rise in NoSQL databases, with two key document-oriented databases, MongoDB and CouchDB, coming into the picture.

Features

At a high level, NoSQL databases typically have the following features:

Distributed computing
Easy to scale out
Flexible schemas and rich query language
Ease of use for developers
Partition tolerance
High availability

BASE compliance

NoSQL databases are BASE compliant, i.e., (Basic Availability Soft state Eventual consistency).

Basic availability refers to the ability of the system to tolerate a partial failure (like a loss of a node).
Soft state means that the system allows temporary inconsistencies before eventually achieving consistency automatically over time.

BASE compliance ensures high availability, faster data processing, scalability, and flexibility. However, MongoDB can also be configured to provide multi-document ACID compliance(Atomicity, Consistency, Isolation, and Durability).

Partition

In NoSQL databases, a partition (also known as a shard in MongoDB) is a way to divide the database into smaller, more manageable pieces. Each partition contains a subset of the data and is managed by a specific server or set of servers. This approach helps in distributing the data across multiple servers, which can improve performance, scalability, and fault tolerance.

In MongoDB, a sharded database is an example of a partitioned database. Here, the data is divided into shards, and each shard is managed by a set of servers. Each shard has a primary server that handles all the writes and reads, and secondary servers that replicate the data for redundancy and failover.

Partition Key

The partition key (or shard key) is used to determine how the data is distributed across the shards. This key is crucial for ensuring an even distribution of data and optimizing query performance.

For example, in a MongoDB sharded cluster, each shard maintains exclusive control of its partition of the data. If the primary server of a shard fails, one of the secondary servers is automatically elected to become the primary, ensuring high availability.

In summary, a partition in NoSQL databases is a method to distribute data across multiple servers to enhance performance, scalability, and reliability. The partition key is essential for distributing data across the cluster and ensuring efficient query performance.

When selecting a shard key, it’s important to consider the following:

The shard key should be chosen to ensure an even distribution of data across the shards.
The shard key should support the most common query patterns to optimize performance.
The shard key should be immutable if it includes the _id field.

Common Misconception

NoSQL databases don’t store relationship data well

NoSQL databases can store relationship data — they just store it differently than relational databases do. In fact, many find modeling relationship data in NoSQL databases to be easier than in relational databases because related data doesn’t have to be split between tables. NoSQL data models allow related data to be nested within a single data structure.

NoSQL databases don’t support ACID transactions

Some NoSQL databases, like MongoDB, do, in fact, support ACID transactions.

Schemaless Database

Schemaless database is compare to traditional SQL database. A schemaless database, like MongoDB, does not have these up-front constraints, mapping to a more ‘natural’ database. Any data, formatted or not, can be stored in a non-tabular NoSQL type of database.

How does a schemaless database work?

In schemaless databases, information is stored in JSON-style documents which can have varying sets of fields with different data types for each field. So, a collection could look like this:

    { 
        name : “Joe”, age : 30, interests : ‘football’ }
    { 
        name : “Kate”, age : 25 }

As you can see, the data itself normally has a fairly consistent structure. With the schemaless MongoDB database, there is some additional structure — the system namespace contains an explicit list of collections and indexes. Collections may be implicitly or explicitly created — indexes must be explicitly declared.

MongoDB

The way of using MongoDB is similar to SQL database like MySql. It provide MongoDB Compass to operate your database. MongoDB Shell can run it’s APIs just like running SQL for you SQLDB.

Terms

MongoDB Compass: It is a software which is database window interface, something like ‘MySQLWorkbench’.
MongoDB Shell (mongosh): Can run commands, Like db.movies.find( { "title": "Titanic" } ).
Collection: similar to table in SQL DB
Documents: similar to row in SQL DB
Field: similar to column in SQL DB
Views: A MongoDB view is a read-only queryable object whose contents are defined by an aggregation pipeline on other collections or views.
Query APIs: comprises two ways to query data in MongoDB:
- find() Operations
- Aggregation pipelines
CRUD APIs: like updateOne(), updateMany() insertOne(), deleteOne(), find(), etc.
Triggers: support Triggers.

find()

When you get to know how to use find(), other CRUD APIs are similar.

Use find command: db.movies.find( { "title": "Titanic" } ).

You can ‘Project Fields’ for your return by using ‘Project Selectors’.
You can filter document for your return by using ‘Query Selectors’.
You can sort, limit return.

Matches any of the values specified in an array or NOT in array

$in and $nin

For example: db.movies.find( { rated: { $in: [ "PG", "PG-13" ] } } ) This operation corresponds to the following SQL statement: SELECT * FROM movies WHERE rated in ("PG", "PG-13")

Matches values that are equal or greater or less to a specified value.

$eq
$gt
$gte
$lt
$lte
$ne

For example: db.movies.find( { countries: "Mexico","imdb.rating": { $gte: 7 } } )

Logical Operators

$and
$or

For example: db.movies.find( { $and: [ {countries: "Mexico"} , {"imdb.rating": { $gte: 7 }} ] } )

count()

Returns the count of documents that would match a find() query for the collection or view.
The db.collection.count() method does not perform the find() operation but instead counts and returns the number of results that match a query.
db.collection.count(query, options)
Example: db.collection.find( { a: 5, b: 5 } ).count()

Aggregation

Aggregation operations process multiple documents and return computed results. You can use aggregation operations to:

Group values from multiple documents together.
Perform operations on the grouped data to return a single result.
Analyze data changes over time.

Aggregate Pipelines Example:

db.orders.aggregate( [
   // Stage 1: Filter pizza order documents by pizza size
   {
      $match: { size: "medium" }
   },
   // Stage 2: Group remaining documents by pizza name and calculate total quantity
   {
      $group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
   }
] )

The `$match` stage:

Filters the pizza order documents to pizzas with a size of medium. Passes the remaining documents to the $group stage.

The `$group` stage:

Groups the remaining documents by pizza name. Uses $sum to calculate the total order quantity for each pizza name. The total is stored in the totalQuantity field returned by the aggregation pipeline.

What is Aggregate Pipelines

An aggregation pipeline consists of one or more stages that process documents:

Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values.
The documents that are output from a stage are passed to the next stage.
An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values.

Aggregate Pipelines Stages

Each States have a name, like $match, $group, $sort etc.

distinct()

db.collection.distinct() Finds the distinct values for a specified field across a single collection or view and returns the results in an array.

This can help you to understand the data in the collection.

Indexes

Indexes support the efficient execution of queries in MongoDB.

Concept Index in NoSQL is different from SQL database. In MongoDB, developer usually will create many indexes to improve queries. And when you create indexes MongoDB will product index size for them.

Index Features

Although indexes improve query performance, adding an index has negative performance impact for write operations.
Indexes don’t have to be unique.
Compound index is common.
Indexes store a small portion of the collection’s data set in an easy to traverse form.
The index stores the value of a specific field or set of fields, ordered by the value of the field.
You can easily create indexex for a collection.
TTL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time.

Compound Indexing Strategies: The ESR (Equality, Sort, Range) Rule

In order to improve query performance, There is a rule when using index! It is ESR Rule.

Understand ESR Rule:

For example, In a CAR collection, your query is: db.cars.find( { model: "Cordoba" } ) And you want it to be sorted by model: db.cars.find( { manufacturer: "GM" } ).sort( { model: 1 } ) And also you have some range filter: like cost more than $20000: db.cars.find( { manufacturer: 'Ford', cost: { $gt:10000 } } ).sort( { model: 1 } )

What is ‘Range’? Range refer to Range query, Range filter. using $gte or $lt etc.

Following the ESR rule, the optimal index for the example query is: db.cars.createIndex( { manufacturer: 1, model: 1, cost: 1 } )

Key points about the ESR rule:

Equality first: When creating a compound index, the field that is most frequently used in equality comparisons (like filtering on a specific value) should be placed first.
Sort next: Following the equality field, the field that is often used for sorting operations (like ordering results by a certain criteria) should be added to the index.
Range last: Finally, if a range query is needed on a field (like finding values within a specific range), that field should be placed last in the compound index.

Query Plan

MongoDB have Query Plan concept

For any given query, the MongoDB query planner chooses and caches the most efficient query plan given the available indexes. To evaluate the efficiency of query plans, the query planner runs all candidate plans during a trial period. In general, the winning plan is the query plan that produces the most results during the trial period while performing the least amount of work.

To view the query plan information for a given query, you can use: db.collection.explain().find({manufacturer: 'Ford'})

You can check your Index effection by view the Query Plan. For example, when you find({manufacturer: 'Ford'}) before the create the manufacturer index, MongoDB is using COLLSCAN. But when you create the index, MonboDB is using IXSCAN.

Data types

Object ID − This datatype is used to store the document’s ID. Like: _id = ObjectId()
String − This is the most commonly used datatype to store the data. String in MongoDB must be UTF-8 valid.
Array − This type is used to store arrays or list or multiple values into one key.
Integer − This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.
Boolean − This type is used to store a boolean (true/ false) value.
Double − This type is used to store floating point values.
Timestamp − ctimestamp. This can be handy for recording when a document has been modified or added.
Object − This datatype is used for embedded documents.
Null − This type is used to store a Null value.
Date − This datatype is used to store the current date or time in UNIX time format. You can specify your own. Like new Date(”2002-01-30”)

NoSQL Database Overview

What is NoSQL

Brief history of NoSQL databases

Features

BASE compliance

Partition

Partition Key

Common Misconception

NoSQL databases don’t store relationship data well

NoSQL databases don’t support ACID transactions

Schemaless Database

How does a schemaless database work?

MongoDB

Terms

find()

Matches any of the values specified in an array or NOT in array

Matches values that are equal or greater or less to a specified value.

Logical Operators

count()

Aggregation

Aggregate Pipelines Example:

The `$match` stage:

The `$group` stage:

What is Aggregate Pipelines

Aggregate Pipelines Stages

distinct()

Indexes

Index Features

Compound Indexing Strategies: The ESR (Equality, Sort, Range) Rule

Key points about the ESR rule:

Query Plan

Data types

References

Post Detail

NoSQL Database Overview

What is NoSQL

Brief history of NoSQL databases

Features

BASE compliance

Partition

Partition Key

Common Misconception

NoSQL databases don’t store relationship data well

NoSQL databases don’t support ACID transactions

Schemaless Database

How does a schemaless database work?

MongoDB

Terms

find()

Matches any of the values specified in an array or NOT in array

Matches values that are equal or greater or less to a specified value.

Logical Operators

count()

Aggregation

Aggregate Pipelines Example:

The $match stage:

The $group stage:

What is Aggregate Pipelines

Aggregate Pipelines Stages

distinct()

Indexes

Index Features

Compound Indexing Strategies: The ESR (Equality, Sort, Range) Rule

Key points about the ESR rule:

Query Plan

Data types

References

Post Detail

The `$match` stage:

The `$group` stage: