In this video I talk about a new, upcoming database called Noria.
It’s drastically different from traditional database systems.

Find me elsewhere:
Instagram: https://www.instagram.com/kamiyaa91/
Github: https://github.com/kamiyaa/

But its not actually a database.
It’s actually a data-flow engine that acts like a database.

Noria’s target is mainly web applications. Specifically, ones that are read-heavy, which most web apps fall under. These web apps most likely interact with some sort of database with the majority of accesses being reads. Recomputing the same SQL SELECT statement repeatedly is a lot of wasted CPU cycles and unnecessary latency.
To combat this, developers either precompute these values beforehand, or use a caching system like redis or memcached to minimize database lookups and increase performance.
The problem with this approach is that application logic increases dramatically. Developers now need to know when to evict cache entries, when to invalidate them and when to update them. There is also the problem where all clients misses the cache at the same time, then all the clients queries the database, causing the database to crash.

Another solution to this is to use stream-processing systems like Apache Spark, where data is processed as it comes in and values are updated. But these solutions are not very flexible. They usually require developers to know the queries to optimize for beforehand and can’t change them on the fly. In addition, these systems usually implement a ‘windowed’ system where they only keep track of most recent data to limit memory use. e.g. This week’s weather or today’s news.

Noria basically combines all of these into one. And if you think about it, the database already knows your query patterns so it can efficiently maintain a cache for you. Created from the result of a research paper in 2018, Noria is written entirely in Rust. It describes itself as a new streaming dataflow system that act as a fast storage backend for read-heavy web applications. There are two main ways of interfacing with Noria; through its Rust API or
through its MySQL adapter. You can also manually query Noria using any mysql client.

Noria’s data flow system is implemented as a directed acyclic graph of relational operators (aggregations, joins, filters, etc.). Base tables are the roots and view are the leaves. Updates start at the base tables and propagate downwards. The operators can either be partially stateful or fully stateful depending on the data it is holding. While the base tables are stored in persistent storage, the views and operators are stored in memory.

In Jon’s talk, he said there were 3 major challenges that Noria
needed to overcome:

  1. The problem of limited memory
    You can’t just keep all query results in memory because its going to use too much memory. So Noria needs to come up with a way of evicting unused results.

  2. Application changes
    As the applications continues to run, queries will change. New data becomes old data, old data becomes data that is rarely accessed. So Noria needs to come up with a way to seamlessly transition to a new dataflow.

  3. Read/Write concurrency
    Being able to read and write concurrently is key to performance.

Trade-offs:
Noria is still young and lacking some features.
The fault tolerance for sharding currently is not very good. If a shard were to go down, all of its state will be lost and will need to be recomputed. Depending on your needs, this may or may not work for you. Noria also doesn’t support certain SQL keywords like range or soundex. And Noria’s consistency guarantees are not as strong. After a write is complete, it does not necessarily mean a read after it will reflect the changes. Noria only guarantees it will eventually see the changes.

Hope you guys enjoyed the video!

Resources:

Noria Source Code
https://github.com/mit-pdos/noria

Noria Research Paper
https://jon.tsp.io/papers/osdi18-noria.pdf

Jon Gjengset’s talk on Noria
https://www.youtube.com/watch?v=s19G6n0UjsM

Jon Gjengset’s Channel
https://www.youtube.com/channel/UC_iD0xppBwwsrM9DegC5cQQ

Podcast with Jon Gjengset
https://corecursive.com/030-rethinking-databases-with-jon-gjengset/

Query Graph
https://dl.acm.org/doi/10.1145/582353.582400

evmap Source Code
https://github.com/jonhoo/rust-evmap

Music:
Daily by lukrembo
Michikusa by PeriTune (Licensed under https://creativecommons.org/licenses/by/4.0/)

#sql #database #noria

The Fastest SQL Database Ever
1.65 GEEK