2014/05/17

Join you some Riak for great good!

I have been playing around since Presto had been opensourced by facebook. Riak is a highly available database with operational friendliness, which tolerates against network partition. It has been categorized as "NoSQL" databases, as Riak does not have SQL interface nor transaction processing with ACID semantics, which is a consequence of focusing on AP of CAP (although there still a big gap of concepts between C of ACID and C of CAP).

But, intrinsically, there is no need for SQL to be mandatory with transactions. Riak can have SQL. There have been a choice of putting a sequel query language inside Riak, while query processing IS as difficult problem as transaction processing is. Riak has riakp_ipe inside, which is very cool distributed processing system, but it does not have smarter optimization because Riak does not take care of its data inside, just treats them as blob. Thus it is not so much space to do sufficient optimization.

That had been the situation since last year, until Prestodb came up open source. It has a good SPI (service provider api) which enables third party plugin as data backend. This means Presto is great because it tears apart the problem of transaction processing and query processing, which had been historically tightly coupled.

So, presto-riak lets you query with SQL over data stored in Riak, via Presto, in a distributed and scalable manner. As Presto is going to be compatible with standardized ANSI SQL,  even joins can be processed, which had been impossible before. There are a lot of hacks inside presto-riak, So I'll reveal incrementally as it gets stable.

See how great it works.
presto:default> show tables;
 Table 
-------
 logs  
 users 
(2 rows)

Query 20140517_135143_00003_n8wgm, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 43B] [6 rows/s, 150B/s]

presto:default> select * from logs cross join users where logs.accessor = users.id;
      timestamp      | method | status | accessor | id | name |   army    
---------------------+--------+--------+----------+----+------+-----------
 2014-04-15-00:04:00 | GET    |    301 |        1 |  1 | Fett | Freelance 
 2014-04-15-00:04:00 | GET    |    200 |        5 |  5 | Solo | Freelance 
 2014-04-15-00:04:00 | GET    |    200 |        2 |  2 | Solo | Freelance 
 2014-04-12-00:03:00 | GET    |    200 |        0 |  0 | Solo | Freelance 
 2014-04-12-00:03:00 | GET    |    204 |        5 |  5 | Solo | Freelance 
 2014-04-12-00:03:00 | GET    |    503 |        4 |  4 | Fett | Freelance 
 2014-04-12-00:03:00 | GET    |    404 |        2 |  2 | Solo | Freelance 
(7 rows)

Query 20140517_135148_00004_n8wgm, FINISHED, 1 node
Splits: 8 total, 8 done (100.00%)
0:01 [6 rows, 258B] [8 rows/s, 370B/s]
presto-riak is now opensourced under Apache 2.0 license, as same as Riak and Presto. Its current status is very young, just hit run it and just work in a very small scale. It has a lot work to do to be reliable enough in production, but I'm sure I'll take time on this and will gradually clear them. I am waiting for your contribution, feedback, come open an issue. Or send me mail from GH profile.