Join you some Riak for great good!

I have been playing around since Presto had been opensourced by facebook. Riak is a highly available database with operational friendliness, which tolerates against network partition. It has been categorized as "NoSQL" databases, as Riak does not have SQL interface nor transaction processing with ACID semantics, which is a consequence of focusing on AP of CAP (although there still a big gap of concepts between C of ACID and C of CAP).

But, intrinsically, there is no need for SQL to be mandatory with transactions. Riak can have SQL. There have been a choice of putting a sequel query language inside Riak, while query processing IS as difficult problem as transaction processing is. Riak has riakp_ipe inside, which is very cool distributed processing system, but it does not have smarter optimization because Riak does not take care of its data inside, just treats them as blob. Thus it is not so much space to do sufficient optimization.

That had been the situation since last year, until Prestodb came up open source. It has a good SPI (service provider api) which enables third party plugin as data backend. This means Presto is great because it tears apart the problem of transaction processing and query processing, which had been historically tightly coupled.

So, presto-riak lets you query with SQL over data stored in Riak, via Presto, in a distributed and scalable manner. As Presto is going to be compatible with standardized ANSI SQL,  even joins can be processed, which had been impossible before. There are a lot of hacks inside presto-riak, So I'll reveal incrementally as it gets stable.

See how great it works.
presto:default> show tables;
(2 rows)

Query 20140517_135143_00003_n8wgm, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 43B] [6 rows/s, 150B/s]

presto:default> select * from logs cross join users where logs.accessor = users.id;
      timestamp      | method | status | accessor | id | name |   army    
 2014-04-15-00:04:00 | GET    |    301 |        1 |  1 | Fett | Freelance 
 2014-04-15-00:04:00 | GET    |    200 |        5 |  5 | Solo | Freelance 
 2014-04-15-00:04:00 | GET    |    200 |        2 |  2 | Solo | Freelance 
 2014-04-12-00:03:00 | GET    |    200 |        0 |  0 | Solo | Freelance 
 2014-04-12-00:03:00 | GET    |    204 |        5 |  5 | Solo | Freelance 
 2014-04-12-00:03:00 | GET    |    503 |        4 |  4 | Fett | Freelance 
 2014-04-12-00:03:00 | GET    |    404 |        2 |  2 | Solo | Freelance 
(7 rows)

Query 20140517_135148_00004_n8wgm, FINISHED, 1 node
Splits: 8 total, 8 done (100.00%)
0:01 [6 rows, 258B] [8 rows/s, 370B/s]
presto-riak is now opensourced under Apache 2.0 license, as same as Riak and Presto. Its current status is very young, just hit run it and just work in a very small scale. It has a lot work to do to be reliable enough in production, but I'm sure I'll take time on this and will gradually clear them. I am waiting for your contribution, feedback, come open an issue. Or send me mail from GH profile.


Imagine your life

We are living in a world of acceptance, diversity and mutual respect as long as our lives not being danger or threat. This is common sense, right?

Imagine a person who doesn't wear a cloth when in bed.

Imagine a person who walks barefoot.

Imagine a woman who loves women.

Imagine a man who loves men.

Imagine a car with five tires.

Imagine a person who lives in a tent.

Imagine a person who sleeps in one's car.

Imagine a person who drinks spaghetti meatball.


Now working for Basho

It has been more than two years since I posted an article here. Since then it has been long way to come here: now I am working for Basho with Erlang/OTP! I'm not language guru, or script kiddy but I really love working in functional language. Basho Japan was established in September 2012 and I am the first employee in Japan.

A lot has changed since then. I participated RICON 2012, which was very exciting and made me devote to distributed systems. Riak is still emerging in Japan but already No.1 commercial NoSQL database with solid technical support in Japan - yet other databases have share but with hard stories and no commercial support. I'll keep up myself not only dev, support, but also ... anyway stay tuned!


Great time in Kanda, Tokyo: Erlounge

Yesterday (9/23) was a great day that Erlang workshop as a satellite of ICFP/ACM SIGPLAN international conference. Although I did not participated in the workshop, I joined the party because Francesco Cesarini and Ryosuke Nakai said me to join. Seeing living legends in Europe (indeed just community members in Western countries) was very exciting.
I was introduced by Kenji Rikitake (AKA @kenji_rikitake) as an author of MessagePack Erlang port - That made me think I should output more and more to the open source community and the Erlang/OTP community. Until this day I was thinking of stopping reading, writing and saying anything about Erlang/OTP because of baby sitting (many thanks to my wife for helping me work for community and study that does not make money to live along) and some my personal disgust about my work... But their activeness, amount of beers they drunk, the time when the party was finished, speaking English in a positive way and their positive attitude made me think positive to keep in touch with Erlang.
Don't ask permission. Ask for forgiveness.
All thanks to Erlangers who were in Kanda, Tokyo at 2011/9/23.


Use MessagePack/Erlang and write message queue in an hour

I wrote a toy software within an hour (and additional debug time), which is message queue server accessible from clients with many kind of languages: C, C++, Ruby, Java, Python and so on. Erezrdfh (pronounces "e-re' zerd f") is a simple, on-memory message queue with 9-nines availability of Erlang/OTP. It doesn't need particular client library but users can use MessagePack-RPC to write client in a minute. Ruby one-liner is as follows:
c = MessagePack::RPC::Client.new(host,port); c.call(:push, "name", "message"); c.call(:pop, "name);

and C++ code is like this:
msgpack::rpc::client c(host,port);
c.call("pop", "name", "message").get<bool>();
c.call("pop", "name").get<std::string>();

and Java code is like:
Client c = new Client(host,port);
c.callApply("push", new Object[]{"name","message"});
c.callApply("pop", new Object[]{"name"});

MessagePack is a software suite of serializer, RPC and IDL compiler. This is a great library due to its performance, simplicity and language diversity. Erlang is also a great software that promises scalability, simplicity and solidness. Why don't you miss these great technologies?

Its performance is also so great that I can't believe it is less than of 250 LOC. With my quad-core Phenom machine, load-generation tool and erezrdfh server running in one machine, its performance of push/pop was 20000 qps. Due to Erlang/OTP's scalability if you install on dedicated machine with more cores, erezrdfh will scale more. The source code includes basho_bench driver and just try it!


Had a talk on 4th Tokyo Erlang Workshop

(This post is a translation of my Japanese post)
I participated to 4th Tokyo Erlang Workshop in Aoyama Oracle Center Tokyo, Japan. I had a talk about Yatce in sessions and went to the party. The organizer was cooldaemon, who did a great contribution and prepare for the workshop and the party. All people there including me appreciated very much on his contribution. Also I appreciate Oracle Japan, Inc. for it's contribution of providing its great conference room and understanding on open-source communities.

The workshop:
Takeru Inoue's "(maybe) useful Algorithms for distributed storage" was first and the most difficult speech. He talked about 'Sinphonia', which took best paper in SOSP'07, overcoming the Amazon's famous Dynamo paper. This algorithm is very difficult but revolutionary because it shows a method that is much faster than 2-phase commit and much consistent than Vector clocks. He also introduced BDD and ZDD. ZDD is the only algorithm found by Japanese that is booked in Knuth's "the Art of Programming".

Higepon's talk about his implementation of Skipgraph Key-value-storage was also excellent. I've been very interested in his simple design of concurrent-join in SkipGraph ring. It admits three broken status in SkipList when reading and earns read-throughput. Moreover, he had created sample application of bulletin board, whose CSS design was cool. Stay tuned on mio!
My only question was mio's design for fault-tolerance and replication (and had forgotten asking). And what he said was "Good programmers never forget automation of unit-testing," ...

My talk on Yatce and general Erlang-C bindings was like this (partially Japanese, partially English):

I have a few additional topics: Linked-in driver may be best choice because ERTS's prim_file, prim_inet (which are the backend of file I/O and net I/O) are implemented with Linked-in driver. Usually for I/O intensive tasks linked-in driver is suitable and CPU intensive tasks NIF seems suitable. It will be the style. And my talk was Ust'ed.

@sleepy_yoshi's "Badly-educated guy seems in tutorial of Erlang" was a great laughter in hard-boiled workshop (which was organized by hard-algorithm, hard-software-design, hard-implementation). Of course he is far from badly-educated.

The pardy was great time because many great hackers around Erlang and other cool technologies such as linux-kernel, linux-distribution, OCaml and Python.


Re: Toke

Matthew gave me replies with some objections. I'm posting to my blog both because failed posting a comment to the original blog post and because it's a bit long, late and not so cool.
I do take objection to your comment in your blog post: “It’s a good example case that not expressing nor publishing in English leads products/opinions shall be ignored, and bad practice”. The simple fact is that I was not aware of yatce, and indeed, googling for “erlang tokyo cabinet” does not return any result for yatce until the 3rd page. If I had known about yatce in advance, then I would have tried to use it in preference to writing Toke: we have absolutely no desire to reinvent the wheel unnecessarily.
Yes, I meant 'ignore' as not being aware, just because I'm not so good at English. I'm sorry.

I don’t however understand your question about 16 tables. I’ve not come across limits on Erlang drivers and ports but maybe there’s something I’ve missed?
No, I had a memory of some limitation, seeing crypto module is using 16 drivers for more calculation speed. But again I search the erlang documents around and found no special statement about limitation of number of linkedin drivers.

If anything, it’s slightly slower than the current code (as well as being slightly more messy). It would seem the context switch is not hurting me at all. Even for tests which I would have thought would most expose control as being faster (eg lots of gets), it’s in fact no faster at all.
Yes, I was wrong. I thought too simply that context switching makes erts slower but context switching breaks some optimization (I don't know details and what it is) of Erlang runtime schedulers. They're using spinlocks...

After all totally you're right, matthew.


Toke: an alternative to Yatce

Toke - TokyoCabinet driver for Erlang has been just announced by LShift Inc., as Voluntas informed me. Lshift Inc. is famous for its relationship with Rabbit Tech. with its excellent product RabbitMQ.

Most famous and early TC-Erlang driver is tcerl. Their claim about tcerl is buggy, hard to make it work, slow(because of port-driver), and even seems not maintained. I agree. And they say they still want to use TC because bare TC is blazingly fast. That's all.

I've been developing yatce since April 2009, and it's getting more stable since around August 2009. For readers of my erlang diary since then, Toke seems nothing but a re-invention of wheel... It's a good example case that not expressing nor publishing in English leads products/opinions shall be ignored, and bad practice. Even @kenji_rikitake introduced me in erlang-questions. orz...

(P.S: mention; Writer of Toke matthew answered me that he was not ignoring me but just he wasn't aware of yatce. Ignoring and being not aware is different, sorry for my poor English and thanks for replying!)

BTW without any look at specs I went source code of Toke and found Toke too simple, no, surprised. Pros are:
  • so fast because it's using linked driver
  • the code and interface is so simple that will make it stable and with fewer bugs
  • Reliable in a viewpoint of maintenance because Erlang professionals are full-time dedicated to Toke (and other Erlang products)
And as a creator of a software of much the same, the problem is well-known (Con):
  • There is only one port per table, so toke can't realize the TC's real potential in multi-threaded environment. (as they say 1/3 performance got.)
I know how can we get multi-threaded, doing like crypto module will be the answer. But NIF will blow them... As a rival of a software of much the same, there're many differences:
  • pure difference:
    • Toke requires binary as key/value while yatce can deal with raw term.
    • Yatce uses port_control for driver-erlang communication and seems superior in latency (in a design level) because port_control doesn't make any context-switch. While Toke uses port_command for driver-erlang communication seems good at throughput (as its interface insert_async indicates... no doubt in RabbitMQ!). Eventual consistency.
  • toke is better:
    • putcat, fold, insert_async, tunings supported.
    • good at error code (indicating errors), beautiful interface.
    • Reliable maintainers, more than me:P
  • yatce is better:
    • size, sync, more TC-like interface.
    • It's using TCADB so supports on-disk B+-tree, on-mem Hash, on-mem B+-tree (while toke only Hash)
    • Multiple-tables in one driver (is it good?!)
    • also a NIF'd version!!!
This post is a translation of my Japanese post.


l10n of "CouchDB: the Definitive Guide"

Now I've started translation works inspired by one of my Japanese/pythonista/Erlang friend Voluntas. As he says "CouchDB rocks! smells! Smell of Money!", started reading the Definitive Guide. Andalso I'm interested in the internals of CouchDB. At first I started reading the source code but too complicated and I thought of reading external specs before surveying internals. It should be 'good' way. But just reading documents is not enough for non-native people like me because they skip and have less concentration. To avoid skipping sentences and to force concentrate on English, translating is a much better way, which another Japanese/pythonista shibukawa says (and he is also an evangelist of sphinx).
Moreover, I'm interested in it's data-scalability so now I'm translating the Part IV "Deploying CouchDB". It's difficult to translate the nuance of Erlangers/Pythonistas' (sometimes ironical) rhetorics. I'll show up as the translation grows up.

The license? The book draft is published under CC-3.0 Attribution 3.0 Unported. I believe in.


I'm back!

After long time absent I left this blog, I'm back for my open-source program explanation. See bitbucket and github for my codes. Stay tuned!