Wednesday, February 25, 2009

HIVE

Hive is an data warehousing system developed by Facebook.
On top of Hadoop, Hive enables developers (or analysist) to write quick query to examine large dataset.
Even though the features of Hive is relatively simple compared with heavier language (e.g. Pig-Latin, DryadLINQ, etc...), it seems working in many cases. If a user need more sophisticated analysis, he/she could write his/her own MapReduce code. If it is a simple task, it is better to use Hive to run that query - Hive is good fit to this scenario. This is a 'good enough' system for many cases, and a stepping stone to more high-profile languages, which would take time to mature.

Wednesday, February 18, 2009

Dynamo

This paper describes Dynamo, Amazon's highly available Key-value store.
As the title says, Dynamo is a simple key-value store, runs on large number of servers, takes failure into account as a normal event, and makes trade-off for availability.
As we've seen so far, lots of traditional data store don't work well in super-large configuration. GFS and BigTable represents google's response to the issue, and Dynamo is Amazon's.
The most interesting point is that they weight write over read - usually the opposite is the case.
This is due to how Amazon's main application - e-commerce - works.
Another interesting point is emphasis on SLA and its representation - 99.9%.
Dynamo is highly engineered toward meeting Amazon's needs.
Even though Dynamo let users choose some trade-off, but there are already a lot of assumption regarding workload characteristic. (Of course, this is the case with GFS and BigTable)
I guess that it is time for another general solution to come. (SCADS?)

Wednesday, February 11, 2009

Chubby

Chubby offer distributed lock system, to be used by other services.

Most interesting part of this work is, Chubby is out of practical reason - programmers are familiar with locking, and often retroactively add scale-out feature, so distributed lock system is more suitable than something like Paxos library.

This implies an important lesson. Large-scale distributed system evolves over long time period, and lots of programmers (with different skill level) write code for it. This means that 'easy-to-use' is often more essential than super-efficiency or elegant implementation.

Also, this kind of 'infrastructure service' for providing various services will be more prevalent, as this is something like 'middle ware for cloud'!

BigTable

BigTable is simplified structured data storage for large distributed system.
As opposite to google file system which is mainly for write-once read-many data, BigTable supports more frequent update. And this is especially engineered toward google's specific needs. For example, string is the only data supported.

This is google's answer to their persistant data storage needs, which is not supported by any existing databases. As system scales, there are many applications (or services) that traditional database system doesn't consider. Or, sometimes database system seems too general to be used for specific purpose. Trends are going backward, to build own storage system, at least for very-large services.

Google File System

Google file system solves google's need to 1) store large volume 2) on commodity hardwares.
Before Google file system (or even now), people usually purchased expensive storage system (NAS, SAN, etc...) to store large volume of data safely. But this is costly, and inefficient in terms of resource utilization, as there are many idle storage space on commodity servers. Google came up with the way to utilize it.

Their design is based on practical reasons. Having single master is a good reason - this is definitely single point of failure, but they chose ease of management and simplicity over robust but complex solution. At least, it is useful enough in their usage pattern.

Also, they have made several trade-off choice to support 1st class customer better - write once read many data. Which implies that many other storage systems should appear to support other needs. BigTable is a good example. This leaves a question - The era of general solution (as database) has gone? Is it way better to devise different system for each need?

Monday, February 9, 2009

eBay's Scaling Odyssey & All I Need is Scale!

Those two slides presents eBay's principles & practices on distributed computing and cloud computing.

Principles described there are classical, but it is important to put them together and keep it in mind whenever we design systems to be scale out, like eBay. This slides are valuable as those are from the actual experience! However, it were more interesting if it included more detailed description how they did various trade-off, as that is easy to say but hard to generalize, and everything is case by case.

To them, it should be natural to see opportunity in cloud computing, for economic and operational reason. As their architecture is already ready-to-scale, it should be relatively easy to adopt cloud computing. To do so, they would like to figure out the difference between internal DC / internal cloud / 3rd party clouds, but it is not clear yet what the dominant 3rd party clouds will look like.

Wednesday, February 4, 2009

A Policy-aware Switching Layer for Data Centers

This paper tries to make middle-box the 1st class citizen in data center environment.
As the network configuration in a data center is getting more complex, making sure all packets going through middle box is getting harder. This paper introduces a programmable switch to apply middle-box policy. This will introduce some overhead at switch level, but operational benefit will be large enough to justify it. As operational cost of data center is becoming increasingly significant, this kind effort to make it simple & effective will get more attention.

DCell: A Scalable and Fault-Tolerant Network Structure for Data

This paper tries to solve similar problem of the previous paper.
It looks similar to hypercube approach in supercomputing world.
DCell introduces complexity in wiring, much more than fat-tree approach.
Even an addition of single computer is not simple.
We can't just add a server or a rack and plug it in.
We should always consider topology.
Also this makes it hard to determine locality.

A Scalable, Commodity Data Center Network Architecture

This paper describes fat-tree topology to utilize commodity switch in data center networking.
As the price of high-end switch is too expensive, many people tried to use commodity hardware to reduce cost. Also, this tries to increase bisection bandwidth, which is becoming more important due to MapReduce type applications. But it comes not for free. Especially, this makes load balancing and wiring more complex, in turn this solution impractical.

Monday, February 2, 2009

Designing a highly availabile directory service

Designing a highly availabile directory service

As data center failure is common, it is important to have software system to cope with it. Basic strategy is making replication to fail over.

An interesting point regarding this article is replication & recovery scenario for different configuration - in case of 1, 2, 3, 5 datacenter.
More replication makes the system more reliable, and more complicated - even in this kind of imaginary scenario.
Real-life scenario might be far more complex, and need more manual fine-tune to achieve adequate performance & reliability as topology & capability of datacenters & links might be diverse.
It seems hard to generalize the solution.

Failure Trends in a Large Disk Drive Population

Failure Trends in a Large Disk Drive Population

The virtue of this paper : REAL numbers.

As the size of system is growing, only a handful of large company is able to see what's going on with that size. And this makes the life of researchers without the access to large resources harder. Cloud computing would help in conducting experiment in larger scale & make it more convincing.

This paper shows some interesting statistics on disk driver failure. It is not surprising that the failure is strongly correlated to model number, but this suggests that the trends could change as technology evolves. It would be interesting to see what will look like with SSD, which has different characteristics from HDD. In case of SSD, we may want to look more, e.g. writing pattern / locality, as SSD is worning out.

Failure stories

These stories tell us that "physical" data center failure happens.
Data center experts should spend more time to think about 'unthinkable' failure scenario to make it more reliable.