This paper describes wide-area distributed file system.
It follows POSIX and developers could control various trade-offs.
As application runs on multiple datacenter emerges, underlying infrastructure that is easy to use would be crucial. And filesystem is one of the most important systems. WheelFS looks suitable for this end as it offers standard interface and flexiblility as well. More systems supporting multiple datacenter concept will come, such as storage, database, processing, etc...
As there are many trade-off choices and it is application specific in many cases, giving developers an avility to specify cues would be a natural direction to go. However, having too many options often ends up with just using default. Sometimes, provided cues turns out to be inappropriate. It would be great if there is a way to make a smart decision based on profiling or something like that.
Monday, April 20, 2009
Scalling Out
Having multiple datacenters is becoming more important, both for better quality and fault tolerance. This article illustrate needs & problems of expanding a web service over multiple datacenters.
To solve cache consistency problem, they hacked MySQL to know when and what data is replicated - cache item is updated (actually deleted) when it is replicated.
Another problem they had was routing problem. As they allowed writing only on California databases, the user written to the database should remain on California until the entry is replicated to other datacenters to avoid confusion - and they picked 20 seconds for it.
Facebook's solutions looks practical enough, but far from general or formal solution. What if 3rd datacenter is coming? etc. Their solution only works for their settings and applications.
To solve cache consistency problem, they hacked MySQL to know when and what data is replicated - cache item is updated (actually deleted) when it is replicated.
Another problem they had was routing problem. As they allowed writing only on California databases, the user written to the database should remain on California until the entry is replicated to other datacenters to avoid confusion - and they picked 20 seconds for it.
Facebook's solutions looks practical enough, but far from general or formal solution. What if 3rd datacenter is coming? etc. Their solution only works for their settings and applications.
Wednesday, April 1, 2009
Erlang
Erlang is a language developed at Ericsson.
It actually include a concept of virtual machine just like Java, to support populating process.
This language takes massive parallelism and failure in mind from the beginning.
As a result, it is a good fit to large scale distributed system.
It is proven useful, as some of ericsson product and other open-source projects were written in Erlang and work well.
However, people complain about its syntax - As a functional language, the syntax of Erlang is not that easy compared to traditional imperative languages. To be a functional language seems inevitable to prevent side-effect which is bad for concurrent program, but it still keep ordinary programmers from using it.
It actually include a concept of virtual machine just like Java, to support populating process.
This language takes massive parallelism and failure in mind from the beginning.
As a result, it is a good fit to large scale distributed system.
It is proven useful, as some of ericsson product and other open-source projects were written in Erlang and work well.
However, people complain about its syntax - As a functional language, the syntax of Erlang is not that easy compared to traditional imperative languages. To be a functional language seems inevitable to prevent side-effect which is bad for concurrent program, but it still keep ordinary programmers from using it.
Subscribe to:
Posts (Atom)