Mesa leverages common Google infrastructure and services, such as Colossus (Google’s next-generation distributed file system) BigTable, and MapReduce. To achieve storage scalability and availability
Characteristics and Goals
- Data is horizontally partitioned and replicated.
- To achieve consistent and repeatable queries during updates,the underlying data is multi-versioned.
- To achieve update scalability, data updates are batched, assigned a new version number, and periodically (e.g., every few minutes) incorporated into Mesa.
- To achieve update consistency across multiple data centers, Mesa uses a distributed synchronization
- protocol based on Paxos
How it is different from existing Google tools
- Megastore, Spanner, and F1 all three are intended for online transaction processing they do provide strong consistency across geo-replicated data but they do not support the peak update throughput needed by clients of Mesa.
- Mesa does leverage BigTable and the Paxos technology underlying Spanner for metadata storage and maintenance.
What to learn
Schema changes for a large number of tables can be performed dynamically and efficiently without affecting correctness or performance of existing applications
How it works
- It uses associative and commutative functions based aggregations in tables
- While new version-ed information is being calculated old version is used to server the applications
- When all calculations are over the version is incremented and users issue queries against new version
- Upstream systems generate updated data in batches
- The committer assigns each update batch a new version number and publishes all metadata associated with the update (e.g., the locations of the files containing the update data) to the versions database, a globally replicated and consistent data store build on top of the Paxos consensus algorithm.