Thursday, 2 January 2014

GPS : A Parallel Graph Processing Framework

As soon as google announced Pregel , Many open source started to flow in . Some were gone with the wind like Phoebus,HipG,GoldenOrb where as some stands tall because of there robust framework. One such Pregel implementation is GPS. A parallel Graph Processing Framework. I wont call it a Pregel Clone because of the added features it has.
In GPS the vertices of graphs are distributed across machines and message are exchanged between vertices to perform the computation. Each computation is devided into iterations called superstep where at each vertex vertex.compute function is applied.Unlike Pregel GPS brings with it some new features that makes the execution really fast .
1) One such feature of GPS is master.compute() function. Many algorithms like pagerank,shortest path can be implemented with vertex.compute() function but some algorithms like K-Means are combination of global and vertex computations . For those functions GPS has something called master.compute() function.With the present pregel implementation we had to consider one vertex as master and had to put the logic of global computation inside that function resulting in a sharp increase in execution time .In GPS pregels  API is extended by including master.compute() function. Subclass the master class and implement the master.compute() function. Master class has access to all global functions and also update the global object map.
2)Dynamic Repartitioning :  To reduce the number of messages exchanged during algorithm computation GPS reassigns vertex to different workers dynamically
3)LALP : large adjacency list partitioning (LALP) :  Adjacency list of high degree vertices are distributed across different workers rather then storing at single worker resulting in again lesser message exchange.

No comments:

Post a Comment