Monday, 14 October 2013

SPAR Review

Paper Link :
SPAR 


This paper presents a new system(SPAR) of doing Partition on social Networks. SPUR helps in minimizing the replication overhead by maintaining data locality  .When experiments were conducted on Twitter and Facebook data sets ,using various algorithm and compared with each other it was clearly seen that SPAR outperforms other algorithms. So seeing the huge demand of  social network, I think using SPUR will bring significant improvement in data replication .The paper is also well written and conveys the idea clearly.

8. Detailed comments
A. Summery
1) Scalability of real systems is a complex field. It becomes more difficult for Social networks as the data is not disjoint. The paper proposes a middleware for partition and replication for Social Networks called SPAR. SPAR works on joint partitioning and replication. Author explains replication with a graph containing 10 nodes on 2 servers and gives overview of replication using DHT,Full and SPAR. In case of SPAR the queries are resoled locally on the server as a result the throughput was high.  SPAR also gives user flexibility to select its datasource. Spar is a online algorithm so it is highly useful for dynamic social graphs as these graphs requires recomputation of partitions.The author describes the Min replication problem and gives a solution based on greedy optimization. The algorithm is triggered by any of add/removal of node,sever,edge.All the 6 cases are analyzed. Addition of user happens at the partition with minimum replicas. When a user is deleted master and its slaves are deleted.In the case of new relation a edge is created between 2 users . Algorithm checks if the two masters are co located if so no action is required. If not then it calculates minimum number of replicas to be created.The author explains the edge addition with various cases.It was illustrated that minimizing the nodes was not the only condition for minimizing replication .The cases of server addition and removal is also discussed in the paper.In case of server addition it was seen that SPAR was able to achieve stable state ,irrespective of how servers were added.  Extensive experiments were conducted on data from Facebook and Twitter and SPAR was compared against various other algorithms like MOTIS,MO+,Random. The experiments were conducted with 0 and 2 replicas and computed movement cost across 4 servers to 512 servers. It was seen that overhead was minimum in case of SPAR. COV for read and write operation in case of SPAR was 0.37 and 0.0019 ,indicating spars efficient efficient handling of read and write in terms of balncing them across servers.Using Twitter clone Statusnet performance of SPAR was studied on top of MYSQL and Casandra and it was seen that SPAR reduces network traffic by a factor of 8.

3)It is not very clear from the paper how the new edges are added. How they deal with edges if there is metadata attached to it.
4)Local load balancing was not addressed.
5)Formulation of Solution is very vague and could have been explained in a better way.

No comments:

Post a Comment