The idea of Consistent Hashing dates back to the 90's when David Karger and others presented a paper on Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web .
In Amazon Dynamo the database partition scheme relies on Consistent Hashing.
So what is Consistent Hashing ?
Suppose you have a simple web application , your traffic increased suddenly and your db started to slow down. So you decided to cache some result so that your webpages load quickly.If you have n machine the common way to load balance will be to put URL U in cache machine number Hash Function : U+5 Mod N. If N = 23 URL mapping will be U + 5 Mod 23 . .As we are using mod n so all objects will be distributed equally and then the new objects will be added.Problem occur when you decide to remove or add a cache machine .
Add a machine ,N will increase and so all your cache will become invalid.New URL = U + 5 Mod 24.
What Can be the best Way ?
I would say and also advisable will be if a machine is added it will take share of objects from all the rest machine to equally distribute the load and when cache is removed its better to share its object with remaining machine. Thats what Consistent Hashing does.. Consistently maps objects to the same cache machine, as far as is possible. Each item should be mapped to only a small number of machines, and in such a way that all machines get roughly the same load of items.
Choose constant Hash function that maps URL to range [0,1]Assume it to be a unit circle and similarly map Cache to Unit Circle.
URL is assigned to closest cache going clockwise . 1,2,3 Item mapped to A. and B mapped to 4,5 When a new cache is added the only URLs that are reassigned are those closest to the new cache going clockwise around the circle. You start where resource A is and head clockwise on the ring until you hit a server. If that server is down, you go to the next one, and so on and so forth. Suppose if Cache C is added ,Only item adjacent to Cache is added in this case 1,2 added to C .If suppose C is removed then items 2,3 move back to A again.
In Amazon Dynamo the database partition scheme relies on Consistent Hashing.
So what is Consistent Hashing ?
Suppose you have a simple web application , your traffic increased suddenly and your db started to slow down. So you decided to cache some result so that your webpages load quickly.If you have n machine the common way to load balance will be to put URL U in cache machine number Hash Function : U+5 Mod N. If N = 23 URL mapping will be U + 5 Mod 23 . .As we are using mod n so all objects will be distributed equally and then the new objects will be added.Problem occur when you decide to remove or add a cache machine .
Add a machine ,N will increase and so all your cache will become invalid.New URL = U + 5 Mod 24.
I would say and also advisable will be if a machine is added it will take share of objects from all the rest machine to equally distribute the load and when cache is removed its better to share its object with remaining machine. Thats what Consistent Hashing does.. Consistently maps objects to the same cache machine, as far as is possible. Each item should be mapped to only a small number of machines, and in such a way that all machines get roughly the same load of items.
Choose constant Hash function that maps URL to range [0,1]Assume it to be a unit circle and similarly map Cache to Unit Circle.
URL is assigned to closest cache going clockwise . 1,2,3 Item mapped to A. and B mapped to 4,5 When a new cache is added the only URLs that are reassigned are those closest to the new cache going clockwise around the circle. You start where resource A is and head clockwise on the ring until you hit a server. If that server is down, you go to the next one, and so on and so forth. Suppose if Cache C is added ,Only item adjacent to Cache is added in this case 1,2 added to C .If suppose C is removed then items 2,3 move back to A again.
No comments:
Post a Comment