Images are one of the key components of a social network. A storage for images needs to be highly scalable and provide redundancy, high availability and the ability to grow its size. Efficiency is also required so that disk stage and the need for processing power can be minimized.
Tuenti's image storage uses a Content Delivery Network (CDN) as a web cache that allows us to meet high throughput requirements. When an image is not cached in the CDN, it is requested from the Image Routing Layer (IRL), which is in charge of finding its physical location. If the IRL is not able to retrieve the image from one of the locations it can get it from the other copies available, preventing the CDN and the user from noticing the miss. If the requested size is not available in the storage, the IRL will automatically resize the best size available and serve it back. Expensive operations, such as finding the physical location or resizing, are only done when there is a cache miss on the CDN.
The physical storage is split in homogeneous buckets that are spread across the storage servers. The growth strategy is to add more Storage Servers and to rebalance buckets towards them. Rebalancing not only provides free space on full servers but also allows the upload bandwidth to increase because there will be fewer buckets per server, and so fewer uploads per server.