Memcache with quicklz

Memcache is a fast, distributed key-value-storage. Yes, this day you would say a non-permanent NoSQL storage, but hey memcache is around since 2003 (initially written by Brad Fitzpatrick) long before the NoSQL-Hype. Memcache has a very broad user base which includes quite a few top 100 sites. You can connect to the Memcache Server (called memcached) via an ASCII or a Binary protocol, but for the most programing language there are already client API’s that makes it easy to use memcache. For PHP there are even two libraries PECL memcached, which use libmemcached, and PECL memcache. I will use memcache for this article. But hey, that’s the easy stuff now it's time to go on.
Compressing the cache
Ok, so it would be great to compress this cache to have even more cache available. It does also make sense to compress the data in the client library; therefore you not only save time in the cache but you also get a speedup on the transfer to the cache.
And guess what, PECL memcache has already build in this feature. If you add the MEMCACHE_COMPRESSED flag to your store operation the library will compress your date by using zlib. You can even define a threshold with which every entry above a certain size will get compressed automatically. PHP does even set this threshold to a certain value initially, but in my point of view this is a bug.
Compression is always an architectural tradeoff. By compression you gain speed on the transmission and space in the storage, but you lose time on compressing and decompressing it on the client. In nearly all cases, or in least the cases your caching is designed well, you gain much more by compression.
Compressing the cache even better
Ever since Urban’s talk about quicklz at the webtuesday I wanted to implement the memcache compression with this algorithm. Due to the high workload at eth, the project did get delayed over and over again, and I also got stuck at benchmarking memcache correctly (also because of the above mentioned bug).
So I did add quicklz to PECL memcache 3.0.5 which was not too hard. The compression happen in the file memcache_pool.c in the functions mmc_compress and the decompression in mmc_uncompress accordingly. I added the files quicklz.c and quicklz.h to the folder and changed the functions accordingly (see the patch here). As I’m a bit lazy I ignored most of the error handling. (Compression shouldn’t fail, right!?) Don’t forget to add quicklz.c to the config9.m4 that it will be compiled as well. As PECL build & install did not work proper for me I did the installation “manual”. “phpize; configure; make; make install”, in the main source folder does the job. Thanks a lot to Alvaro Videla for the help, seems like this twitter-thingy is useful for something after all.
Benchmarking it
It was a bit hard to get a good benchmark. But I used the abstract dump from Wikipedia which is a big xml document. I splited it into chunks of 512k (“split -b 512k -a 3 enwiki-latest-abstract1.xml wiki_”, unix is awesome). I uploaded then all this files into memcached with a small PHP-script and downloaded all of them in a random order 3 times in a row. I repeated these steps over 10 iteration. I started memcached on the local machine with 10 gigs off storage and the debugging option “-vv” on an extra-large amazon ec2 instance.
Uncompressed they did need 642 MB of cache storage; the average write time is 3.89 seconds whereas the read time 5.26 seconds (keep in mind that the read time is always over three iteration). Compressed with zlib the storage space sink to 87 MB but at the same time the read and write time increases significant. The average write time is now 27.66 seconds and the read time goes up to 19.55 seconds. But now the result we are really interested in. I used quicklz level one and did not get a big difference for level two, for level three I did get crashes so I could not do any measurements. The result for quicklz: The read values are with 6.68 seconds pretty good as this comes nearly to the uncompressed, but what is really impressive is the write per write performance which is with 3.54 seconds even better then uncompressed. Unfortunately quicklz uses with 118 MB way more storage then I did expect.

Keep in mind that this measurement are done on a unusual setting, the machine was way faster than a usual web box, so the compression did of course benefit from this. On the other hand usually the memcached (or at least not all of them) are not on the same box and the network speed matters. So if you have a slow network the size of the cache content matters a lot. And at the end you will not compress a so much data like I did, so compression speed should not make a huge difference in your webapp. Also if you have a smart caching strategy, caching will be always faster than calculating the data, no matter how fast compression is.
Conclusion
Well at the end I’m not really happy with the size of the data quicklz produces even if the speed of the algorithm is impressive. But at the end of the day for most web app the cache size matters much more than the speed, because you get a lot of speed improvement out of the cache. That’s the reason why I did not improve the patch the way one could use it in production.






