Etsy site outage caused by multicast rsync
Etsy · Solr Index Replication
Etsy experienced significant challenges with its Solr search index replication as the index size and number of replica servers grew. The standard HTTP replication method scaled linearly with the number of replicas, leading to increasingly long replication times, with a 4 GB index taking over an hour to replicate to 11 servers. This bottleneck was exacerbated by network interface limitations and the need for nightly index optimizations.
In an attempt to improve replication speed, Etsy engineers experimented with multicast rsync to send index data to all replicas simultaneously. However, this deployment in production led to an “epic failure” where the multicast traffic saturated the CPU on core network switches, causing the entire Etsy site to become unreachable for several minutes.
Following the multicast rsync incident, Etsy decided to integrate BitTorrent into Solr for index replication. This involved selecting and modifying an existing Java BitTorrent library (ttorrent) to handle Solr’s specific requirements, such as multi-file torrents and large file support. The new system allows replica servers to seed index files, significantly speeding up the process of bringing new or stale replicas online.
The BitTorrent replication process involves the primary server hashing index files, creating a torrent, and notifying replicas. Replicas then download the torrent file, verify existing index files, and use the BitTorrent protocol to download missing parts. This approach ensures index consistency through hash verification and provides a clearer overview of replication status compared to the previous HTTP method.
While BitTorrent replication dramatically reduced transfer times (e.g., a 9 GB index to 23 replicas now takes less than an hour), it introduced a lag due to SHA1 hash calculations for large indexes. Etsy addressed this by creating a thread-safe SHA1 implementation and a DigestPool interface for parallel hashing, reducing the lag from 2 minutes to about 32 seconds. Future improvements include creating torrents per index segment to further reduce lag and supporting replication of arbitrary related files.