I’ve set up a RabbitMQ cluster (well, just one node for now) and put an AWS Elastic Load Balancer in front of it to handle traffic in case I need to add more nodes and deal with increased traffic. Unfortunately, ELB can’t really handle serving RabbitMQ traffic without causing all sorts of strange issues.
My specific issue is that I was using the ELB endpoint as the connection, but I would often get errors such as “Could not establish TCP connection to any of the configured hosts” and “Empty response received from the server”. I tried a number of different things such as changing the timeout settings on the load balancer, messing with the connection strings and various configurations, but nothing worked. Enabling logging for the load balancer also proved fruitless.
Digging around, I found a number of different posts about people having issues with this. It seems to have to do with how ELB deals with checking if connections are alive and heartbeats and the like. So after struggling with this for some time, I went a totally different direction: to use HAProxy instead of the elastic load balancer.
So I spun up an instance of HAProxy, configured it a bit, and everything works. Really well, actually; I’ve not had any issues (knocks on wood) since I made the change. The biggest thing to note was in the haproxy config:
listen rabbitmq 0.0.0.0:5672 mode tcp log global retries 3 timeout client 3h timeout server 3h option clitcpka option tcplog option persist balance leastconn server rabbitmq1 10.1.2.3:5672 check
That is, to have the timeout on HAProxy be longer than the default tcp_keepalive_time on the server.