For those who have no idea what is node.js , evented I/O for V8 , google's super fast javascript engine that powers chrome. Trust me its a completely different paradigm of coding. I wrote this simple node.js script that acts a proxy to multiple websevers and multi-node for spawing multiple processes using all the cores on the machine.
And we see that with more threads RPS is growing and so is the LoadAvg. Node is supposed to provide non-blocking I/O so increase in performance with threads more than the cores was not expected. After a "ps ax | grep node" which proved there were multiple processes running and lsof of each process showed me that all the processes are listening on the same port. Hey wait a minute I was under the impression that only one process can bind to a socket for listening (mistake !!!). After a ton of reading socket manuals I was right : only one process can bind to a socket but forking "child_process" can listen on the same port. The OS acts as a load balancer delegating the requests coming on the port. So added the "f" flag and :
In vain I posted it on Google groups and then a solution to all the problems. Apparently the http.Agent defaults to a connection pool of 5 sockets for making the http requests to the backend servers. So a change to that dreaded line increasing the maxSockets : 4.5K RPS on 15 threads and 6.5K RPS using 30 threads and loadavg below 10. This was really fun to go about figuring out the underlying issue.
UPDATE : after my colleague Harish went through this post he pointed out a lot of things I have missed out to mention.
1) the last tests I was maxing out on network bandwidth.
2) after removing the delay from the webservers I was getting a 14K RPS.
3) I still have to push the proxy to max out on CPU and report test results when I able to achieve the same
var http = require('http'), request = require('request') var port = "8080"; var address = ["1.1.1.1","1.1.1.2"] //multiple node.js webservers delivering content with a delay of 200 ms var server = http.createServer(function (req, res) { var curr = address.shift(); //round robin the webservers request.get({uri: "http://"+curr+":"+port+"/" , maxSockets : 100 // the dreaded LINE which i did not include initially }, function (err, response, html) { res.end(html); }); address.push(curr); }) var nodes = require("multi-node").listen({ port: 8090, nodes:30, //no. of multiple child processes }, server);So now starts the long analysis in which small mistakes would lead me to various pitfalls. Obviously benchmarking this script was extremely essential. ApacheBench is stubborn to end on the first error so turned to siege as the benchmarking tool to emulate realtime traffic. Running this on a server with 16 cores and 32gb RAM.
Nodes requestsPerSecond LoadAvg on the ProxyServer 30 1902.79 8.93 60 3724.89 33.26 100 6553.07 78.10 263 6560.79 223.43 ( had to increase ulimit beyond 70000 started getting EMFILE too many open files error) 300 6800.36 260.87 500 6124.38 345.38 ( ulimit at 300000)
And we see that with more threads RPS is growing and so is the LoadAvg. Node is supposed to provide non-blocking I/O so increase in performance with threads more than the cores was not expected. After a "ps ax | grep node" which proved there were multiple processes running and lsof of each process showed me that all the processes are listening on the same port. Hey wait a minute I was under the impression that only one process can bind to a socket for listening (mistake !!!). After a ton of reading socket manuals I was right : only one process can bind to a socket but forking "child_process" can listen on the same port. The OS acts as a load balancer delegating the requests coming on the port. So added the "f" flag and :
jknair@linuxbox:~/Projects/NodeHttpProxy$ ps axf | grep Node 8783 pts/0 S 0:00 \_ node NodeHttpProxy.js 8784 pts/0 S 0:00 | \_ node /home/jknair/Projects/NodeHttpProxy/NodeHttpProxy.js 8785 pts/0 S 0:00 | \_ node /home/jknair/Projects/NodeHttpProxy/NodeHttpProxy.jsNow that I know they are child_processes still trying to figure out why the increase in RPS. After going beyond 200 nodes "Too many open files" error started popping up and then lsof-ing each pid there were many UNIX Sockets that were open. The loadavg could be because of high processor contention due to these unix sockets trying to maintain some state among all the processes. But reading through the underlying libraries didn't heed any such state maintaining features.
In vain I posted it on Google groups and then a solution to all the problems. Apparently the http.Agent defaults to a connection pool of 5 sockets for making the http requests to the backend servers. So a change to that dreaded line increasing the maxSockets : 4.5K RPS on 15 threads and 6.5K RPS using 30 threads and loadavg below 10. This was really fun to go about figuring out the underlying issue.
UPDATE : after my colleague Harish went through this post he pointed out a lot of things I have missed out to mention.
1) the last tests I was maxing out on network bandwidth.
2) after removing the delay from the webservers I was getting a 14K RPS.
3) I still have to push the proxy to max out on CPU and report test results when I able to achieve the same
Dude, 6.5k qps on 16 core machine is not much. You can achieve that with any web server, if you are doing something trivial. Can you specify what you are doing as part of each request?
ReplyDeleteDo you have any examples of a proxy server implementation that give can give you higher RPS ?
ReplyDeletetry hitting from 5 clients at a 1000 concurrency each, I am sure Apache will die and tomcat without the NIO optimisation will also give up. Mind You I was maxing out the network bandwidth so I still have to test with a faster bandwidth.