Thursday, November 24, 2011

ZMQ socket creation test

Working with Harish aka ZMQ guru :)

We needed to use ZMQ for a lot of backend queueing operations from the serving PHP code, I started inspecting the ZMQ-PHP bindings (docs : http://php.zero.mq/ ) .

On close inspections we found that the sockets do not persist over multiple PHP requests*** but persist for multiple requests to the queue within the same request. So we have to make new sockets equal to the different queues we will be connecting for every request. To test whether this is optimal, Harish suggested a test as follows : 

<?php
for($i = 0; $i < 10000; $i ++) {
    
    $pid = pcntl_fork ();
    if ($pid == - 1) {
        
        die ( 'Fork failed' );
        break;
        
    } elseif ($pid == 0) {
        
        $context = new ZMQContext ();
        $socket = new ZMQSocket ( $context, ZMQ::SOCKET_PUSH, 'mySock' );
        
        $socket->setSockOpt ( ZMQ::SOCKOPT_HWM, 5 );
        
        $dsn = "tcp://1.1.1.1:22222";
        
        $socket->connect ( $dsn );
        
        break;
        
    } else {
        pcntl_waitpid ( $pid, $status );
    }
}
?>

We timed it using :

jknair@linuxbox:~/Projects/queue$ time php zmqFork.php 

real 1m32.174s
user 0m26.186s
sys 0m49.355s

and then without the ZMQ stuff :

<?php
for($i = 0; $i < 10000; $i ++) {
    
    $pid = pcntl_fork ();
    if ($pid == - 1) {
        
        die ( 'Fork failed' );
        break;
        
    } elseif ($pid == 0) {  

        break;
        
    } else {
        pcntl_waitpid ( $pid, $status );
    }
}
?>

After timing it :

jknair@linuxbox:~/Projects/queue$ time php zmqFork.php 

real 1m23.990s
user 0m10.629s
sys 0m37.450s

Approximately 10 seconds for 10000 ZMQ sockets.
1ms for a ZMQ socket !!! No problemo :) 


***Update : the sockets persist within the same preforked apache/nginx process that can server multiple php requests but do  not persist across mutiple processes 

Sunday, August 7, 2011

Node.js multi-node Proxy

For those who have no idea what is node.js , evented I/O for V8 , google's super fast javascript engine that powers chrome. Trust me its a completely different paradigm of coding. I wrote this simple node.js script that acts a proxy to multiple websevers and  multi-node for spawing multiple processes using all the cores on the machine. 

var http = require('http'),
request = require('request')

var port = "8080";

var address = ["1.1.1.1","1.1.1.2"]  //multiple node.js webservers delivering content with a delay of 200 ms
 
var server = http.createServer(function (req, res) {

var curr = address.shift(); //round robin the webservers

request.get({uri: "http://"+curr+":"+port+"/" ,
             maxSockets : 100 // the dreaded LINE which i did not include initially
}, function (err, response, html) {
        res.end(html);  
});

address.push(curr);

})

var nodes = require("multi-node").listen({
        port: 8090,
        nodes:30,  //no. of multiple child processes 
    }, server);


So now starts the long analysis in which small mistakes would lead me to various pitfalls. Obviously benchmarking this script was extremely essential. ApacheBench is stubborn to end on the first error so turned to siege as the benchmarking tool to emulate realtime traffic. Running this on a server with 16 cores and 32gb RAM. 

Nodes requestsPerSecond LoadAvg on the ProxyServer

30    1902.79                8.93
60    3724.89                33.26
100   6553.07                78.10
263   6560.79                223.43 ( had to increase ulimit beyond 70000 started getting EMFILE too many open files error) 
300   6800.36                260.87
500   6124.38                345.38 ( ulimit at 300000)



And we see that with more threads RPS is growing and so is the LoadAvg. Node is supposed to provide non-blocking I/O so increase in performance with threads more than the cores was not expected. After a "ps ax | grep node" which proved there were multiple processes running and lsof of each process showed me that all the processes are listening on the same port. Hey wait a minute I  was under the impression that only one process can bind to a socket for listening (mistake !!!). After a ton of reading socket manuals I was right : only one process can bind to a socket but forking "child_process" can listen on the same port. The OS acts as a load balancer delegating the requests coming on the port. So added the "f" flag and :
jknair@linuxbox:~/Projects/NodeHttpProxy$ ps axf | grep Node
 8783 pts/0    S      0:00      \_ node NodeHttpProxy.js
 8784 pts/0    S      0:00      |   \_ node /home/jknair/Projects/NodeHttpProxy/NodeHttpProxy.js
 8785 pts/0    S      0:00      |   \_ node /home/jknair/Projects/NodeHttpProxy/NodeHttpProxy.js


Now that I know they are child_processes still trying to figure out why the increase in RPS. After going beyond 200 nodes "Too many open files" error started popping up and then lsof-ing each pid there were many UNIX Sockets that were open. The loadavg could be because of high processor contention due to these unix sockets trying to maintain some state among all the processes. But reading through the underlying libraries didn't heed any such state maintaining features.
In vain I posted it on Google groups and then a solution to all the problems. Apparently the http.Agent defaults to a connection pool of 5 sockets for making the http requests to the backend servers. So a change to that dreaded line increasing the maxSockets :  4.5K RPS on 15 threads and 6.5K RPS using 30 threads and loadavg below 10. This was really fun to go about figuring out the underlying issue. 


UPDATE : after my colleague Harish went through this post he pointed out a lot of things I have missed out to mention.


1)  the last tests I was maxing out on network bandwidth.
2)  after removing the delay from the webservers I was getting  a 14K RPS.
3) I still have to push the proxy to max out on CPU and report test results when I able to achieve the same

Saturday, October 23, 2010

to dB or not to dB

Personally not a big fan of Database technologies. I have only been exposed to PostgresSql and MySql and Oracle but haven't done anything advanced which I can call it as a skill. From what I remember the most screwy query that I have executed is importing and exporting about 2 gigs of some DBpedia file in postgres and still screwed up the DB pretty well.

Exploring as usual I stumbled upon this MongoDB what caught my eye was JSON-style documents with dynamic schemas offer simplicity and power. Already aware of JSON format and having worked on a JSON-RPC kinda modules, I assumed it was some sort of retrieval format of data. Clicking leads me to BSON specs (scroll to the examples and drag the mouse over the BSON format). Going back to MongoDB I goto the obvious tutorial section which leads me here http://try.mongodb.org/ . What an awesome tutorial !!! 
soon :
sudo apt-get install mongodb

sudo easy_install pymongo

didnt work so easily had to get help something about error while loading shared libraries: libmozjs.so, I am guessing the xulrunner package issues.
So here goes :

Have fun playing around with this new schemaless, document-oriented, NoSQL database and its abundant array of drivers.

PS : I found a link to http://expressjs.com/ ... I have read few things about Node.js but what is this now !!!

Tuesday, October 5, 2010

Javascript top 11 websites

REBLOGGED FROM Angus Croll's blog http://javascriptweblog.wordpress.com/

These are the JavaScript blogs that I am repeatedly drawn to. Their emphasis is on the language itself. Real code, real situations. Enjoy!

Dmitry Baranovskiy, author of famed Raphael vector graphics engine. Updated fairly infrequently but always high quality
http://dmitry.baranovskiy.com/

Juriy Zaytsev (aka kangax) part of the Prototype.js team and all-round JavaScript good-guy. JavaScript wisdom from the master. Also best URL ever.
http://perfectionkills.com/

Alex Young et al. My first stop every morning. True to the URL – new stuff daily. Crystal clear JavaScript commentary and new product info. Added bonus – every Thursday Alex talks you through building his very own JavaScript framework.
http://dailyjs.com/

Peter Michaux. I came across this site fairly recently but already finding plenty of absorbing content. This one, for example, is a gem
http://peter.michaux.ca/

Andrea Giammarchi. Sometimes hard to follow – often controversial, but the product of a brilliant and fiercely independent mind. Well worth the effort
http://webreflection.blogspot.com/

Dion Almaer et al. My other daily fix. I hesitated to add this at first because its less JavaScript centric than the other sites, but its the best site for keeping up with the latest HTML5 and CSS trickery and also shares some very nifty JavaScript solutions.
http://ajaxian.com/

Ben Cherry. Great coder, intrepid investigator, skillful communicator, great attitude. I learned a lot.
http://www.adequatelygood.com/

Oliver Steele. Mr Functional JavaScript himself – a huge inspiration to me, he seems to be on the front end of almost every javascript-lambda pattern out there. Discovering his Functional.js library was like finding gold. Plenty of other goodies in here too.
http://osteele.com/

Nicholas C. Zakas. Author of a slew of excellent JavaScript books. A new gem every week and takes the time to explains some of those pesky JavaScript quizzes that are popping up everywhere (including his own)
http://www.nczonline.net/

Robert Nyman. I came here for this (read it!) and stayed for the rest. Look for “JavaScript series” in the sidebar (there’s much more of interest besides)
http://robertnyman.com/

Thanks Angus for the awesome list and yea the 11th its his blog itself http://javascriptweblog.wordpress.com/

Saturday, October 2, 2010

Stackoverflow

My first question at Stackoverflow !!!

http://stackoverflow.com/questions/3846631/c-vs-python-precision


I got the solution in minutes later !!! thanks TOR VALAMO

AWK, SED, GREP, FIND

The basics of these 4 utilities is a must know !!! I have found a few tutorials which atleast taught me the basics which HELPS a lot !!!




GREP : http://www.readylines.com/linux/grep/basics You will find a lot on grep but this one of the most basic one I could find.


Well after all this I couldnt resist but googling up python appended to each of these commands and well a few things to read ....




Friday, October 1, 2010

MozGnowser

Our final year project MozGnowser done under the guidance of Dr. Nagarjuna ( chairman of FSF India). His blog http://gnowgi.org/ is a must visit. After completing a mini project of a Mozilla toolbar to connect to http://www.gnowledge.org/ we were given the task of making a GUI for Gnowsys (http://lab.gnowledge.org/Software) as a Mozilla extension.

Gnowsys : Gnowledge networking and Organizing System (GNOWSYS) is the flagship software project. Its main objective is to implement a theory of meaning (neighbourhood theory of meaning) represented in the form of a structure of memory.

Initially all the attempts to make Cross site XML-RPC requests from a remote client was fruitless. Then writing a JSON-RPC server on client side and then using it as a sub domain was the next choice. While googling for python and XPCOM landed to the beautiful solution called PyXPCOM (http://pyxpcomext.mozdev.org/index.html). Creating python stubs and using jquery and jquery-ui we were able to make all requests and with python's power at the backend helped create a powerful application. We faced issues with rendering the SVG which was the most crucial part of the project and to make it interactive with the rest of the UI. The SVG generated at the server side by the Gnowsys app itself. Stumbling onto yet another SVG library http://www.dotuscomus.com/svg/lib/library.html which took a lot of time modifying for making a suitable for the NozGnowser. We used the python backend to completely rewrite the SVG to make it suitable to the DOM interactions we needed.

Proxy settings feature and using the THREAD manager to make the JS scripts run in parallel so that no more UNRESPONSIVE SCRIPT warnings due to slower connections was also implemented. Then the GnowQL Shell where developers can write the python gnowql commands to obtain the pythonised results was also added.


some snapshots of the project :