Monday, November 30, 2009

Communication Link Failure , Mysql,jdbc

Communication Link Failure

So, I was indexing a 10+ million records in MySQL into Solr and kept coming across the following odd MySQL exception:

com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications
link failure
Last packet sent to the server was 4467745 ms ago
...

com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985) at
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) at
In my code, I loop over a JDBC ResultSet and add the records to Solr per the Solr field schema, mapping columns to fields, etc. This would happen after getting through something like 9M+ records. After some tracking down, hypothesizing, talking with others, we came to the conclusion that the issue was a combination of having the autocommit value in Solr set and MySQL timing out the ResultSet, such that when Lucene had to do a large merge (even in the background), Solr had to wait for said merge to finish, thus keeping the ResultSet open too long w/o activity. Now, these large merges can take some time. They can happen in the background, but Solr can’t refresh it’s IndexReader until the merge finishes, AIUI. Thus, we’re stuck in the middle of a ResultSet loop, holding the cursor open past MySQL’s default setting (600 seconds, more on that later), causing MySQL to kill the connection, and rightfully so. On the MySQL side of things, we are streaming the results, since it’s JDBC driver does not support setFetchSize() (ugh!). As it turns out MySQL has a Streaming timeout value named netTimeoutForStreamingResults (see http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html) which defaults to 600 seconds.

Long story short, I have at least two options:

Turn off autocommit, meaning user’s won’t be able to see documents as soon as they may like
Increase the netTimoutForStreamingResults value. This is great for MySQL and I have verified it works, but is not a generic value for other DBs, which our code supports
I am still deciding on what to do, and also thinking of some other options that can decouple DB retrieval from the indexing process. At any rate, I wanted to post the cause of my seeing this exception, because I did not see anyone else with this exception whose cause was due to a timeout during ResultSet processing and hopefully it will save them some time.

7 comments:

  1. Do you have found a solution? I've the same problem...plyzzzzzzzzzzzz

    ReplyDelete
  2. Hi I am facing the same problem
    Did you get any solution for it



    Thank you,
    Vijayant Kumar
    Software Engineer
    Website Toolbox Inc.
    http://www.websitetoolbox.com
    1-800-921-7803 x211

    ReplyDelete
  3. I solved the problem by commentig out the 'innodb_lock_wait_timeout' entry under the section '[mysqld]' in the configuration file 'my.cnf' like this:

    #innodb_lock_wait_timeout = 50

    ReplyDelete
  4. Currently your mysql was listening to your local machine, so go to your my.cnf file and comment "bind-address" or make it to listen on all address.

    ReplyDelete
  5. @Karthik thanks very much! Just saved me!

    bind-address= localhost
    service mysql restart

    ReplyDelete
  6. Colocar a lib do mysql connector na pasta

    $ /usr/lib/jvm/java-6-openjdk/jre/lib/ext

    Editar o arquivo
    $ sudo vi /etc/mysql/hosts.allow

    Adicionar a linha:
    mysqld : 127.0.0.1 : allow

    Reiniciar apache : service apache2 restart

    Reiniciar mysql : service mysql restart

    ReplyDelete