PyMySQL throws 'BrokenPipeError' after making frequent reads

TheNationalSarcasmSociety picture TheNationalSarcasmSociety · Jan 14, 2017 · Viewed 8.4k times · Source

I have written a script to help me work with a database. Specifically, I am trying to work with files on disk and add the result of this work to my database. I have copied the code below, but removed most of the logic which isn't related to my database to try to keep this question broad and helpful.

I used the code to operate on the files and add the result to the database, overwriting any files with the same identifier as the one I was working on. Later, I modified the script to ignore documents which have already been added to the database, and now whenever I run it I get an error:

pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

It seems like the server is rejecting the requests, possibly because I have written my code poorly? I have noticed that the error always occurs at the same place in the list of files, which doesn't change. If I re-run run the code, replacing the file list with a list of only the file on which the program crashes, it works fine. This makes me think that after making a certain number of requests, the database just bottoms out.

I'm using Python 3 and MySQL Community Edition Version 14.14 on OS X.

Code (stripped of stuff that doesn't have to do with the database):

import pymysql

# Stars for user-specific stuff
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='*******',
                             db='*******',
                             use_unicode=True, 
                             charset="utf8mb4",
                             )
cursor = connection.cursor()

f_arr = # An array of all of my data objects

def convertF(file_):
    # General layout: Try to work with input and add it the result to DB. The work can raise an exception
    # If the record already exists in the DB, ignore it
    # Elif the work was already done and the result is on disk, put it on the database
    # Else do the work and put it on the database - this can raise exceptions
    # Except: Try another way to do the work, and put the result in the database. This can raise an error
    # Second (nested) except: Add the record to the database with indicator that the work failed

    # This worked before I added the initial check on whether or not the record already exists in the database. Now, for some reason, I get the error:
    # pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

    # I'm pretty sure that I have written code to work poorly with the database. I had hoped to finish this task quickly instead of efficiently.
    try:
        # Find record in DB, if text exists just ignore the record
        rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))
        length = cursor.fetchall()[0][0] # Gets the length
        if length != None and length > 4:
            pass
        elif ( "work already finished on disk" ): 
            # get "result_text" from disk
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
        else:
            # do work to get result_text
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
    except:
        try: 
            # Alternate method of work to get result_text
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
        except:
            # Since the job can't be done, tell the database
            cmd = "UPDATE table SET text = %s, hascontent = 0 WHERE name = %s"
            cursor.execute(cmd, ( "NO CONTENT", file_["name"]) )
            connection.commit()

for file in f_arr:
    convertF(file)

Answer

e4c5 picture e4c5 · Jan 14, 2017

Mysql Server Has Gone Away

This problem is described extensively at http://dev.mysql.com/doc/refman/5.7/en/gone-away.html the usual cause is that the server has disconnected for whatever reason and the usual remedy is to retry the query or to reconnect and retry.

But why this breaks your code is because of the way you have written your code. See below

Possibly because I have written my code poorly?

Since you asked.

rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))

This is a bad habit. The manually explicitly warns you against doing this to avoid SQL injections. The correct way is

 rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = %s", (file_["name"],))

The second problem with the above code is that you don't need to check if a value exists before you try to update it. You can delete the above line and it's associated if else and jump straight to the update. Besides, our elif and else seem to do exactly the same thing. So your code can just be

try:
        cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
        cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
        connection.commit()
except:  # <-- next problem.

And we come to the next problem. Never ever catch generic exceptions like this. you should always catch specific exceptions like TypeError, AttributeError etc. When catching generic exceptions is unavoidable, you should at least log it.

For example, here you could catch connection errors and attempt to reconnect to the database. Then the code will not stop executing when your server gone away problem happens.