Losing connections due to ADO.NET bug using Transactions


Environment: ASP.NET 2.0/C#/SQL Server 2005 (SP1 on live servers and SP2 on some development and test servers)

So we’ve been developing an application for many months and finally went live with it. Once going live, seemingly randomly we started getting the following problem:

Fill: SelectCommand.Connection property has not been initialized.
Exception Details: System.InvalidOperationException: Fill: SelectCommand.Connection property has not been initialized.

Once we got this, we remembered seeing it somewhat unpredictably during development:

<backflash>
We were using the MyGenerations OR Mapper for our data tier. This is an open source product so we were compiling it in release mode on our own then just pointing to the DLL. Because it was open source, one of us added the MyGenerations project to our solution to debug the problem, but doing so made the problem go away. We ended up adopting this as a workaround to this problem so we could move on (mistake #1).

Just before deploying our application to the live servers, we had a conversion application that was written to convert data from the legacy system to the new system. Everything worked fine until we tried doing the true conversion – we had this same problem. Somebody stumbled upon a workaround by simply running the conversion utility in debug mode and this worked. Because all systems were down and this conversion tool was a 1-run program before it got thrown away, this was an acceptable solution. However, nobody looked further into the problem (mistake #2).
</backflash>

So now back to the present day. Once going live, it seemed every couple days we’d get this problem. Once this problem occurred, ALL connections (it seemed) were hosed and the entire site was down until we did an IISReset. Because it happened so infrequently and we had no clue how to reproduce it, we just monitored the situation with this defect (while working on many other defects). Over a period of a couple weeks, this problem grew and grew until it was happening 10 times a day (again, we didn’t know how to reproduce it but we were logging the hell out of things from every angle we could think of).

Oh, and one more thing – our database is being hit by 3 independent web sites/servers and they were only going down 1-at-a-time so it was most likely not any issue with the database server itself.

Now the obvious first guess at the problem was that we weren’t closing our connections and the connection pool was running out of them or perhaps we had hanging transactions that were causing problems, or something along those lines. We code reviewed the MyGenerations code up and down, left and right to no avail. Then we started reviewing tons of our code, again, no luck. However, one thing we did learn was that simply doing nothing other than compiling the solution in debug mode made the problem go away!

We finally opened a ticket with Microsoft. After a couple weeks of working with them and one of our guys stumbling upon this post, we finally came to the source of the problem – a bug in .NET 2.0′s ADO.NET. Before I show you the fix, read this snippet from an email from Microsoft after we fixed it that explains the problem at a high level:

Based on your information, I did locate a bug on this. The development team does not have any plans right now for a hot fix, but they are looking at fixing this in the future as the forum post indicates. The issue only occurs when you use the transaction as the sole (or last) reference to the connection. Then there is a brief window when the transaction only has a weak reference to the connection and if a Garbage Collection occurs before it is converted to a strong reference, then it will collect the connection object. The reason it doesn’t show the problem in the debug build is that variables are not freed as early as they are when an application is compiled for Release mode, and so we delay enough the window is a non issue. The workaround is fairly simple once you understand the nature of the problem and that is to keep an external reference to the connection that is outside the transaction object.

So the fix was just as he said it was, keep an external reference to the connection that is outside the transaction object. So in our case, with the help of the MyGenerations developers, we modified the MyGenerations code to do this and our problem is gone, at last! So expect an update to their software very soon to work around this .NET bug. And if you hit this problem with your own code, now you know how to fix it. :)

-Jax

2 Comments

  1. Comment by Brian on December 12, 2007 12:30 am

    Would you mind posting the code changes that fixed this problem? Thanks for your help.

  2. Comment by Shane on December 12, 2007 12:49 am

    I don’t have the code handy but let’s see if I can help ya’.

    I THINK we might have done something like creating a singleton List(of SqlClient.SqlConnection) object that we just added our connection object to as soon as we possibly could, and then removed them after we were finished with the transaction. But that was many months ago, a couple projects ago, so I could be wrong. But I think that would circumvent the bug – simply having some object of your own somewhere that references the connection object and that doesn’t get garbage-collected while the connection is open. So going with that, something like the following would work:

    Bad code (well, good code but may potentially trigger the bug)
    | Dim t as SqlClient.SqlTransaction
    | … do stuff to t …
    | t.Connection = New SqlConnection(…) // Note that NOTHING else has a reference to the connection

    Good code:
    | Dim t as SqlClient.SqlTransaction
    | … do stuff to t …
    |
    | // Instantiate your connection with c as a temp variable (this isn’t good enough to fix the bug)
    | Dim c as New SqlClient.SqlConnection(…)
    |
    | // Add a reference to your connection to a “permanent” (until you clear it) datastructure so you
    | // have a “hard reference” to the connection so it doesn’t get garbage-collected.
    | MySingletonClass.ListSingletonReference.Add(c)
    |
    | // Set the connection to your Transaction now.
    | t.Connection = c

    Note that while your c variable (read pointer to the connection) may be garbage-collected, the actual connection cannot be garbage-collected because your singleton is still pointing to it with a “hard reference” to it.

    I do recall that at some poing within the transaction, the buggy “soft reference” became a “hard reference” so we deemed it sufficient (and verified with ample traffic/testing) to drop our own reference to the connection upon committing/rolling back the transaction.

    Hope this helps! Let me know if it doesn’t and I’ll dig up that other code.

    -Shane

Comments RSS TrackBack Identifier URI

Leave a comment


Jaxidian Update is proudly powered by WordPress and themed by Mukkamu