Environment: ASP.NET 2.0/C#/SQL Server 2005 (SP1 on live servers and SP2 on some development and test servers)
So we’ve been developing an application for many months and finally went live with it. Once going live, seemingly randomly we started getting the following problem:
Fill: SelectCommand.Connection property has not been initialized.
Exception Details: System.InvalidOperationException: Fill: SelectCommand.Connection property has not been initialized.
Once we got this, we remembered seeing it somewhat unpredictably during development:
We were using the MyGenerations OR Mapper for our data tier. This is an open source product so we were compiling it in release mode on our own then just pointing to the DLL. Because it was open source, one of us added the MyGenerations project to our solution to debug the problem, but doing so made the problem go away. We ended up adopting this as a workaround to this problem so we could move on (mistake #1).
Just before deploying our application to the live servers, we had a conversion application that was written to convert data from the legacy system to the new system. Everything worked fine until we tried doing the true conversion – we had this same problem. Somebody stumbled upon a workaround by simply running the conversion utility in debug mode and this worked. Because all systems were down and this conversion tool was a 1-run program before it got thrown away, this was an acceptable solution. However, nobody looked further into the problem (mistake #2).
So now back to the present day. Once going live, it seemed every couple days we’d get this problem. Once this problem occurred, ALL connections (it seemed) were hosed and the entire site was down until we did an IISReset. Because it happened so infrequently and we had no clue how to reproduce it, we just monitored the situation with this defect (while working on many other defects). Over a period of a couple weeks, this problem grew and grew until it was happening 10 times a day (again, we didn’t know how to reproduce it but we were logging the hell out of things from every angle we could think of).
Oh, and one more thing – our database is being hit by 3 independent web sites/servers and they were only going down 1-at-a-time so it was most likely not any issue with the database server itself.
Now the obvious first guess at the problem was that we weren’t closing our connections and the connection pool was running out of them or perhaps we had hanging transactions that were causing problems, or something along those lines. We code reviewed the MyGenerations code up and down, left and right to no avail. Then we started reviewing tons of our code, again, no luck. However, one thing we did learn was that simply doing nothing other than compiling the solution in debug mode made the problem go away!
We finally opened a ticket with Microsoft. After a couple weeks of working with them and one of our guys stumbling upon this post, we finally came to the source of the problem – a bug in .NET 2.0’s ADO.NET. Before I show you the fix, read this snippet from an email from Microsoft after we fixed it that explains the problem at a high level:
Based on your information, I did locate a bug on this. The development team does not have any plans right now for a hot fix, but they are looking at fixing this in the future as the forum post indicates. The issue only occurs when you use the transaction as the sole (or last) reference to the connection. Then there is a brief window when the transaction only has a weak reference to the connection and if a Garbage Collection occurs before it is converted to a strong reference, then it will collect the connection object. The reason it doesn’t show the problem in the debug build is that variables are not freed as early as they are when an application is compiled for Release mode, and so we delay enough the window is a non issue. The workaround is fairly simple once you understand the nature of the problem and that is to keep an external reference to the connection that is outside the transaction object.
So the fix was just as he said it was, keep an external reference to the connection that is outside the transaction object. So in our case, with the help of the MyGenerations developers, we modified the MyGenerations code to do this and our problem is gone, at last! So expect an update to their software very soon to work around this .NET bug. And if you hit this problem with your own code, now you know how to fix it.