A long running SQL batch in asp.net with feedback

Another story from the app with the sprocs. That was a classical 2-tier CS application which we transformed into an asp.net 1.1 app. Once a year the application has to copy and transform huge amounts of data. This is done by running a batch of stored procedures which run in one huge transaction after which the database is ready for the next year. It takes quite a lot of time, up till now 8 hours was not unusual. The growing usage of the app will only increase the amount of time needed. Some feedback on the progress would be nice. For a windows client all of this is not that difficult to build, to get something likewise in asp.net is a different ballgame. We managed to get something working. It does have a drawback though; more on that in the end.


The outline of our approach:



  • Create a table with a progress message and a timestamp

  • Start a transaction and enlist all sprocs

  • Start the sprocs one by one

  • In between add progress records to the temp tables

  • Wrap the process up in a method which is started in a new thread

  • Start the thread and navigate to another page which periodically queries the progress table

Some essential parts of the code


A helper function kopieSproc wraps up an individual stored procedure in a SqlCommand, sets the parameters and sets the timeout of the command. Setting up the transaction and enlisting the procs:



SqlCommand cmdPRKOPIEJRBDGAFD = kopieSproc(“dbo.PRKOPIEJRBDGAFD”, idFaculteit, vanJaar, naarJaar);


// More sprocs


SqlCommand cmdPRKOPIEROOSTER = kopieSproc(“dbo.PRKOPIEROOSTER”, -1, vanJaar, -1);


 


sqlConnection1.Open();


SqlTransaction trans = sqlConnection1.BeginTransaction();


 


// Alle sprocs in 1 transaction


cmdPRKOPIEJRBDGAFD.Transaction = trans;


// More sprocs


cmdPRKOPIEROOSTER.Transaction = trans;


A  helper function reportStatus fires a sqlcommand to write a status record to the table. This command should not enlist in the transaction. The transaction is meant to get an all or nothing result, changes are not visible until the transaction is committed. The status row has to be visible the moment it is added to the table.


Running the transaction:



try


{


    reportStatus(cmdStatus, “Start kopiëren cursusjaar”);   


 


    reportStatus(cmdStatus, “Start kopieren jaarbudget school”);


    rc = cmdPRKOPIEJRBDGAFD.ExecuteNonQuery();


    reportStatus(cmdStatus, string.Format(“{0} regels toegevoegd”, rc));


 


    // More sprocs


   


    reportStatus(cmdStatus, “Start kopieren roostergegevens”);


    rc = cmdPRKOPIEROOSTER.ExecuteNonQuery();


    reportStatus(cmdStatus, string.Format(“{0} regels toegevoegd”, rc));


 


    reportStatus(cmdStatus, “Commit naar database”);


 


    trans.Commit();


 


    reportStatus(cmdStatus, “Kopieren voltooid”);


}


catch(Exception ex)


{


    trans.Rollback();


    reportStatus(cmdStatus, ex.Message);


    reportStatus(cmdStatus, string.Format(ApplicationMessages.CursusJaarKanFaculteitNietKopieren, idFaculteit));


}


finally


{


    sqlConnection1.Close();


    reportStatus(cmdStatus, “Connectie met DB gesloten”);


}


So these  lines of code can take hours and hours to complete. After each successful step, or after an exception a row is added to the table. Never mind the Dutch, the idea should be clear.


The whole process is wrapped up in a method KopieerFaculteit, which is a member of a component in the data layer. A web page starts a new thread for the method and navigates away.



private void maakKopie()


{


    int idVanJaar = int.Parse(DropDownListVan.SelectedValue);


    int idNaarJaar = int.Parse(DropDownListNaar.SelectedValue);


    JaarData.KopieerFaculteit(int.Parse(DropDownListFaculteit.SelectedValue), idVanJaar, idNaarJaar);


}


 


private void Button1_Click(object sender, System.EventArgs e)


{


    System.Threading.Thread t = new System.Threading.Thread(new System.Threading.ThreadStart(maakKopie));


    t.Start();


    Response.Redirect(“../SysteemBeheer/KopieerStatus.aspx”);


 


}


The maakKopie method collects the parameters from the form and will keep running for quite some time on its own thread. The user is redirected to the status page.


The follow the status, the status page has to requery the database periodically. Refreshing the page is automated by setting the (browser’s) requery interval in the head of the page to 30 seconds.



<HEAD>


    <title>KopieerStatus</title>


    <meta name=”GENERATOR” Content=”Microsoft Visual Studio .NET 7.1″>


    <meta name=”CODE_LANGUAGE” Content=”C#”>


    <meta name=”vs_defaultClientScript” content=”JavaScript”>


    <meta name=”vs_targetSchema” content=”http://schemas.microsoft.com/intellisense/ie5″>


    <meta HTTP-EQUIV=”REFRESH” CONTENT=”30″>


</HEAD>


The result is that the application is responsive, none of the responses takes long to generate. The application is also informative, the user will be kept up to date about the status of the process. The application will stay alive, as it has to handle a request every 30 seconds.


A small drawback is debugging. Any exception, however futile (like an error message which is to long to fit in the status field) trashes the thread with a quite non-informative “system error” message. So before running the code in a new thread you’ll have to try it on the main thread. And now you have to set the page time-out to something large. Which does have it’s drawbacks.


But, as said in the beginning of this story, there is a larger drawback. The thread running the sql batch lives in the IIS application. As long as the application is alive the thread will keep running. But there are a lot of ways the application can be restarted. In .net terms the appdomain is recycled and your thread dies.



  • IIS is restarted

  • A content file of the application, like an aspx page or the web.config, is touched

  • Spontaneously, IIS restarts the application to free up resources.

You can control the first two, but you cannot (afaik) control the latter one. Googling around on recycling an appdomain will give you a lot of information but no solution. The only one is to limit the amount of work you do in one go. Which does make a lot of sense but in our case that would be a major re-architecting of the application. All the BL is in the sprocs and their interaction. It works and it’s result are exactly what’s desired. So better stay away from changing that.


Any suggestions are welcome. For the moment we’re happy with this.

This entry was posted in ASP.NET, Data. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://todotnet.com Sander

    The ‘spontaneous’ restart of the application is not really that unpredictable. Recycling (that’s what it’s called) of the application pool is based on settings you can manipulate yourself, such as: every x minutes, at certain specified times, when consumed memory reaches a certain point. Check the settings for the specified application pool for that webapp.

    Nonetheless, it’s not recommended design to count on whether or not the application pool recycles. I would personally run the longrunning action in a seperate process and have a field in a table somewhere telling me whether it’s still running. You can then query that status-table anytime.

    Given that it’s a once-a-year-process, I can imagine not spending too much time perfecting it.

  • Wyatt Barnett

    Is there any particular reason you did not take this huge batch process offline and run it from the command line?

  • http://codebetter.com/blogs/peter.van.ooijen/ pvanooijen

    @ (hi) sander
    Thanks for the info. No it’s not a recommended design. The good thing is that the operation is one transaction. When the thread fails the transaction will not commit. No damage, just (a lot of) time lost.

    @Wyatt
    A major requirement was to push the database away. It can only be reached by the web-server. The user can only reach the webserver over HTTP(s). Not much choice..

  • btompkins

    I have a solution for long running processes using ATLAS to update the UI here:

    http://codebetter.com/blogs/brendan.tompkins/archive/2006/04/18/143041.aspx

  • http://codebetter.com/blogs/peter.van.ooijen/ pvanooijen

    @Brendan, that’s cool! But my app is an 1.1 app, which would mean to a lot of do it yourself Ajax. The other problem is that I want to be notified several times during the process.

  • Wyatt Barnett

    I understand that requirement, and deal with it all the time. That is what scheduled tasks are for. Or, if it cannot be scheduled, use a fire-and-forget web service to kick off the background process.

  • http://www.codebetter.com/blogs/eric.wise ewise

    I agree with Wyatt, if your users have time to sit around watching a screen auto-refresh, they need a manager to give them something else to work on.

    You can still use a fire and forget web service that updates the log tables and bring that up in a asp .net page on request without having to have someone sit around and babysit it.

  • http://codebetter.com/blogs/peter.van.ooijen/ pvanooijen

    A webservice, be it fire and forget or one which reports also runs in IIS. So that’s no difference. As a kick-off could be an option.

    It’s not a matter of scheduling it, but the user has to start the proces by hand, after doing some preparations. Besides that you need (a lot of) rights to schedule a task on a remote machine.

    The main organisational problem we are facing is permissions. The IT department is very strict (and also short-sighted). They open some ports and that’s just it. The user does not have the time to baby-sit, to busy battling IT :) Walking past a machine should give a quick status how the process is doing.

  • Colin Blair

    Assuming this is MS SQL Server, could you let SQL Agent run this as a job instead? If you aren’t really doing any logic beyond running stored procedures and writing to a log file, you can do that in a job. Your web application then just calls sp_start_job and then monitors your tracking tables using read uncommited.

  • http://codebetter.com/blogs/peter.van.ooijen/ pvanooijen

    I have to admit I’m blank on SQL Agent. It could be done provided a SQL agent job can be started as fire and forget from asp.net and that should not require openeing an extra port or fiddeling with an user setting.
    Worth investigating, thanks.

  • Colin Blair

    SQL Agent is part of SQL Server, so ports will not be an issue. Any jobs will be running inside the SQL Agent process so it doesn’t matter what you are calling sp_start_job from. You just need to make sure that the job that you create is “owned” by the user who kicks it off. That way they have the correct permissions to run it. To make your life easier, write one stored procedure to own the transaction, run the other stored procedures, and write to the log table. Then just write your job to execute that stored procedure.

    If you wanted to get fancy, you can even have your application create the job, run it, and then delete the job afterwards. Permissions for job creation default to the public role. Read sp_add_job, sp_delete_job, sp_start_job, and sp_stop_job in SQL Server Books Online. However, you are better off creating the job through Enterprise Manager and then scripting it out from there instead of trying to type it out yourself. Just right click on the job, All Tasks-> Generate Sql Script.

  • http://codebetter.com/blogs/peter.van.ooijen/ pvanooijen

    Thanks Colin, that looks quite clear, pretty cool and quite usable.
    The C# method firing all the sprocs can be converted to a sql script.

    In our case I see two problems
    - It requires the sql server agent to be running
    - It’s harder to update the job as it’s not a member of the database itself.

    We’re using sql-compare to create stup and update scripts and pass the result to the IT-department. Who need monkey proof scripts they can run. And they stiil manage to run those against the wrong (even master) database. So we should go for creating the job on the fly.

    In case the current solution, running in an IIS thread, starts giving trouble this looks like the way to go. Given IT will cooperate :)

  • http://payday-on-line.biz/ Thea

    Greeting. Whatever you do, do it to the purpose; do it thoroughly, not superficially. Go to the bottom of things. Any thing half done, or half known, is in my mind, neither done nor known at all. Nay, worse, for it often misleads.
    I am from Benin and , too, and now am writing in English, give true I wrote the following sentence: “Rates on asia hotels, cheap asian airline tickets, vacation packages,last minute asia tickets shows.”

    With respect :-( , Thea.

  • Mitchell Stokely

    I have also run into this problem. Your typical “old school” desktop application developer, trying their hand at Web Sites (ASP.NET), will think this way, and try running a gigantic heavily dependent OS Service, or command line process, or some other tedious and inconvenient process manipulating threads and AppDomains.

    If you need to run something like this from the web site interface, as more people are doing, then your best design is to break apart your SQL Process into small chunks and return small results, but avoid the 8 completed process. Thats a lofty redesign but if you start thinking in terms of processed versus unprocessed work loads and records, you might find you have a universal framework that not only works to solve this one problem but across all processes drivven by IIs. This means your 8 hour process is done in small batches of several thousand records every 30 seconds or so. You can then allow SQL to return a completed status via AJAX at undetermined intervals as needed rather than a clunky metarefresh which is also prone to failure (ie a user closes the browser, logs off, computer shutdown, etc.). By processing incrementally, returning completed datasets in SQL, the user can pause the process in ASP.NET then return.

    - Mitchell Stokely,  Web Architect