Use a regular expression to lock up the webserver

I am a rookie when it comes to regular expressions but learned one thing which I want to share with you. With a regular expression you can do very sophisticated matching of string patterns. The content of a regullar expression is a science on itself. The .net framework makes evaluating the expression quite simple.


string Pat = MyRegularExpression;


RegexOptions options = new RegexOptions();
options |= RegexOptions.Singleline;


Regex r = new Regex(Pat, options);
Match m = r.Match(MyStringToSearch);


if (m.Success)


The Match method of the RegEx class will do the actual matching. What the documentation does not tell you is that it can take virtually forever to execute this statement. We had a beautifull expression which worked very well on a test string. Skimming the contents of a 396 KB text file went different. Starting the match drove the aspnet worker process (w3wp on server 2003) to 100 % processor utilization (51% on a (hypertreaded) dual) which locked up the server. We killed the process after half an hour. Nobody can wait that long.


Browsing the net I found this post on MSDN which explained what was going on. Finally found a need for supercomputing@home.

This entry was posted in ASP.NET, Coding. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Ryan

    Using a regular expression to match a 400k file is almost certainly a case of using the wrong tool for the job.

    Regular expressions are used to ask the question “is this text in the format I think it is?” If you have a gigantic _file_ then you should pretty well know where it came from, and what it is. You probably don’t really want to ask whether the WHOLE file is in some specific format, but whether some string is present inside it.

    Just curious, what are you trying to do this for?

  • Sam

    I’ve found regular expressions to be great for parsing, and I’m not talking about 396K files either, but 800MB to 2GB files.

    It all depends on the expression, and the complexity of the data. If you know an explicit start/end it really helps. If you don’t, then it can run much slower. Also, one big expression called fewer times is faster than lots of little expressions called many more times, so once you’ve broken up a file into manageable chunks, using other regexs to break them up into groups can be comparatively slow. Try to do as much as possible in a single go (for a given buffered chunk of a file) and you’ll be good, if you can’t, then you’ll probably have to abandon regexs.

  • http://codebetter.com/blogs/peter.van.ooijen/ pvanooijen

    The file being parsed is an xml file. The first go was using xpath. In the proces of developing the xpath expression xpath went mazurk. As an alternative regular expressions popped up. This went mazurk to.

    As stated, I’m not deep in regular expresssions at all and I was surprised to find out they could take that long to match. Looking back I think xpath went mazurk for the same reason. Still investigating that. No fingerpointing yet :)

  • Ryan

    Sed might help to trim down the file before searching through it in some other way. Look at http://www.student.northpark.edu/pemente/sed/sed1line.txt and search for “regexp.” (Yes sed does run on Windows.) Of course, sed works line by line, so if your file doesn’t have carriage returns in normal places, it won’t work.