CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]



Published by

Comments

Brendan Tompkins said:

Fascinating stuff and what a great way to start your tenure here at CodeBetter!  Glad to have you aboard.

BTW, I'm a sortof NO native (Dad's whole family lives there) ... I always thought of naming my blog "Lagniappe"  :)

-Brendan
# June 9, 2006 8:20 PM

Greg said:

Glad to be here Brendan. Nice one on the lagniappe, I had considerred it as well but there goes that idea :)

My gf is out of town this weekend = hopefully extra time to write ..

Greg
# June 9, 2006 8:55 PM

Sam Gentile said:

Excellent start - you're making me proud -)
# June 10, 2006 9:46 AM

Jason Haley said:

# June 10, 2006 1:07 PM

Greg Young [MVP] said:

I promise this is the last ground laying post before I move on to something of consequence but I wanted...
# June 10, 2006 6:52 PM

Jason Haley said:

# June 11, 2006 11:01 AM

Greg Young [MVP] said:

This article discusses looping performance at the lowest level (native). It dispels a few myths that have been circulating the blogosphere and makes some general performance recommendations.
# June 11, 2006 7:10 PM

Greg Young [MVP] said:

This article discusses looping performance at the lowest level (native). It dispels a few myths that have been circulating the blogosphere and makes some general performance recommendations.
# June 11, 2006 11:42 PM

Jason Haley said:

# June 12, 2006 11:10 AM

Mark Lubischer said:

Well written article, quite interesting to see the optimization done by the compiler.

As I was reading through it, I was wondering about how a locally scoped hoist would be handled by the JIT optimizer:

       int total = 0;
       int[] length = new int[10000];

       for (int i = 0, j = length.Length; i < j; i++)
       {
           total |= i;
       }
# June 12, 2006 4:12 PM

Greg said:

That's a good question, you are correct that I forgot to mention this, I will add a section to add this.

This will have 2 distinct behaviors (one when inlined one when not)

With inlining it is smart enough to realize what you are doing (and to undo it)

           for (int i = 0, j = length.Length; i < j; i++) {
00000014  xor         edx,edx
00000016  mov         eax,dword ptr [ecx+4]
00000019  test        eax,eax
0000001b  jle         00000026
               total |= i;
0000001d  or          esi,edx
           for (int i = 0, j = length.Length; i < j; i++) {
0000001f  add         edx,1
00000022  cmp         edx,eax
00000024  jl          0000001D
           }

This is identical to the code produced but i<length.Length

Without inlining it will put it into the preamble of the loop producing functional equivalent results to the other hoisted examples (although slightly different orderring and obviously the variable maintains a better scope) 

          for (int i = 0, j = length.GetUpperBound(0); i < j; i++) {
00000013  xor         esi,esi
00000015  mov         ecx,eax
00000017  xor         edx,edx
00000019  cmp         dword ptr [ecx],ecx
0000001b  call        792666A8
00000020  test        eax,eax
00000022  jle         0000002D
               total |= i;
00000024  or          edi,esi
           for (int i = 0, j = length.GetUpperBound(0); i < j; i++) {
00000026  add         esi,1
00000029  cmp         esi,eax
0000002b  jl          00000024
           }

The key thing to notice is that the JIT still does not realize what we are doing with array bounds hoists ...

            for (int i = 0, j = length.GetUpperBound(0); i < j; i++) {
00000016  xor         esi,esi
00000018  mov         ecx,edi
0000001a  xor         edx,edx
0000001c  cmp         dword ptr [ecx],ecx
0000001e  call        792664C8
00000023  test        eax,eax
00000025  jle         00000039
00000027  mov         edx,dword ptr [edi+4]
                total |= lengthIdea [I];
0000002a  cmp         esi,edx
0000002c  jae         0000003F
0000002e  or          ebx,dword ptr [edi+esi*4+8]
            for (int i = 0, j = length.GetUpperBound(0); i < j; i++) {
00000032  add         esi,1
00000035  cmp         esi,eax
00000037  jl          0000002A
            }


Good catch!

Cheers,

Greg

# June 12, 2006 5:55 PM

Greg Young [MVP] said:

Ok, I have received about a dozen emails in regard to my last post An in depth look at foor loops basically...
# June 12, 2006 8:34 PM

Jason Haley said:

# June 13, 2006 9:38 AM

Mark Lubischer said:

Thanks for the follow up!
# June 13, 2006 12:39 PM

Greg said:

I have some further information on the removal case that I will be posting .. I am waiting to hear back on a few things before I post. Basically if you assign a static/instance to a local variable removal will occur (but often times your explictness is also optimized out). I am trying to figure out if this is a bug or not and whether or not this should be considerred a best practice in 2.0 as the same optimization may in fact incur a penalty dealing with other JITs.

# June 17, 2006 1:03 AM

derekdb said:

My understanding was that the Jit optimizer only hoisted bounds checks when the code directly tested against array.length.  I've actually seen code of the form
for (int i=0; i < foo.length; i++) { if (i < 1000) break; ... with the claim that this hoisted the bounds check out of the loop.  I can't verify this right now, so I leave that as an exersize for the reader..
# June 17, 2006 1:32 AM

Brendan Tompkins said:

Yeeha! 30 Lbs!  That's a whopper!  My biggest striper (we call em rockfish here in VA) was about 20.  How long did that take you to land?
# June 28, 2006 1:01 PM

Greg said:

Just under 10 minutes ... I had fairly heavy tackle as I was after alligator blues and was striper fishing during slack tide. I caught 3 other smaller stripers (one of which took about 15 minutes to land, about a 15 lber the thing with him was that he ran back to his home the line for a lobster pot so I had to pull up the 15 lb striper + the weight of that line ... my arms hurt after that)
# June 28, 2006 1:28 PM

Greg said:

I am curious on your source derek ... this is not the case.
# June 30, 2006 11:36 AM

johnwood said:

Two points:
1. I would consider this a bug in the compiler, simply because the fact it's ignoring that the base class has a string overload makes it counter-intuitive. There's nothing clever about it - it's a bug, they should fix it.
2. That would make a terrible interview question unless you were looking for someone with experience in testing compilers.
# July 1, 2006 9:48 PM

Greg said:

john:

its compliant with the C# spec

"First, the set of all accessible (§10.5) members named N declared in T
and the base types (§14.3.1) of T is constructed. Declarations that
include an override modifier are excluded from the set."


It being that it is compliant with the specification it cannot be called a compiler bug. I agree that it is non-intuitive and should be changed in the spec, should we fault the C# team for following the spec? Their goal is to provide a compiler that is compliant with ECMA 334.
# July 2, 2006 12:21 AM

johnwood said:

You're right, perhaps faulting the compiler team isn't appropriate. Given it's written so explicitly in the spec there was obviously intention behind this behavior, so I imagine there is some explanation or justification in their minds. Would be interesting to know what it was. It's just the kind of issue that would have you sitting there scratching your head for a few hours trying to figure out what the hell is going on. The whole point of inheritance is to make the inherited behavior as transparent as possible, and this seems to go completely against that tenet of OOP IMO.
# July 2, 2006 9:53 AM

Peter Ritchie said:

For anyone interested, this originated from the following MSDN Forums post:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=515521&SiteID=1
# July 2, 2006 11:24 AM

TrackBack said:

# July 2, 2006 2:40 PM

BirgerH said:

I'm sure it's not a bug.

Anders Heijlsberg seems to be well aware that "the way we do overload resolution in C# is different from any other language I know of, for reasons of versioning." http://www.artima.com/intv/nonvirtualP.html

He doesn't explain a lot but I suppose it makes sense in some situations:

Imagine version 1 of your example only had Derived.Foo(object) and Base.Foo(string) was introduced in version 2. The C# code would always call the same method but the VB.NET code wouldn't.
# July 3, 2006 1:30 AM

johnwood said:

There are fundamental problems with versioning in OOP - like the fragile base class problem for example. People expect those type of versioning issues when using OOP I think. IMO it's rarely actually an issue. Making such a fundamental change to the language just to work around one particular issue with versioning in OOP is just going to confuse people. I think it was a bad decision, but one that's obviously too late to change.
# July 3, 2006 7:32 PM

Greg said:

Could you not offer the best of both worlds?

If the method is on the base it is not considerred for overload resolution. If the method is overriden in the derived class it is.
# July 4, 2006 12:26 PM

johnwood said:

You know I think that would have been better. I can't think of any reason off the top of my head why that wouldn't work. Shame it's too late.
# July 4, 2006 12:57 PM

Jason Haley said:

# July 7, 2006 10:41 PM

MarlinlovesDori said:

I've been on both sides of the fence in software development and currently work for a large real estate firm where I have learned alot about negotiation.  As you have stated, it is all about the contract.  To the non technical groups, it's all about what you have contracted for and any vagaries will unfortunately be used against the other party.  This does not help the party who are taking on the risk of translating someone's ideas into workable code, and as a correllary only compitent people will want clarity, in English.

Unfortunately I have seen marketing groups at my company parse contracts for ways to wiggle out of payment because no one captured "exactly what I meant".  It's ugly as all parties pull out the daggers.  To be honest, marketing deserved to be screwed because they kept the vagaries in the contract and didn't bother to be clear about the concepts.  Ther service provider deserved what they got because they should have insisted on clarity.

You are best served by developing a relationship with a customer who can be an advocate where you can present a hybrid model.  Certain portions of the project will have a finite time frame so you can deliver them at a fixed cost and you should contract separately for those items phases.  

For the big idea stuff, you want a customer who is willing to pay by the hour to have you mentor them and develop the concept, then produce the idea.  Again there will have to be limits and quantifiable chunks.  That's when your methodology will distinguish you from the rest.  

In short, avoid the customers who want a fixed price unless you are will ing to double or tripple your price because what they want doesn't exist .  Be honest - "I am doubling my estimate because it is unclear what you want."  Unless you are willing to do a package implementation, it's a crap shoot on how long it will take to develop ideas and implement.  Protect yourself by avoiding unclear terms.
# July 8, 2006 12:27 PM

Rusty said:

I will have to think about this in more detail as your concerns are all very valid.  I believe the whole concept of fixed price bid is _the_ sticking point.  Comparing software to house building, or bridge building, or physical construction of any kind is misleading.  In physical construction, the construction phaze is 80% or more of the whole project.  The design phaze is a very small, up front exercise that relies on static mathmatical formulas and thousands of years of experience in architecture that have both succeeded and fallen.  The formulas for testing a design can be relied upon and reverified.  Planning, then, is the most crucial skill and variable that will affect profit for the project implementors.  For software, the construction phaze is not writing code.  The construction phaze is compilation and deployment.  Clearly, that's far less then 80%. Furthermore, there are no formulas to test a design other then writing the code.  Software platforms change so frequently that one cannot use historical data to predict the outcome of a project.  Emerging techniques, patterns and frameworks are helping but not solving this problem.  Software is not predicatable in the same way building construction is.  Even in construction, things go differently then planned.  My pop-in-law is taking over a $12 million bridge project paying $15,000 in penalties per day for being late.  The first thing he noticed are two insanely expensive cranes that aren't being used but remain on the project because a set of high voltage lines prevent mobile cranes access to the site.  He asked if anyone had considered paying the power company the $ to move them in order to save substantially more by enabling them to remove the larger cranes.  Wasn't in the plan!  How long will it take to build a house?  How long did it take to build a house of that size in similar conditions?  Pretty reliable, right?  How long will it take to build this software for my company?  Let me go find that same software somewhere else so I can get reliable, fixed numbers.  See the cunnundrum?  There are conditions when fixed price is required.  Then, you must plan and agree upon everything prior to the contract being signed.  However, consultants are not brought in for their ability to adhere to a contract.  They are consulted for their expertise and skills.  A company does not want a contract that matches its language to the product and how long it took to build.  Now really.  They want working software that increases effieciencies, opens new lines of revenue and adds to their bottom line in a positive way.  A fixed-price plan is not designed to predict the outcome of the project, its designed to shackle the client to their understanding of the problem on the first day of the project.  Its designed to prevent new knowledge from affecting the product.  Its purpose is to mitigate the inherent risk in information technology by locking in payment regardless of value.  Most of the time, things go relatively well.  However, I think things would go just as well with smaller plans and less rigid relationships.  Clearly, there is lot to be learned to make this reality
# July 8, 2006 12:40 PM

Greg Young [MVP] said:

I deliberately left some information out of my article on for loops as I thought it to be a JIT bug. Here is that information.
# July 8, 2006 3:36 PM

Greg said:

A few friends told me the house analogy was not a good one before hand but I still used it :( It is not a clean cut analogy you are correct.

Rusty: you make some great selling points against fixed price estimates. I will have to remember these wordings the next time I am forced to deal with it.

The problem does still exist in dealing with the contract though, the hiring company can be shafted on a pure block of hourly time basis. This is one of the things that the fixed price contract does help eliminate as the contract is contingent on working software (usually atleast passing UA testing). I think we can all agree that a huge portion of time is ramp up. If the hiring company loses that consulting firm after say 90 days the next firm will most likely need 30 days of ramp up to be as efficient (not to mention they will want to at the minimum go over everything previously done with a fine tooth comb)

Getting back to our other discussion .. There is still the problem in that many managers etc will outsource projects to absolve themselves of responsibility of the project. This is a problem in many industries but is esepcially prevalent in IT where management often does not have the first clue of how to run a project. If the project comes out well the manager takes full credit for it as it was their brilliant insight and choice of a quality firm to handle the process that made it a success. If it fails it is that the contracting company did not meet the requirements set, ship it to legal. As such we are stuck with a huge corporate mindset problem that I am completely unsure of how to attack ...
# July 9, 2006 4:13 AM

DotNetKicks.com said:

You've been kicked (a good thing) - Trackback from DotNetKicks.com
# July 9, 2006 9:57 AM

Jason Haley said:

# July 9, 2006 11:25 AM

johnwood said:

This is fascinating stuff. I haven't gone through all of your previous posts but it seems what you're describing is quite an obvious optimization the JIT could perform. One quick thought, in the WillNotRemoveChecksStaticReference disassembly wouldn't it also make more sense to pull out [esi+4] just once, putting it into edi, then doing a mov eax, edi to copy the value into eax, and then testing eax for 0... therefore eliminating an extra memory look up per iteration? Or is an extra instruction more costly than looking stuff up in the L1?
# July 9, 2006 1:27 PM

bryanallott said:

"... I am reluctant to even attempt being agile on a fixed price bid ..."

maybe reworded: reluctant to attempt being agile on a fixed price bid where the requirement [solution] is not that clear ?

if you're responding to a RFQ, you're in a position with a fair amount of experience to know wether the project you're looking at is fairly cut-n-dried or if it's going to be more explorative and discovery driven...

if clear, there's nothing wrong with a plan-based approach if the expectations are clear [some clients are quite professional and explicit, even pedantic :), about what they want] and these are easy to approach with a fixed-cost because most of your planning [which itself can be an agile process] is more accurate [more information at hand]- and this should mirror your client's commitment to detail. but this does take some time: and there's no such thing as a free quote [if you want quality] :) - a different discussion?

disclaimer: if your planning process cannot estimate accurately: get better at planning or don't consider a competitively priced fixed-cost project.

if the solution is ambiguous, risky and reliant on discovery for knowledge of the solution... well, you'd be going out on all limbs to accept a fixed-cost without teeth in the contract- but that attitude implemented is what has given software a bad rap.

re the corporate mindset of fixed-cost projects: as solution/service providers, we have a responsibility [duty] to bat a proposal back if it's ambiguous, vague or unrealistic [not even S.M.A.R.T.] and educate the corporate client in the process, via co-operation. historically, i think we failed ourselves as a collective because we didn't have the insight or courage to say no to a lot of projects when we should have- or at least say: wait. let's work this proposal through, together, to get a better understanding becos this is just simply not information to go on.

to perpetuate the house analogy [but with specific respect to the planning phase which is more akin to software]... u can't go to an architect/builder and say: i want a house with a door and 4 windows and i want it at $ by DATE.
their ethical repsonse should be
not until you provide detail [but discovery through co-operation sets you back at $x/hour]
or no- come back when u got an exact idea and save some discovery costs. if they do agree to that kind of spec, chances are, *someone's* gonna get mulled :)

3) How do we change corporate culture to move away from fixed price contracts?
educate the corporate that they can get fixed-price but they must be willing to spend some $$ upfront to get that. provided you're also happy with that and confident in your ability to plan and deliver according to a really good plan- goferit.

and if a reasonable client they can't see why or you can't motivate why spending some $$ upfront for an accurate cost is critical, then abandon the project. there's bigger issues lurking.

it doesn't have to be ALL agile since it is no silver bullet but just a part of the available toolset- but that's up to your team :)
# July 10, 2006 7:22 AM

Jeff Parker said:

Well if I would have to make a guess at why you are seeing this behavior and why it is by design it is because of the spec. For Loops have every Argument marked as optional. This might be the compilers way of handling it.

for (;;) {
}

Is valid code. So I would have to guess somewhere in the JIT compiler there is some logic we do not fully understand handling this and why it is by design. However I can't say for sure.
# July 10, 2006 10:13 AM

Greg said:

Jeff I am not sure that I am following the relation to your special case in the optimizer. Perhaps you can explain a bit more why you think there is a relation for this particular case.
# July 10, 2006 11:40 AM

Jeff Parker said:

Well I am not disagreeing with the whole hoisting thing and that it is faster what I am saying is that there is a lot of complexity in a for loop over an array that there may be something in there by design that causes this to happen that you or I are not thinking of. I have been very curious of your posts and some some of my own research and experiments and well can't think of a reason for this behavior, however I am not a compiler optimization expert either. The biggest thing I can think of why you are seeing this behavior is that an array is a reference type this includes the length of the array.

Via the spec it defines a for loop as actually a while loop with everything auto generated.

for-initializer ;
while ( for-condition ) {
embedded-statement ;
LLoop:
for-iterator ;
}

Which is exactly what you are showing. However on the flip site of things the for loop allows any of the following constructs

for ( ; ; ) embedded-statement
for ( for-initializer ; ; ) embedded-statement
for ( ; for-condition ; ) embedded-statement
for ( ; ; for-iterator ) embedded-statement
for ( for-initializer ; for-condition ; ) embedded-statement
for ( ; for-condition ; for-iterator ) embedded-statement
for ( for-initializer ; ; for-iterator ) embedded-statement
for ( for-initializer ; for-condition ; for-iterator ) embedded-statement

Now adding in the complexity of an array, which array can be single dimensional, Multidimensional, or also rectangular and Jagged. The main key I think though to these array oddities you are seeing though in the spec it states

"On the other hand, the size of the array—as represented by the length
of each of its dimensions—is not part of an array’s type. This split is made clear in the language syntax, as
the length of each dimension is specified in the array creation expression rather than in the array type."

So since the length is not part of an array's type, what really is it. This is where I have kind of stopped research for a while. A question for another day. But I firmly believe the problem lies in the array not really in the for loop itself.
# July 10, 2006 4:06 PM

Jeff Parker said:

Oh yeah and one more thing to note which is in the spec,

Because context is required to determine the type of an array initializer, it is not possible to use an array initializer in an expression context without explicitly stating the type of the array.

So also since the context of the array can not be determined ahead of time this also might be why you are seeing this odd behavior. Another thought though would be using the fixed keyword for the array The fixed statement prevents the garbage collector from relocating a movable variable. The fixed statement is only permitted in an unsafe context. Fixed can also be used to create fixed size buffers.

Because of Garbage collection and shifting memory around might also be the reason for this additional check.
# July 10, 2006 4:26 PM

Greg said:

Jeff .. you have brought up many points, I will do my best to address them.

1) for vs while vs etc. This is all handled by the C# compiler .. these things all look the same when viewed as IL. There is no such thing as a "for" in IL. The JIT only operates on the IL level so I don't really see these items causing too much of an issue. These items can be fairly complex but they are complex at the compiler level (not so much at the JIT level). To see better what I am referring to, run ildasm on the examples from the original for article (or the examples listed here).

2) Arrays .. remember that there are specific instructions for different types of arrays in IL so it knows the difference between a SZ and MD case. http://www.codeproject.com/dotnet/arrays.asp is a great article on this. Since it is being given IL instructions to deal with a SZ array it knows that the array does not have multiple dimensions and that it is 0 based etc.

3) "So since the length is not part of an array's type, what really is it.", for SZ arrays which is what we are dealing with here the size of the array is kept with the array instance it is located at the object reference + 4. We can see this being accessed in the following instruction.

0000000b  mov         eax,dword ptr [esi+4] //access array length

The length of an array is immutable, you can only change the array to be a different array (which would have a different length). Since a reference to the array is being held in a register it does not have to deal with the volatility problem it might have to deal with if it were directly reading from memory (i.e. if another thread changes the original reference to point to another array, it will still be pointing at the array it started off with; it will not be updated to point to the new array).

4) "Because of Garbage collection and shifting memory around might also be the reason for this additional check."

I am going to write a post on this but I will give you the abbreviated version here, I had a concern that was partially related (in code with bounds removed that garbage collection could possibly remove the item). This is however coverred in Jeffrey Richter's CLR via C# (chapter 20). Registers are considerred to be GC roots (see pages around 462-464 for a detailed explanation of what happens). The garbage collector will in fact come into managed code and change register values in the case that memory shifts.

"Naturally, moving the objects in memory invalidates all variables and CPU registers that contain pointers to the objects. So the garbage collector must revisit all of the application's roots and modify them so that each root's value points to the object's new memory location." p464

As such in the case of a memory move the esi register in this case (which points to the array's object reference) would be updated to reflect the new memory location. Since the array's length is bound to the instance of the array the length would move as well. The code internal to the loop would continue working with the new memory address as the offset being dealt with is calculated every iteration through the loop based upon the value in esi and the loop counter (edx) in the following line.

00000012  or          ecx,dword ptr [esi+edx*4+8]

So if the garbage collector were to interrupt midway through looping it is quite possible that you could end up with the situation ..

esi = 100000 edx = 0 write to 100008
esi = 100000 edx = 1 write to 100012
esi = 100000 edx = 2 write to 100016
GC
esi = 110000 edx = 3 write to 110016
esi = 110000 edx = 4 write to 110020

notice that GC would have changed the ESI register

# July 10, 2006 10:11 PM

JonGalloway.ToString() said:

No, I don't buy into The Construction Metaphor A few times a year, someone will write about howsoftware
# July 12, 2006 4:58 PM

Jeffrey Palermo said:

I hope you were kidding about adding it to interview questions.  If a candidate misses the answer, all you know is they they haven't see this post or the MSDN thread addressing it.  
# July 12, 2006 6:46 PM

Greg said:

that is right Jeffrey! though they would get bonus points for either :)
# July 13, 2006 12:30 AM

johnwood said:

It is quite funny, though, how some interviewers seem obsessed with questioning their victims on some wildly obscure fact they learnt within the past few days. The times I've done interviews with someone and they've started with something like "So exactly what permissions are disallowed in .Net when running in a sandbox?". Of course sometimes it's actually quite interesting and revealing to see how people go about answering a question they have no clue about :)
# July 13, 2006 12:40 AM

jlynch said:

Greg,

Welcome to the world of BizTalk development!

I've been developing integration solutions with BizTalk Server since before it's initial release seven years ago and you are correct, it does not allow for a very "agile" development methodology, nor for (formal) test driven development.

BizTalk Server is basically a set of tools provided by Microsoft that generate XSD, XSLT and C# code as well as a runtime to process this code. Everything that BizTalk does "could" be written by hand and unit tested the same way you are used to. However, the time savings from using BizTalk is enormous and the code produced is excellent (you can set a registry entry to have BizTalk emit the C# code during a build).

Think of BizTalk unit testing as a back box. It takes a known input and produces one or more expected outputs. Some parts (schemas and maps) of a BizTalk solution can be individually tested in Visual Studio but generally the entire application must be tested as a black box on the developer's workstation or on a staging server. You can automate this application level testing using scripts and the BizTalk object model but I've found it more efficient to do this manually.

It's not a perfect solution but it certainly beats what was required before BizTalk. Like everything else (including Agile), it's a trade-off between "control" and "speed of development". In this case, I've choose speed!

Feel free to ping me if you have any BizTalk questions.

Jeff
# July 13, 2006 9:00 AM

Tomas Restrepo said:

Greg,

I concurr with Jeff that testing biztalk solutions can be pretty hard, and yes, unit testing as we know it is pretty hard. However, there are certainly a few things one can do to improve it a little bit. The first thing is to clearly break up your solution so that you can test it:

1- Schemas and pipelines: Normally many people don't think about unit testing those, but they are sometimes one of the most problematic aspects, particularly if you're dealing with complex flat files and stuff like that. I wrote a Pipeline Testing library that can really help to automate those aspects with the help of NUnit. I wrote it originally to test custom pipeline components, but discovered it works very nicely for the other stuff. See http://www.winterdom.com/weblog/2006/04/27/PipelineTestingLibraryPart1.aspx

2- BRE: The Business Rules Engine is accessible via a .NET API, so you can use that to create tests for your business rules that way.

3- Maps: Maps are rather inconvinient. There are ways to extract the XSLT that the compiler generates and you could use that for testing, but it's rather awkward to be sure

4- Orchestations: This is the big paint point, I guess, and not much you can do about it. You'll want to avoid having too much code in your Expression shapes and moving that into .NET components you can test, so that helps some.

Anyway, maybe this will get you some extra ideas for that.
# July 13, 2006 10:59 PM

Greg said:

4- I think I have some but I will need to get into some documentation in order to determine the feasability of my thought.

I had figured out how to do some decent testing for BRE/Maps. Schemas were pretty straight forward (although what are you really going to "unit test" on either?). I would imagine functional tests would be best for schemas/xslts.

Orchestrations are really the big pain point for me as they seem like a place where things can really break :) I have been researching the possibility of hosting an orchestration in my own code. I could then fairly easily mock out the in/out ports and allow for a decent unit test ... I am still researching this though. The big problem here would be in maintainance (its alot faster to just make a small schema change than to go through and then change the mocks as well).

Overall I find biztalk to be a very intriguing product. I do however find a need for a "biztalk light" ... while great for scalability, there is a pretty heavy sacrafice on per item speed for smaller systems.

Also they REALLY need to get a good UI guy or twenty on the team :) the interface takes me back to VS.NET 1.0 and all the lovely intricacies you had to put to memory (like re-adding references so it sees changes, trying a restart of the environment if you have trouble, etc)
# July 15, 2006 12:19 PM

Tomas Restrepo said:

I doubt you'll be able to self-host the orchestration engine; it is a fairly complicated component, but hey, if you succeed, then do let us know how :)

BizTalk itself is an extremely powerful product, but pretty complex. It's also pretty daunting at first and "getting it" right takes a while.

Re. BizTalk Light: Sure, it would be interesting, but since the story for BizTalk is scalability and reliability, you pay a price. BizTalk isn't all that heavy, btw, and can really do a lot of work given a proper configuration (and I'm not talking a 10 server farm). However, my guess is that MS will try to fill that small-integration level niche with WF and the upcoming WinFX adapter framework in BTS2006 R2 (which allows you to create integration adapters on top of WCF which can be used both standalone with WCF or as BizTalk adapters).


Re. the UI: I don't find the UI problematic at all. Sure, it aint's so pretty, but BTS2006 has a really nice management console, and the tools in VS are pretty good overall. BTW, you shouldn't have to re-add references to see changes, at least I've never had to. However, you really need to understand the deployment and versioning story behind biztalk as well as the execution model to know how to work effectively and don't get caught on long redeploy/restart cycles, which are productivity killers. Jon Flanders has some pretty good posts on this topic, btw, which you might want to research.
# July 16, 2006 2:38 PM

Greg Young [MVP] said:

Something as simple as a method call can often be complex in the managed world. This is the first part in a series looking at how method calls work; it covers a basic method call including the JIT process.
# July 20, 2006 1:58 AM

johnwood said:

Another interesting topic well explored.
Any thought as to what happens in a multithreaded application when it comes to JIT a method. What if two threads hit that method... obviously one gets there first, but what does it do with the other call while it JITs? Does it somehow gain a lock on the method before it calls it, blocking the other call? How else would it stop the instructions changing from under its feet, and other race condition issues?
# July 20, 2006 2:29 AM

Greg said:

Ah good question, I will have to address that ... Of course I can only look at disassembled code in the production JIT (and even doing that to make a comment is well umm against the EULA:))

I can however point you to exactly how this works in the SSCLI the method being called is in prestub.cpp -http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/clr/clr/prestub_8cpp-source.html, it is MethodDesc::DoPreStub .. it is quite a long and involved method but there are some comments there discussing how thread safety is handled (in fact in the SSCLI this is the method that the thunk calls). There is also a bit of discussion on this in SSCLI Essentials from O'Reilly
# July 20, 2006 2:46 AM

Sam Gentile said:

Welcome new readers! There are a number of great posts that caught my attention today. In addition, I...
# July 20, 2006 2:23 PM

Stephen W. Thomas said:

I agree that testing BizTalk can be hard.

Not being from a coding background (just kind of stumbled into BizTalk), I have found the typical .net / coding practices doesn’t seem to fit very well with BizTalk.  BizTalk is a server product that has a heave development component.  This is very evident in testing.

When I develop a BizTalk solution, I test schemas, maps, .net components, etc in a manual manner but I don’t consider this testing my BizTalk solution – since in my mind the configuration of the ports is as import as any BizTalk Artifacts.  

In order to “unit test” my end to end BizTalk solution BizUnit works very well for this.  Now this might not fit the definition of unit testing, but is comes close looking at it from a BizTalk Server point of view with a unit being a process (message in – message out) scenario.  From a BizTalk Server point of view, this is the smallest unit of work you can test.

Since I have worked on about 10 different BizTalk project everyone has a different opinion on this.  I was on one project in the past that told me I couldn’t run a single Orchestration since running Orchestration was considered Integration Test and not in my scope (or time bucket).  

I just wanted to throw in my 2 cents.
# July 20, 2006 5:10 PM

Alois said:

Hi Greg,

you did very nicely explain how the JITer works. I was wondering about this table in your article
00de00b0   00913070      JIT ConsoleApplication29.Foo.Test()
009130d4   00913078     NONE ConsoleApplication29.Foo..ctor()

How can it be that the test method has been jited before the ctor has been run? To get your program compiled you must instantiate the class before you can call a function of it. I assume that the ctor must have been JITed before the Test function or does the JITer know that the default ctor does nothing? Did I miss here something or did you "patch" the method table with notepad? ;-).

Yours,
  Alois Kraus


Yours,
 Alois Kraus
# July 20, 2006 6:23 PM

Greg said:

No edit .. try it for yourself

notice the construction code

00000000  push        esi  
00000001  mov         ecx,913080h
00000006  call        FFB21FAC
0000000b  mov         esi,eax

The address in ECX look familiar? :)


I believe what is happenning here is that the consructor is being inlined (and as such it is not being JIT'ed). Good catch, I hadn't noticed that, that is worth a post on its own!
# July 20, 2006 10:22 PM

Sam Gentile said:

Sipping the first cup of coffee, ah yes, there's a possibility I'll be awake soon...
Windows Vista ...
# July 21, 2006 10:20 AM

johnwood said:

I'm not sure using Int32s would really speed it up that much given all the bit twiddling you have to do to reverse it.

If I try this code:

public static unsafe string JohnsReverse(string s)
{
string newcopyout = string.Copy(s);
fixed (char* start = newcopyout)
{
char *en = (start + s.Length - 1);
char *st = start;
while (st<en)
{
char old = *en;
*en = *st;
*st = old;
st++;
en--;
}
}
return newcopyout;
}

... it runs quite a bit faster than yours (1.7s compared to 2.3s on 10mil iterations).
# July 21, 2006 2:38 PM

Greg said:

hmmm ...

Test : JohnsReverse took 42553687187.3528 ns, average ns = 425536.871873528
Test : GregsInt32Reverse took 33369146730.09 ns, average ns = 333691.4673009
Press any key to continue . . .

in release with JIT optimizations ... 80k string size ... maybe I have bigger ratio of processor / memory speed?

 

but you are right .. the shifts are expensive ... using rotl/rotr on the register would remove this but I can't for the life of me get the JIT to produce that code :)

# July 21, 2006 3:31 PM

Greg said:

ok tested on another machine and they come out about the same ..

80k string 10000 iterations

Test : JohnsReverse took 3043694595.65126 ns, average ns = 304369.459565126
Test : GregsInt32Reverse took 2963980647.19858 ns, average ns = 296398.064719858

800 byte string 1000000 iterations.
Test : JohnsReverse took 1610683234.92359 ns, average ns = 1610.68323492359
Test : GregsInt32Reverse took 1678762222.35589 ns, average ns = 1678.76222235589

# July 21, 2006 3:52 PM

Greg said:

although you are right as well in that I could optimize a bit better too by removing the stop variable and just using begin < end
# July 21, 2006 3:57 PM

johnwood said:

Oh, also my tests were with a 20 byte string :) I think at a point like this the competition becomes a little hard to win definitively :))
# July 21, 2006 5:08 PM

Adam Machanic said:

Thanks for the additional commentary on this topic, Greg... However, I think we need to consider how often we actually reverse 80k strings in real programs... I think John's 20 byte string is just a bit more real-world... ;)
# July 21, 2006 6:59 PM

Greg said:

ROTL ROTL my kingdom for a ROTL
# July 21, 2006 9:38 PM

Greg said:

ok after reading up some documentation on various intel chips. ROTL has the same performance as SHL .. I can therefore assume that the following code would execute at the same speed as the ROTL

       public static unsafe string GregsInt32Reverse(string s) {
           UInt32 Low;
           UInt32 High;
           string ret = string.Copy(s);
           fixed (char* start = ret) {
               UInt32* begin = (UInt32*)start;
               UInt32* end = (UInt32*)(start + s.Length - 2);
               while (begin < end) {
                   Low = *begin;
                   Low = (Low) << 16;
                   *end = Low;
                   High = *end;
                   High = (High) << 16;
                   *begin = High;
                   begin++;
                   end--;
               }
           }
           return ret;
       }

This code is about 20% faster than the simple unsafe on the machines where it came up being identical previously. Of course the problem is that its not actually reversing as its losing the overflow :) It wins for 20 byte, 80 byte, 800 byte, 8k, and 80k ... now if only I could get a rotl.

I am toying with adding an IL instruction to mono right now to offer a more definate proof that this optimization really exists (unless people would be willing to accept me copy/pasting the JIT generated assembly to MASM, alterring it and running it there?)
# July 22, 2006 12:52 PM

Alois said:

Hi Greg,

you seem to be begging for your own JIT Plugin. I have actually tried to contact Microsoft Research to get some feedback if this idea is feasible. Imagine if you could create JIT plugins that create optimized assembly code for your graphics card processor (http://geekswithblogs.net/akraus1/archive/2006/06/23/82859.aspx).

By the way your GregsUnsafeReverse does contain an error. Try a string with a-z and reverse it you will get
zyxwvutsrqpomnlkjihgfedcba the mn characters are not switched.
You need to use while (begin <= stop) to get a correct result.

My own function does not beat yours but it is pretty close:

       public static unsafe string AloisAlgo(string s)
       {
           char* arr = stackalloc char[s.Length];

           int j= s.Length-1;
           fixed (char* start = s)
           {
               char* end = start + s.Length;
               char* current = start;
               while (current != end)
               {
                   arr[j] = *current;
                   j--;
                   current++;
               }
           }
           return new string(arr,0,s.Length);
       }

Yours,
 Alois Kraus

# July 22, 2006 7:48 PM

Greg said:

very nice! (I hadnt thought of stackalloc which should be a touch faster)


Per plugin JIT, did you know this was originally looking like it was going to be done? Way back in the days of 1.0 beta there were 3 JITs. There was optjit (if i remember this one properly), jit, and econojit (which implemented code pitching to amortize memory usage this JIT can be found in SSCLI now) ... (there was also prejit ... what a name Pre-JustInTime hmm :)...so they renamed it to ngen)

These jits were plugin based with the CLR, you decided depending on your circumstances which you wanted to use. It had been discussed to make the interface public ... if you look back into the dotnet lists on develop.com you should find them (2001 I believe)

Personally I don't want to write my own JIT but I would like a JIT that optimized to some of the things I mention. This is a prime case of where the CONCEPT of JIT rocks. Not all processors have a hardware based rotate instruction (x86 does) but on a power pc the preferred method is to use 2 32 bit registers emulating a 64 bit register, shl (overflowing into the second register then or them together). The ability for me to be able to write (val << 16) | (val>>16) and it to pick it up and optimize that pattern for the CPU (or to let me use a rotl instruction in IL) is the main benefit of compiling at runtime with a known environment.
# July 22, 2006 8:21 PM

raydot said:

With all due respect (especially because I think this is an awfully important subject) I'm not sure I get this.  I think this type of logic makes perfect sense from the point of view of a coder, but I'm not sure it would make sense from the point of view of a client, namely: "Our shop uses agile methods so we can't make contract guarantees."

I know you're not only saying that, but I can sort of see the appeal of that argument, so that's where I'll go with this.

Having moved from code to project management, if there's one thing I'm sure of it's that each side in an IT project -- client/vendor --  has the obligation to educate the other.  The problem is that all too often the vendor takes the approach that it seems is being mentioned here, "We're gonna learn as we go so it's a moving target."  I just don't think so.  

Before we all sit down to write Client X's payroll system, we're going to sit down and understand Client X's goals in developing a payroll system, and if Client X doesn't know either then the developer shouldn't engage the contract and should be very explicit about that.  The vendor certainly shouldn't at the end of the target berate the client for setting a "moving target," and especially not when this was clearly the case to begin with.  It's only going to end in tears.  Yes some other salesman can come along and say, "We can get it done in the budget you want," let that shop get hung up in the lawsuits afterwards.

What I've also found is all too often the case is that software developers don't protect themselves from the lack of expert knowledge they might have of their client's business process.  I learned this the hard way when I worked with a client that wanted to develop a Web site to sell more of his product.  When the site was built and it didn't sell more product, the client came after us, saying it was my shop's fault his business didn't thrive, until I pointed out to the client that we didn't know a thing about the client's product.  "You're in business to make your gizmos, which means you're in business to sell your gizmos.  I don't see anything over our shop's door or in the contract you signed that says we know one thing about your gizmos -- we know how to build Web sites, and that's it!"  At which point the client had to admit that we'd built exactly what we set out to build.  But all too many developers will (unknowingly) pass themselves off as business experts, and most just aren't.  I now won't have anything to do with a project if the client doesn't seem to understand this.

...and that's what we as software developers have to understand.  I don't buy the "best of all possible worlds" mentality, things can always be made better.  Developers need to be more sympathetic to customers concerns and more clear up front.  Don't expect the client to understand software development, and don't you go passing yourself off as understanding your client's business process -- even if you do.  If project concerns are spelled out explicitly, and constant feedback is provided, there's no need for problems in the long run for either side.  The AIGA has long understood this, and their boilerplate contract is quite good at protecting designers from client folly.
# July 23, 2006 1:13 PM

Jason Haley said:

# July 26, 2006 10:40 PM

Jason Haley said:

# July 26, 2006 10:40 PM

Jason Haley said:

# July 26, 2006 11:21 PM

Jason Haley said:

# July 26, 2006 11:21 PM

legobuff said:

Question on note 2... does "handle normal punctuation properly" also mean that I need to handle money also?  or will this be strictly words?

And thank you for the mental exercise.
# July 28, 2006 11:50 AM

Greg said:


*DELETED*
# July 28, 2006 1:55 PM

bushmango said:

How about you give us one of those fabled unit tests? You know, so we can learn some Test Driven Development =) That would allow us to make sure we understand and interpret the rules the same.
# July 28, 2006 3:09 PM

bushmango said:

Also, is this for English-only, ascii only, or do we have to support all of unicode?
# July 28, 2006 4:20 PM

Greg said:

OK. I apologize for my quick response a few hours ago which I am forced to change do to some things I didn't think of (namely abbreviations). Some people have brought up some really fun