Wednesday, April 14, 2010

C# - Extension methods

"Extension methods should be able to modify instance values"

I recently set out to create a library for bit manipulation in C#. Specifically to add these capabilities as naturally as possible to the ulong datatype (used a lot in chess programming ;-). What I envisioned was this;

public static void SetBit(this ref ulong data, byte bitIndex)
 data = data | ((ulong)1 << bitIndex);

This enables code to be used like this;

ulong BitBoard;


Directly changing the value of the BitBoard ulong variable. And ofcourse appear in the methods list intellisensed on ulong variables (if the right namespace was being used). Unfortunately the this modifier (making it an extension method) prohibits the use of ref and out, breaking the above code.

There is no way to do this in C# that I found. It is possible in .NET because another .NET language named Visual *ouch* Basic allows ref combined with this in extension methods. So what options do I get in my favorite language;

BitBoard = Bitboard.SetBit(22);// Extension method, intellisensed
SetBit(ref BitBoard,22);       // public static call
BitBoard.SetBit(BitBoard,22);  // Extension method using extra ref parameter 
                               //  (ugly as ####)
BitBoard.SetBit(22);           // Wrap the ulong in a struct

All of them are ofcourse vastly inferior from a beatiful and clean code point of view. The second method using a ref parameter will probably be fastest, although creating a struct with only a ulong var inside also comes a long way in achieving nice, clean coding. After some tests the struct wrapping turns out to be just as fast as using regular calls with a ref parameter. Only disadvantage is that you need the value inside the struct for normal operations. So you would need to write BitBoard.Value * 5 for normal pre-existing operations a ulong can participate in. Value would be the public ulong that's the structs only data. This would have been BitBoard * 5 with the extension method approach.

Before you ask, these are the alternatives available in Framework #4 for bitwise programming;

- BitVector32 exists (valuetype, but exists only as a 32 bits uint)
- BigInteger (large, slow immutable reference type)
- BitArray (reference type)

All of them are missing bitwise routines I would like to use. And I want my type to be a valuetype for performance and minimal storage reasons.

Sunday, April 4, 2010

C# - Enumerator Indexed Arrays

Anders Hejlsberg unfortunately forgot to include a very neat feature from his previous masterpiece with C#. I sorely miss my Enumerator Indexed Arrays ;-(. I used to write code that would look like this in C#;

Enum ChessPieces  {queen, rook, bishop, king, pawn, knight};
int[] pieceValue = new int[ChessPieces];
-- A loop using this
foreach (Piece in ChessPieces) {totalValue += PieceValue[Piece];}
-- A lookup
var value = pieceValue[queen];

This code is safe, clear and concise. But does not work in C#. There are several ways to do things along these lines in C#. All of them use more code, casting and/or ugly constructs. There is only one realistic option; support for it in the language itself!

Saturday, April 3, 2010

Genetic Optimization for Chess Rules

One of the many challenges of making a chess engine that uses rules for evaluation is how to attach a value to a rule. Say for example that you use these two rules;

- A bishop on a dark field gets a progressive bonus if many enemy pieces are also on dark fields
- A rook gets a bonus for every direction that it can move to that is not blocked by a friendly piece

How large should these bonusses be? Even if you play chess very well the exact value is not obvious. And the values should be chosen in proportion to the values for all the other rules.

One way to go is to use historical games to see what values for these rules were being used in these games by pretending the games were played using these rules. Or, in other words, to find the values for your rules that would have played the same moves as the winners did.

Philosophical questions aside (will it ever become better as the average level of all winners?) such optimization is easier said then done. If you have 50 rules and limit the value for each rule to 10 possible values there are 10* 10 * 10 ... = more possiblities then it is possible to evaluate. Generating random solutions is a simple way to find a set of values. Thats easy;

  1. - Generate a random value within applicable bounds for all rules (let's call this a solution)
  2. - Test for all moves played by the winner in all hsitorical games how many times the move played would be the same move your soluation valued highest.
  3. - If the solution is better than the best so far remember it. 
  4. - Return to step 1

Due to the large amount of possible solutions this will not get you a great combination of values. Genetic Optimization will get you better values, faster. The Genetic scheme works like this;

  1. - Generate a lot of solutions
  2. - Test all solutions on the moves in the database of played games
  3. - Save the top 5% scoring solutions, remember the best overall solution
  4. - Combine (breed is the technical term ;-) the solutions to form new solutions. 
  5. - Return to Step 2

This is simplifying things, but gives a good first picture.I will now discuss the specific chess-related pitfalls I came across to make this work. In no particular order;

- Getting your rules to make the exact same move might be difficult and also not required. Usually there are at least a few good moves. I switched to rewarding how high the move actually played ended up in my ranking of the moves. With a large negative if the move played was at the very bottom of the list. And also logged for inspection. Looking at very badly evaluated moves will show you which rules are bad and which moves are unfit to use for evaluation (next issue).

- You will get lot's of moves classified as bad because they were forced. Meaning that normally putting a queen in a tight corner is not useful, but if it's check prevention or a sacrifice it might be the best move anyway. This can be countered by refining the set of moves you test with. I created a set of moves played from a historical database that were purely positional moves. If a search of alphabeta depth x did not change to relative value of the board then the move was allowed into the testset. This way all moves that are forced are left out. Always searching to depth x with every evaluation is also an option, but will seriously dampen your ability to evaluate lot's of solutions.

- I created offspring that was both a combination of good values and offspring that simply mutated some values for just a few rules. After several iterations these last will finetune your values for the best score.

My own chessprogram improved markedly after tuning all the rules in this way. It beat the #### out of older versions. Then why does it not reign supreme on the internet do you wonder? Because it's also a question of how good your rules are. And I am not that good a chess player........

64k Aliasing

If your performance differs between CPU's with and without hyperthreading you might be encountering the 64K Aliasing issue. It's not hyperthreading specific, but increases tenfold on hyperthreading machines. I was finally tipped off by a VTune performance trace from a user (as I did not have a hyperthreading box at the time).

What's 64K Aliasing? A quote from the VTune reference

A 64K-aliasing conflict occurs when a virtual address memory references a cache line that is modulo 64K bytes apart from another cache line that already resides in the first level cache.  Only one cache line with a virtual address modulo 64K bytes can reside in the first level cache at the same time.

Worst case would be to have multiple threads all having their stack space starting at a 64k modulo. That's what I had (in my memory manager). Every time a thread is context switched into action the data from the another thread is pushed out of the cache ;-) Another comparable worst case would be to subsivide a large picture in 64K blocks and start working on each block in a different thread.

The fix in my case was to offset each threads stackspace by a differing amount of  memory. Along the lines of this pretty technical article;

Igor Ostrovsky wrote an easier to read article on performance issues that are related to current cpu-cache architectures, including the 64K issue. Without solutions however.

Another good related read is can be found on dobbs; (or search 64k and pixomatic if the link is down).