Sunday, March 19, 2006

XML, Strings and "Simple" Text Processing

Last friday I was hacking up a simple system for maintaining a presistent local parameter store. I did not want to waste a lot time on it, so I decided I would persist my parameters in an XML file and store the file everytime a parameter was added or changed. After finishing the implementation, the first thing I wanted to do was persist an XML document, represented as a string (a log4net configuration document). No problem I thought, its just text - I had a key node with the parameter key and a value node where I persisted the XML document.

When I went to implement a simple parameter editor, I thought... after editing, maybe my XML document is non-conforming. I think I will reload it into an XML document after editing to test its validity. Much to my disappointment, when I did this everything broke. All my XML elements were XML encoded as <node-name> and when I reloaded the document I got lots of errors. What went wrong.

In fact, when I loaded the document into the XML document and later stored the document inner XML (in .NET C#) the new text contained an node giving the encoding of the document. Since this was being stored as an XML node in my parameters document, the XML framework thought it must be text and encoded it for me. This show why XML sucks. You can not just blindly add text to a document. If I would have added XML that was not well-formed in a single node, this could break my containing document.

To counter this I decided to base64 encode the xml text and push it into a CDATA section of my parameters document. Okay, the text of my parameters document would not be as easy to hack, but I could guarantee that any parameter could be put in the document without breaking the container. This is the source of today's rant. I find the C# API for such a task is crappy.

First, what I expected to do was to create a string object that was the base64 representation of my initial string object. But the Convert method for creating a base64 string takes a byte array (okay that is logical), but to create a byte array from a string you have to get text encoding object and call the GetBytes method. One would expect that string could expose a constructor that takes and encoding and a byte array and a GetBytes method that could create a byte array. I spent 1/2 hour looking though the crappy .NET documentation trying to find the objects collaborating in the string to byte array sequence before stumbling upon the correct incantation. Then I spent another 1/2 hour doing the same thing for converting a byte array back to a string.

The basic problem here is that you should be able to say, "string, give me a byte array representation of yourself - here is my encoding scheme" - I think this is the Java strategy and I think it is the logical one. What .NET has is "encoding, here is a string, take a peak inside its current implementation and representation and give me a byte array represetation of it". I think it breaks encapsulation and you really have to have a lot of experience working in domains where character encodings are important (i.e. where 99% of all north american programmers like myself don't work) before you even guess where the solution might be located.

The second problem is that all the .NET documentation (no... make that all Microsoft documentation) is broken in the 1/2 screen factoids with circular references between every three pages. Examples are designed to make people quit trying. For example, I recently looked at the documenation for getting directory information and an example program was presented showing how to print out the various parts of a file path without showing what the various method calls should return... I suppose they expect you to create a project, copy and paste the example code and run the example if you are really interested in knowing what the method calls do, rather than just clearly explaining it in the description of the methods.

Thursday, March 16, 2006

Review Your Code

The other day one of my little boys was sick at home, so as I had to spend the day with him. I decided to write a presentation document - using OpenOffice Impress - describing the technical details of an iteration release of one of our products. The idea was that we produce some much code, for so many different products that, first we don't remember what we did or how we solved the problems a month later and second, people who were not sitting in on the coding session don't have a good idea of how the code works. I thought an Impress presentation would allow us all to quickly review what we did on a release, and the developers who did the code could give insight to the implementation strategies.

We have not really had a review meeting (but this will happen), but I found the process of writing the code review presentation to be very informative for me. First, I am the worst when it comes to forgetting what I did two weeks ago (I am the burnt out coder after all). Second when we isolate code sections for a simple presentation, we see inconsistencies and omissions that are not appearent when you are busy hacking code. In one section of the presentations I talk about the major interfaces that we implemented. I show a UML class diagram of the interface on the left with a description on the right. That's it... one interface per slide. Under this presentation you notice things about the interface design the you don't when it is all mungied-up with other classes and interfaces.

In the end, on one hand I could see that some interface members may have been asymetrically or inconsistently implemented, but on the other hand we don't add things to interfaces lightly and everything is the subject to quite a bit of debate. Also we have a strict rule about not adding stuff we don't need. Currently the interfaces work (warts and all) and that is the most important thing. I am happy with the review and I am happy with the structure of the interfaces. I find the time spent writing the review was not wasted.

Friday, March 03, 2006

I am back

I have not been blogging very often, but I have been working hard. We have created a lot of code and I have a lot of cool projects on the go. Version 4 of our principal product is progressing in fits and starts. We are implementing the foundation using some of the most incomprehensible code that I have ever written.

We corrected a major problem with the PTI (ProjectTypeIndexer) today. This is an object that is repsonsible for finding other addressable objects at runtime, from "nearby" dl'ls, validating the rules associated with them and allowing them to be integrated into the project as first class citizens. The bug had to do with how the change manager rebuilt this object so that once one user identified and integrated a type, it could be distributed to to other user using a commit/synch workflow.

Rarely have we written such simple code that has such a contorted runtime trace.

With this bug fix we are on target to finish implemented basic change management for graphic objects. This project has put such an emphasis on change management (using local and distributed workflows) that I don't think we will ever look at implemented an undo/redo facility in the same way. We learned lots of stuff and when everything starts working it is super cool....