Monthly Archives: May 2012

Piglet 1.3.0 is out, now with unicode support!

Piglet has been updated today, with version 1.3.0. Here’s a list of the most exiting changes:

Unicode support

You can now use the full character range available in your regular expressions and parser tokens. This means that the parser will correctly be able to lex things such as Arabic, Chinese and Japanese. When using regular expressions for these, all the normal rules apply but the characters will not be included in any of the shorthand notation. For instance, the traditional Japanese numeral Kanjis are not part of the \d construct.

Nothing in existing code needs to be altered to enable this support. The runtime of the lexer has been slightly altered and is very slightly slower, but it should not even be noticeable.

Choice of lexer runtime

The most costly thing by far in the parser and lexer construction algorithms is the lexer table compression. Though this has been alleviated somewhat by the unicode functionality which actually served to reduce the size of the lexing tables, it can still be quite expensive.

If a faster construction time but a slower lexer is desired, you now have other options. When constructing, set the LexerRuntime in the LexerSettings variable of your ParserConfigurator. Or if constructing just a lexer with no accompanying parser, set the LexerRuntime property.

The available values are:

  • Tabular. Tabular is the slowest to construct but the fastest to run. Lexers built this way will use an internal table to perform lookups. Use this method if you only construct your lexer once and reuse it continually or parsing very large texts. Time complexity is O(1) – regardless of input size or grammar size. Memory usage is constant, but might incur a larger memory use than other methods for small grammars.
  • Nfa. Nfa means that the lexer will run as a non-finite automata. This method of constructing is VERY fast but slower to run. Also, the lexing performance is not linear and will vary based on the configuration of your grammar. Initially uses less memory than a tabular approach, but might increase memory usage as the lexing proceeds.
  • Dfa. Runs the lexing algorithm as a deterministic finite automata. This method of construction is slower than NFA, but faster than Tabular. It runs in finite memory and has a complexity of O(1) but a slower run time and more memory usage than a tabular lexer. It provides a middle road between the two other options.

The tabular option is still the default option.

Hope you find it useful, and please report and bugs or problems either directly to me or file an issue on github.

Advertisements

The language feature abuse threshold

C# has an odd strategy to language features which can probably be best approximated with “That looks cool, let’s put it in”. This has resulted in a language which is about as full of syntactic sugar as the very best of them.

As an example, C# now has lambdas since a few years back. They’ve got their own syntax as well. Not that we actually needed lambdas in a strict sense, since we got delegates before that. Not that we needed those either, since we’ve got objects and interfaces to pass around. Which weren’t themselves needed. The only thing you really need is a few machine code instructions. Or, given a convoluted example you need only one assembly language instruction.

Of course you don’t want to code in that, so you’d end up writing in some type of sugar coated language in order to be as productive as possible. But, when are you crossing the threshold of overusing language features just because you can?

My example here is, again, going to be the lambda functions in C#. In part because I am using those myself a lot, and the usage is increasing – maybe in part due to my experience with Haskell which really turned me on to using a functional style.

Local functions

Lambdas let you make local functions, something which isn’t possible using a normal member function. Which means you can create something like this.

public void DoStuff(string message)
{
    Func<string, bool> messageContains = s => (message??"").Contains(s);
    if (messageContains("this"))
    {
        //.. stuff
    }
    else if (messageContains("that"))
    {
        // .. other stuff
    }
}

This saves quite a few characters to type, since you’d get a useful null safe comparison but it’s only scoped in the local function and doesn’t pollute your class. Overuse, or clever?

Self recursive lambdas

This Fibonacci function works just like the standard double recursive function, but from within a local scope.

Func<int, int> fib = null;
fib = f => f < 2 ? f : fib(f - 1) + fib(f - 2);

Granted, this is contrived and probably in all sorts of bad styles? Or appropriate somewhere?

Functions returning functions

My favourite, honestly very very useful, but would you yourself use this? Overuse?

public bool HasSpecificChildren(XDocument doc)
{
    Func<string, Func<XContainer, bool>> hasDescendant = 
        name => e => e.Descendants(name).Any();
    Func<Func<XContainer, bool>, Func<XContainer, bool>, Func<XContainer, bool>> and =
        (a, b) => x => a(x) && b(x);

    return and(hasDescendant("child"), hasDescendant("otherChild"))(doc);
}

In case it’s not obvious, this code is equivalent to this

public bool HasSpecificChildren2(XDocument doc)
{
    return doc.Descendants("child").Any() && doc.Descendants("otherChild").Any();
}

Now, interestingly, though this example is a bit over the top – which of the two implementations is the most redundant. I’d say it is the second one. The lambda sillyness only repeats the functionality once for each part and is actually as factored as you can get. Consider if you were to change the implementation from Descendants to Elements. One solution has only one place to change..

Currying

A final piece of something that probably is a bit from the Haskell world, though interestingly enough the venerable Jon Skeet wrote about it.

Currying is the idea that each function only really needs one argument. This can be achieved in C# as well. Consider this.

public static Func<T1, Func<T2, Func<T3, TResult>>> Curry<T1, T2, T3, TResult>(Func<T1, T2, T3, TResult> uncurried)
{
    return a => b => c => uncurried(a, b, c);
}

static void Main(string[] args)
{
    Func<string, string, string, string> func = (a,b,c) => a + b + c;
    var uncurried = func("currying", "is", "awesome");

    Func<string, Func<string, Func<string, string>>> curry = Curry(func);
    var curried = curry("currying")("is")("awesome");
}

This is very rarely seen in C#, but is a mainstay of other languages and can prove very useful indeed for function composition. So, are we abusing the language enough? Or do we need to go even further.

All of the things above have their place in your development toolbox, honestly. But when are you overusing them?

Is it OK to make a local lambda statement in order to avoid passing an argument to a private function? I know I make local lambdas constantly for this very reason. Is it cool to prefer LINQ to avoid making for loops? Should you avoid the var keyword because someone else might get confused of your typing intentions or just go along with the speed of development that it offers constantly?

When are you over the line? And who is to determine what is acceptable?

The forgotten paradigm

Programming paradigms, ways of thinking about and structuring code has been around since the dawn of time – since man started to think of the goto statement as a bad thing and structured programming made its way into the spotlight. The major paradigms are

  • Imperative
  • Functional
  • Declarative

Of the first two, examples are everywhere. Imperative languages are everywhere. Examples are almost all popular languages: C, C++, C#, Java, Python, Perl, Ruby, the list goes on and on. Functional languages are less common, though increasing in popularity since they offer good ways of handling concurrency and parallellism. Examples are Haskell, F#, Clojure and LISP.

But what of the third? Can you name a declarative language?

When you program in a declarative style, you tell the computer what you want instead of telling it what to do. There are very few declarative languages, though one has reached a ubiquitous status, SQL. Think about it, apart from T-SQL, the bastard lovechild of imperative and declarative database access, SQL will never require you to tell it how to get its data, you tell it the rules that the data you want back will obey.

What if you could do the same sort of problem solving in your code?

Business rules

As an example, think about how to go about implementing business rules. If there is one thing that software development has taught me it’s that the people who dream up rules for calculating prices have no end to their imagination. Say that you have built a system for shipping. There’s a fixed cost onto which a variable cost per mile is added to give the total shipping cost.

Fair enough, given the information you make very simple function to get the price. Then, as things are bound to do, a new rule enters. If you ship over a certain distance, you pay only 75% of the fixed cost. Fine, a simple if-statement will fix that.

Then the next change request. If you ship to a certain destination the variable price is lower. And if you ship more than 10 items you get a discount. If you book the shipping early it’s cheaper, if you book it late it’s more expensive.

Can you see where this is headed? Because of the permutation effects, every new case you add to this sort of computations will have to work together with every other case you have in your code, and still produce correct results.

Prolog

This is a case where logic directed programming should shine. And there exists to my knowledge only one logic based language, Prolog.

I’ve been meaning to explore this further, but it appears that Prolog is about as out of style as you can get. There seems to have been some hype during the fifth generation computer project in the eighties but since that failed horribly it seems to have taken down Prolog with it.

Also, Prolog is a decidedly comp-sciency language and quite mathematical where as far as I can tell the applications of the paradigms are in the realm of the very practical. This gap doesn’t exactly help.

Is there anyone using it? Or is there another way to express rule logic based programming in a declarative style? It would be sort of cool to have a generic rule based engine available since these sorts of problems are so common.

Tagged ,

Would you build a house using agile methods?

Well would you? I don’t think most people would, and it would be really interesting to see how the agile houses would look. We’d work in iterations, so I’ll probably start with the smallest deliverable object, a tarpaulin across some trees and build from there. Need another bathroom? Just add a few more walls, knock a hole in the wall and raise four walls and put the tub down. Feeling cramped? Tack on another floor. After each week the builders could sit down and talk about how to alter the building standards based on the mistakes of last week and provide a new set of building standards.

Does this sound ludicrous to you? Of course it does. But why?

Because houses aren’t software. And the environment they work in is not agile, and it never will be. You submit your plan and get planning permission, which can’t be changed without considerable effort and cost. You can’t change the standards on how to build exterior walls halfway through a project, or decide that you’d like more rooms after the foundation is done. It won’t fly. Pictured is the bent pyramid of Dahshur, in which the builders realized that the angle of the pyramid wouldn’t work halfway up the top. So they, predating the agile movement by some 4600 years, changed the design and rolled with the punches. I bet the pharaoh wasn’t too pleased about the end result though.

A hegemony of scrum

About two years ago I was at Öredev, which is the largest developer-centric conference in my region. Someone had the brilliant idea of putting a large note on the door to the mens bathrooms proudly proclaiming them to be site of the “waterfall classes”. Such is the state of the software industry in here that the venerable old waterfall model is more ridiculed than respected as a real technique. And yet if you were building a house, I’d wager you would use the waterfall model to build it, and you’d use the agile methods to build anything made out of software.

But really, are all software software? Are, in fact, some software more like houses? And are we really doing ourselves a favour by applying an agile method to something that isn’t suited for agile development.

Consider writing a software application for a washing machine. Is it more like a fast moving web development scenario, or is it more like building a house. You’ll never be able to change the software after your product has shipped. If you have bugs in your software the costs could be astronomical (think water leaks, electrical fires whatnot).

I’m not saying that there aren’t things in agile that our prospective washing-machine-software developer can take home with him, but I’d wager that even if he chooses an agile method for the daily tasks, it will be strengthened with the inclusion of a waterfall-like model on top. And that’s not a bad thing.

How much of a house is your software

I believe that this has to do with the cost of change after shipment, and the agile adoption of the rest of organization. If the cost of change is very high, or the cost of running your agile process within an organization that simply refuses to conform to an agile model then you’re probably better off doing a different kind of software development methodology. Preferably one of your own design.

Scrum is a very strange beast, since it is one teams evolved plan that worked for them that has been codified and subsequently adopted by people across the software development world. Sometimes it works, often it doesn’t – or at least it doesn’t get fully adopted. If you’re in any way think that the software you’re building or the organization you’re building it in has some characteristics of a housing business scrum as it stands probably will never work out the way the evangelists say it will.

Roll your own

I’ve a feeling that when the case for scrum isn’t clear cut, we often go about process change the wrong way. And by wrong way, I mean by bringing in agile consultants that are supposed to implement a codified scrum. They constantly tell us that we are supposed to modify things as we go along while at the same time telling us how it really should be done. Why not just evolve things on your own?

The only thing you really need is the current baseline, i.e. what you have going right now – even if that is nothing at all, and a whiteboard. Take one thing from scrum, the retrospective and sit down. Rank all the crappy things in your working environment and address the worst one. Don’t even bother with anything else. It’ll take a while, but you’d find the best way of doing things given your situation and your software.

How are your thoughts on this, are you also experiencing a move towards just taking a boxed implementation of what has worked for someone else instead of actually doing the pretty easy task of finding out what would be the actual best method for your organization?