The forgotten paradigm

Programming paradigms, ways of thinking about and structuring code has been around since the dawn of time – since man started to think of the goto statement as a bad thing and structured programming made its way into the spotlight. The major paradigms are

  • Imperative
  • Functional
  • Declarative

Of the first two, examples are everywhere. Imperative languages are everywhere. Examples are almost all popular languages: C, C++, C#, Java, Python, Perl, Ruby, the list goes on and on. Functional languages are less common, though increasing in popularity since they offer good ways of handling concurrency and parallellism. Examples are Haskell, F#, Clojure and LISP.

But what of the third? Can you name a declarative language?

When you program in a declarative style, you tell the computer what you want instead of telling it what to do. There are very few declarative languages, though one has reached a ubiquitous status, SQL. Think about it, apart from T-SQL, the bastard lovechild of imperative and declarative database access, SQL will never require you to tell it how to get its data, you tell it the rules that the data you want back will obey.

What if you could do the same sort of problem solving in your code?

Business rules

As an example, think about how to go about implementing business rules. If there is one thing that software development has taught me it’s that the people who dream up rules for calculating prices have no end to their imagination. Say that you have built a system for shipping. There’s a fixed cost onto which a variable cost per mile is added to give the total shipping cost.

Fair enough, given the information you make very simple function to get the price. Then, as things are bound to do, a new rule enters. If you ship over a certain distance, you pay only 75% of the fixed cost. Fine, a simple if-statement will fix that.

Then the next change request. If you ship to a certain destination the variable price is lower. And if you ship more than 10 items you get a discount. If you book the shipping early it’s cheaper, if you book it late it’s more expensive.

Can you see where this is headed? Because of the permutation effects, every new case you add to this sort of computations will have to work together with every other case you have in your code, and still produce correct results.

Prolog

This is a case where logic directed programming should shine. And there exists to my knowledge only one logic based language, Prolog.

I’ve been meaning to explore this further, but it appears that Prolog is about as out of style as you can get. There seems to have been some hype during the fifth generation computer project in the eighties but since that failed horribly it seems to have taken down Prolog with it.

Also, Prolog is a decidedly comp-sciency language and quite mathematical where as far as I can tell the applications of the paradigms are in the realm of the very practical. This gap doesn’t exactly help.

Is there anyone using it? Or is there another way to express rule logic based programming in a declarative style? It would be sort of cool to have a generic rule based engine available since these sorts of problems are so common.

Advertisements
Tagged ,

Would you build a house using agile methods?

Well would you? I don’t think most people would, and it would be really interesting to see how the agile houses would look. We’d work in iterations, so I’ll probably start with the smallest deliverable object, a tarpaulin across some trees and build from there. Need another bathroom? Just add a few more walls, knock a hole in the wall and raise four walls and put the tub down. Feeling cramped? Tack on another floor. After each week the builders could sit down and talk about how to alter the building standards based on the mistakes of last week and provide a new set of building standards.

Does this sound ludicrous to you? Of course it does. But why?

Because houses aren’t software. And the environment they work in is not agile, and it never will be. You submit your plan and get planning permission, which can’t be changed without considerable effort and cost. You can’t change the standards on how to build exterior walls halfway through a project, or decide that you’d like more rooms after the foundation is done. It won’t fly. Pictured is the bent pyramid of Dahshur, in which the builders realized that the angle of the pyramid wouldn’t work halfway up the top. So they, predating the agile movement by some 4600 years, changed the design and rolled with the punches. I bet the pharaoh wasn’t too pleased about the end result though.

A hegemony of scrum

About two years ago I was at Öredev, which is the largest developer-centric conference in my region. Someone had the brilliant idea of putting a large note on the door to the mens bathrooms proudly proclaiming them to be site of the “waterfall classes”. Such is the state of the software industry in here that the venerable old waterfall model is more ridiculed than respected as a real technique. And yet if you were building a house, I’d wager you would use the waterfall model to build it, and you’d use the agile methods to build anything made out of software.

But really, are all software software? Are, in fact, some software more like houses? And are we really doing ourselves a favour by applying an agile method to something that isn’t suited for agile development.

Consider writing a software application for a washing machine. Is it more like a fast moving web development scenario, or is it more like building a house. You’ll never be able to change the software after your product has shipped. If you have bugs in your software the costs could be astronomical (think water leaks, electrical fires whatnot).

I’m not saying that there aren’t things in agile that our prospective washing-machine-software developer can take home with him, but I’d wager that even if he chooses an agile method for the daily tasks, it will be strengthened with the inclusion of a waterfall-like model on top. And that’s not a bad thing.

How much of a house is your software

I believe that this has to do with the cost of change after shipment, and the agile adoption of the rest of organization. If the cost of change is very high, or the cost of running your agile process within an organization that simply refuses to conform to an agile model then you’re probably better off doing a different kind of software development methodology. Preferably one of your own design.

Scrum is a very strange beast, since it is one teams evolved plan that worked for them that has been codified and subsequently adopted by people across the software development world. Sometimes it works, often it doesn’t – or at least it doesn’t get fully adopted. If you’re in any way think that the software you’re building or the organization you’re building it in has some characteristics of a housing business scrum as it stands probably will never work out the way the evangelists say it will.

Roll your own

I’ve a feeling that when the case for scrum isn’t clear cut, we often go about process change the wrong way. And by wrong way, I mean by bringing in agile consultants that are supposed to implement a codified scrum. They constantly tell us that we are supposed to modify things as we go along while at the same time telling us how it really should be done. Why not just evolve things on your own?

The only thing you really need is the current baseline, i.e. what you have going right now – even if that is nothing at all, and a whiteboard. Take one thing from scrum, the retrospective and sit down. Rank all the crappy things in your working environment and address the worst one. Don’t even bother with anything else. It’ll take a while, but you’d find the best way of doing things given your situation and your software.

How are your thoughts on this, are you also experiencing a move towards just taking a boxed implementation of what has worked for someone else instead of actually doing the pretty easy task of finding out what would be the actual best method for your organization?

Implementing scrum and missing the mark

I have been working with different varieties of scrum for about 6 years now, and I’m not sure that we are focusing on the right things. What are we trying to solve, and who are we trying to preach this new agile gospel to?

A hierarchy of (developer) needs

Like the famous Maslow’s hierarchy of needs, I think that we are looking at much the same thing when it comes to making a working environment for a development team. I believe that this goes pretty much like this:

  1. What am I supposed to be doing?
  2. How am I supposed to do it?
  3. What’s everyone else doing?
  4. How can we get better at doing it?

First and foremost, you want to know what you’re supposed to be doing. As a developer you’ll need some sort of plan, or at least a clear idea of what is the end result.

Secondly, once you actually know what is to be done, you’ll need to know how you are supposed to do it. What are the requirements of the task?

Once you know those things, you can work reasonably efficient, which would give you enough time to help your co-workers out and begin to act as a team, which forms the need to know what they are doing.

Which leads to the final step of wanting to get better at the actual process of doing things.

How scrum is typically implemented

When trying to implement scrum in an organization it’s almost always implemented in the completely opposite direction.

First to be implemented is the daily scrum, because it is easy. Not because people really need it, but because it’s an easy thing to do and programmers need to stand up more in the morning. Secondly, you get to the retrospective. Most teams never get further than this due to organizational intertia or lack of interest.

In my opinion, this is doing it all wrong.

The product owner is the key

The product owner is the key to the entire idea of scrum. Without a proper product owner, you are never going to get things working properly. Look, daily standups and fuzzy retrospectives are nice and all but if the backlog isn’t prioritized you’re going to end up with headless chickens in the development team who does not know what they are going to do next week.

A product owner is not a project manager. And he is absolutely not a developer. And, no, he can’t double as the scrum master. Seriously. And he has about one thing to do, and that is to make sure that there are issues to do each iteration and that they are planned way ahead.

A strong product owner is the buffer between the scrum bubble and the rest of the organization, much like the scrum master is the buffer between the individual developer and the product owner. And yet people keep completely missing the point of the role.

I’m thinking that this is because it’s the most innovative concept that scrum introduces. Most people can imagine the scrum developer with the daily standups and once-per-iteration retrospectives. But the product owner is a new entity, an administrative one that usually doesn’t exist within an organization. So they usually skip it altogether, or completely misrepresents the role by thinking about it as pretty much a renamed project manager.

Barking up the wrong tree

So, why is it that we have Agile conferences filled with people whose primary job is to write code? Why is it that we see the daily standup as the prime vehicle of scrum, where in fact it is a luxury item that only comes into play once you actually know what you are supposed to be doing the next few weeks?

The most common complaint I hear is this:

We the developers try to do scrum, we have our iterations but we still have old-school project managers that doesn’t jive with the methodology. So we only do standups and retrospectives.

Then why bother? If the organization can’t embrace scrum by including one of it’s holy trinity, the scrum master and development team being the other two, why are you trying to fit square pegs in round holes?

If there is one thing I’d like to force into people’s mind it’s this:

You cannot do scrum without a prioritized backlog and a product owner. Everything else does not really matter if you do not have this.

How are your ideas regarding this symbiosis? Do you feel you can really do scrum half-ass and be happy with it? If the rest of the organization is that unwilling to the concept that you can only do it inside of your own little development team with your programming buddies, perhaps it’s time to evolve another system of your own that can actually fit within your organization instead of trying to do everything the opposite way while still maintaining an interface outwards that an old-school organization. I’d love to hear your thoughts.

Tagged , ,

Piglet 1.2.2 released

Well, honestly, 1.2.1 was released too but I forgot to add one of the features that I was going to add to it, so there was two releases today.

Either way, here’s whats new:

Support for escaping inside character classes

You can now use escaped characters inside the character classes. That is, regular expressions like [\d\s] will now result in the correct expected expression. This also enables you to escape the – and ^ sign inside the character class which previously was not possible.

Support for ignoring expressions in character classes

For those purposes where you want the lexer to silently drop things instead of generating tokens. Use configurator.Ignore(expression) to ignore it. For instance, this expression ignores a C# style comment: configurator.Ignore(@”//[^\n]*\n”);

As always, please do report any issues or questions, I’m happy to take a look at it!

Tagged ,

Piglet 1.2.0 released

A new version of Piglet – the fluent parser generator is now out. Here’s a bit of what has been fixed, and how you use those new features in your project.

Added error recovery and reporting to fluent parser configuration

Piglet can now recover from errors and continue parsing in a controlled manner when using the fluent configuration. This is done by the very special Error token. What you do is that when defining a rule, instead of the token you want to use, you use the Error token. In case of an error this token will match instead and all subsequent tokens will be skipped until the next legal token is matched.

When a rule containing the Error token has been found, you will find the Error variable filled in in the corresponding WhenFound clause.

Added token precedence to fluent configuration

You can now specify precedence groups for tokens using the LeftAssociative, RightAssociative and NonAssociative methods found in the fluent configurator object.

Pass one of these methods a group of tokens to declare them to be of equal precedence. Literals or expressions are legal input.

Completed XML documentation to include every method

No more empty intellisense! There should be somewhat helpful text everywhere now.

Fixed bug with escaped ?

\? was not handled correctly when used as an expression. It is now.

Fixed bug with numbered repetition

Numbered repetition in regular expressions now supports the standard {x,y} syntax in addition to the {x:y} syntax that I apparently implemented originally. The old syntax still works.

Also now, the {x,} syntax is supported, which Piglet didn’t support before.

Fixed bug with possible wrong order of defined expressions for fluent configuration

The terminal wasn’t immediately created when a fluent expression was declared. It now becomes created at the moment when the AndReturns function is called. Note that expressions now MUST call the AndReturns function. An incompletely defined expression is no longer legal tender in the piglet world.

Automated the NuGet package management

Thanks to @orjansjoholm the NuGet packages are now built with psake and automated. I could not have done this myself, and it works a treat!

Other news

Check out this thing: A DCPU16 emulator and assembler. Uses Piglet as its parser for the assembly!

Tagged ,

A review of my Haskell adventure

So, I decided to teach myself something new. Something exotic. I wanted to try a new paradigm and a language that is different from those which I am very familiar with while still being a usable language. I chose Haskell and ended up coding pretty much the entire easter holiday since my girlfriend was sick and we cancelled most of the usual family dinners. This is a look back at my quite intensive experience over the weekend. It’s less about the language itself, and more about the process of trying to learn it.

Why Haskell?

Because it is a totally different language, and because it is a pure language. That is, you really can’t work around the fundamental paradigm of this functional language like you can in other languages. You pretty much have to get into a different mindset and think differently. It also has a strong community and a certain buzz around it. This review is obviously from a very personal perspective and full of soft things like opinions. Everything here is also written from the perspective of a windows user. If your opinions differ, please do comment!

Getting started

Haskell is pretty welcoming in this regard. I downloaded the Haskell platform, pressed install and it pretty much worked out of the box. By working I mean the compiler works and the cabal package installer works. This is a very good point, since if this would have taken all day I probably would have given up. Time is precious to me, and mucking about with the pure fundamentals isn’t what I’m interested in.

However, Haskell is lacking one thing in particular and it’s a really strong IDE. There is one, Leksah which I tried, but the auto-completion and syntax highlighting didn’t work like I wanted it to work. And by that I mean it would work pretty much exactly like ReSharper. I can’t stress this enough. A good IDE is an exploratory tool for a total beginner, it will give you useful error messages, it will suggest things which are correct and help you out. Haskell doesn’t have this. From what I gather many use Emacs and those sorts of editors. I’m sure they are good if you know how to use them, but a new editor as well as a new language is a bit much. I initially searched for a Visual Studio language extension, which apparently doesn’t exist. If this would exist it would have shortened my learning time considerably, taking away much frustration and wasted time.

In the end I ended up using Sublime text for editing Haskell, due to good syntax highlighting and a nice colour scheme.

Resources

There’s a lot! Seems that the converts are really eager to spread the Haskell gospel to the rest of us. A few stick out in particular, Learn you a haskell for great good and Real world haskell. They approach the learning differently, the first walks you through language features and shows you how to use them and the second one tries to do the practical thing. Also, I’d like to point to this post on stack overflow which really gives a nice clear path to follow.

I decided to do as the SO post said and do the 99 problems thing and keeping these two as a reference. Turns out this was a good idea. I’m learning by doing and doing these exercises was both fun and educational and sufficiently challenging to keep it interesting. Not all of the exercises are really suitable for Haskell learning and could do with a rewrite. Especially the random number exercises which are silly for a beginner to do in Haskell. Once this was exhausted I continued doing Project Euler problems, doing I think 15 of them.

I have mixed feeling about Project Euler. On one hand it’s really fun in a nerdy way. On the other hand the exercises quickly evolve into not coding together a solution for problem but either optimizing a solution to run fast enough to be practical or to mathematically analyze a problem in order to make it simpler. Optimizing Haskell is hard for a beginner, since the normal techniques won’t work or isn’t available at all, and doing the math analysis won’t make you better at Haskell.

The fabled community is very helpful indeed. Ask a question on the IRC channel and you’ll get answers in seconds. Ask on stack overflow and you get exhaustive answers immediately. At least to the beginner issues, I have no idea if this continues up to advanced stuff as well.

Hoogle is good, but a bit difficult to use for a total rookie. It increases in value all the time to me.

Making a project

When I realized that I was no longer learning Haskell from the Project Euler exercises I decided to go and do something on my own. My usual goto is graphics programming and this time I decided to do 2D-metaballs. This was a good exercise and took me most of a day to produce a working, if somewhat slow implementation that renders using the provided OpenGL bindings. The hardest thing here was the IO thing, especially since I didn’t (and still don’t) understand monads.

To a novice, Haskell feels like two languages. The functional one which is pretty easy to grasp and the monadic weird imperative language. A week has passed now and I think I understand a little bit more of why the code looks like it does, but it remained a mystery through most of the development. The finished project is on GitHub for the world to see and chuckle at. It runs slower than it should but I really don’t know why. I think it computes things too many times, but I don’t know how to optimize it yet.

A second project

I decided I wanted to do a raytracer, that draws a picture into a window. This proved hard but not for the reasons which I expected. Windows is apparently a second-class citizen in the Haskell world when it comes to libraries which requires interop with C. Haskell comes with a pretty brilliant package managed called cabal. In theory you should be able to just do cabal install whatever-package. This works as long as the package is pure Haskell. I really loathe multiple click installs, and to get SDL running is a joke on windows! Here is obviously work to be done to make this friendly to anyone.

I can spend stupid amounts of time struggling with the language itself, but I get pretty pissed off at fiddling with the environment for more than a few minutes. This is also, by the way, the reason I won’t administrate servers or even fiddle with their settings to any real lengths.

So, I decided to just write a picture to disk instead. This also proved somewhat difficult, again due to the missing IDE. I found a promising library, but I can’t seem to find instructions on how to actually use them. cabal will install it, but it is missing examples. And this is my current status so far, since the easter weekend is over. I’m sure the API usage is somewhere in Hackage, but hell if I know how to use that thing properly to figure out what calls are available and what parameters they expect.

Final thoughts

Haskell is a cool language and while it takes time to grok all the stuff in it. I’m not really qualified to talk about what is good and what is not in the language itself, but the purity of the thing is a very nice challenge and it will expand the horizon of any imperative oriented programmer out there. I like the language and the things it teaches me. It is definitely worth taking a weekend to do a test project. I also really like the compactness of the code, and the way the type system works. I have a long standing aversion to weakly typed languages, and Haskell is the antithesis of that.

Will I use it professionally? I’d like to do things in it, but I doubt that I’ll have the opportunity. While Haskell has come a long way in user friendlyness it still has a way to go in the tooling department for a typical windows user. The threshold for getting your feet wet and doing simple programs is low, but it feels like it is a steep step to doing something that is an actual real world thing in the windows-.NET-centric environments I work in. Can I get it to run easily in an IIS server or can I make windows applications or windows services easily in it? If the Haskell community could make those easily available to me the chances would be much improved.

And I hope they do improve, cause the feeling of coding Haskell when you have flow is one of the coolest things in the world. I got a little sense of it, and the power at your fingertips in this language makes you feel like a wizard.

Tagged , ,

Adding many entries to an Observable collection in a performance friendly way

I was writing a WPF application which was polling a lot of web services in order to display stuff in a list. The items change infrequently but required constant polling in order to make sure that they stayed the same. The problem appeared when they suddenly changed a lot. You see, the built-in Observable collection does not like when you toy around with it a lot and will send tons of events. So, while my app mostly chugged along with a few new entries an hour, sometimes it got a thousand in one fell swoop – causing unresponsiveness.

I googled around a bit, and found this
article
, that while good did not help my situation. Since I cannot know when I’ll receive many entries, and I’d like to have a simple interface I have improved on this solution a bit.

Behold, the DeferEventObservableCollection:

public class DeferEventObservableCollection : ObservableCollection
{
	private readonly List<NotifyCollectionChangedEventArgs> deferredEvents = new List<NotifyCollectionChangedEventArgs>();
	private bool hasQueuedDispatcherUpdate = false;
	private readonly int threshold;
	private object syncRoot = new object();

	public DeferEventObservableCollection(int threshold = 10)
	{
		this.threshold = threshold;
	}
	
	protected override void OnCollectionChanged(NotifyCollectionChangedEventArgs e)
	{
		lock (syncRoot)
		{
			deferredEvents.Add(e);
			if (!hasQueuedDispatcherUpdate)
			{
				Dispatcher.CurrentDispatcher.BeginInvoke(new Action(() =>
				{
					lock (syncRoot)
					{
						if (deferredEvents.Count > treshold)
						{
							base.OnCollectionChanged(
								new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
						}
						else
						{
							foreach (var ev in deferredEvents)
							{
								base.OnCollectionChanged(ev);
							}
						}
						deferredEvents.Clear();
						hasQueuedDispatcherUpdate = false;
					}
				}));
			}
			hasQueuedDispatcherUpdate = true;
		}
	}
}

What I have made is a collection that will seamlessly handle the issue of suddenly getting a lot of changes in a more graceful way. Adding an item or removing it will catch the event that the observable collection normally fires and queue it up on the dispatcher queue. Since you normally get your items added in the dispatcher thread this event will then be deferred to later on in the same chunk of dispatcher work (or the next chunk). Events keep piling up in this queue and once it’s time to fire the events the collection will evaluate the size of the event list. If it is above some threshold value it will replace the events in the queue with a single Reset event. You’ll need to experiment with the threshold to find the optimum number, the 10 is a raw guess, and it will vary on how many listeners you have on your collection.

Being the concurrency-aware programmer that I am, I have added some extra measure of thread safety so that it cannot go haywire. When emptying the event queue, if another thread should attempt to add a new even to the queue, the lock will prevent this from blowing up in your face.

This simple drop-in replacement changed the performance on my application by several orders of magnitude, and I hope you can find it as useful as I did in your project!

Edit: I fixed the locking, it no longer locks on this which is bad form. Instead it locks on a syncroot as it should. Thanks @pheiberg for pointing it out!

Tagged , , , , ,

New versions of Piglet and Regexvisualizer are out

It’s been a busy few weeks now, but I finally managed to get some OSS work done again. Here’s an overview of the new stuff

Piglet 1.1.0

Piglet has been updated with a few new features.

  • DFAs generated by piglet can now be minimized, and are minimized by default. This is in essence an optimization that will reduce Piglets memory footprint and make it run faster. As alwyas, this is a tradeoff and you can if you really want to turn off the minimization. The result will be a larger memory footprint but a faster parser generation time. The correctness of the thing remains the same.
  • Ability to get the graph string for the lexer has been added to the public API. Really a feature coming from the regex visualizer, it has been improved with a better API for access from outside the piglet internals. This has also been enhanced with new features, outlined below in the updates to the regex visualizer.
  • Changed test framework to NUnit. This decision was made since there had been some issues with MsTest, mainly that it screwed up building on a machine which did not have Visual Studio installed. This was not an acceptable situation so we decided to change it.

All in all, this release should be fully backwards compatible with earlier versions of Piglet. If you are using it and finding any problems, please do let me know. Piglet is, as always, available on NuGet and on GitHub

Regex visualizer

The regular expression visualizer has been given a new feature, making it really useful for debugging. It will now highlight given an input string which are the active states in both the NFA and the DFA. And if your expression isn’t matched, it will show you what part of your expression is matched and the last state where the machine was in a legal state. All in all this is turning out to be a pretty useful tool. I hope to do some work on the rendering soon, to avoid the pretty annoying flickering – going to look into canviz to turn the rendering into client-side instead of torturing google with constant requests.

Tagged ,

DFA state minimization

If you follow the generic instruction for converting a NFA to a DFA presented in my earlier post, the DFA you generate will work, but it will not be optimal. It will have too many states opposed to the bare minimum needed to do it’s job properly. This tends to happen more frequently if you give it an input of an expression that in some respect have redundancy. A prime example of this is the expression a+b+|ab. The |ab part is of course, totally useless. But the subset construction algorithm doesn’t know that.

To compare, here is the NFA generated by thompsons construction:

Here is the non-minimized DFA that the subset construction algorithm will generate:

And here is the minimum DFA required to do the job properly:

Quite a difference! So, how do we do this? There are several ways of accomplishing this, some more complex than others. I’m going to give you a working algorithm which admittedly isn’t the fastest. The faster versions are more complex, and would ruin the idea of explaining the algorithm.

For programmers

I’m willing to bet you’re like me, a programmer and not a mathematician. You can describe this algorithm in math terms to no end, and I’m willing to bet that most people are going to miss out on what is really not the most complex algorithm in the world. If you google this you’re going to get the algorithm described as some function which partitions states so that this and that is true. Perhaps this is jealousy on my part since I feel left out of the cool crowd with the greek symbols and all, but I think there are better ways of explaining this stuff.

The algorithm

It’s a bit backwards, but what we’re going to do is to assume that all states can be merged together and then rule out those that can’t. This is going to be represented by a table, where empty squares are those that we cannot tell apart. So, for now, this table looks like this:

+-----+
|s1|  |
+--+--+--+
|s2|  |  |
+--+--+--+--+
|s3|  |  |  |
+--+--+--+--+--+
|s4|  |  |  |  |
+--+--+--+--+--+
   |s0|s1|s2|s3|
   +--+--+--+--+

This table is going to be filled later on. There’s no reason for a full table, we only want the distinct pairs of states.

What we are interested in is to sort the states into groups that may be merged into a single state without altering the functionality of the DFA. So, what we really need to do is to find out how to find these state groups normally referred to as nondistinguishable states. The first rule of this is:

A state that is an accepting state may never be in the same group as a state which is not an accepting state.

So, in our graphs we may never merge a double-circle state with a single circle state. Let’s apply this to our table. We end up with this:

+-----+
|s1|  |
+--+--+--+
|s2|X |X |
+--+--+--+--+
|s3|  |  |X |
+--+--+--+--+--+
|s4|X |X |  |X |
+--+--+--+--+--+
   |s0|s1|s2|s3|
   +--+--+--+--+

We’re not out of the woods yet, still more work to do. There is another rule at work here. It’s easiest to express this in pseudocode. You will apply the algorithm repeatedly until the table stops changing. If you complete a full iteration without any entry getting filled in, you are done.

do {
for (int i = 0; i < numStates; ++i) 
{
  for (int j = i + 1; j < numStates; ++j)
  {
    if (table[i,j].IsEmpty) 
    {
      foreach (var inputsymbol in allValidInputs)
      {
        if (one of the states has a transition but the other one doesn't)
        {
          table[i,j] = X;
        }
        else if (both states have a transition) 
        {
          int iTarget = state at other end of transition from state i;
          int jTarget = state at other end of transition from state j;
          if (!table[iTarget, jTarget].IsEmpty)
          {
            table[i,j] = X;
          }
         }
        else
        {
           // For clarity, if none of the states have transitions
           // on this symbol, don't change the table
        }
      }
    }
  } 
}
} while (table.WasChanged)

What this does is to check if there is any significant difference by checking where the transitions go, and filling in the table as it goes along. Applying this changes the table to look like this:

+-----+
|s1|X |
+--+--+--+
|s2|X |X |
+--+--+--+--+
|s3|X |  |X |
+--+--+--+--+--+
|s4|X |X |  |X |
+--+--+--+--+--+
   |s0|s1|s2|s3|
   +--+--+--+--+

In this example, iterating the algorithm won’t change the table which means we are done. The states s1 and s3 can be merged into one state and s2 and s4 can be merged into another. To illustrate, here is the two nondistinct state groups marked out:

A final caveat is that you might get long chains of states. Say that s0 and s1 will be merged along with s1 and s2, which really means you need to merge s0, s1 and s2 into another state. This is pretty trivial, and I will not elaborate further on this, an example of this can be found in the class DFA of my parser library Piglet, which is available on GitHub.

As a final note, it can be proven that this algorithm generates the smallest possible DFA that accepts the given grammar. All in a few lines of code. Try the algorithm out yourself at the regex visualizer – make a redundant expression and uncheck the Minimize checkbox.

Tagged , , ,

A regular expression visualizer

Sometimes writing software gives you byproducts, and this is one. When implementing Piglet i ran into the issue that I couldn’t use the built in System.Text.Regex classes that I would have liked to use to avoid implementing another regular expression engine. No matter, so I built one of my own. Now when implementing one of those bad boys I figured that the best way of debugging them was to draw the same graphs that you find in the books – for the simple reason that getting an overview of a directed graph by investigating a bunch of references is tricky to say the least!

So, wise from experience I figured that you can’t spend too much time writing a good debugging facility, and it turned out quite nicely. So nice in fact that I decided to share this with the rest of the world.

It’s a simple tool that allows you to input your regular expression and see the deterministic and nondeterministic machines that implements your expression. Useful for understanding regular expressions and finding problems in your expressions.

Deployed on Appharbor, which was a truly pleasant experience indeed, it can be found here. Open sourced, if you feel it can be improved, please do so! The code is on github, as usual. The app itself is a teeny-tiny MVC application, with some javascript written by someone who is notoriously bad at it (me). If you find any issues, please do tell me using the communications channel of your choice.

Hope you find it useful!

Tagged , ,