Piglet

Piglet 1.4 released

Another release of Piglet! It’s been some time since the last release, and here are the major new features. Get it on NuGet or fork away on Github.

Performance

Parser construction and and especially lexer construction speeds are vastly improved. The main feature here is improved lexing speed, especially when you are handling lange and overlapping character sets.

Useability

Prioritize constants in rule over expressions. When writing constants in rules they now receive higher priority than expressions previously written. This reduces the amount of expressions the user needs, and is the normal right thing to do anyway.

Unicode support is now improved. \w and \d now handles the same ranges as normal MS regular expressions.

Thread safety

Both the lexer and the parser are now thread safe. The lexer interface is slightly changed in order to accommodate this. Lexers now include the Tokenize method which is way better to read token streams than the previous for loop.

Piglet also has a new icon in nuGet!

Bug fixes

A few bug fixes has also been sneaked in.

For full details, check the github list of issues for release 1.4. Please report any and all problems or issues on GitHub, or why not ask a question on stack overflow? Make sure to tag it with piglet to make sure someone spots it.

Oct 25 2012

Leave a comment

Parsing, Piglet

Building a DSL with Piglet – Extending our DSL

In the last part we created a very basic DSL for use with turtle graphics. Today we’re going to extend the language with more capabilites and learn more about how to use Piglet and also incidentally a bit of compiler construction (nothing scary, I promise).

This is the DSL we ended up with last time:

pendown
move 50
rotate 90
move 50
rotate 90
move 50
rotate 90
move 50
penup
move -10
rotate 90
move 10
pendown
move 30
rotate 90
move 30
rotate 90
move 30
rotate 90
move 30

One obvious thing here is that it would be really nice if we could have variables. We could represent the constants with something else and perform calculations on them. An important thing is to consider the syntax, for a simple DSL we want something that is easy to parse. For this reason we’ll use a var keyword to make variable declarations and require all variables to start with a $. Let’s modify the existing program to include our new language features:

var $size = 50
var $innersize = 30
pendown
move $size
rotate 90
move $size
rotate 90
move $size
rotate 90
move $size
penup
move -10
rotate 90
move 10
pendown
move $innersize
rotate 90
move $innersize
rotate 90
move $innersize
rotate 90
move $innersize

Looking at the new DSL, we note that there is now a new legal statement to make – the var declaration. So we need to extend statement with another rule. The statement list is already pretty long, so we separate it out into another rule:

// Runtime stuff
var variables = new Dictionary<string, int>();

var variableIdentifier = configurator.Expression();
variableIdentifier.ThatMatches("$[A-Za-z0-9]+").AndReturns(f => f.Substring(1));

variableDeclaration.IsMadeUp.By("var")
    .Followed.By(variableIdentifier).As("Name")
    .Followed.By("=")
    .Followed.By<int>().As("InitialValue").WhenFound(f =>
    {
        variables.Add(f.Name, f.InitialValue);
        return null;
    });

Since we want something that matches not a single constant string, we need to make an expression. Expressions are made using the configuration.Expression() method. You give it a regex and you get back an object. You also need to make a function to return a value for the expression. What we want is the variable name without the $ at the start.

To keep track of what variables we have and what values they have, we add them to a dictionary of strings and integers which we then can reference.

We add this new rule to our choices for statementList by adding this to the end:

.Or.By(variableDeclaration)

Note that our rule has just a single component we do not need to add a WhenFound method. It will automatically return the value of the function returned by the single value. Not that we’re using the values returned yet, but we will.

Continuing on, we want to use the variables for the rest of the commands. In order to do that we need to make another rule. Let’s call it expression. For now, a valid expression is either a constant int value, or a variable name. We then use the new rule instead of plain ints for the move and rotate commands. Here’s the full listing that supports the language that we set out to support so far:

// Runtime stuff
var turtle = new Turtle(canvas1);
var variables = new Dictionary<string, int>();

// Parser configurator
var configurator = ParserFactory.Fluent();
var statementList = configurator.Rule();
var statement = configurator.Rule();
var variableDeclaration = configurator.Rule();
var expression = configurator.Rule();

var variableIdentifier = configurator.Expression();
variableIdentifier.ThatMatches("$[A-Za-z0-9]+").AndReturns(f => f.Substring(1));

variableDeclaration.IsMadeUp.By("var")
    .Followed.By(variableIdentifier).As("Name")
    .Followed.By("=")
    .Followed.By<int>().As("InitialValue").WhenFound(f =>
    {
        variables.Add(f.Name, f.InitialValue);
        return null;
    });

expression.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable]);

statementList.IsMadeUp.ByListOf(statement);
statement.IsMadeUp.By("pendown").WhenFound(f =>
{
    turtle.PenDown = true;
    return null;
})
    .Or.By("penup").WhenFound(f =>
    {
        turtle.PenDown = false;
        return null;
    })
    .Or.By("move").Followed.By(expression).As("Distance").WhenFound(f =>
    {
        turtle.Move(f.Distance);
        return null;
    })
    .Or.By("rotate").Followed.By(expression).As("Angle").WhenFound(f =>
    {
        turtle.Rotate(f.Angle);
        return null;
    })
    .Or.By(variableDeclaration);

A few interesting possibilities here. It’s perfectly acceptable to replace the int for the variable declaration. In fact, it’s a pretty good idea. This enables us to write things like this:

var $foo = 42
var $bar = $foo

What we really want however is to be able to write this:

var $size = 50
var $spacing = 10
var $innersize = $size - $spacing * 2

We’re halfway there already. What is needed is to create some more rules for how an expression can be made. We start simple, we allow only a single addition. Separate what is today an expression into a new rule called term, and make a new rule for additionExpressions.

expression.IsMadeUp.By(additionExpression);

additionExpression.IsMadeUp.By(term).As("First").Followed.By("+").Followed.By(term).As("Second").WhenFound(f => f.First + f.Second)
    .Or.By(term);
term.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable]);

So, an addition expression is either an add, or a single lone value. This works fine as long as you dont try something like var $a = $b + $c + 100. This is easily fixable by changing the first term into an additionExpression. We are wiring the rule to itself! And like magic we can have as many additions in a row as we’d like. It’s trivial to add a subtraction expression as well, duplicate the rule, and change the operator and the WhenFound function to subtract instead of add (renaming to addSub instead to reflect the new functionality).

addSub.IsMadeUp.By(additionExpression).As("First").Followed.By("+").Followed.By(term).As("Second").WhenFound(f => f.First + f.Second)
                    .Or.By(additionExpression).As("First").Followed.By("-").Followed.By(term).As("Second").WhenFound(f => f.First - f.Second)
                    .Or.By(term);

However, if we add a multiplication and division here things will start to get strange. Everything will be evaluated left to right, so when writing var $a = 2 + 3 * 10 $a will have the value 60 instead of the expected 32. In order to solve this we need to redefine term. A term will now be a multiplication or division expression or a single factor that is a constant. We also wire this rule to itself like the addition rule so we can do many multiplications in a row. The expression grammar now looks like this:

expression.IsMadeUp.By(addSub);

addSub.IsMadeUp.By(addSub).As("First").Followed.By("+").Followed.By(mulDiv).As("Second").WhenFound(f => f.First + f.Second)
    .Or.By(addSub).As("First").Followed.By("-").Followed.By(mulDiv).As("Second").WhenFound(f => f.First - f.Second)
    .Or.By(mulDiv);

mulDiv.IsMadeUp.By(mulDiv).As("First").Followed.By("*").Followed.By(factor).As("Second").WhenFound(f => f.First * f.Second)
    .Or.By(mulDiv).As("First").Followed.By("/").Followed.By(factor).As("Second").WhenFound(f => f.First / f.Second)
    .Or.By(factor);

factor.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable]);

As the final thing for today, let’s add support for parenthesis. They go in the factor rule. So, a factor can be an entire expression wrapped in parenthesis. It’s almost a thing of magic. We wire the entire last rule back up to the start.

factor.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable])
    .Or.By("(").Followed.By(expression).As("Expression").Followed.By(")").WhenFound(f => f.Expression);

This is where we’ll end for this part. We have a fully functional expression parser. You can use expressions for all the commands that previously only took integer values, and you can assign values to variables. A few things for the adventurous to try:

Add variable assignment. Make set $foo = 1+3*$bar a legal, working piece of code
Add support for the modulus operator
Add support for unary minus. var $foo = -$barM work. This is a bit tricky, and don’t be afraid if you get some scary exceptions from Piglet

Next time, we’ll make the DSL a bit more fun. We’ll add some basic flow control! Code on github is updated with the latest version

Tagged c++, dsl, parsing, piglet

Oct 24 2012

2 Comments

Parsing, Piglet

Using Piglet to create a DSL – Setting the stage for turtles

Piglet has quite a few uses, and a case where it shines in particular is in the area of domain specific languages. So, in order to compliment the admittedly scarce tutorials on how to actually use Piglet, I intend to write a short little series on how to make a domain specific language using Piglet. Let’s get straight to it.

Enter the Turtle

I’m going to go for a classic educational DSL here, Turtle graphics. Because everyone like turtles, right? Basically, it’s a way of drawing graphics that could be imagined as if you had a (very obedient) turtle with a paintbrush attached to the back of it’s shell. Masterfully trained, it responds to three commands:

Put the brush up or down
Move forward or backward a certain number of steps
Rotate a certain number of degrees

I have made a little WPF application that implements this behaviour. Eschewing all the WPF junk, here’s the turtle itself:

    public class Turtle
    {
        private readonly Canvas canvas;
        private double angle;
        private double x;
        private double y;

        public Turtle(Canvas canvas)
        {
            this.canvas = canvas;
            x = canvas.ActualWidth/2;
            y = canvas.ActualHeight/2;
        }

        public bool PenDown { get; set; }

        public void Move(double distance)
        {
            var oldX = x;
            var oldY = y;

            double rads = angle/360.0*Math.PI*2.0;
            x += Math.Sin(rads)*distance;
            y += Math.Cos(rads)*distance;

            if (PenDown)
            {
                canvas.Children.Add(new Line { X1 = oldX, Y1 = oldY, X2 = x, Y2 = y, Stroke = new SolidColorBrush(Color.FromRgb(0, 0, 0))});
            }
        }

        public void Rotate(float a)
        {
            angle += a;
        }
    }

Nothing fancy, unless you’ve forgotten all about basic trigonometry. It takes a canvas and when moving it leaves trails if the pen is set to down. Using the turtle is straightforward. This program draws a twenty-sided polygon:

var turtle = new Turtle(canvas);          
turtle.Rotate(90);

turtle.PenDown = true;

for (int i = 0; i < 20; ++i)
{
   turtle.Rotate(360/20);
   turtle.Move(20);
}

It’s fun drawing with the turtle, but cumbersome since you’d have to recompile every time you want to change the shape. What we would really want is to describe it in a structured way using text. Sort of like a mini-programming language. LOGO would be the classic for turtle programming, but I feel it would be more fun to make our own little language for this.

Turtleese

When making a DSL or full blown language, in any tool, it’s usually easiest to start small and work your way up. To get the ball rolling, or turtle walking, let’s make a language that contains only the basic commands understood by the turtle. A sample piece of this rudimentary turtleese looks like this:

pendown
move 50
rotate 90
move 50
rotate 90
move 50
rotate 90
move 50
penup
move -10
rotate 90
move 10
pendown
move 30
rotate 90
move 30
rotate 90
move 30
rotate 90
move 30

This program should make the turtle draw two concentric squares, 10 units apart. It’s really verbose, but it’s a start that we can improve on. If we look at the structure of the program, we can see that this is made up of list of statements. This is the key to formulate a grammar for parsing the language. Piglet gives us two options for configuring a parser, a fluent and a more technical interface. I’ll use the fluent interface for this series, maybe in the end provide an equivalent grammar in the technical interface. In the end, it’ll serve to deepen the knowledge of how grammars actually work, though you should have a pretty good idea when we actually get there.

Here’s the entire thing:

var turtle = new Turtle(canvas1);

var configurator = ParserFactory.Fluent();
var statementList = configurator.Rule();
var statement = configurator.Rule();

statementList.IsMadeUp.ByListOf(statement);
statement.IsMadeUp.By("pendown").WhenFound(f =>
    {
        turtle.PenDown = true;
        return null;
    })
    .Or.By("penup").WhenFound(f =>
    {
        turtle.PenDown = false;
        return null;
    })
    .Or.By("move").Followed.By<int>().As("Distance").WhenFound(f =>
    {
        turtle.Move(f.Distance);
        return null;
    })
    .Or.By("rotate").Followed.By<int>().As("Angle").WhenFound(f =>
    {
        turtle.Rotate(f.Angle);
        return null;
    });
var parser = configurator.CreateParser();
parser.Parse(code.Text);

Going through it line by line. First we make a configurator object. This is the object that we’ll use to make the parser. Then we make two rules, statementList and statement. A thing of huge importance here. The order of declaring these is vitally important. You see, if you’d make this in the wrong order it would just try to find a single statement. A parser constructed by Piglet must be able to parse everything you give it down to a single rule. For now, the single rule is the statement list. Usually it’s something like “program” for a programming language or translation unit or something to that effect.

Moving on, we declare that a statementList unsurprisingly is made up by a list of statement. This is a reference to another rule.

A statement is declared as one of the four commands that the turtle understands. Following each possibility there is a lambda that gets executed when the parser has recognized that a rule is to be applied. For now we make the turtle do the same thing as the command says. There is a need to return something from the parse function, we ignore that for now and return null. Later on we’ll revisit this and find out why you want to return things from rules.

The move and rotate are interesting, since they have a parameter which is an integer value. We’ll need to find out the value of this parameter. In Piglet, this requires you to give the parameter a name. This name then becomes accessible on the dynamic object that is the input to the WhenFound function. Integers are simple types, so Piglet has a built-in function for recognizing them.

Calling parser.Parse with the code causes to the turtle to do what you wanted it to. The program also gives some helpful hints for when you’ve confused our poor reptilian friend, you get this functionality for free when using Piglet.

The full source code for this tutorial is found on GitHub: TurtlePig. Feel free to fork and mess about with it. The same repo will be updated once this tutorial continues, so you might find a slightly different version. I’ll figure a way to keep all versions accessible.

Next time, we’ll take a look at making this language a lot more capable – step by step.

Tagged c++, dsl, parsing, piglet

May 17 2012

Leave a comment

Piglet

Piglet 1.3.0 is out, now with unicode support!

Piglet has been updated today, with version 1.3.0. Here’s a list of the most exiting changes:

Unicode support

You can now use the full character range available in your regular expressions and parser tokens. This means that the parser will correctly be able to lex things such as Arabic, Chinese and Japanese. When using regular expressions for these, all the normal rules apply but the characters will not be included in any of the shorthand notation. For instance, the traditional Japanese numeral Kanjis are not part of the \d construct.

Nothing in existing code needs to be altered to enable this support. The runtime of the lexer has been slightly altered and is very slightly slower, but it should not even be noticeable.

Choice of lexer runtime

The most costly thing by far in the parser and lexer construction algorithms is the lexer table compression. Though this has been alleviated somewhat by the unicode functionality which actually served to reduce the size of the lexing tables, it can still be quite expensive.

If a faster construction time but a slower lexer is desired, you now have other options. When constructing, set the LexerRuntime in the LexerSettings variable of your ParserConfigurator. Or if constructing just a lexer with no accompanying parser, set the LexerRuntime property.

The available values are:

Tabular. Tabular is the slowest to construct but the fastest to run. Lexers built this way will use an internal table to perform lookups. Use this method if you only construct your lexer once and reuse it continually or parsing very large texts. Time complexity is O(1) – regardless of input size or grammar size. Memory usage is constant, but might incur a larger memory use than other methods for small grammars.
Nfa. Nfa means that the lexer will run as a non-finite automata. This method of constructing is VERY fast but slower to run. Also, the lexing performance is not linear and will vary based on the configuration of your grammar. Initially uses less memory than a tabular approach, but might increase memory usage as the lexing proceeds.
Dfa. Runs the lexing algorithm as a deterministic finite automata. This method of construction is slower than NFA, but faster than Tabular. It runs in finite memory and has a complexity of O(1) but a slower run time and more memory usage than a tabular lexer. It provides a middle road between the two other options.

The tabular option is still the default option.

Hope you find it useful, and please report and bugs or problems either directly to me or file an issue on github.

Apr 24 2012

Leave a comment

Piglet

Piglet 1.2.2 released

Well, honestly, 1.2.1 was released too but I forgot to add one of the features that I was going to add to it, so there was two releases today.

Either way, here’s whats new:

Support for escaping inside character classes

You can now use escaped characters inside the character classes. That is, regular expressions like [\d\s] will now result in the correct expected expression. This also enables you to escape the – and ^ sign inside the character class which previously was not possible.

Support for ignoring expressions in character classes

For those purposes where you want the lexer to silently drop things instead of generating tokens. Use configurator.Ignore(expression) to ignore it. For instance, this expression ignores a C# style comment: configurator.Ignore(@”//[^\n]*\n”);

As always, please do report any issues or questions, I’m happy to take a look at it!

Tagged piglet, release

Apr 19 2012

Leave a comment

Piglet

Piglet 1.2.0 released

A new version of Piglet – the fluent parser generator is now out. Here’s a bit of what has been fixed, and how you use those new features in your project.

Added error recovery and reporting to fluent parser configuration

Piglet can now recover from errors and continue parsing in a controlled manner when using the fluent configuration. This is done by the very special Error token. What you do is that when defining a rule, instead of the token you want to use, you use the Error token. In case of an error this token will match instead and all subsequent tokens will be skipped until the next legal token is matched.

When a rule containing the Error token has been found, you will find the Error variable filled in in the corresponding WhenFound clause.

Added token precedence to fluent configuration

You can now specify precedence groups for tokens using the LeftAssociative, RightAssociative and NonAssociative methods found in the fluent configurator object.

Pass one of these methods a group of tokens to declare them to be of equal precedence. Literals or expressions are legal input.

Completed XML documentation to include every method

No more empty intellisense! There should be somewhat helpful text everywhere now.

Fixed bug with escaped ?

\? was not handled correctly when used as an expression. It is now.

Fixed bug with numbered repetition

Numbered repetition in regular expressions now supports the standard {x,y} syntax in addition to the {x:y} syntax that I apparently implemented originally. The old syntax still works.

Also now, the {x,} syntax is supported, which Piglet didn’t support before.

Fixed bug with possible wrong order of defined expressions for fluent configuration

The terminal wasn’t immediately created when a fluent expression was declared. It now becomes created at the moment when the AndReturns function is called. Note that expressions now MUST call the AndReturns function. An incompletely defined expression is no longer legal tender in the piglet world.

Automated the NuGet package management

Thanks to @orjansjoholm the NuGet packages are now built with psake and automated. I could not have done this myself, and it works a treat!

Other news

Check out this thing: A DCPU16 emulator and assembler. Uses Piglet as its parser for the assembly!

Tagged new release, piglet

Apr 01 2012

Leave a comment

Piglet, Tools

New versions of Piglet and Regexvisualizer are out

It’s been a busy few weeks now, but I finally managed to get some OSS work done again. Here’s an overview of the new stuff

Piglet 1.1.0

Piglet has been updated with a few new features.

DFAs generated by piglet can now be minimized, and are minimized by default. This is in essence an optimization that will reduce Piglets memory footprint and make it run faster. As alwyas, this is a tradeoff and you can if you really want to turn off the minimization. The result will be a larger memory footprint but a faster parser generation time. The correctness of the thing remains the same.
Ability to get the graph string for the lexer has been added to the public API. Really a feature coming from the regex visualizer, it has been improved with a better API for access from outside the piglet internals. This has also been enhanced with new features, outlined below in the updates to the regex visualizer.
Changed test framework to NUnit. This decision was made since there had been some issues with MsTest, mainly that it screwed up building on a machine which did not have Visual Studio installed. This was not an acceptable situation so we decided to change it.

All in all, this release should be fully backwards compatible with earlier versions of Piglet. If you are using it and finding any problems, please do let me know. Piglet is, as always, available on NuGet and on GitHub

Regex visualizer

The regular expression visualizer has been given a new feature, making it really useful for debugging. It will now highlight given an input string which are the active states in both the NFA and the DFA. And if your expression isn’t matched, it will show you what part of your expression is matched and the last state where the machine was in a legal state. All in all this is turning out to be a pretty useful tool. I hope to do some work on the rendering soon, to avoid the pretty annoying flickering – going to look into canviz to turn the rendering into client-side instead of torturing google with constant requests.

Tagged piglet, regular expressions

Mar 13 2012

4 Comments

Parsing, Piglet

Introducing Piglet

This is an introductory post for my latest creation, Piglet the friendly parsing and lexing tool. As so much software Piglet is written out of both a sense of satisfaction in solving a difficult problem, and seeing a void where my piece of software might be useful to someone. Piglet is open sourced using the MIT license. This post is intended to compliment the readme on github to form a better picture of why I have written this library and how it can be used.

Why another parser generator?!

Because the other parser generators are too big for easy tasks, and harder to integrate into a project. They are also highly technical and hard to understand for someone who isn’t genuinely interested in old-school comp.sci.

The purpose is for you to have a tool at hand for parsing tasks that are smaller than full blown language construction efforts. Piglet tries to bridge the gap between (ab)using regular expressions and going in guns blazing with a large parsing toolset. It is not going to replace ANTLR or any of those tools, though you certainly can parse larger grammars with Piglet. This is the tool to use if you want a low-treshold, easy to use tool for any context free data.

Code based configuration

Piglet is, in sharp contrast to most other generators configurable in code. You create your parser and use it as naturally as you configure any other object. It has about the same functionality as the yacc/bison family of tools and generates similar parsers. This is an important point because if your parser is configured using a separate input file or even a completely separate tool you are always going to have a distance between the running code and the parser. The parser generators also sometimes generate fairly incomprehensible code. If you have a small parsing task, are you really going to use a generator or are you going to roll your own parser?

There are some tradeoffs in this strategy. Obviously the parser generation is going to be done at runtimes. However, the parsers are reentrant to the construction should only need to be done once. It also enables you to be able to use lambdas as actions when rules are followed, which also means that Piglet can construct type-safe parsers!

Ease of use

Unfortunately parsing is a bit complex and it’s usually required to know a bit about context free grammars to be able to construct a parser. Piglet tries to get around this problem by introducing a fluent configuration interface.

It’s easier to show it straight up. This is a full blown JSON parser. The only thing missing here is really only details such as the ability to write 0x notation for hexadecimal and exponent notation for numbers. There rest is all there.

// Create a configuration object. This object generates all the configuration needed for the parser.
var config = ParserFactory.Fluent();

// Create rules. The first rule is the MAIN rule, the rule that everything must be able
// to condense down to. For JSON this is a single object, since a JSON string is always
// just one object.
var jsonObject = config.Rule();

// Used to represent an element of an object
// which is something like "elementname" : some_value
var jsonElement = config.Rule();

// This represents a value of a json element
var jsonValue = config.Rule();

// This represents an array of values
var jsonArray = config.Rule();

// Now we declare what a jsonObject is made up of
// Literals are found in quotes. Interesting parts that we are interested 
// in are named using the As clause, which makes them accessible in the 
// .WhenFound. The result of WhenFound is returned to the caller.
jsonObject.IsMadeUp.By("{")
		  .Followed.ByListOf<JsonElement>(jsonElement).As("ElementList").ThatIs.SeparatedBy(",").Optional
		  .Followed.By("}")
	.WhenFound( o => new JsonObject { Elements = o.ElementList } );

// Declares what an element is made up of. Note that the rule above uses the
// jsonElement before what it is made of is declared. This is a crucial part
// of parsing. This bit has two interesting named pieces "Name" and "Value".
// Since each bit has a value, this gets assigned to the o parameter to the WhenFound lambda.
jsonElement.IsMadeUp.By(config.QuotedString).As("Name")
		   .Followed.By(":")
		   .Followed.By(jsonValue).As("Value")
	.WhenFound( o => new JsonElement { Name = o.Name, Value = o.Value } );

// A jsonValue has several interpretations, separated by Or clauses.
// Predefined parsing is found for simple types. Note the recursive use
// of jsonObject, values can be full objects themselves! There is no need
// for .WhenFound clauses for single part rules, they
// will always return the value of the single part.
jsonValue.IsMadeUp.By(config.QuotedString)
	.Or.By<int>()
	.Or.By<double>()
	.Or.By(jsonObject)
	.Or.By(jsonArray)
	.Or.By<bool>()
	.Or.By("null").WhenFound(o => null); // Need to specify, since the default will return the string "null"

// This rule could have been merged into the jsonValue rule, as its 
// own .Or.By clause. It's separated for readability only.
jsonArray.IsMadeUp.By("[")
		 .Followed.ByListOf(jsonValue).As("Values").ThatIs.SeparatedBy(",").Optional
		 .Followed.By("]")
	   .WhenFound(o => o.Values);

// When the configuratin is completed - create the parser!
// If the parser creation is unsuccessful, this throws the
// friendliest exception possible to help you find the issue.
var parser = config.CreateParser();

// Here is how you use it!
var jObject = (JsonObject)parser.Parse(
	@"{ 
		 ""Property1"":""va\""lue"", 
		 ""IntegerProperty"" : 1234, 
		 ""array"":[1,2,3,4,5],
		 ""another_object"" : {
			""another_property"":13.37
		 },
		 ""empty_object"" : {
			
		 }
	}");

There is also a less verbose technical interface, which right now includes more functions than the fluent one does. In the interest of brevity in this post I’m not going to describe it in detail. The features that currently only exists in the tech interface are context dependent precedence, token precedence, panic mode error recovery and type specific parser generation (the fluent parsers will all generate an object as an end result). These should all be upcoming features in further versions for the fluent interface.

The fluent configuration interface is right now in a working state, but can certainly use more functions! Common tasks should be easy to accomplish while harder tasks may require you to fall back on the technical configuration. As an exciting preview of things to come, I’m currently working on an embedded domain specific language generator (made using Piglet) called Snout, which when finished should help the maintainability of the fluent interface.

Piglet is totally dependency free, and written in C# 4. The code is well commented and understandable. If you are looking for code that reasonably explains the parser generation algorithm, Piglet is a sensible choice.

In technical terms, Piglet generates LALR(1) parsers – mainly because it’s an easier way of expressing your grammar since you do not require left factoring which is a pretty difficult concept to explain.

More examples

I’ve written a Demo project that is available in the main github repository which contains a few demos. The unit tests for Piglet are also quite extensive and are suitable for learning. I’m hoping to find more time to write a detailed tutorial on how to go from a given format into a complete parser. Suggestions for tutorials and articles are most welcome.

Getting your hands on it

Piglet lives on GitHub if you want the source code and demos. It also lives on NuGet if you just want the binaries.

Contributing

You are most welcome, regardless of skill level. There’s lots of stuff to do! Or just fork it and go to town on your own. I would be flattered if you did.

And if you can’t program, Piglet could use a logo. I’m thinking a piglet would make an appropriate one 🙂

Tagged .NET, c++, open source, piglet

Binary sculpting

Making software, one bit at a time

Category Archives: Piglet

Piglet 1.4 released

Performance

Useability

Thread safety

Bug fixes

Building a DSL with Piglet – Extending our DSL

Using Piglet to create a DSL – Setting the stage for turtles

Enter the Turtle

Turtleese

Piglet 1.3.0 is out, now with unicode support!

Unicode support

Choice of lexer runtime

Piglet 1.2.2 released

Piglet 1.2.0 released

Added error recovery and reporting to fluent parser configuration

Added token precedence to fluent configuration

Completed XML documentation to include every method

Fixed bug with escaped ?

Fixed bug with numbered repetition

Fixed bug with possible wrong order of defined expressions for fluent configuration

Automated the NuGet package management

Other news

New versions of Piglet and Regexvisualizer are out

Piglet 1.1.0

Regex visualizer

Introducing Piglet

Why another parser generator?!

Code based configuration

Ease of use

More examples

Getting your hands on it

Contributing