ANTLR and C#

I’ve just spent the day getting ANTLR set up to work within a Visual Studio 2010 solution, which wasn’t without pain. While there is quite a lot of (mostly useful) documentation, finding exactly what you want can be difficult, and the examples don’t necessarily work, which is always frustrating! Hopefully this will help someone else along the way. It’s obviously written from my perspective and I’ll be making no attempt to generalise it to other situations, but there’s naturally going to be crossover with other stuff.

To get started, the following sources proved helpful:

The details in this post simply extend what is available in those documents.

First, let’s get Visual Studio 2010 set up ready to work. The simplest way to get the ANTLR VS extensions is to open up the Extension Manager (Tools -> Extension Manager) and search the online gallery for the following packages:

  • ANTLR Language Support
  • StringTemplate 4 Language Support

Install them both, and you’ll notice that the Tunnel Vision Labs Extensibility Framework extension will also be installed automatically. You can if you prefer download the extensions directly (links taken from (2)):

However, note that if you download from the above links, not only are those versions out of date, if you try to update them using the Extension Manager, it will throw an error saying that the VS Extensibility Framework is already installed, and it doesn’t seem to be able to handle updating that.

With these installed, you’ll get syntax highlighting, dropdowns of parser and lexer rules where your class and method dropdowns usually are respectively, and a bunch of templates for creating gammars, which you can find easily by typing “ANTLR” in the search filter when adding new items.

So for us to get started, you’ll need to create yourself a new console application project, call it AntlrTest if you like.

Now you’ll need the C# port of ANTLR. At the time of writing, the latest version could be downloaded from http://www.tunnelvisionlabs.com/downloads/antlr/antlr-dotnet-csharp3bootstrap-3.3.1.7705.7z, which contains just the files you’ll need to generate C# code from ANTLR. (To make sure you’ve got the latest version, you can go to http://www.antlr.org/wiki/display/ANTLR3/Antlr3CSharpReleases and look for the bootstrap version to download.) Once you’ve downloaded the archive, extract it to a suitable place within your solution – for me, I tend to use a lib folder for references, so I extracted everything to .\lib\Antlr

To integrate the code generation into your build, you’ll need to edit your project file manually. Unload your VS project (right click on project, select Unload Project) and open it for editing (right click on the project file, click Edit AntlrTest.csproj, for example). Note: you can save yourself a click here by using the PowerCommands (see here). Now look for the following line:

<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />

and below it, paste the following code:

<PropertyGroup>
  <AntlrBuildTaskPath>$(ProjectDir)..\lib\Antlr</AntlrBuildTaskPath>
  <AntlrToolPath>$(ProjectDir)..\lib\Antlr\Antlr3.exe</AntlrToolPath>
</PropertyGroup>
<Import Project="$(ProjectDir)..\lib\Antlr\Antlr3.targets" />

Obviously if you have your ANTLR files in a different directory to me, you’ll need to change the above paths accordingly. You can now go ahead and reload the project (right click, Reload Project) and if everything’s going smoothly the project should reload, without any noticeable changes. You’ll also need to add a reference to the ANTLR runtime assembly, which is located at .\lib\Antlr\Antlr3.Runtime.dll.

At this point, your project is ready to generate code using ANTLR and link it into compiled assemblies – the well written Antlr3.targets file takes care of all that. However, if you want to write any C# code within VS, rather than inside ANTLR grammar files, and have IntelliSense working, you’ll need to go through a bit more effort at this point. The trouble is that the bundled targets file generates the C# code to the intermediate obj directory during the build process. We want to have those generated files within our solution, but we don’t want to be pointing to them within a specific build configuration folder, e.g. obj\x86\Debug, because that can lead to all kinds of problems when we change configurations etc. So, we need to generate the files to a known location regardless of build configuration, and link to them there. Also, we need to stop the files being included automatically by the ANTLR target, since they’ll already be included in our solution and we don’t want to link to them twice. Luckily, this has all been done for you by a certain G. Richard Bellamy, who has updated the Antlr3.targets file for this purpose. Go to http://youtrack.jetbrains.net/issue/RSRP-265402, download the Antlr3.targets attachment (found at the top of the page) and drop it into your Antlr folder (.\lib\Antlr), replacing the existing targets file. Now, whenever you build the project, the generated C# code from the ANTLR grammar file will be put in a folder called Generated in the same location as the grammar file. If you want to change the name of this folder, simply open the Antlr3.targets file and replace any instances of .\\Generated with your preferred folder (note: it’s written more than once, because the folder is specified as the output location for generated files but is also mentioned later on so that the redundant .tokens files can be cleared from the generated folder).

Great, so let’s see this all in action. There’s a sample grammar shown in (1) which I’ve taken for this example with a bit of modification. Start by adding a grammar file by adding a new item and choosing ANTLR Combined Parser. Call it SimpleCalc.g3 (g3 means grammar 3, as we’re dealing with ANTLR v3 grammars here) and click Add. A combined grammar specifies rules for lexing and parsing an input. Therefore you’ll notice three files are created for you:

  • SimpleCalc.g3 – the ANTLR grammar file itself
  • SimpleCalc.g3.lexer.cs – a partial class for adding functionality to the lexer
  • SimpleCalc.g3.parser.cs – a partial class for adding functionality to the parser

The partial classes are useful if you want to extend your generated classes without having to specify C# code in your grammar file, which obviously offers a whole host of benefits. One other thing to note is that if you look at the properties of SimpleCalc.g3, you’ll notice that the build action is set to Antlr3 and the custom tool set to MSBuild:Compile. The extension kindly deals with setting these properties for you, but if you ever added files manually you’d need to do that yourself. These settings mean that the Antlr3 target is called whenever the file is built (so generated files are created), and the custom tool setting ensures that whenever you save the file, due to the carefully crafted targets file, the C# code will be re-generated and IntelliSense will be updated, which is rather nifty.

Open up SimpleCalc.g3, and you’ll see some skeleton code that the extension creates for you. For now, we’ll just overwrite this completely with our own example:

grammar SimpleCalc;

options {
    language=CSharp3;
}

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}

@lexer::namespace { AntlrTest }
@parser::namespace { AntlrTest }

/*
 * Parsing rules
 */

public expr	: term ( ( PLUS | MINUS )  term )* ;

term	: factor ( ( MULT | DIV ) factor )* ;

factor	: NUMBER ;

/*
 * Lexing rules
 */

NUMBER	: (DIGIT)+ ;

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	{ $channel = Hidden; } ;

fragment DIGIT	: '0'..'9' ;

Copy and paste that into your grammar file. One gotcha to note: blogs/wikis sometimes use a different Unicode character for whitespace, such that when you copy this into the Unicode-compliant VS, it will preserve those special characters without you knowing. Trouble is, ANTLR can’t handle them, so when you try to build your project you’ll end up with an error similar to error(150): grammar file AntlrTest.g has no rules. To avoid this, it might be a good idea to copy the example into something that doesn’t preserve Unicode, e.g. Notepad, and then copy again from there. This caught me out when I first copied the example from (1), but now I can’t reproduce it!

It’s also worth noting at this point that:

  1. the grammar name has to be the same as the filename it’s stored in: if you change one, you must change the other, and
  2. the ordering of elements within the grammar file can be quite important: when ANTLR is parsing a grammar it can be picky about where elements are located.

Having copied over that grammar, let’s look at the differences between that and the example shown at (1), and why I’ve made them.

  1. language changed to CSharp3 – the CSharp2 target is outdated
  2. Added in namespaces for the lexer and parser – these are required for the partial classes we have to be able to combine with the partial generated classes (these are added in automatically for you by the extension)
  3. Removed the @members { ... } section – this will be explained further down the page…
  4. …as will the reason for adding the public modifier to the expr parser rule
  5. Change of case in $channel = Hidden; – the constant Hidden simply has a different name in C# to align with conventions (its value is 99 so you can also use that instead)

At this point, build your project. Hopefully it will finish successfully, and you can now add those generated files into your project so we can see them. Add an existing item to your project, and look in the Generated folder (that we talked about earlier), where you should see two files: SimpleCalcLexer.cs and SimpleCalcParser.cs – add these as links to your project. Obviously it’s important you add as links otherwise you won’t pick up any changes when the files are regenerated. Having now got these files into our solution, you can have a poke around and see what ANTLR is doing for you; there’s more documentation on the ANTLR website about what exactly is generated, so for now I’ll move on.

Back to that @members { ... } I kicked out. While you can write C# code within the grammar file, which will then be included within the generated class, it’s much preferable to do it within a proper separated class file (which is why our helper partial files exist). Case in point: if you actually copy and try to build that example, you’ll notice it will fail because the generated file is missing a vital using System; for Console, which is a mistake you obviously wouldn’t make if you were editing within a proper class file (note: you can also fix this issue by adding @headers { using System; } to the top of your grammar file). Therefore, I took the code from that section, modified it and put it into the Program.cs file that was added when we created the project:

using System;
using Antlr.Runtime;

namespace AntlrTest
{
    class Program
    {
        public static void Main(string[] args)
        {
            const string input = "1+4/2";
            var lex = new SimpleCalcLexer(new ANTLRStringStream(input));
            var tokens = new CommonTokenStream(lex);
            var parser = new SimpleCalcParser(tokens);

            parser.expr();
        }
    }
}

You’ll notice I removed the try ... catch block that’s used in the example from (1). This example seems to be out of date since I noticed that RecognitionExceptions are now caught within the generated code, so they’ll never be thrown to be caught at this level. The errors are supposed to be outputted by the BaseRecognizer, but it looks like a TextWriter property needs to be set and presumably isn’t being – I didn’t have time to work out why. It’s easy for now to get errors printed on the console by making use of our partial classes (handy example), by adding the following to both the lexer and the parser:

public override void EmitErrorMessage(string msg)
{
    Console.WriteLine(msg);
}

Copy all that code into the relevant places and build your project. I’ve changed the input to the lexer to be an ANTLRStringStream so we can easily play with it (it’s a file reader in the example from (1)). If everything’s gone to plan the build will be successful and you should be able to run the program… and nothing will happen. That means success! Why? Because all we’re doing here is attempting to parse the input, nothing more. Because our input is valid, nothing will be printed to the console, and so the program will terminate with seemingly nothing happening. Try replacing the input with something nonsensical (e.g. GEORGE) and watch as the parser spews out errors. Beautiful.

Now I’ll clarify why I added that public keyword to the expr parser rule earlier on. The CSharp3 target introduced the concept of modifiers on parser rules. By default, all generated parser rule methods are set to private. If we’d left the code in that member section in the grammar file, we wouldn’t have needed to change the modifier, since the method would have been copied to the generated file and therefore been able to access the private method. However, since we’re attempting to access it from outside that class, in our separate Program class, we need to make it public. It would possibly have been a better idea to add a method to the partial parser class that allowed us to access that private rule, but I left it as it is as a nice example of how and why the modifiers might be used.

Phew. Now you should have an idea about how to get started using ANTLR grammars within C#. The above example doesn’t output anything (you’ll notice the return type of the expr() method is void), so it’s not very useful, but by adding the line output=AST; to the options section of the grammar file (as shown in the skeleton grammar files generated by the extension), the parsing rules will return a result object describing the parsed expression that you can play with. I think I’ve written enough for one day though.

5 Comments

  1. when i insert the it says the element PropertyGroup has invalid child element AntlrBuildTaskPath and when i try to reload it fails…
    Thanks for the replies

    Reply

  2. I’m really glad this post has helped other people out, and is still getting views! Thanks Mark for adding more details. Unfortunately I’ve no idea if what I described would work in VS2005, is there any reason you’re tied to that version Dao? It is pretty old now after all 😉 Also, I think WordPress deleted some fairly vital element names from your error message which makes diagnosing it hard.

    Reply

  3. Does this work in VS 2005? I got error when following the steps. The error says:

    “Unable to read the project file ‘antlrTest1.csproj’.

    F:\antlr\antlrTest1\Reference\Antlr3.targets(81,3): The element beneath element is unrecognized.”

    Thanks for any comment!

    Reply

  4. That was very, very useful. For anyone reading this, they should note that the RSRP-265402 ReSharper IntelliSense ANTLR support thread mentioned above has been updated and Sam Harwell has created a new Antlr.Target file since this post, with substantial differences from the G. Richard Bellamy version. I am starting my development using the G. Richard Bellamy version and using both Anltr and Resharper. All seems okay so far. Note that the G. Richard Bellamy verison of Antlr.target does not include the settings for AbstractGrammarFiles . I don’t know enough yet to make any sensible comments on which file to use, or whether you need to merge these files, but I’ll dig around to find out and post another comment if I find out more.

    Reply

Something to add?

This site uses Akismet to reduce spam. Learn how your comment data is processed.