Tuesday, October 9, 2012

Transpilation

CoffeeScript, TypeScript, and Google's Dart are languages which compile to JavaScript. How this works is not something I'd speculated on before, but Headspring's Patrick Lioi illuminated the process and made it seem less like magic in a talk I saw yesterday at the Austin .NET User's Group. Patrick's example was a language he wrote himself called Rook which compiles to C#. I had assumed this sort of thing outside of my grasp. As it turns out, this is something I could do myself with a whole lot of work and patience, although I can't imagine I have the tenacity Mr. Lioi exhibited for the plumbing required.

Your own invented syntax typed into notepad can serve as actual syntax, providing instructions to a compiler for how it should craft code in another language, in lieu of merely being a string of meaningless, seemingly random characters. There are four things a compiler must do:

  1. Lexing:
    The lexer arm of a compiler takes in a string and breaks it into a collection of the smallest possible string pieces in which every chuck has meaning. Regular expressions are going to do the heavy lifting here to find and make sense of things. An example line of Rook code is Print(evens.First()) and the lexer is going to have to be smart enough to separate this into Print(), evens, and .First(). One just has to slog through writing the rules. Naturally, the order one tries to make matches in code matters. One must try to find != before either ! or = or it will never be found. Things to remember about regular expressions, straight from one of Patrick's slides, are:
    • a* signifies zero or more matches for a
    • a+ signifies one or more matches for a
    • a|b|c allows for either one a, one b, or one c
    • abc signifies a followed by b followed by c
    • (a|b|c)*def is thus any number of the first three lowercase letters of the alphabet
                                        ...followed by d followed by e followed by f
     
  2. Parsing:
    The lexed content is then placed into a tree by a parser. Irony and ANTLR are examples of tools that do this. Naturally, you can write your own too and Patrick has written one called Parsley which he uses to parse Rook syntax. Everything gets separated into functions (operations are a type of function with a funny name) and keywords. Prioritizations will determine where in a tree a given function sits as a node and which might come before something else. For Rook, for example, the multiplication of two and three in 1 + 2 * 3 + 4 will be given priority (if no parenthesis are used to break up the numbers and suggest priority) because Parsley slates such as a rule. What lines of code comes first, second, and last obviously also biases where lexed items end up in the tree the parser makes.

     
  3. Type Checking:
    Here one assigns a type to each node of a language-to-be-compiled of the language-to-be-compiled-to. Patrick suggested looking into Hindley-Milner Type Inference (Google it!) for more on this. The compilation should fail in this step if the code handed in is just bad. The compiler safety stuff gets baked in here. Half the talk was on this stuff and some of the tricky challenges. Does the + operator concatenate two strings or add two integers together? You cannot tell from the operator itself. You have to look to its nodes.
     
  4. Transpilation:
    This is the act of walking an annotated tree and writing copy in the target programming language based upon what is found. In this step, a compiler would walk an annotated tree of Rook syntax (for example) and write C# (for example) elsewhere as it went. For this to work, we have to be able to ask every node "What is your type?" and get back an answer that makes sense in C#. That said, we should be there on the other side of the prior three steps.
     

At this event Zackary Geers mentioned that Austin's Microsoft Office should host the Austin .NET Users Group from this event going forward.

No comments:

Post a Comment