Surprisingly Difficult to Parse a Character

by Justin Chase 9. January 2010 05:52

I’ve been working on a custom variant of an OMeta parser for a couple weeks now. It’s coming along pretty well, I think I’ve overcome most of the major hurdles and I’m just trying to go through what I currently have, clean it up and get it to solve some of the edge cases that I need.

Just now I was working on the grammar for parsing a character and realized how hard it really is. It sounds trivial, afterall it’s just two single quotes and a character right? Wrong. Here’s my current grammar:

CharacterLiteralToken
    = '\'' '\\' 'u' Hex#4 '\''
    | '\'' '\\' 'U' Hex#8 '\''
    | '\'' '\\' 'x' Hex Hex? Hex? Hex? '\''
    | '\'' '\\' ('\'' | '\"' | '\\' | '0' | 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v') '\''
    | '\'' '\u0000'..'\uffff' '\'';

It turns out that you have to be sure to account for a multitude of escape characters as well as escaped Unicode literals. I didn’t want to have to implement this, but you can see the last rule which just matches every character under the sun needed it.

This will match:

  • '\u0000'
  • '\U00000000'
  • '\x0', '\x00', '\x000', '\x0000'
  • '\'', '\"', '\\', '\0', '\a', '\b', '\f', '\n', '\r', '\t', '\v'
  • 'a' …

Next I get to do the string parser… that should be even more interesting.

Currently rated 4.0 by 1 people

  • Currently 4/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

DSL | MetaSharp | languages

Pattern Calculus: Computing with Functions and Structures

by Justin Chase 16. December 2009 07:13

I haven’t read this book yet but I am putting it to the top of my queue after reading the first couple parts of chapter 1:

http://books.google.com/books?id=Q_J4Lnmfjx4C&lpg=PP1&dq=pattern%20calculus&pg=PP1#v=onepage&q=&f=false

 

Here is the abstract:

This book develops a new programming style, based on pattern matching, from pure calculus to typed calculus to programming language. It can be viewed as a sober technical development whose worth will be assessed in time by the programming community. However, it actually makes a far grander claim, that the pattern-matching style subsumes the other main styles within it. This is possible because it is the first to fully resolve the tension between functions and data structures that has limited expressive power till now. This introduction lays out the general argument, and then surveys the contents of the book, at the level of the parts, chapters and results.

*emphasis mine

Based on my understanding of the initial claims it describes the core concepts that OMeta is also based on. I’m not sure of Alessandro Warth has read this book or if this is parallel research but they seem to both come to the same conclusions. It never ceases to amaze me how concepts that seem so new and fresh to me have almost always already been written about in a 200 page book.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Software Development | languages

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

Justin Chase

sweetest hat ever

I am a software developer from St. Paul MN and I work for Microsoft on the Expression team. This blog is about various technical topics I find myself encountering here and there. In addition to loving WPF and Xaml and Expression studio in general I have a special interest in DSLs, programming languages and games.

RecentComments

Comment RSS

Calendar

<<  March 2010  >>
MoTuWeThFrSaSu
22232425262728
1234567
891011121314
15161718192021
22232425262728
2930311234

View posts in large calendar