Surprisingly Difficult to Parse a Character

by Justin Chase 9. January 2010 05:52

I’ve been working on a custom variant of an OMeta parser for a couple weeks now. It’s coming along pretty well, I think I’ve overcome most of the major hurdles and I’m just trying to go through what I currently have, clean it up and get it to solve some of the edge cases that I need.

Just now I was working on the grammar for parsing a character and realized how hard it really is. It sounds trivial, afterall it’s just two single quotes and a character right? Wrong. Here’s my current grammar:

CharacterLiteralToken
    = '\'' '\\' 'u' Hex#4 '\''
    | '\'' '\\' 'U' Hex#8 '\''
    | '\'' '\\' 'x' Hex Hex? Hex? Hex? '\''
    | '\'' '\\' ('\'' | '\"' | '\\' | '0' | 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v') '\''
    | '\'' '\u0000'..'\uffff' '\'';

It turns out that you have to be sure to account for a multitude of escape characters as well as escaped Unicode literals. I didn’t want to have to implement this, but you can see the last rule which just matches every character under the sun needed it.

This will match:

  • '\u0000'
  • '\U00000000'
  • '\x0', '\x00', '\x000', '\x0000'
  • '\'', '\"', '\\', '\0', '\a', '\b', '\f', '\n', '\r', '\t', '\v'
  • 'a' …

Next I get to do the string parser… that should be even more interesting.

Currently rated 4.0 by 1 people

  • Currently 4/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

DSL | MetaSharp | languages

Comments

Add comment


 

  Country flag

biuquote
  • Comment
  • Preview
Loading



Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

Justin Chase

sweetest hat ever

I am a software developer from St. Paul MN and I work for Microsoft on the Expression team. This blog is about various technical topics I find myself encountering here and there. In addition to loving WPF and Xaml and Expression studio in general I have a special interest in DSLs, programming languages and games.

RecentComments

Comment RSS

Calendar

<<  March 2010  >>
MoTuWeThFrSaSu
22232425262728
1234567
891011121314
15161718192021
22232425262728
2930311234

View posts in large calendar