## Quick attempt at a validating roman numeral parser... Lots of gotchas.

Got asked about a roman numeral parser during an interview. I have to say that I don't mind when the process of obtaining employment plays into my strengths. The process was quite similar to a previous process where I wrote a spoken numerics converter. Not only that, there were many similar qualities to my int parsing routines. With that in mind I think I did fairly well. The goal at the time was to produce a routine to validate numbers up to roman numeral 30 or XXX. Didn't take long, but in the end, I had left out many different validation techniques. I really wanted to revisit the problem since I had the code correctly written in my mind. Check the algorithms out, they should handle just about anything you can throw at them at this point. If you find issues, please feel free to submit your problems, since I'd love to solidfy things a bit more. Apparently roman numeral parsing has great application in reading dates.

Roman Numeral Parsing: Code Only: Bidirectional roman numeral parsing. [EDIT: Added alternate parsing routines and performance fixes]

Integer to Spoken Numerics: Code-Only: int/long/double conversion to Spoken Numerics

Phone Number to Words: Trying my hand at the old Phone number to Words teaser project!

Integer Parsing: DWC.Algorithms.NumberUtilities

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

Then do a match on the input and return the index. Quick and easy :)

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

There is a partial test harness in the code, and I can upload the performance harness and boundary test harness to make sure you catch all of the invalid input.

Remember, on my blog you can't make a statement without backing it up with proof.

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

Sorry to seem so rude in the comment. Now that I know more of what you are speaking towards, yes, it would be possible to do an IndexOf over the array, and return the index. Just be forewarned and prepared when they ask you to rewrite the parser to accept any value from 1 to Int32.MaxValue and make sure you bring lots of dry erase markers for that huge array.

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

Looking at that post our integer to roman numeral routine could be more table driven if we chose. In fact, I'll add a table driver right along-side the existing routine and upload in a bit.

Now, the reason int2Rom is easy to write as a table driven approach is that there is no ambiguity in the handling of an integer. Aka, the interger is in the right format, no matter how you slice it.

With the reverse rom2Int the input string can be completely bogus and not represent a roman numeral at all. Even worse, it can be a malformed roman numeral. A table driven approach would work to a point, and then require that you bolt on validation for the basic things like "VV" which is invalid or "DD" which is invalid.

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

1. The standard replacement is to take a Match region and replace it with Groups out of that match.

2. Replacements don't allow mappings in the replacement string. An example would be a conditional capture that if captured, replace with 1, etc... You can map 1 thing (aka replace a Match with some fixed string).

3. You can only construct captures from material in the string. You can't create new material at any point...

Longer answer is... Yes/Maybe! It would be complex, but you can add material to the string before you do a replacement. By appending the digits 0 through 9 as the first ten characters, they can now be used to construct captures.

So what kinds of captures do we have? Well, We use conditionals that look for a pattern in the string, and then create a new capture group that is either empty or filled with a number... By concatenating all of these items in the replacement string, you can possibly retrieve a number back.

The maximum number achievable will be fixed to how you build expression. Remember that thousands can walk on forever, and it isn't possible to *count* them and then use that information to more properly add digits... Actually, I take that back, you kind of CAN do that too...

I might put some work into this. I imagine the place to use it would be in client-side script, however, we could just as easily convert our parsers to Javascript...

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

((?<two>II)|(?<one>I))(?(two)(?=.*(?<ones>2))|(?(one)(?=.*(?<ones>1))|(?=.*(?<ones>0))))012

Notice that we match 012 at the end... The input string needs to be of the form II012 or I012 for the above example to work... So what we assume is that we'll be appending 012 to the end of any roman numeral BEFORE doing the replacement.

We'll in essence take 9 sub-patterns to represent the ones, 9 sub-patterns for the tens, 9 sub-patterns for the hundreds... This is a first chance at this after only thinking a few minutes, so if you spot a short-cut that I'm missing please point it out.

Being a tool driven world we live in, I now know the intricacies of the expression, enough such that I can dynamically generate the full pattern with just a small amount of code... Yay!

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

I say this not to tease you (for I consider our discussion of my book over) but because I have found BNF useful in surprising areas including the parsing of international phone numbers.

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

romanNumeral = [ thousands ] [ hundreds ] [ fifty ] [ tens ] [ five ] [ ones ]

thousands = M [ thousands ]

hundreds = C [ hundreds ]

fifty = [ decrementByTen ] L

tens = ( decrementByOne X ) | straightTens

straightTens = X [ straightTens ]

five = [ decrementByOne ] V

ones = I [ ones ]

decrementByTen = X

decrementByOne = I

The grammar is clearly ambiguous, but this shouldn't scare us because the ambiguity is resolved in a known limited context.

For example, a "preprocessor" could resolve all incidences of a I, acting as a decrement operator by appearing in other than the first position. It could then replace the decrement operator and the next symbol by a bogus symbol representing IV (four) in the case of decrementing five by one.

Once ambiguity is resolved, you start to code one procedure for each nonterminal, passing the input RN and an index.

The ambiguities don't have to be resolved by preprocessing and transcribing. Instead, a routine (get next token) looks at I to see if it is followed by other-than-I, and returns a special symbol in place of the input symbol.

I will implement a code solution and post it this weekend.

The above grammar doesn't let you decrement the C or M symbols and needs further work.

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

Gute Arbeit hier! Gute Inhalte.

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

Sehr wertvolle Informationen! Empfehlen!

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

The Best Figure Flattening Garter Belts Online Available

## # re: Quick attempt at a validating roman numeral parser... Lots of gotchas.

Wow, I never really knew I needed to look at my net worth. It is really interesting the great things you can learn by searching the internet a little bit. Thanks so much for the valuable information.