Miscellaneous Debris

Avner Kashtan's Frustrations and Exultations
When is a dash not a dash?

An interesting an unusually frustrating pitfall that I've stumbled into a few times:

Many times when we're preparing for deployment of a software product, we find ourselves preparing a Word document with the steps for installation. In the case of my current system, installing the server requires several manual steps, among them running a command-line utility to set some Kerberos security parameters.

Naturally, being the friendly and considerate developers we are, we supplied the full command-line that is to be run on the target machine, complete will all options, parameters and variables.

Imagine our surprise, then, to learn that the command line simply fails to run, claiming a syntax error in the inputs. Even when we went over to the system to see it for ourselves, we saw that a direct copy-paste operation from the Word document to the commandline fails mysteriously. What is even stranger is that typing out the command-line, letter by letter, will cause the command to succeed.

The answer, as you might guess by the title of my post, is that Word tends to be a bit too smart for its own good. When I copied the command-line to the document with a option parameter like this:

setspn.exe -a http/MyServerName.domain.com

it replaced the dash in "-a" automatically with this:

setspn.exe –a http/MyServerName.domain.com

Similar at first glance, but what we have on the first line is a Hypen-Minus (U+002D) and on the second an En Dash (U+2013). Word's grammar rules changes the minus-sign to a Dash, and the command-line parser chokes on it and dies. This is almost impossible to spot; command-line fonts don't display the difference to any discernable degree. The only way we stumbled onto it is because we tried replacing every other element on the command-line with no success.

So remember - by default Word allows itself freedom in altering your documents. For letter-perfect preservation, consider disabling AutoText/AutoCorrect in word, or just storing syntax-sensitive data in normal text files.

 

 

Published Tuesday, January 17, 2006 5:26 PM by AvnerK

Comments

# re: When is a dash not a dash?@ Tuesday, January 17, 2006 12:34 PM

Good point

I have been caught many times with their "smart" quotes in code but never with this

Mike

# re: When is a dash not a dash?@ Tuesday, January 17, 2006 4:34 PM

Word seems to frequently suffer from "A little knowledge is a dangerous thing".

When you need fidelity in text content, don't use MS Word.



AndrewSeven

# re: When is a dash not a dash?@ Wednesday, January 18, 2006 4:31 AM

It's a big problem for anybody working with XML where text might have originated in Word. There's a handful of "clever" things it does which are a nightmare when you put the text into XML.

Guy Murphy

# re: When is a dash not a dash?@ Wednesday, January 18, 2006 8:33 AM

Andrew: Fidelity! Exactly the word I was looking for. Thanks.

Mike/Guy: Indeed, XML is a stickler for syntax, and a case of smart-quotes or a character that doesn't suit its encoding can send you going byte-by-byte looking for the culprit.

Avner Kashtan

Leave a Comment

(required) 
(required) 
(optional)
(required)