George V. Reilly's Technical Blog

March 2009 - Posts

Augmenting Python's strftime
Big Ben

The Cozi Tech Blog needed some love, so I wrote a post on augmenting Python's strftime.

Posted: Mar 30 2009, 11:19 PM by george_v_reilly | with no comments
Filed under:
Ack - Better than Grep
Ack!

On a StackOverflow question about favorite Vim plugins, I learned about Ack, a replacement for grep that's smarter about searching source trees.

Ack is written in Perl. The built-in :vimgrep is rather slow. It seems to have some Vim-specific overhead, such as creating swap files and executing BufRead autocmds. Ack is noticeably faster, though somewhat slower than GNU grep.

Which would you rather type to search a tree, ignoring the .svn and .git subtrees?

$ ack -i -l foobar
$ grep --exclude='*.svn*' --exclude='*.git*' -i -l -r foobar .

The ack takes 6 seconds to search 4500 files, while the grep completes in 2. This does not count the time that I spent trying to figure out the correct syntax and argument quoting for --exclude. The help says both --regexp=PATTERN and --exclude=PATTERN, but the latter is a glob (file wildcard pattern).

On Windows, I wrapped ack with pl2bat.

Posted: Mar 26 2009, 07:19 PM by george_v_reilly | with no comments
Filed under:
Exuberant Ctags and JavaScript
Exuberant Ctags

Exuberant Ctags is an essential complement to Vim: it generates an index of symbol names (tags) for a set of source files. In Vim, just place the cursor on a function name and type C-] to go to its definition.

Ctags works well for most of the languages that I deal with, but falls down badly on modern JavaScript. Its built-in parser simply doesn't handle declarations like these:

Sizzle.selectors.filters.animated = function(elem) { // ...
ajaxSetup: function( settings ) {

I came across Unbad's workaround earlier tonight. His code didn't work for me, so I hacked on it until it did:

--langdef=js
--langmap=js:.js
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\{/\1/,object/
--regex-js=/([A-Za-z0-9._$()]+)[ \t]*[:=][ \t]*function[ \t]*\(/\1/,function/
--regex-js=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*\(([^)])\)/\1/,function/
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\[/\1/,array/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^"]'[^']*/\1/,string/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^']"[^"]*/\1/,string/

Simply add the above to ~/.ctags or $HOME/ctags.cnf.

Flattening List Comprehensions in Python
List Comprehension

Python has list comprehensions, syntactic sugar for building lists from an expression.

>>> [2 * i for i in (2, 3, 5, 7, 11)]
[4, 6, 10, 14, 22]

This doesn't work so well when the comprehension expression is itself a list: you end up with a list of lists.

>>> def gen():
...     for l in [['a', 'b'], ['c'], ['d', 'e', 'f']]:
...         yield l
...
>>> [l for l in gen()]
[['a', 'b'], ['c'], ['d', 'e', 'f']]

This is ugly. Here's one way to build a flattened list, but it's less elegant than the comprehension.

>>> x = []
>>> for l in gen():
...     x.extend(l)
...
>>> x
['a', 'b', 'c', 'd', 'e', 'f']

It took me a while to find a readable list comprehension, with a little help from Google. Use sum() on the outer list and prime it with an empty list, []. Python will concatenate the inner lists, producing a flattened list.

>>> sum([l for l in gen()], [])
['a', 'b', 'c', 'd', 'e', 'f']

Alternatively, you can use itertools.chain().

>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']

That might be slightly more efficient, though I find the sum() to be a little more readable.

>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']

That might be slightly more efficient, though I find the sum() to be a little more readable.

Edit: I forgot about nested comprehensions

>>> [inner
...     for outer in gen()
...         for inner in outer]
['a', 'b', 'c', 'd', 'e', 'f']

Somewhat cryptic on one line however:

>>> [j for i in gen() for j in i]
['a', 'b', 'c', 'd', 'e', 'f']
reStructuredText syntax highlighting
reStructuredText syntax highlighting

Vim has had syntax highlighting since version 5.0 in 1998. It quickly became indispensable. It's hard to go back to looking at monochromatic source code after you've become accustomed to syntax highlighting.

The syntax highlighting is tied into Vim's support for colorschemes, which define colors for the fundamental syntax groups like Number, Comment, and String. The syntax highlighting for a particular language defines custom syntax groups for specific language features, such as cppExceptions or htmlEndTag,

The custom syntax groups are linked to the underlying fundamental syntax groups. Hence, if you change your colorscheme, your syntax highlighting is updated automatically.

The reStructuredText syntax highlighting in Vim 7.2 has some shortcomings, in my opinion. For example, *italic* shows up as italic in gVim, but in the same color as regular text, while **bold** shows up in a different color, but not bolded.

When you declare a syntax group, you can either link it to another gropu and pick up that's one color and fontstyle, or you can give it a concrete fontstyle and color. If you do that, then the syntax group won't change color when you change the colorscheme.

After much poking around, I found a way to set a syntax group's fontstyle and link it to another group's color: see hi def rstItalic and hi def rstBold below.

I also make use of certain Unicode characters in my reStructuredText source, such as endash and emdash, which are very hard to tell apart in a fixed-width font—even though an emdash (—) is twice as wide as an endash (–) in a proportional font. Worse, a non-breaking space is invisible and can't easily be distinguished from a normal space.

I provided custom highlighting for these Unicode characters and the various ‘curly’ “quotes”.

All of this goes into ~/.vim/syntax/rst.vim, which treats $VIMRUNTIME/syntax/rst.vim as a subroutine. I tried putting it into ~/.vim/after/syntax/rst.vim, which gets executed after $VIMRUNTIME/syntax/rst.vim completes, but then I can't provide non-overrideable definitions for rstEmphasis and rstStrongEmphasis.

function! s:SynFgColor(hlgrp)
    return synIDattr(synIDtrans(hlID(a:hlgrp)), 'fg')
endfun

function! s:SynBgColor(hlgrp)
    return synIDattr(synIDtrans(hlID(a:hlgrp)), 'bg')
endfun

syn match rstEnumeratedList /^\s*[0-9#]\{1,3}\.\s/
syn match rstBulletedList /^\s*[+*-]\s/
syn match rstNbsp /[\xA0]/
syn match rstEmDash /[\u2014]/
syn match rstUnicode /[\u2013\u2018\u2019\u201C\u201D]/ " – ‘ ’ “ ”

exec 'hi def rstBold    term=bold cterm=bold gui=bold guifg=' . s:SynFgColor('PreProc')
exec 'hi def rstItalic  term=italic cterm=italic gui=italic guifg=' . s:SynFgColor('Statement')
exec 'hi def rstNbsp    gui=underline guibg=' . s:SynBgColor('ErrorMsg')
exec 'hi def rstEmDash  gui=bold guifg=' . s:SynFgColor('Title') . ' guibg='. s:SynBgColor('Folded')
exec 'hi def rstUnicode guifg=' . s:SynFgColor('Number')

hi link rstEmphasis       rstItalic
hi link rstStrongEmphasis rstBold
hi link rstEnumeratedList Operator
hi link rstBulletedList   Operator

source $VIMRUNTIME/syntax/rst.vim

syn cluster rstCruft                contains=rstEmphasis,rstStrongEmphasis,
      \ rstInterpretedText,rstInlineLiteral,rstSubstitutionReference,
      \ rstInlineInternalTargets,rstFootnoteReference,rstHyperlinkReference,
      \ rstNbsp,rstEmDash,rstUnicode
Hash Table Attacks
Worst-case hash table collisions

At lunch today, I told Eric about Hash Attacks: for many hash functions, it's possible to construct a large set of keys that collide. This can be used to cause a Denial of Service as hashtable operations can be induced to take O(n) time instead of O(1).

Crosby and Wallach successfully demonstrated this against a number of applications.

Andrew has a good writeup of Hash Algorithm Attacks.

There are various mitigations suggested. The one that I used when I first became aware of this problem is to use a salt to the hash function.

In other words, change:

unsigned hash(const char* s)
{
    unsigned h = 0;
    while (*s)
        h = h * 101 + (unsigned char*) *s++;
    return h;
}

to:

unsigned hash(const char* s)
{
    unsigned h = SALT;
    while (*s)
        h = h * 101 + (unsigned char*) *s++;
    return h;
}

where SALT is chosen randomly when the hash table is created or when the process starts. This should be enough to vary the order in which keys are distributed to buckets from run to run.

More Posts