March 2009 - Posts
The Cozi Tech Blog needed some love,
so I wrote a post on augmenting Python's strftime.
On a StackOverflow question about favorite Vim plugins, I learned about Ack,
a replacement for grep that's smarter about searching source trees.
Ack is written in Perl.
The built-in :vimgrep is rather slow.
It seems to have some Vim-specific overhead,
such as creating swap files and executing BufRead autocmds.
Ack is noticeably faster,
though somewhat slower than GNU grep.
Which would you rather type to search a tree,
ignoring the .svn and .git subtrees?
$ ack -i -l foobar
$ grep --exclude='*.svn*' --exclude='*.git*' -i -l -r foobar .
The ack takes 6 seconds to search 4500 files, while the grep completes in 2.
This does not count the time that I spent trying to figure out
the correct syntax and argument quoting for --exclude.
The help says both --regexp=PATTERN and --exclude=PATTERN,
but the latter is a glob (file wildcard pattern).
On Windows, I wrapped ack with pl2bat.
Exuberant Ctags is an essential complement to Vim:
it generates an index of symbol names (tags) for a set of source files.
In Vim, just place the cursor on a function name
and type C-] to go to its definition.
Ctags works well for most of the languages that I deal with,
but falls down badly on modern JavaScript.
Its built-in parser simply doesn't handle declarations like these:
Sizzle.selectors.filters.animated = function(elem) { // ...
ajaxSetup: function( settings ) {
I came across Unbad's workaround earlier tonight.
His code didn't work for me, so I hacked on it until it did:
--langdef=js
--langmap=js:.js
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\{/\1/,object/
--regex-js=/([A-Za-z0-9._$()]+)[ \t]*[:=][ \t]*function[ \t]*\(/\1/,function/
--regex-js=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*\(([^)])\)/\1/,function/
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\[/\1/,array/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^"]'[^']*/\1/,string/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^']"[^"]*/\1/,string/
Simply add the above to ~/.ctags or $HOME/ctags.cnf.
Python has list comprehensions,
syntactic sugar for building lists from an expression.
>>> [2 * i for i in (2, 3, 5, 7, 11)]
[4, 6, 10, 14, 22]
This doesn't work so well when the comprehension expression
is itself a list: you end up with a list of lists.
>>> def gen():
... for l in [['a', 'b'], ['c'], ['d', 'e', 'f']]:
... yield l
...
>>> [l for l in gen()]
[['a', 'b'], ['c'], ['d', 'e', 'f']]
This is ugly. Here's one way to build a flattened list,
but it's less elegant than the comprehension.
>>> x = []
>>> for l in gen():
... x.extend(l)
...
>>> x
['a', 'b', 'c', 'd', 'e', 'f']
It took me a while to find a readable list comprehension,
with a little help from Google.
Use sum() on the outer list and prime it with an empty list, [].
Python will concatenate the inner lists, producing a flattened list.
>>> sum([l for l in gen()], [])
['a', 'b', 'c', 'd', 'e', 'f']
Alternatively, you can use itertools.chain().
>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']
That might be slightly more efficient,
though I find the sum() to be a little more readable.
>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']
That might be slightly more efficient,
though I find the sum() to be a little more readable.
Edit: I forgot about nested comprehensions
>>> [inner
... for outer in gen()
... for inner in outer]
['a', 'b', 'c', 'd', 'e', 'f']
Somewhat cryptic on one line however:
>>> [j for i in gen() for j in i]
['a', 'b', 'c', 'd', 'e', 'f']
Vim has had syntax highlighting since version 5.0 in 1998.
It quickly became indispensable.
It's hard to go back to looking at monochromatic source code
after you've become accustomed to syntax highlighting.
The syntax highlighting is tied into Vim's support for colorschemes,
which define colors for the fundamental syntax groups
like Number, Comment, and String.
The syntax highlighting for a particular language
defines custom syntax groups for specific language features,
such as cppExceptions or htmlEndTag,
The custom syntax groups are linked to the underlying fundamental syntax groups.
Hence, if you change your colorscheme, your syntax highlighting is updated automatically.
The reStructuredText syntax highlighting in Vim 7.2
has some shortcomings, in my opinion.
For example, *italic* shows up as italic in gVim,
but in the same color as regular text,
while **bold** shows up in a different color, but not bolded.
When you declare a syntax group,
you can either link it to another gropu and pick up that's one color and fontstyle,
or you can give it a concrete fontstyle and color.
If you do that, then the syntax group
won't change color when you change the colorscheme.
After much poking around, I found a way to set a syntax group's
fontstyle and link it to another group's color:
see hi def rstItalic and hi def rstBold below.
I also make use of certain Unicode characters in my reStructuredText source,
such as endash and emdash,
which are very hard to tell apart in a fixed-width font—even though an emdash (—) is twice as wide as an endash (–) in a proportional font.
Worse, a non-breaking space is invisible and
can't easily be distinguished from a normal space.
I provided custom highlighting for these Unicode characters
and the various ‘curly’ “quotes”.
All of this goes into ~/.vim/syntax/rst.vim,
which treats $VIMRUNTIME/syntax/rst.vim as a subroutine.
I tried putting it into ~/.vim/after/syntax/rst.vim,
which gets executed after $VIMRUNTIME/syntax/rst.vim completes,
but then I can't provide non-overrideable definitions for
rstEmphasis and rstStrongEmphasis.
function! s:SynFgColor(hlgrp)
return synIDattr(synIDtrans(hlID(a:hlgrp)), 'fg')
endfun
function! s:SynBgColor(hlgrp)
return synIDattr(synIDtrans(hlID(a:hlgrp)), 'bg')
endfun
syn match rstEnumeratedList /^\s*[0-9#]\{1,3}\.\s/
syn match rstBulletedList /^\s*[+*-]\s/
syn match rstNbsp /[\xA0]/
syn match rstEmDash /[\u2014]/
syn match rstUnicode /[\u2013\u2018\u2019\u201C\u201D]/ " – ‘ ’ “ ”
exec 'hi def rstBold term=bold cterm=bold gui=bold guifg=' . s:SynFgColor('PreProc')
exec 'hi def rstItalic term=italic cterm=italic gui=italic guifg=' . s:SynFgColor('Statement')
exec 'hi def rstNbsp gui=underline guibg=' . s:SynBgColor('ErrorMsg')
exec 'hi def rstEmDash gui=bold guifg=' . s:SynFgColor('Title') . ' guibg='. s:SynBgColor('Folded')
exec 'hi def rstUnicode guifg=' . s:SynFgColor('Number')
hi link rstEmphasis rstItalic
hi link rstStrongEmphasis rstBold
hi link rstEnumeratedList Operator
hi link rstBulletedList Operator
source $VIMRUNTIME/syntax/rst.vim
syn cluster rstCruft contains=rstEmphasis,rstStrongEmphasis,
\ rstInterpretedText,rstInlineLiteral,rstSubstitutionReference,
\ rstInlineInternalTargets,rstFootnoteReference,rstHyperlinkReference,
\ rstNbsp,rstEmDash,rstUnicode
At lunch today, I told Eric about Hash Attacks:
for many hash functions,
it's possible to construct a large set of keys that collide.
This can be used to cause a Denial of Service
as hashtable operations can be induced
to take O(n) time instead of O(1).
Crosby and Wallach successfully demonstrated this
against a number of applications.
Andrew has a good writeup of Hash Algorithm Attacks.
There are various mitigations suggested.
The one that I used when I first became aware of this problem
is to use a salt to the hash function.
In other words, change:
unsigned hash(const char* s)
{
unsigned h = 0;
while (*s)
h = h * 101 + (unsigned char*) *s++;
return h;
}
to:
unsigned hash(const char* s)
{
unsigned h = SALT;
while (*s)
h = h * 101 + (unsigned char*) *s++;
return h;
}
where SALT is chosen randomly when the hash table is created
or when the process starts.
This should be enough to vary the order in which keys
are distributed to buckets from run to run.
More Posts