Google has
posted
a very interesting analysis of the code/authoring techniques
of over one billion documents
here. It seems much of the data they collected was pretty
obvious (ex: the abundance of the "a" and "img" element).
But, it's still an interesting read.