More and more projects are managing their documentation as a bunch of Markdown files in a repository. Sites such as Github make that really easy and convenient, by providing an easy web interface around viewing and editing Markdown files, creating and integrating pull requests, viewing changesets, etc. It seems to provide all the advantages of a wiki, and more, while using a standard and easy set of tools.
Documents are not just Markdown however: there should almost always be some associated metadata such as a title (that is often extracted from the file name or the first heading of the document, but that’s suboptimal and unreliable), tags, authors, etc. That additional data could be stored in a database, but that’s hard to manage, doesn’t version along with the documents, and it creates too much potential for orphaned files or records. Another solution is companion files, but if you’re going to use the file system, why have two files when you could have only one?
I think the best solution to that problem is a multipart document format that allows for a structured metadata section, followed by a rich text body. This is similar to image formats such as JPEG, that allow for embedding EXIF metadata.
Markdown itself has been hugely successful by providing a rich text format that can not only be expressed as plain text, but can also be read and written by non-technical human beings. There are other examples of successful plain text human-readable formats, such as YAML and modern diff and patch formats. The multipart format that I need should also be plain text, and easy to author for non-technical users.
The second requirement for this multipart format is that it should be minimalist: it should only deal with assembling multiple documents into one, but should get out of the way as far as the actual document parts are concerned.
I’ve opted for a very simple and fun separator format to delimit the different parts of a document, taking a clue from Markdown to use something that immediately makes sense to an untrained user:
The separator is simply the emoticon for a pair of scissors cutting through a dotted line. The number of dash characters on the right of the scissors can be anything above 2. The separator has to be on its own line.
With this we’re halfway through. The second part of the puzzle is a way to specify the format of the parts.
First there has to be good defaults. I’ve chosen those to be a YAML header followed by a Markdown body, because that corresponds to the main scenario, and those are the best and most successful structured and rich text formats that are also human-writable, so that’s what you get if part formats are not otherwise explicitly specified.
The first and preferred way that you can explicitly specify part formats is through file extensions. File extensions are already widely used and understood: foo.md is a Markdown file, bar.json is a JSON file, baz.yaml is a YAML file, so it would only make sense that foo.yaml.md would be a multipart file with a YAML part followed by a Markdown part.
If you can’t use the file extension for some reason, you can instead embed the file format into the separator. You can specify the format of the part before the break, after the break, or both:
Here’s an example of a typical snippable document.yaml.md multipart document:
Title: A simple snippable document
Author: Bertrand Le Roy
Tags: snippable, yaml, markdown, multipart
A Snippable Document
This is what a snippable document looks like.
This document has two parts:
* a YAML header
* this Markdown document
It should not look too terrible to a regular Markdown parser
and can be parsed to extract the header.
If you find this useful, and want to use the format yourself, please do: it’s under the MIT license. If you create an implementation in another language, please let me know, and I’ll point to it.