Parsing OneNote XML with LINQ to XML

Recently I was involved in a small project (http://www.codeplex.com/onenote2pdf) to write a command-line utility to export entire OneNote's notebook/section/or section group to a PDF file with some customization in PDF output such as Table of Contents, Bookmark, etc.

OneNote 2007 supports some level of interacting with external application by means of API, and there is an API that export entire structure of a notebook including its sections, sectiongroups, and pages at all level into XML string.

clip_image001

As you can see, the XML exported by OneNote contains notebook hierarchy: Notebook node can contain many Section and SectionGroup nodes, SectionGroup node can contain many Section and nested SectionGroup nodes, and Section node contains only Page nodes.

My task was build the hierarchy of notebook, sectiongroups, sections, and pages from exported XML. With the help of LINQ, parsing this XML string is rather simple task. However, due to the recursive nature of data, I have to implement some recursive functions to build the entire hierarchy.

First, get the desired notebook information such as name, ID from input XML

// Define XML namespace
XNamespace oneNS = 
"http://schemas.microsoft.com/office/onenote/2007/onenote"; XDocument outputXML = XDocument.Parse(OneNoteNoteBookXML); notebook =
(from nb in outputXML.Descendants(oneNS + "Notebook") select new Data.ONNotebook { Name = nb.Attribute("name").Value, ID = nb.Attribute("ID").Value, Sections = SelectSections(nb), }).First();

In the above code snippet, function SelectSections was called to build Section/SectionGroup hierarchy for the desired notebook.

private List<Data.ONSection> SelectSections(XElement xml)
{
    List<Data.ONSection> sections =
        (from section in xml.Elements()
         where 
(section.Name == oneNS + "SectionGroup") ||
(section.Name == oneNS + "Section") // orderby section.Value select new Data.ONSection { Name = section.Attribute("name").Value, ID = section.Attribute("ID").Value, SubSections = SelectSections(section), Pages = SelectPages(section), }).ToList(); return sections; }
// Get page information out of XML

private
List<Data.ONPage> SelectPages(XElement xml) { List<Data.ONPage> pages = (from page in xml.Elements(oneNS + "Page") // orderby section.Value select new Data.ONPage { Name = page.Attribute("name").Value, ID = page.Attribute("ID").Value, }).ToList(); return pages; }

So with the help of LINQ to XML, we can parse a very complex XML file with ease.

Happy Coding!

Published Friday, January 25, 2008 12:21 AM by hoanguyen
Filed under: , ,

Comments

# re: Parsing OneNote XML with LINQ to XML

Thursday, February 26, 2009 10:48 AM by Benjamin Haag

Interesting and helpful; thanks!

I'm curious; did you use a third-party product to create the PDF file?