Create a doc gen template in Word 2007 and insert data using .NET
Since Microsoft introduces the OpenXML format, one key innovation enabled for developers is the ease of manipulate the content of Office documents without using any additional component. What you need (since OpenXML documents are XML files packaged in ZIP) are the System.IO.Packaging as well as System.XML in .NET to do magic.
So one of the magic you can do is create a skeleton document which you can reuse later to pump data. This is suitable for used as document generator where mail merge is not the right solution as you might want to integrate business logic inside; and also you might want the application hosted as a server side solution running from the Web.
When it comes to surfacing tabular data onto the Word document, things get a little bit tricky. You document could have many tables inside, and there is no way in OpenXML at the moment to ID a table as in HTML. But there is another solution which will work that is wrapping the table with your custom XML Schema tag. I created one (below) with a Document and a Section element.
<?xml version="1.0" encoding="utf-8"?> <xs:schema id="SectionedDocument" targetNamespace="http://tempuri.org/SectionedDocument.xsd" elementFormDefault="qualified" xmlns="http://tempuri.org/SectionedDocument.xsd" xmlns:mstns="http://tempuri.org/SectionedDocument.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Document" type="DocumentType"> </xs:element> <xs:complexType name="DocumentType" mixed="true"> <xs:sequence> <xs:element name="Section" minOccurs="1" maxOccurs="unbounded"> </xs:element> </xs:sequence> </xs:complexType> </xs:schema>
Next time you need to do is to import this document into Word. Open up Word 2007, look out for the Developer tab
Click on Schema, a new Templates and Add-ins window will appear
Click on the ‘Add Schema…’ button
When a new file dialog opened, look for the xsd file you just created and double click to select it.
A Schema Settings windows will then appear asking you to give it a Alias. It can be anything for this exercise, I’ll name mine ‘document’
Then click OK and the Schema Settings window will be closed. You are now back to Templates and Add-ins windows. You can see the new schema ‘document’ inside the Available XML Schemas list.
You can now click OK at the bottom of the Templates and Add-ins windows
After that you will realize there is a new custom task pane opened up at the right hand side of the Word window called XML Sctructure
Look at the bottom and you will see a ‘Document’ item in the Choose an element… window. That Document item is the Document XML element you created inside the .xsd just now.
Before we bind the XML element to the document, let’s create some table first
Now click on the ‘Document’ item at the lower right of the screen. You should be able to see a screen like below
At the Apply to entire document window, click Apply to Entire Document
After that you shall see a ‘Document’ tag surrounding the document; you might also notice the lower right part of the list does not have the ‘Document’ item but instead a ‘Section’ item.
Now highlight the table by dragging the mouse around it. You might want to have line breaks before and after the table so you can insert the ‘Section’ tag to wrap around it. (as per pic below)
After that click on the ‘Section’ item at the task pane over the right hand side
Then this is what you will see at the document, the ‘Document’ and ‘Section’ tag wrapping the table.
Now you can save and close the Word document.
Open up Visual Studio 2008. As per mentioned before you can make use of any free ZIP and XML API from any platform and language to manipulate this but I found a better and easier way using LINQ to XML and OpenXML SDK v2.0. LINQ to XML comes out of the box with VS2008 but you have to download OpenXML SDK 2.0 here.
Install the SDK onto your PC. the start VS2008 and create a Console Application project for this exercise.
Add a reference to the OpenXML SDK by right click the project file and choose ‘Add reference’
Look for DocumentFormat.OpenXml.dll in C:\Program Files\Open XML Format SDK\V2.0\lib when the Add reference window pops up
After that open Module1.vb and copy and paste the codes below to replace the code inside.
Below as the code and I hope they are self explanatory. You can download my project from my SkyDrive.
Imports DocumentFormat.OpenXml.Wordprocessing Imports DocumentFormat.OpenXml.Packaging Imports DocumentFormat.OpenXml Module Module1 Sub Main() 'Constants for the XML Schema namespace and tags Const WORD_PROCESSING_NS As String = _ "http://schemas.openxmlformats.org/wordprocessingml/2006/main" 'replace the following with what you have in your own XSD Const CUSTOM_XML_SCHEMA_NS As String = "http://tempuri.org/SectionedDocument.xsd" Const DOCUMENT_TAG As String = "Document" Const SECTION_TAG As String = "Section" 'Open a document form the file location 'For SharePoint document, got to change that to IOStream Dim doc = WordprocessingDocument.Open("c:\test\table.docx", True) Using doc 'open up document.xml from the zip Dim mainPart = doc.MainDocumentPart 'There could be multi instances of XML Schema imported For Each documentXsdBlock In mainPart.Document.Body.Elements(Of CustomXmlBlock)() Dim uri = documentXsdBlock.GetAttribute("uri", WORD_PROCESSING_NS) Dim element = documentXsdBlock.GetAttribute("element", WORD_PROCESSING_NS) 'Every XML schema can have more than one element, so must double check '<w:customXml w:uri="http://tempuri.org/SectionedDocument.xsd" w:element="Document"> If uri.Value = CUSTOM_XML_SCHEMA_NS _ And element.Value = DOCUMENT_TAG Then For Each sectionXsdBlock In documentXsdBlock.Elements(Of CustomXmlBlock)() uri = sectionXsdBlock.GetAttribute("uri", WORD_PROCESSING_NS) element = sectionXsdBlock.GetAttribute("element", WORD_PROCESSING_NS) 'Every XML schema can have more than one element, so must double check '<w:customXml w:uri="http://tempuri.org/SectionedDocument.xsd" ' w:element="Section"> If uri.Value = CUSTOM_XML_SCHEMA_NS _ And element.Value = SECTION_TAG Then Dim table = sectionXsdBlock.Elements(Of Table).First Dim rows = table.Elements(Of TableRow)() 'Make a copy of the 2nd row (assumed that the 1st row is header) Dim dupRow = rows(1).CloneNode(True) 'Insert data into duplicated node 'This is the place to put in a For loop if you inserting more than 1 row 'A table cell in Word OOXML is arranged like this '<w:tc> --> This is the TableCell object ' <w:p> --> This is the Paragraph object ' <w:r> --> This is the Run object ' <w:t> --> This is the Text object, ' inside this XML element is the content For Each cell In dupRow.Elements(Of TableCell)() Dim paragrahs = cell.Elements(Of Paragraph)() Dim para As Paragraph 'Checking is need because a TableCell might not have content at all ' and vice versa for Paragraph and Run If paragrahs.Count < 1 Then para = New Paragraph cell.AppendChild(para) Else para = paragrahs.First End If Dim runs = para.Elements(Of Run)() Dim _run As Run If runs.Count < 1 Then _run = New Run para.AppendChild(_run) Else _run = runs.First End If Dim texts = _run.Elements(Of Text)() Dim _text As Text If texts.Count < 1 Then _text = New Text _run.AppendChild(_text) Else _text = _run.First End If 'You can use your business logic to insert something here _text.Text = "Something" Next ' End of Insert data program logic table.AppendChild(dupRow) 'You might want to delete the first row since you already duplicate it 'Place this line of code with caution if you are inserting more than 1 row. rows(1).Remove() End If Next End If Next doc.MainDocumentPart.Document.Save() End Using End Sub End Module