Home > Uncategorized > Create a doc gen template in Word 2007 and insert data using .NET

Create a doc gen template in Word 2007 and insert data using .NET

Since Microsoft introduces the OpenXML format, one key innovation enabled for developers is the ease of manipulate the content of Office documents without using any additional component. What you need (since OpenXML documents are XML files packaged in ZIP) are the System.IO.Packaging as well as System.XML in .NET to do magic.

So one of the magic you can do is create a skeleton document which you can reuse later to pump data. This is suitable for used as document generator where mail merge is not the right solution as you might want to integrate business logic inside; and also you might want the application hosted as a server side solution running from the Web.

When it comes to surfacing tabular data onto the Word document, things get a little bit tricky. You document could have many tables inside, and there is no way in OpenXML at the moment to ID a table as in HTML. But there is another solution which will work that is wrapping the table with your custom XML Schema tag. I created one (below) with a Document and a Section element.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="SectionedDocument"
  targetNamespace="http://tempuri.org/SectionedDocument.xsd"
  elementFormDefault="qualified"
  xmlns="http://tempuri.org/SectionedDocument.xsd"
  xmlns:mstns="http://tempuri.org/SectionedDocument.xsd"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Document" type="DocumentType">
</xs:element>
<xs:complexType name="DocumentType" mixed="true">
<xs:sequence>
<xs:element name="Section" minOccurs="1" maxOccurs="unbounded">
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>

Next time you need to do is to import this document into Word. Open up Word 2007, look out for the Developer tab

image

Click on Schema, a new Templates and Add-ins window will appear

image

Click on the ‘Add Schema…’ button

image

When a new file dialog opened, look for the xsd file you just created and double click to select it.

image

A Schema Settings windows will then appear asking you to give it a Alias. It can be anything for this exercise, I’ll name mine ‘document’

image

Then click OK and the Schema Settings window will be closed. You are now back to Templates and Add-ins windows. You can see the new schema ‘document’ inside the Available XML Schemas list.

You can now click OK at the bottom of the Templates and Add-ins windows

After that you will realize there is a new custom task pane opened up at the right hand side of the Word window called XML Sctructure

image

Look at the bottom and you will see a ‘Document’ item in the Choose an element… window. That Document item is the Document XML element you created inside the .xsd just now.

image

Before we bind the XML element to the document, let’s create some table first

image

Now click on the ‘Document’ item at the lower right of the screen. You should be able to see a screen like below

image

At the Apply to entire document window, click Apply to Entire Document

image

After that you shall see a ‘Document’ tag surrounding the document; you might also notice the lower right part of the list does not have the ‘Document’ item but instead a ‘Section’ item.

image

Now highlight the table by dragging the mouse around it. You might want to have line breaks before and after the table so you can insert the ‘Section’ tag to wrap around it. (as per pic below)

image

After that click on the ‘Section’ item at the task pane over the right hand side

image

Then this is what you will see at the document, the ‘Document’ and ‘Section’ tag wrapping the table.

Now you can save and close the Word document.

Open up Visual Studio 2008. As per mentioned before you can make use of any free ZIP and XML API from any platform and language to manipulate this but I found a better and easier way using LINQ to XML and OpenXML SDK v2.0. LINQ to XML comes out of the box with VS2008 but you have to download OpenXML SDK 2.0 here.

Open XML Format SDK 2.0

Install the SDK onto your PC. the start VS2008 and create a Console Application project for this exercise.

project

Add a reference to the OpenXML SDK by right click the project file and choose ‘Add reference’

add ref

Look for DocumentFormat.OpenXml.dll in C:\Program Files\Open XML Format SDK\V2.0\lib when the Add reference window pops up

browse

After that open Module1.vb and copy and paste the codes below to replace the code inside.

Below as the code and I hope they are self explanatory. You can download my project from my SkyDrive.

Imports DocumentFormat.OpenXml.Wordprocessing
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml

Module Module1

    Sub Main()

        'Constants for the XML Schema namespace and tags
        Const WORD_PROCESSING_NS As String = _
       "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
        'replace the following with what you have in your own XSD
        Const CUSTOM_XML_SCHEMA_NS As String = "http://tempuri.org/SectionedDocument.xsd"
        Const DOCUMENT_TAG As String = "Document"
        Const SECTION_TAG As String = "Section"

        'Open a document form the file location
        'For SharePoint document, got to change that to IOStream
        Dim doc = WordprocessingDocument.Open("c:\test\table.docx", True)

        Using doc

            'open up document.xml from the zip
            Dim mainPart = doc.MainDocumentPart

            'There could be multi instances of XML Schema imported
            For Each documentXsdBlock In mainPart.Document.Body.Elements(Of CustomXmlBlock)()

                Dim uri = documentXsdBlock.GetAttribute("uri", WORD_PROCESSING_NS)
                Dim element = documentXsdBlock.GetAttribute("element", WORD_PROCESSING_NS)

                'Every XML schema can have more than one element, so must double check
                '<w:customXml w:uri="http://tempuri.org/SectionedDocument.xsd" w:element="Document">
                If uri.Value = CUSTOM_XML_SCHEMA_NS _
                   And element.Value = DOCUMENT_TAG Then

                    For Each sectionXsdBlock In documentXsdBlock.Elements(Of CustomXmlBlock)()
                        uri = sectionXsdBlock.GetAttribute("uri", WORD_PROCESSING_NS)
                        element = sectionXsdBlock.GetAttribute("element", WORD_PROCESSING_NS)

                        'Every XML schema can have more than one element, so must double check
                        '<w:customXml w:uri="http://tempuri.org/SectionedDocument.xsd" 
                        '    w:element="Section">
                        If uri.Value = CUSTOM_XML_SCHEMA_NS _
                             And element.Value = SECTION_TAG Then

                            Dim table = sectionXsdBlock.Elements(Of Table).First
                            Dim rows = table.Elements(Of TableRow)()

                            'Make a copy of the 2nd row (assumed that the 1st row is header)
                            Dim dupRow = rows(1).CloneNode(True)

                            'Insert data into duplicated node
                            'This is the place to put in a For loop if you inserting more than 1 row

                            'A table cell in Word OOXML is arranged like this
                            '<w:tc>       --> This is the TableCell object
                            '  <w:p>      --> This is the Paragraph object
                            '    <w:r>    --> This is the Run object
                            '      <w:t>  --> This is the Text object, 
                '                 inside this XML element is the content

                            For Each cell In dupRow.Elements(Of TableCell)()
                                Dim paragrahs = cell.Elements(Of Paragraph)()

                                Dim para As Paragraph

                                'Checking is need because a TableCell might not have content at all
                                ' and vice versa for Paragraph and Run
                                If paragrahs.Count < 1 Then
                                    para = New Paragraph
                                    cell.AppendChild(para)
                                Else
                                    para = paragrahs.First
                                End If

                                Dim runs = para.Elements(Of Run)()
                                Dim _run As Run
                                If runs.Count < 1 Then
                                    _run = New Run
                                    para.AppendChild(_run)
                                Else
                                    _run = runs.First
                                End If

                                Dim texts = _run.Elements(Of Text)()
                                Dim _text As Text

                                If texts.Count < 1 Then
                                    _text = New Text
                                    _run.AppendChild(_text)
                                Else
                                    _text = _run.First

                                End If

                                'You can use your business logic to insert something here
                                _text.Text = "Something"
                            Next ' End of Insert data program logic

                            table.AppendChild(dupRow)

                            'You might want to delete the first row since you already duplicate it
                            'Place this line of code with caution if you are inserting more than 1 row.
                            rows(1).Remove()
                        End If
                    Next
                End If
            Next

            doc.MainDocumentPart.Document.Save()

        End Using

    End Sub

End Module
  1. No comments yet.
  1. No trackbacks yet.