Create a doc gen template in Word 2007 and insert data using .NET

Since Microsoft introduces the OpenXML format, one key innovation enabled for developers is the ease of manipulate the content of Office documents without using any additional component. What you need (since OpenXML documents are XML files packaged in ZIP) are the System.IO.Packaging as well as System.XML in .NET to do magic.

So one of the magic you can do is create a skeleton document which you can reuse later to pump data. This is suitable for used as document generator where mail merge is not the right solution as you might want to integrate business logic inside; and also you might want the application hosted as a server side solution running from the Web.

When it comes to surfacing tabular data onto the Word document, things get a little bit tricky. You document could have many tables inside, and there is no way in OpenXML at the moment to ID a table as in HTML. But there is another solution which will work that is wrapping the table with your custom XML Schema tag. I created one (below) with a Document and a Section element.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="SectionedDocument"
  targetNamespace="http://tempuri.org/SectionedDocument.xsd"
  elementFormDefault="qualified"
  xmlns="http://tempuri.org/SectionedDocument.xsd"
  xmlns:mstns="http://tempuri.org/SectionedDocument.xsd"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Document" type="DocumentType">
</xs:element>
<xs:complexType name="DocumentType" mixed="true">
<xs:sequence>
<xs:element name="Section" minOccurs="1" maxOccurs="unbounded">
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>

Next time you need to do is to import this document into Word. Open up Word 2007, look out for the Developer tab

image

Click on Schema, a new Templates and Add-ins window will appear

image

Click on the ‘Add Schema…’ button

image

When a new file dialog opened, look for the xsd file you just created and double click to select it.

image

A Schema Settings windows will then appear asking you to give it a Alias. It can be anything for this exercise, I’ll name mine ‘document’

image

Then click OK and the Schema Settings window will be closed. You are now back to Templates and Add-ins windows. You can see the new schema ‘document’ inside the Available XML Schemas list.

You can now click OK at the bottom of the Templates and Add-ins windows

After that you will realize there is a new custom task pane opened up at the right hand side of the Word window called XML Sctructure

image

Look at the bottom and you will see a ‘Document’ item in the Choose an element… window. That Document item is the Document XML element you created inside the .xsd just now.

image

Before we bind the XML element to the document, let’s create some table first

image

Now click on the ‘Document’ item at the lower right of the screen. You should be able to see a screen like below

image

At the Apply to entire document window, click Apply to Entire Document

image

After that you shall see a ‘Document’ tag surrounding the document; you might also notice the lower right part of the list does not have the ‘Document’ item but instead a ‘Section’ item.

image

Now highlight the table by dragging the mouse around it. You might want to have line breaks before and after the table so you can insert the ‘Section’ tag to wrap around it. (as per pic below)

image

After that click on the ‘Section’ item at the task pane over the right hand side

image

Then this is what you will see at the document, the ‘Document’ and ‘Section’ tag wrapping the table.

Now you can save and close the Word document.

Open up Visual Studio 2008. As per mentioned before you can make use of any free ZIP and XML API from any platform and language to manipulate this but I found a better and easier way using LINQ to XML and OpenXML SDK v2.0. LINQ to XML comes out of the box with VS2008 but you have to download OpenXML SDK 2.0 here.

Open XML Format SDK 2.0

Install the SDK onto your PC. the start VS2008 and create a Console Application project for this exercise.

project

Add a reference to the OpenXML SDK by right click the project file and choose ‘Add reference’

add ref

Look for DocumentFormat.OpenXml.dll in C:\Program Files\Open XML Format SDK\V2.0\lib when the Add reference window pops up

browse

After that open Module1.vb and copy and paste the codes below to replace the code inside.

Below as the code and I hope they are self explanatory. You can download my project from my SkyDrive.

Imports DocumentFormat.OpenXml.Wordprocessing
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml

Module Module1

    Sub Main()

        'Constants for the XML Schema namespace and tags
        Const WORD_PROCESSING_NS As String = _
       "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
        'replace the following with what you have in your own XSD
        Const CUSTOM_XML_SCHEMA_NS As String = "http://tempuri.org/SectionedDocument.xsd"
        Const DOCUMENT_TAG As String = "Document"
        Const SECTION_TAG As String = "Section"

        'Open a document form the file location
        'For SharePoint document, got to change that to IOStream
        Dim doc = WordprocessingDocument.Open("c:\test\table.docx", True)

        Using doc

            'open up document.xml from the zip
            Dim mainPart = doc.MainDocumentPart

            'There could be multi instances of XML Schema imported
            For Each documentXsdBlock In mainPart.Document.Body.Elements(Of CustomXmlBlock)()

                Dim uri = documentXsdBlock.GetAttribute("uri", WORD_PROCESSING_NS)
                Dim element = documentXsdBlock.GetAttribute("element", WORD_PROCESSING_NS)

                'Every XML schema can have more than one element, so must double check
                '<w:customXml w:uri="http://tempuri.org/SectionedDocument.xsd" w:element="Document">
                If uri.Value = CUSTOM_XML_SCHEMA_NS _
                   And element.Value = DOCUMENT_TAG Then

                    For Each sectionXsdBlock In documentXsdBlock.Elements(Of CustomXmlBlock)()
                        uri = sectionXsdBlock.GetAttribute("uri", WORD_PROCESSING_NS)
                        element = sectionXsdBlock.GetAttribute("element", WORD_PROCESSING_NS)

                        'Every XML schema can have more than one element, so must double check
                        '<w:customXml w:uri="http://tempuri.org/SectionedDocument.xsd" 
                        '    w:element="Section">
                        If uri.Value = CUSTOM_XML_SCHEMA_NS _
                             And element.Value = SECTION_TAG Then

                            Dim table = sectionXsdBlock.Elements(Of Table).First
                            Dim rows = table.Elements(Of TableRow)()

                            'Make a copy of the 2nd row (assumed that the 1st row is header)
                            Dim dupRow = rows(1).CloneNode(True)

                            'Insert data into duplicated node
                            'This is the place to put in a For loop if you inserting more than 1 row

                            'A table cell in Word OOXML is arranged like this
                            '<w:tc>       --> This is the TableCell object
                            '  <w:p>      --> This is the Paragraph object
                            '    <w:r>    --> This is the Run object
                            '      <w:t>  --> This is the Text object, 
                '                 inside this XML element is the content

                            For Each cell In dupRow.Elements(Of TableCell)()
                                Dim paragrahs = cell.Elements(Of Paragraph)()

                                Dim para As Paragraph

                                'Checking is need because a TableCell might not have content at all
                                ' and vice versa for Paragraph and Run
                                If paragrahs.Count < 1 Then
                                    para = New Paragraph
                                    cell.AppendChild(para)
                                Else
                                    para = paragrahs.First
                                End If

                                Dim runs = para.Elements(Of Run)()
                                Dim _run As Run
                                If runs.Count < 1 Then
                                    _run = New Run
                                    para.AppendChild(_run)
                                Else
                                    _run = runs.First
                                End If

                                Dim texts = _run.Elements(Of Text)()
                                Dim _text As Text

                                If texts.Count < 1 Then
                                    _text = New Text
                                    _run.AppendChild(_text)
                                Else
                                    _text = _run.First

                                End If

                                'You can use your business logic to insert something here
                                _text.Text = "Something"
                            Next ' End of Insert data program logic

                            table.AppendChild(dupRow)

                            'You might want to delete the first row since you already duplicate it
                            'Place this line of code with caution if you are inserting more than 1 row.
                            rows(1).Remove()
                        End If
                    Next
                End If
            Next

            doc.MainDocumentPart.Document.Save()

        End Using

    End Sub

End Module

Open Office documents in Firefox

Microsoft released a open source project to create a Firefox based plugin to open OpenXML documents created by MS Office 2007 (or other Office versions with compat pack). This is to facilitate people who don’t have Office or are using *-NIX based OS like Linux. I got the plugin installed on my PC and below is the screenshot, appearantly it translates the .docx file to .xhtml file to be opened up by FF.

ooxml_ff3

Here is the high level architecture I got from the website

Logical Architecture

The following features are planned for the first milestone:

  1. Core transformation framework
  2. Browser plug-ins for Firefox 3.0.x on Windows and Linux
  3. Word Document features including translation of
    1. font types
    2. images
    3. text styles
    4. diagrams
    5. tables
    6. hyperlinks
  4. Test Results

WordprocessingML document generator with XML Schema tagging

Word documents can be tagged with elements from XML schema and also programmatically inject value into it. With OpenXML SDK and LINQ to XML, all these are even easier! Below are the steps.

Create a schema in Visual Studio

The follow details must be unique

1. targetNamespace

2. xmlns

3. xmlns:mstns

4. name of element and complexType

clip_image002

Open up Word 2007, click on Schema

clip_image004

Click on Add Schema

clip_image006

Select the schema you created just now in file dialog

clip_image008

Enter an alias, mostly try to follow the URI

clip_image010

Click OK on the next screen

clip_image012

XML Structure taskpane will be visible

clip_image014

Add the Document into the Word file by clicking on it, you will see the tags on the Word File

clip_image016

Add a table in between the Document tags as below, key in the field name also

clip_image018

Place cursor in the cell beside the Supplier cell, then click on Supplier at the XML Structure task pane.

clip_image020

You now can see the Supplier tags

clip_image022

Do the same for the rest of the elements

clip_image024

Between each tag put a space as a placeholder, Word at the backend will create a <w:t> element

clip_image026

You can now save and close the Word document

Open up Visual Studio

Now we want to write a program to inject value into the 4 elements defined in the document above, which are:

1. Supplier

2. Phone

3. Fax

4. AttentionTo

Create a console application, name it DocGenWithXmlSchema

clip_image028

Add reference to the libraries I want to use, they are DocumentFormat.OpenXml (OpenXML SDK),

clip_image030

All the main method of program.cs, define the XMLNamespace

clip_image032

If you didn’t add the Using statement, you can use the resolve context menu to do it. (C# only)

clip_image034

Create variable to hold the filename

clip_image036

Add a using block to open the Word document

clip_image038

Instantiate the MainDocumentPart and read it into XDocument

clip_image040

Then take out all the XElements of “w:customXml” which are the XML Schema Tag we put in the Word document. Use foreach to loop thru the elements

clip_image042

Insert a switch-case block within the foreach block

clip_image044

Since I know which values in the 4 tags I want to replace, I use case to find the tagname and replace the value

clip_image046

After the foreach block, save the document

clip_image048

Novell OpenOffice developer speaks on why we need client side apps

Some may look at this interview as a Sun bashing post but I think its more on why client side apps like OpenOffice (or MS Office) is still relevant despite more companies coming out with online apps like Google Apps

Also OpenOffice.org isn’t even finished right now and rewriting all of this in HTML and Javascript would be quite difficult, the web is not a beautiful, clean development environment. It’s actually very difficult to produce something which looks like you want it to look like. And that’s by design – it’s not a fixed layout, which is good for the web but when you try to layout documents you need more precision.

Meeks also mentioned why OpenOffice does not need to follow MS Office 2007 Ribbon interface but the problem with current OpenOffice UI

The current one is using a very inflexible widget toolkit called VCL and that is really something out of the Mid-Nineties – it’s a disaster. It hasn’t been improved substantially since then. So we are doing a whole lot of work to improve the widget toolkit inside OpenOffice.org, to introduce layout and that’s being funded by Novell and driven by us.

 

Then they chat about an alternative OpenOffice version named Go-OO

There also is the gstreamer audio/video-support which is not yet upstream, lots of that nasty Microsoft Works file format support, Mono-integration, better Chinese font rendering and so on. You can go to go-oo.org/discover and check the differences out for yourself.

Not to leave out is OOXML support in Go-OO