Automated Document Generation in Microsoft Word

At AIS, we are often asked by our customers to put together a quick prototype very early in the envisioning phase of a project. The main objective is to determine if the proposed set of technologies will address the key requirements. Having a “working” piece of software this early (despite all the scaffolding needed to make the prototype work) helps the stakeholders make a decision whether to go with a certain technology set or not.  This is especially true if a number of competing solutions are being considered.

In the following video, we talk about one such prototype that we put together quickly for a customer of ours. The requirements are typical of a large-scale document (correspondence) generation system: large-scale generation of documents, ability to author dozens of templates, ability to generate documents by binding the templates to data from business systems, ability to support multiple document formats and ability to create workflows to support the business processes.

Here we describe a solution for automated document generation using the Microsoft Office system. Combining out-of-the-box functionality like Content Controls and Open Office XML SDK with a little customization to your business rules, you can automate template creation, document generation, document conversion and (using SharePoint) allow for Web-based document management.

Read on for more about this solution…

Content Controls

Starting with Office 2007, Microsoft introduced the notion of Content Controls (a big improvement over the Bookmark Control of Office 2003). Content Controls allow a document template to be created using pre-defined pieces of content. These include text blocks, drop-down menus, combo boxes, calendar controls, etc.

Content Controls can be bound to XML elements, effectively providing the ability to define the document template using a “semantic markup.” The screenshot pasted below further illustrates this concept. Within the pane on the right, is a XML tree structure (this could be the schema returned by the underlying Policy service). It is possible to map elements of the XML tree to the content controls contained within the document.

Once the mapping is in place, new documents are then programmatically generated as copies of the template document. When the new document is opened, the placeholder content controls in the document are populated with business data from the underlying data feed.

Document Generation 

Our recommended approach would be to provide a clean separation between template creation and document generation using a well-defined REST based API.  This API would support both interactive and batch-oriented document generation.  Since the API will be self-describing using the REST/Hypermedia pattern, we expect that it can be callable from any existing or new service applications.

So what do we mean by “self-descriptive” API?  In addition to providing methods to generate documents by passing in the required data, the API will also provide a machine readable description of all the parameters needed by a given template. The benefit of this approach is that service clients don’t need hard-coded template details. Additionally, the templates can be changed without impacting the clients.

Under the covers, this API will be dependent on two Office technologies:

1) Open Office XML SDK: This SDK allows XML mapping to be replaced by data supplied by underlying data feeds.

2) Word Automation Service: This service is part of SharePoint and offers a highly scalable, server-side approach for converting Microsoft Word documents into other formats such as PDF.

The REST API will abstract the aforementioned implementation details from the service client. For instance, the service client will simply call a document generation method that takes template ID and the necessary parameters (self-described, as indicated earlier).  Upon successful completion of the method, a URL of the generated document will be returned to the client. The URL could be pointing to a generated document resource located within a database, SharePoint document library or other storage, as needed. The service client application can then render the generated document within the service client as needed. While this example describes the interactive scenario, the batch scenario would work similarly.

Web-Based Editing of Generated Documents

Generated documents can be edited within the browser using Microsoft Office Web Apps, an online companion to Office Word. While light-weight editing is possible, there are a number of scenarios not supported by Microsoft Office Web App:

  • Editing documents and using track changes to mark revisions.
  • Editing Word objects, such as content controls.
  • Using macros in Word, Excel and PowerPoint documents.

I would like to thank Sandeep Nahta from AIS for his help with building this prototype.

About Vishwas Lele

Vishwas Lele serves as Chief Technology Officer at Applied Information Sciences, Inc. Mr. Lele is responsible for assisting organizations in envisioning, designing, and implementing enterprise solutions. Mr. Lele brings close to 24 years of experience and thought leadership to his position, and has been at AIS for 18 years. A noted industry speaker and author, Mr. Lele serves as Microsoft Regional Director for the Washington, D.C. area and is a member of Windows Azure Insiders group. Additionally, Mr. Lele received an MVP (Most Valuable Professional) for Solution Architecture.