RSS feed for blog Linkin Skype Mail Me Twitter


Inserting HTML into word documents

Document generation is one of the most useful things that any system can do for users as basically users tend to think in terms of Microsoft word and Excel documents and in my opinion the best library for producing them via Java is

This library (I am using version 2.8) allows you to build everything from scratch, but its far easier to start with a template document normally supplied by a client and just substitute the values you want

so lets do that:

First create your empty “WordprocessingMLPackage” package which is the holding object for your word document, and create a normal java File object using your template file

Then load that file into your WordprocessingMLPackage

WordprocessingMLPackage wordMLPackage;
File templatefile = new File("C:\\mytemplatefile.docx");
wordMLPackage = WordprocessingMLPackage.load(templatefile);

Ok, so now we have a word document, but its just a generic document its not personalised to our clients needs, to personalise it we first need to put placeholders in the template for our data to go in, these are just text in the word document in the format ${xxxxxxx} as you can see below


Once these are in you can use them to substitute your text

first build and populate yourself a hashmap of the values you want substituting

HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("Replace_Tex1", "This is some custome data");

As you can see the first part needs to be the placeholder name and the second part the value you want to be inserted.

Once that is ready you can do the substitution, but first you need to convert your existing WordprocessingMLPackage to XML so you have something easier to work with

String xml = XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true);

with the resultant xml and the hashmap we can use the XmlUtils.unmarshallFromTemplate function to do the swap then stuff the result back into the WordprocessingMLPackage

wordMLPackage.getMainDocumentPart().setJaxbElement((org.docx4j.wml.Document) XmlUtils.unmarshallFromTemplate(xml, mappings));

And that is us done we can just “SaveToZipFile” (basically docx files are just renamed zip files with XML inside them) and save the file to the fie system

SaveToZipFile saver = new SaveToZipFile(wordMLPackage);"C:\\finaldocument.docx");

But where you ask was the html insertion you promised, you have just shown us plain text, where is the formatted html inserts.

For that dear reader with need to add a bunch more code

There is no built in function to do that for HTML so we are going to have to:

  1. Get a list of all the object (such as paragraphs, lines of text etc etc) in the WordprocessingMLPackage
  2. Search though then to find the location of the text we want to replace (the placeholder)
  3. Remove the Place holder text.
  4. Add a wrapper object to our HTML, convert it to DOCX XML and insert it at the correct place.

for No.1 we will pinch from the fabulous Jos Dirksen’s Create complex Word documents programatically with docx4j to get our list of objects.

private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
    List<Object> result = new ArrayList<Object>();
    if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();

    if (obj.getClass().equals(toSearch))
    else if (obj instanceof ContentAccessor) {
        List<?> children = ((ContentAccessor) obj).getContent();
        for (Object child : children) {
            result.addAll(getAllElementFromObject(child, toSearch));

    return result;

That done we will use that function to go though and search each object for the placeholder text and if its found return the integer that tells us its location in the document (and while its at it, it removes the place holder text)

private int findPlaceHolder(String placeholder, WordprocessingMLPackage template) {
    int index = 0;
    List<Object> paragraphs = getAllElementFromObject(template.getMainDocumentPart(), P.class);

    P toReplace = null;
    for (Object p : paragraphs) {
        List<Object> texts = getAllElementFromObject(p, Text.class);
        for (Object t : texts) {
            Text content = (Text) t;
            if (content.getValue().equals(placeholder)) {
                toReplace = (P) p;
                index = template.getMainDocumentPart().getContent().indexOf(toReplace);
    if ( toReplace != null ) {
        //remove placeholder
    return index;

This finally gives us a way to insert the HTML at the correct place, so we do our mapping again, but this time with HTML rather than plain text

mappings.put("Replace_Tex1", "<b>This is some custom html data</b>");

Then loop though the HashMap, finding the correct location, importing the HTML and inserting the HTML into the WordprocessingMLPackage using the addall function.

Iterator iterator = mappings.entrySet().iterator();
while (iterator.hasNext()) {
    Map.Entry mapEntry = (Map.Entry);
    String xhtml = "<div>" + mapEntry.getValue().toString() + "</div>"; 
    int locationOfItem =  findPlaceHolder(mapEntry.getKey().toString(), wordMLPackage);
    if (locationOfItem > 0) {
        wordMLPackage.getMainDocumentPart().getContent().addAll(locationOfItem , XHTMLImporter.convert(xhtml, null, wordMLPackage)  );   
    xhtml = "";

Thats it you can now insert any HTML you want anywhere you want into an existing Document

You can see that I had to wrap the HTML in a “DIV” (any root element will do but DIV is easiest), you have to do this or the XHTMLImporter will fail its validation, also the smallest item the HTML will import in as is a Paragraph so you can not use this method to change text and formatting inside an existing line, you will always get a paragraph break before and after (not really a limitation given the use cases)

As always yell if anything seems off.

Leave Your Comments

blog comments powered by Disqus

Related Entries

Missing A Conference

Salesforce: Same Code Different Triggers

Remote Desktop while away

MWLug 2016 Round-Up

To Find The Perfect Office

A Little Thing Done Right

SalesForce for Domino Dogs 3: Web Query Save Agents

Presenting at MWLUG

SalesForce for Domino Dogs 2: Scheduled Agents

SalesForce for Domino Dogs 1: Profile Documents

Editable salesforce templates

New Platform Type New Client Type

Engage 2016

LDCVia Webinar

Current Android Software 2015

Salesforce read mode hide-when hack

Classic Domino and multi country dates with Bootstrap

C3 charts on Saleforce Winter16

Updating Statamic on AWS

Icon UK 2015

Quiet Isnt it

Engage 2015

Conditional Checking in AngularJs using Restangular

IBM ConnectED2015 All work and no play

Learning A Lesson About Security from other People

2014 A Year In Review

IBM Connections Dev Update to V5

W.T.F. they made me an IBM Champion and a C-API tip

Bloody Android Kit-Kat and SD Cards

LDC Via Cross Post: Why oh why oh why (Part One)

Applet security: a blast from the past

UK ICON 2014

jQuery UK 2014 Day 2

jQuery UK 2014 Day 1

London Node.js User Group April 2014

Listen while you work

SQuirrel SQL For IBM Connections

Living Document: Connections Db Schema Versions

Connections Db Schema Tip2: Finding the UserID

Working from anywhere

Connections Db Tip1: getting the Connections db Schema version

CSC Event No.1

Engage 2014

IBM Connections Dev Links

Collaboration Stack Community Agenda and Stuff

Connections aide memoire 02: Backing Up Websphere Config

Connections aide memoire 01: Changing XML config

IBM Connect 2014 Slide Decks

IBM Connect 2014 Round up

Finding Me at IBM Connect 2014

Collaboration Stack Event

Commuting tip for Developers

My IBM Connect 2014 Sessions

IBM Connect 2014 session

Best USB Cable

The Perfect Consultant

First Tuesday Club November 2013

Latest Blogs