Inserting HTML into word documents

Document generation is one of the most useful things that any system can do for users as basically users tend to think in terms of Microsoft word and Excel documents and in my opinion the best library for producing them via Java is http://www.docx4java.org/

This library (I am using version 2.8) allows you to build everything from scratch, but its far easier to start with a template document normally supplied by a client and just substitute the values you want

so lets do that:

First create your empty “WordprocessingMLPackage” package which is the holding object for your word document, and create a normal java File object using your template file

Then load that file into your WordprocessingMLPackage

WordprocessingMLPackage wordMLPackage;
File templatefile = new File("C:\mytemplatefile.docx");
wordMLPackage = WordprocessingMLPackage.load(templatefile);

Ok, so now we have a word document, but its just a generic document its not personalised to our clients needs, to personalise it we first need to put placeholders in the template for our data to go in, these are just text in the word document in the format ${xxxxxxx} as you can see below

docx4j1.png

Once these are in you can use them to substitute your text
first build and populate yourself a hashmap of the values you want substituting

HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("Replace_Tex1", "This is some custome data");

As you can see the first part needs to be the placeholder name and the second part the value you want to be inserted.
Once that is ready you can do the substitution, but first you need to convert your existing WordprocessingMLPackage to XML so you have something easier to work with

String xml = XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true);

with the resultant xml and the hashmap we can use the XmlUtils.unmarshallFromTemplate function to do the swap then stuff the result back into the WordprocessingMLPackage

wordMLPackage.getMainDocumentPart().setJaxbElement((org.docx4j.wml.Document) XmlUtils.unmarshallFromTemplate(xml, mappings));

And that is us done we can just “SaveToZipFile” (basically docx files are just renamed zip files with XML inside them) and save the file to the fie system

SaveToZipFile saver = new SaveToZipFile(wordMLPackage);
saver.save("C:\finaldocument.docx");

But where you ask was the html insertion you promised, you have just shown us plain text, where is the formatted html inserts.
For that dear reader with need to add a bunch more code
There is no built in function to do that for HTML so we are going to have to:
1. Get a list of all the object (such as paragraphs, lines of text etc etc) in the WordprocessingMLPackage
2. Search though then to find the location of the text we want to replace (the placeholder)
3. Remove the Place holder text.
4. Add a wrapper object to our HTML, convert it to DOCX XML and insert it at the correct place.
for No.1 we will pinch from the fabulous Jos Dirksen’s [Create complex Word documents programatically with docx4j](http://www.smartjava.org/content/create-complex-word-docx-documents-programatically-docx4j) to get our list of objects.

private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
    List<Object> result = new ArrayList<Object>();
    if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();
    if (obj.getClass().equals(toSearch))
        result.add(obj);
    else if (obj instanceof ContentAccessor) {
        List<?> children = ((ContentAccessor) obj).getContent();
        for (Object child : children) {
            result.addAll(getAllElementFromObject(child, toSearch));
        }
    }
    return result;
}

That done we will use that function to go though and search each object for the placeholder text and if its found return the integer that tells us its location in the document (and while its at it, it removes the place holder text)

private int findPlaceHolder(String placeholder, WordprocessingMLPackage template) {
    int index = 0;
    List<Object> paragraphs = getAllElementFromObject(template.getMainDocumentPart(), P.class);
    P toReplace = null;
    for (Object p : paragraphs) {
        List<Object> texts = getAllElementFromObject(p, Text.class);
        for (Object t : texts) {
            Text content = (Text) t;
            if (content.getValue().equals(placeholder)) {
                toReplace = (P) p;
                index = template.getMainDocumentPart().getContent().indexOf(toReplace);
                break;
            }
        }
    }
    if ( toReplace != null ) {
        //remove placeholder
        ((ContentAccessor)toReplace.getParent()).getContent().remove(toReplace);    
    }
    return index;
}

This finally gives us a way to insert the HTML at the correct place, so we do our mapping again, but this time with HTML rather than plain text

mappings.put("Replace_Tex1", "<b>This is some custom html data</b>");

Then loop though the HashMap, finding the correct location, importing the HTML and inserting the HTML into the WordprocessingMLPackage using the addall function.

Iterator iterator = mappings.entrySet().iterator();
while (iterator.hasNext()) {
    Map.Entry mapEntry = (Map.Entry) iterator.next();
    String xhtml = "<div>" + mapEntry.getValue().toString() + "</div>"; 
    int locationOfItem =  findPlaceHolder(mapEntry.getKey().toString(), wordMLPackage);
    if (locationOfItem > 0) {
        wordMLPackage.getMainDocumentPart().getContent().addAll(locationOfItem , XHTMLImporter.convert(xhtml, null, wordMLPackage)  );   
    }
    xhtml = "";
}

Thats it you can now insert any HTML you want anywhere you want into an existing Document
You can see that I had to wrap the HTML in a “DIV” (any root element will do but DIV is easiest), you have to do this or the XHTMLImporter will fail its validation, also the smallest item the HTML will import in as is a Paragraph so you can not use this method to change text and formatting inside an existing line, you will always get a paragraph break before and after (not really a limitation given the use cases)
As always yell if anything seems off.

Commuting tip for Developers

Something that I’m getting used to again with the freelancing is venturing back onto public transport and although I don’t mind it, its not exactly billable, there is only so much email and such you can sort, also as they are not long train trips (30 mins between connections) its not worth getting out your laptop even if there was space, so what to do??

Well write java docs and do code reviews actually,

I’m not kidding, all that documentation and stuff you are supposed to do but never do because it bores the hell out of you and you never have time for, is actually less boring when the alternative is staring at the sweating armpit in front of you, also you would be amazed how much clients love it when your code has proper java docs for all its functions (not to mention that they are a major deliverable in a big project Matt White and I are doing)

The next question is how?

On android at least that that is easy thanks to a little program called AIDE which is basically sold as a android dev IDE, but it has excellent GIT and Dropbox integration, meaning that we can just do a simple git clone of your app, then potter though it when you are travelling, updating the documentation to be just as you would want to find it***

In addition I have found and fixed a couple of scaling issues in my code thanks to being able to view it at leisure, so worth it just for that.

 

 

*It shows a commitment to a clients needs and there is no substitution for some face to face time to keep deliverables in focus

**In fact its not billable at all as I have learnt a important lesson from one of my client’s dealing with their own clients, that NOT charging for travel is excellent for achieving long term client relationships

***Yes I know you are supposed to document as you are coding, but if you tell me you document fully in the middle of a coding frenzy, then I will tell you you are a lying bugger. 🙂

GIT Xpages Issue

I Little devil of an issue plagued me the other day, which I thought I would share

Scenario:

  1. You clone a git repository off a xpages app (I always use the git command shell).
  2. Then as normal in the package manager you open the local git clone as a “project”
  3. You right click on the project and associate it with a new nsf.
  4. This nsf you locate on your local machine.

Error:
you get tons of xPage and Custom control errors stating that you cant use this custom control even though the controls are obviously there and fine, editing and saving each custom control fixes the problem, but that means you are committing tons more than you need to, and other people using the same repository don’t have the issue.

Reason:
The security on the NSF/Git clone you are associating with, has “enforce Local ACL” & Default does not have manager rights or above etc etc,
Basically you don’t have rights to the db you have just created, for some reason this does not show up till you close the database and re-open

Solution:
Associate it to a new db on a server / get another contributor to change the ACL on the GIT hub version, you can change the security once the local db has been build but that has a tendency to touch all the custom controls, which means your git commit is huge and pointless

Best USB Cable

The years of dealing with hundreds of different cables is past, and now nearly everything is USB port Type-A to Micro-B which is USB2 speed and also works on USB3 devices, this is mainly due to the common EPS agreement and ignoring bloody Apple means you can use the same cable for everything from charging batteries to data transfer.

But not all cables are created equal as we have all discovered when we open something shiny and beautiful and discover at the bottom of the packaging a stunted inflexible thing that looks as if it came out of a Christmas cracker. Now there are lots of posh cables out there costing a sodding fortune, flat ones, retractable ones, gold plugs etc etc. but really all I want from a good cable is:

  • Small heads so it fits all devices (even if you have a case on the device)
  • Able to take 10Watt (5v x 2amp) load safely
  • Flexible with a good metal shielding inside against interference
  • USB2 speed and a good low resistance (some of the tablets can be really picky about this)
  • Ideally a repeatable order so they are all the same size (Its nice and neat if they are all the same)

Turns out that reliable cables with these specs have been kicking around forever and you can still get them for about £2, its the old Nokia ones back when Nokia was king of the Hill, to search for it you want to look for CA-101 a 120cm (47inch) cable with its own attached tidy clip and the CA-101D (a 19cm (7.5inch) version with no clip) these cables have served me well and consistently on a variety of devices when other more expensive ones have let me down, recommended

The reason I do not suggest the newer Nokia CA-179, is that I have heard that it wont charge at the the higher amps i.e. more than 500ma

Oh, all the ones I have ordered have been a bit stiff then they arrived but have soon softened up with a but of use

 

 

 

 

Content Disposition Typo

While doing a quick patch for XLS generation (caused by Microsoft’s security change for Excel 2007+) in which I swapped a clients code from generated HTML to using the POI lib I ran into this fun little browser discrepancy :

If I want to name the file I’m generating and returning to the browser for download, I just set the “Content-Disposition” header thus:

var fileName = workbookName +".xls";
pageResponse.setHeader("Content-Disposition","inline; filename=" + fileName);

Works just fine on Chrome/Safari and even the dreaded IE, but is completely ignored by firefox

<scratch scratch> …. <shrug> if in doubt use more quote marks

var fileName = workbookName +".xls";
pageResponse.setHeader("Content-Disposition","Attachment; filename="" + fileName + """);

That’s better works everywhere now 🙂
(Yes I am aware that the quoted version was the one I should have used in the first place as per [w3 Protocols](http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html) but, if the number of examples that don’t have them in is any judge then I was not alone and browser consistency would still be a nice thing)