docx4j find and replace

luckyi picture luckyi · Oct 30, 2013 · Viewed 15.3k times · Source

I have docx document with some placeholders. Now I should replace them with other content and save new docx document. I started with docx4j and found this method:

public static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
    List<Object> result = new ArrayList<Object>();
    if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();

    if (obj.getClass().equals(toSearch))
        result.add(obj);
    else if (obj instanceof ContentAccessor) {
        List<?> children = ((ContentAccessor) obj).getContent();
        for (Object child : children) {
            result.addAll(getAllElementFromObject(child, toSearch));
        }
    }
    return result;
}

public static void findAndReplace(WordprocessingMLPackage doc, String toFind, String replacer){
    List<Object> paragraphs = getAllElementFromObject(doc.getMainDocumentPart(), P.class);
    for(Object par : paragraphs){
        P p = (P) par;
        List<Object> texts = getAllElementFromObject(p, Text.class);
        for(Object text : texts){
            Text t = (Text)text;
            if(t.getValue().contains(toFind)){
                t.setValue(t.getValue().replace(toFind, replacer));
            }
        }
    }
}

But that only work rarely because usually the placeholders splits across multiple texts runs.

I tried UnmarshallFromTemplate but it work rarely too.

How this problem could be solved?

Answer

wal picture wal · Jun 21, 2015

You can use VariableReplace to achieve this which may not have existed at the time of the other answers. This does not do a find/replace per se but works on placeholders eg ${myField}

java.util.HashMap mappings = new java.util.HashMap();
VariablePrepare.prepare(wordMLPackage);//see notes
mappings.put("myField", "foo");
wordMLPackage.getMainDocumentPart().variableReplace(mappings);

Note that you do not pass ${myField} as the field name; rather pass the unescaped field name myField - This is rather inflexible in that as it currently stands your placeholders must be of the format ${xyz} whereas if you could pass in anything then you could use it for any find/replace. The ability to use this also exists for C# people in docx4j.NET

See here for more info on VariableReplace or here for VariablePrepare