Reading doc and docx files using C# without having MS Office installed on server

user1999722 picture user1999722 · Jan 22, 2013 · Viewed 19.1k times · Source

I'm working on a project (asp.net, c#, vb 2010, .net 4) and I need to read both DOC and DOCX files, that I've previosly uploaded (I've done uploading part). Tricky part is that I don't have MS Office installed on server and that I can't use it.

Is there any public library that I can include into my project without having to install anything? Both docs are very simple:

NUMBER TAB STRING  
NUMBER TAB STRING  
NUMBER TAB STRING  
...  

I need to extract number and string for each row (paragraph).

May someone help with this? I should repeat once again that I'm limited in a way that I can't install anything on a server.

Answer

Pavel Kudinov picture Pavel Kudinov · Jan 22, 2013

We can now use open source, NPOI (.NET port of Apache POI) library which also supports docx, xls & xlsx. DocX is also another open source library for creating word docs.

For DOCX I'd suggest Open XML API, though Microsoft developed Open XML to create office files through the XML files communicating with this API, the latest version 2.5 was released in 2013 which is 5 years ago.