3+ years ago we were asked to develop an EDI solution for a client as a matter of urgency.
They wanted full IP/control etc. of the solution and didn't want to use free open source solutions, pay large sums of money for the likes of BizTalk etc. or pay recurring fees to a VAN.
We did some research at the time and actually didn't find a lot of information regarding EDI formats, parsing etc. so our 2 man development team just jumped straight in and developed a solution in C#/ASP.Net. Due to the low number of EDI message transactions that would be taking place (100 or so a day) we adopted a RegEx process for parsing, validation and inserting into the database. This was done via a seperate C# app that was scheduled to run every few minutes and connect to the clients various providers FTP, AS2, EBMX comms and download data as well as upload any outbound EDI messages.
We then developed a web front-end that allowed the clients staff full access to the data with various revenue reports, ability to control the data as well as allow some of the clients agents to log in and also interact with the data and initiate invoice transactions too.
The client now wants some more EDI work done for another avenue of their business, however, this time the edi message transactions would leap into the 1000's. Our development teams concern is the use of RegEx. I read recently that using RegEx for EDI parsing has huge overheads and should be avoided.
The only reason we adopted it in the first place was an inexperience of not knowing what was the best to use. That said, RegEx has made managing edi message templates a breeze including validation within the templates. The client has added several more providers to their books and we were able to add the new message templates (with custom alterations) in minutes.
After much more research recently we found that most solutions parse EDI files into XML. Is there a reason for this? Is this just to adopt a more common format and/or avoid database access? Is it quicker to just parse XML over the flat file EDI messages?
We want the data elements from the EDI file to be in the database? Would we just parse the XML file instead? Isn't this just another step of processing that could be avoided?
I apologise for the generic nature of my question but I am having a hard time locating the answers.
Many thanks for your time.
NOTE: Our development team only use Microsoft products so please take this into account when giving feedback.
About 3 years ago I also created an x12 parser, that parses x12 edi into xml. It is currently available as open source at http://x12parser.codeplex.com. The reason I did it this way was that I wanted the parsing part to not care about the the target, whether it was a database or perhaps flat files. It turns out that was valuable since some of the users used Oracle instead of Sql Server, and a lot of the users flattened it into flat files to load into their database or send to some downstream process. I think this has made the parser itself very flexible for many environments. The other reason I liked XML is because I was able to add other annotations that were valuable for anyone who didn't have all the EDI codes memorized (basically everyone), and I was able to transform it to HTML (see the site for an example) with those annotations. I also built in the ability to unbundle your objects into individual messages so that your post processing can consume then one object at a time. A lot of users have helped me optimize it so that it would handle huge files, so it's gotten pretty stable. I'm doing some maintenance on it now so that it will support all 4010 transactions. The part about parsing into the database I leave up to the user, because everyone seems to be very particular about how they design data tables (for example I couldn't agree with a co-worker on whether to use ints or GUIDs for table identities, those who lean toward DBA mentality prefer ints, those who use a lot of ORMs prefer GUIDs).
Shortly after I posted this, I did add database support, so you can skip the XML and have it go directly to a SqL Server database. You can decide how many segment types will be parsed out into individual tables so that you don't bloat your database with 300 tables of which you will probably only use 10 or 20. There is a discussion here SQL Server as Staging Environment about pros and cons of using xml or sql server as your intermediary to your final system.