How to convert an EDIFACT Document to XML.
In this article, I am going to show an approach to parse an EDIFACT Document to XML without any cost. Many expensive tools are available on the market to convert EDIFACT and EDI . I have evaluated software like Stylus Studio, ETASoft etc, they meet my requirements but they are not at all free!!!
I use smooks which is is a Java Framework/Engine for processing XML and non XML data (CSV, EDI, Java, JSON etc) together with Talend Open Studio
I wanted to have something free that can parse a custom EDI. My objective was to parse an IFLIRR EDIFACT File from Amadeus which is part of the Altea Inventory System, an extended version of Flight Reservation, FLIRES.
What Components do you need?
So let’s start with the tools you will need
Java provides an extensible open source framework called Smooks for building applications for processing XML and Non-XML data such as CSV, EDI. You can do a lot with Smooks but for the purpose of this tutorial, we will stick to EDI to XML Conversion.
2. Now, to develop with Smooks, one need to have a good Java IDE. Many open source Java IDE are available but here we will use Talend Open Studio. Talend Open Studio is an open source studio ETL Tool for business integration. It supports the process of moving, transforming data across information systems. Talend Open Studio, code named as TOS is as powerful as Powercenter Informatica or Microsoft SSIS.
3. To integrate Smooks with Talend Open Studio, SOPERA has introduced Sopera Data Integration Smooks Components which is also free.
Download Talend Open Studio , tSmooks and tSmooksInput. That’s all we will need to parse an EDIFACT to XML
To install TOS, Double click the executable. Once installation is complete, you will need to register. Enter your email address and country etc.
1. In the Proxy Parameters area, select the checkbox if relevant and fill in the following proxy fields. Click Validate to access Talend Open Studio.
2. You will need to create a local repository to store the mapping and developments you will do on TOS. The first time you open TOS, you need to set up a new project or import an existing project. Create a new project and exit TOS. We are going to configure TOS for tSmooks Components.
3. Extract tSmooks and tSmooksInput you downloaded earlier and follow the instructions in their readme file to configure TOS
4. Open TOS and create a new project or open the one you have created before. Choose Java as Generation Language
5. To start with, you need to create new Job.
6. To see if TOS has loaded the tSmooks components, open the palette window and locate tSmooks and tSmooksInput. If you can see tSmooks and tSmooksInput, this means the libraries have been loaded successfully in Talend
Now, you have all the tools required to convert the EDIFACT into XML. With TOS, you can even parse the XML and save the data to a database
The trickiest part is to configure Smooks Tool. For this, you will need to write a mapping configuration file known as Smooks Mapping.
The Mapping Configuration will in fact do a mapping between your EDIFACT Segments, Fields and Components to an XML Element. Writing a configuration file is very easy once you understand the various elements in an EDIFACT document.
EDIFACT has a hierarchical structure where the top level is referred to as Interchange and the lower level consists of multiple messages called segments which in turn consists of components.
An example of an EDIFACT Message to answer product availability request is shown below
IFT+3+NO MORE FLIGHTS'
The UNA-segment describes the segment terminator, data element separator, component data element separator and release character as follows:
' is a segment terminator
+ is a data element separator
: is a component data element separator
? is a release character
Note: The line breaks after each segment in this example have been added for readability. There are typically no line breaks in EDI data.
UNH+1+PAORES:93:1:IA'- This is the header segment which is required at the start of every message
UNT+13+1' - This is the tail segment. It indicated that the message sent contains 13 segments.
Smooks have a set of XML elements that map to Segments, Segment Groups Fields and Components of EDIFACT. They are :
Building the mapping document
1. Construct the XML Header part, description. Keep it as it is here
<?xml version="1.0" encoding="UTF-8"?>
<medi:description name="MYSMOOKSXML" version="1.0" />
2. Indicate to Smooks what delimiters to be used. Here Segment is represented by ’, field by “+” and component by :. Oftentimes, you will see some edifact with sub-components as well
<medi:delimiters segment=”’" field="+" component=":" sub-component="*" />
3. Once you have defined your delimiters, you have to specify your segments, segment, fields and components. That’s become easy for the other elements
<medi:segment xmltag="UNB" segcode="UNB" truncatable="true" minOccurs="0">
<medi:field xmltag="identifier" truncatable="true">
<medi:component xmltag="syntaxidentifier" />
<medi:component xmltag="version" />
<medi:component xmltag="serviceCode" />
<medi:component xmltag="characterEncoding" />
Let us see an Example
Suppose you want to parse Edifact below in XMLHDR*1*0*59.97*64.92*4.95*Wed Nov 15 13:45:28 EST 2006
ORD*1*1*364*The 40-Year-Old Virgin*29.98
Basically you want something that looks like
<Order>The smooks mapping you will build will be as follows <?xml version="1.0" encoding="UTF-8"?>
<date>Wed Nov 15 13:45:28 EST 2006</date>
<lastname>Free Edifact Tool</lastname>
<title>This is a free tutorial to convert Edi</title>
<medi:description name="DVD Order" version="1.0" />
<medi:delimiters segment=" " field="*" component="^" sub-component="~" />
<medi:segment segcode="HDR" xmltag="header">
<medi:field xmltag="order-id" />
<medi:field xmltag="status-code" />
<medi:field xmltag="net-amount" />
<medi:field xmltag="total-amount" />
<medi:field xmltag="tax" />
<medi:field xmltag="date" />
<medi:segment segcode="CUS" xmltag="customer-details">
<medi:field xmltag="username" />
<medi:component xmltag="firstname" />
<medi:component xmltag="lastname" />
<medi:field xmltag="state" />
<medi:segment segcode="ORD" xmltag="order-item" maxOccurs="-1">
<medi:field xmltag="position" />
<medi:field xmltag="quantity" />
<medi:field xmltag="product-id" />
<medi:field xmltag="title" />
<medi:field xmltag="price" />
</medi:edimap> The following illustrations attempts to visually describe the mapping that takes place:
Xmltag is on what XML element will this element mapped to in your output XML file. Truncable is whether to ignore the segment if one does not exist in the file and minoccurs/max occurs to indicate whether the field is mandatory or not and the number of occurrence.
Once your Smooks XML mapping is ready, you are almost done. You have to tell Talend to run your Smooks based on your input EDIFACT file and Smooks Config File.
Open Talend and drag a smooks component on it and configure it as shown below
Create another XML file and named it as smooks-config.xml and put the following in it
Configure the EDI Reader to process the message stream into a stream of SAX events.
<edi:reader mappingModel="!!!smooks_mapping!!!" />
You are basically indicating to smooks the mapping model to use. i.e !!!smooks_mapping!!!
In the smooks properties, input the following
|Input File||path of your EDI File|
|Configuration File||path of smooks-config file you created above|
|Output File||output of your xml file|
|File Name||file of your smooks mapping|
|Will Use in Config||Checked|