How to convert an EDIFACT Document to XML.

In this article, I am going to show an approach to parse an EDIFACT Document to XML without any cost. Many expensive tools are available on the market to convert EDIFACT and EDI . I have evaluated software like Stylus Studio, ETASoft etc, they meet my requirements but they are not at all free!!!

I use smooks which is is a Java Framework/Engine for processing XML and non XML data (CSV, EDI, Java, JSON etc) together with Talend Open Studio

 

introduction1[1]

 

I wanted to have something free that can parse a custom EDI.  My objective was to parse an IFLIRR EDIFACT File from Amadeus which is part of the Altea Inventory System, an extended version of Flight Reservation, FLIRES.

 

What Components do you need?

So let’s start with the tools you will need

 

1.  Smooks

Java provides an extensible open source framework called Smooks for building applications for processing XML and Non-XML data such as CSV, EDI. You can do a lot with Smooks but for the purpose of this tutorial, we will stick to EDI to XML Conversion.

 

2. Now, to develop with Smooks, one need to have a good Java IDE. Many open source Java IDE are available but here we will use Talend Open Studio. Talend Open Studio is an open source studio ETL Tool for business integration. It supports the process of moving, transforming data across information systems. Talend Open Studio, code named as TOS is as powerful as Powercenter Informatica or Microsoft SSIS.

 

3. To integrate Smooks with Talend Open Studio, SOPERA has introduced Sopera Data Integration Smooks Components which is also free.

Download Talend Open Studio , tSmooks and tSmooksInput. That’s all we will need to parse an EDIFACT to XML

 

Installation

To install TOS, Double click the executable. Once installation is complete, you will need to register. Enter your email address and country etc.

 

1. In the Proxy Parameters area, select the checkbox if relevant and fill in the following proxy fields. Click Validate to access Talend Open Studio. 

 

2. You will need to create a local repository to store the mapping and developments you will do on TOS. The first time you open TOS, you need to set up a new project or import an existing project. Create a new project and exit TOS. We are going to configure TOS for tSmooks Components.

 

3. Extract tSmooks and tSmooksInput you downloaded earlier and follow the instructions in their readme file to configure TOS

 

4. Open TOS and create a new project or open the one you have created before. Choose Java as Generation Language

 

5. To start with, you need to create new Job.

 

6. To see if TOS has loaded the tSmooks components, open the palette window and locate tSmooks and tSmooksInput. If you can see tSmooks and tSmooksInput, this means the libraries have been loaded successfully in Talend

 

Configuration

Now, you have all the tools required to convert the EDIFACT into XML. With TOS, you can even parse the XML and save the data to a database

 

The trickiest part is to configure Smooks Tool. For this, you will need to write a mapping configuration file known as Smooks Mapping.

 

The Mapping Configuration will in fact do a mapping between your EDIFACT Segments, Fields and Components to an XML Element. Writing a configuration file is very easy once you understand the various elements in an EDIFACT document.

 

EDIFACT has a hierarchical structure where the top level is referred to as Interchange and the lower level consists of multiple messages called segments which in turn consists of components.

image

An example of an EDIFACT Message to answer product availability request is shown below

UNA:+.? '

 

UNB+IATB:1+6XPPC+LHPPC+940101:0950+1'

UNH+1+PAORES:93:1:IA'

MSG+1:45'

IFT+3+XYZCOMPANY AVAILABILITY'

ERC+A7V:1:AMD'

IFT+3+NO MORE FLIGHTS'

ODI'

TVL+240493:1000::1220+FRA+JFK+DL+400+C'

PDI++C:3+Y::3+F::1'

APD+74C:0:::6++++++6X'

TVL+240493:1740::2030+JFK+MIA+DL+081+C'

PDI++C:4'

APD+EM2:0:1630::6+++++++DA'

UNT+13+1'

UNZ+1+1'

 

The UNA-segment describes the segment terminator, data element separator, component data element separator and release character as follows:

  • ' is a segment terminator

  • + is a data element separator

  • : is a component data element separator

  • ? is a release character

Note: The line breaks after each segment in this example have been added for readability. There are typically no line breaks in EDI data.

UNH+1+PAORES:93:1:IA'- This is the header segment which is required at the start of every message

UNT+13+1' - This is the tail segment. It indicated that the message sent contains 13 segments.

Smooks have a set of XML elements that map to Segments, Segment Groups Fields and Components of EDIFACT. They are :

medi:segments

medi:segment

medi:segmentGroup

medi:field

and medi:Components.

Building the mapping document

1. Construct the XML Header part, description. Keep it as it is here

<?xml version="1.0" encoding="UTF-8"?>

<medi:edimap xmlns:medi="http://www.milyn.org/schema/edi-message-mapping-1.1.xsd">

<medi:description name="MYSMOOKSXML" version="1.0" />

2. Indicate to Smooks what delimiters to be used. Here Segment is represented by ’, field by “+” and component by :. Oftentimes, you will see some edifact with sub-components as well

<medi:delimiters segment=”’" field="+" component=":" sub-component="*" />

3. Once you have defined your delimiters, you have to specify your segments, segment, fields and components. That’s become easy for the other elements

<medi:segments xmltag="MYSMOOKSXML">

<medi:segment xmltag="UNB" segcode="UNB" truncatable="true" minOccurs="0">

<medi:field xmltag="identifier" truncatable="true">

<medi:component xmltag="syntaxidentifier" />

<medi:component xmltag="version" />

<medi:component xmltag="serviceCode" />

<medi:component xmltag="characterEncoding" />

</medi:field>

Let us see an Example

Suppose you want to parse Edifact below in XML

HDR*1*0*59.97*64.92*4.95*Wed Nov 15 13:45:28 EST 2006
CUS*user1*Harry^Fletcher*SD
ORD*1*1*364*The 40-Year-Old Virgin*29.98
ORD*2*1*299*Pulp Fiction*29.99

Basically you want something that looks like
<Order>
<header>
<order-id>1</order-id>
<status-code>0</status-code>
<net-amount>59.97</net-amount>
<total-amount>64.92</total-amount>
<tax>4.95</tax>
<date>Wed Nov 15 13:45:28 EST 2006</date>
</header>
<customer-details>
<username>user1</username>
<name>
<firstname>Smooks Tutorial</firstname>
<lastname>Free Edifact Tool</lastname>
</name>
<state>SD</state>
</customer-details>
<order-item>
<position>1</position>
<quantity>1</quantity>
<product-id>364</product-id>
<title>This is a free tutorial to convert Edi</title>
<price>29.98</price>
</order-item>
<order-item>
<position>2</position>
<quantity>1</quantity>
<product-id>299</product-id>
<title>Pulp Fiction</title>
<price>29.99</price>
</order-item>
</Order>
 
The smooks mapping you will build will be as follows <?xml version="1.0" encoding="UTF-8"?>
<medi:edimap xmlns:medi="http://www.milyn.org/schema/edi-message-mapping-1.0.xsd">
 
<medi:description name="DVD Order" version="1.0" />
 
<medi:delimiters segment="&#10;" field="*" component="^" sub-component="~" />
 
<medi:segments xmltag="Order">
 
<medi:segment segcode="HDR" xmltag="header">
<medi:field xmltag="order-id" />
<medi:field xmltag="status-code" />
<medi:field xmltag="net-amount" />
<medi:field xmltag="total-amount" />
<medi:field xmltag="tax" />
<medi:field xmltag="date" />
</medi:segment>
 
<medi:segment segcode="CUS" xmltag="customer-details">
<medi:field xmltag="username" />
<medi:field xmltag="name">
<medi:component xmltag="firstname" />
<medi:component xmltag="lastname" />
</medi:field>
<medi:field xmltag="state" />
</medi:segment>
 
<medi:segment segcode="ORD" xmltag="order-item" maxOccurs="-1">
<medi:field xmltag="position" />
<medi:field xmltag="quantity" />
<medi:field xmltag="product-id" />
<medi:field xmltag="title" />
<medi:field xmltag="price" />
</medi:segment>
 
</medi:segments>
 
</medi:edimap>
The following illustrations attempts to visually describe the mapping that takes place: Edi-mapping[1]

Xmltag is on what XML element will this element mapped to in your output XML file. Truncable is whether to ignore the segment if one does not exist in the file and minoccurs/max occurs to indicate whether the field is mandatory or not and the number of occurrence. 

Once your Smooks XML mapping is ready, you are almost done. You have to tell Talend to run your Smooks based on your input EDIFACT file and Smooks Config File.

Open Talend and drag a smooks component on it and configure it as shown below

clip_image002

Create another XML file and named it as smooks-config.xml and put the following in it

<?xml version="1.0"?>

<smooks-resource-list

xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"

xmlns:edi="http://www.milyn.org/xsd/smooks/edi-1.1.xsd">

<!--

Configure the EDI Reader to process the message stream into a stream of SAX events.

-->

<edi:reader mappingModel="!!!smooks_mapping!!!" />

</smooks-resource-list>

You are basically indicating to smooks the mapping model to use. i.e !!!smooks_mapping!!!

In the smooks properties, input the following

Input File path of your EDI File
Configuration File path of smooks-config file you created above
Output File output of your xml file
Parameter Name !!!smooks_mapping!!!
File Name file of your smooks mapping
Will Use in Config Checked

Run the job and you should get an XML representation of the EDIFACT file.

To conclude, with the smooks tools, you can practically parse any EDI Document to XML. This is the simplest method to parse and EDI with Talend. You can use Talend components to achieve a more complex solution

Download the smooks components and get started now without any cost

 

Contact me on Fiverr  https://www.fiverr.com/s2/6cd6c52d7a for assistance

Comments

  1. Hi Thanks for this nice article. Is it possible to convert EDI 834 file to XML using the Smooks components in TOS?

    ReplyDelete
  2. Yes. You will to build up the smooks mapping file defining the components, segments, etc in the EDI file.. For that you will need to know to the EDI Specification. Once done, Smooks will take care of the rest..

    ReplyDelete
  3. Thanks Vishal for the reply. Yes, I'm able to process 834 file this way with tSmooks component. The problem I'm currently facing is having multiple occurrences of one or more segment(s) in unordered fashion in the 834 file. The schema (http://www.milyn.org/schema/edi-message-mapping-1.2.xsd) is not allowing it. Is there any workaround for that?

    I would appreciate any insight. Thank you.

    ReplyDelete
  4. Hey, thanks for the article. But, I was looking for some information on Informatica Read Json. Do you have something on that?

    ReplyDelete

Post a Comment

Thank you for your comments

Popular Posts