Sunday, January 27, 2008

Processing Inter-dependent files using a Non-Uniform Sequential Convoy

The Problem

A re-occurring topic that I have recently come across on some of the BizTalk forums(here and here) is the ability to process interdependent files. More specifically we want the ability to process a particular file after receiving a "trigger" or "signal" file.

The use of trigger and signal files is very common in SAP and mainframe systems. A common pattern in the SAP world is to write "work" files to a "work" folder and a "signal" file to a "sig" folder. For SAP outbound files, the SAP system will write data files to the work folder, once the file has been completely written, a signal file will be written to the sig folder. This sig file gives the Middleware, or downstream system, an indication that this file is safe for processing. This is important as some files that get written from these types of systems are very large or may be written to over a period of time via batch jobs.

BizTalk supports messaging patterns that provide the ability to wait for messages to arrive before completing a business process(or orchestration). These mechanisms tend to use correlation and the use of convoys. Stephen W Thomas has written a whitepaper that dives into these topics further.

The messaging pattern that I have chosen to aid in this solving this challenge is the Non-Uniform Sequential Convoy. This pattern's mandate is to process 2 or more messages in a known order.

The challenge with this pattern, for our scenario, is that our work file is written prior to the sig file, but in terms of BizTalk processing messages we want BizTalk to only process the work file once the sig file has been written.

In order to solve this problem we are going to leverage a .Net Helper class to do some of the lifting.

Solution
I have written a proof of concept(POC) app to demonstrate the pattern. The solution is pretty light and easy to implement. It contains 2 message schemas(1 for the signal file, 1 for the work file), a property schema, an orchestration and a .Net Helper class. Note, that I have left what you do with both files once you get them into the same instance of the orchestration out of scope. At this point your requirements will determine what you need to do with both files.

The first artifact that will dive into is the Signal file. The file itself does not need to contain a lot of data. For my sample I have two elements, a timestamp and a WorkFileName. In order to use my convoy, I need to create a correlation type and correlation set. A correlation type requires a promoted property(or BizTalk System property) in order to "link" multiple messages to one running instance of an orchestration. For more information on Correlation, please see the following document.



The second artifact represents data that could be generated by an upstream system such as SAP or a Mainframe. This document is entirely fictitious so don't look too much into it. Also note that I have a promoted property called FileName that is used in my Correlation Type.

So for this example I am using FileName but you can correlate based upon any data, as long as your signal file and work file promote the same data values.




Below is a snapshot of what our orchestration looks like.

A few things to note is that Non-Uniform Sequential convoys to require that both receive shapes connect to the same receive port. This logical receive port also needs to be marked for Ordered Delivery.




For the initial receive shape we need to set a few properties. Since this is the first Receive shape we need to set the Activate property to True in order to instantiate the orchestration.

The other property that we need to populate is the Initializing Correlation Sets. In order for BizTalk to "wait" for the work file to be picked up by the same instance of the orchestration that consumed the signal file we need to initialize a correlation set. (Keep reading for more info on how to create the Correlation Set).


Prior to creating a Correlation Set, you need to create a Correlation Type. You can do this from the Orchestration View tab.

We want to create the Correlation Type based upon the element/attribute that we promoted in each of the two schemas.
We then need to create a Correlation Set, which is instantiated by the initial Receive Shape, that is based upon the Correlation Type that was just created. Correlation Sets are also configured in the Orchestration View tab.



In the Rename Work File Expression shape we are going to call a .Net Helper class that will aid in renaming work file in the source folder. This step is a critical step in the process. Since the work file is completely written before the sig file is, we cannot use the original file extension in the Receive Location file mask. Otherwise, this would prompt BizTalk to consume the work file prior to us wanting it to be consumed. By appending a temporary extension, like .BIZ, to the end of the work file name, we can be assured that BizTalk will pick the file up when we want it to. So in the Receive Location we will use a *.BIZ extension instead of a *.XML.

Note that I have hard coded the path of the source location. Since this is a POC, this is ok but this would not be a suitable solution for a production environment


In order for BizTalk to "wait" for the work file to be picked up, we need to set the Following Correlation Sets property with the same Correlation Set that we initialized in the first Receive shape. Since the rename operation occurs the step before, the "wait" time will be extremely small. The main point here is that we want to control when BizTalk picks up the work file. It is essentially the rename operation and the second receive shape/location that controls this.
Now that you have consumed both the signal file and work file, in order, you can finish up any processing that is required by your business requirements. Since this is just a POC, I output the work file to a folder.


Testing
In order to simulate how an upstream system would write the files, I drop a file into the work folder with the original extension. I then drop a sig file into the sig folder. The sig file will get picked up and BizTalk, via the .Net Helper, will rename the work file. The receive location, for work files, will then pick up the work file and the orchestration will finish processing both files.


Conclusion
Through the use of a Messaging Pattern and with the help of a .Net Helper class we can process a set of files in a known order.

24 comments:

Mouse e Teclado said...

Hello. This post is likeable, and your blog is very interesting, congratulations :-). I will add in my blogroll =). If possible gives a last there on my blog, it is about the Teclado e Mouse, I hope you enjoy. The address is http://mouse-e-teclado.blogspot.com. A hug.

Placa de Vídeo said...

Hello. This post is likeable, and your blog is very interesting, congratulations :-). I will add in my blogroll =). If possible gives a last there on my blog, it is about the Placa de Vídeo, I hope you enjoy. The address is http://placa-de-video.blogspot.com. A hug.

Saurabh Jain said...

Here both files are xml and can have a property which can be promoted, what if my work file is a pdf or a tiff file. How do i correlate my messages then

Kent Weare said...

Hi Saurabh,

You are then going to have to do some custom pipeline work to create your own promoted property. Here are a couple links that should give you some additional info:

http://geekswithblogs.net/sthomas/archive/2004/08/27/10301.aspx

http://www.sabratech.co.uk/blogs/yossidahan/2006/07/promoting-properties-through-custom.html

Regards,
Kent

kallu said...

Can anyone provide what we need to do in the Helper class.

kallu said...

This artical is very interesting, i also had same problem, I understood the concept of this artical, but following description i am not understood,

"In order for BizTalk to "wait" for the work file to be picked up, we need to set the Following Correlation Sets property with the same Correlation Set that we initialized in the first Receive shape. Since the rename operation occurs the step before, the "wait" time will be extremely small. The main point here is that we want to control when BizTalk picks up the work file. It is essentially the rename operation and the second receive shape/location that controls this."

if workfile comes instead of signal file will biztalk wait for signal file or will it pickup the WorkFile?, Please explain how i can accomplish this task, to wait till signal file comes in.
Thanks in advance.

Kent Weare said...

Hi Kallu,

Here are the contents of the helper class:

namespace FileReceiveTrigger.FileReceiveTriggerHelper

{

[Serializable()]
public class TriggerUtility
{
public static void RenameWorkFile(string path)
{
if (File.Exists(path))
{
File.Move(path,path + ".BIZ");

}

}
}
}


All that I am doing is accepting a path (which includes filename) from the orcehstration via the expression shape. I then check to see if the path is valid and then append ".BIZ" to the path and file name.

Kent Weare said...


if workfile comes instead of signal file will biztalk wait for signal file or will it pickup the WorkFile?, Please explain how i can accomplish this task, to wait till signal file comes in.
Thanks in advance.


In this scenario, it is impossible for the work file to be picked up before the sig file. You will need two receive locations:
1) The receive location for the "signal" file
2) The receive location for the "work" file. The trick here is that your file mask will include ".BIZ". You have to receive a signal file in order for the code to be called to rename the work file to include .BIZ at the end of the filename. THis filename will now meet the criterial of your work file receive location.

Otherwise the work file can sit there but until it gets renamed it will just sit in the work folder since its filename will not match the receive location's file mask.

John said...
This comment has been removed by the author.
John said...

Hi Kent. I have the same problem but my difference is that my files are placed in an ftp server. Simple renaming seems impossible... Any other ideas??
Thanks in advance.

Kent Weare said...

Hi John,

You can use the FTPWebRequest class to connect to the FTP server using .Net code. Here are a couple links that should help you out.

Kent

http://kentweare.blogspot.com/2008/04/ftpwebrequest-class-creating-remote.html

http://social.microsoft.com/Forums/en-US/netfxnetcom/thread/8c541130-b571-4b1a-9117-ac610f3e8b34

Grant Daly said...

Hi Kent

I'm new to Biztalk, and we are doing our first Biztalk project, so please excuse any daft questions. We have a scenario that is either similar or exactly the same as in the above blog. That is, receiving a trigger file that means it's okay to process a work file. These arrive in different folders. We have no control over the order files arrive. The question I have is around ordered delivery on the logical receive port. If in the above we are renaming the work file during our orchestration, at which point we've presumably already pick up the trigger message from the messagebox, how could the related renamed work file ever be in the messagebox out of order? I only ask because the documentation suggests there are performance issues in setting ordered delivery.

Regards

Grant Daly

Kent Weare said...

There definately is a cost associated with Delivery Notification. However, in scenarios where messages need to be processed in order, that requirement usually "trumps" performance requirements.

In my scenario the work file is always written first. The problem is that we don't know how long that file will take to be written. It could be milliseconds, seconds or even minutes for a large file. That is why we wait for the Sig file to be written because that is my que that the work file is ready to be processed.

In this scenario, I want the Sig and Work file to be processed in the same orchestration instance so that is why I need to use the correleation sets and set ordered delivery.

In my scenaio the Sig file always gets written after the work file. If your scenario is different in that there is no set order that the files are written, but you need to wait for both files to be written then you will want to look at a different messaging pattern. In this case, you would want to look at a Parallel Receive pattern. Check out this link for more info: http://technet.microsoft.com/en-us/library/ms942189.aspx

Grant Daly said...

Thanks for the feedback, much appreciated. Our scenario is the same as yours, but to re-phrase the question a bit...

If Biztalk can't see/pick up the work file until we've renamed it in our orchestration, and at that point we've already picked up the the related sig file in the orchestration, how can the work file ever be out of order? So, can we do away with setting ordered delivery because the renaming mechanism ensures correct order anyway.

I hasten to add. I haven't done any trails of this yet, as I wanted to understand the theory, so perhaps that's the next step.

Kent Weare said...

Hi Grant,

The order delivery flag is a constraint of the messaging pattern not something that I have added. I recognize your concern, since it is a .net helper that will rename the work file that essentially enforces the order of messages.

Please remember that this non-uniform sequential convoy pattern can be used in scenarios where multiple files are being received. The example only included receiving 1 files so your concerns are valid. Something I wanted to demonstrated was processing the SIG and WORK file in the same orcehstration instance. It sounds like you don't have that requirement so you could just use a variation of this design.

You could have one orchestration that is looking for SIG files. When it receives a SIG file it would then rename the work file on your source server. At this point the "SIG file" orchestration could end. You then have another orchestration that would be a "Work" file orchestration. (Note this could be a send port subscription too). THis way you can avoid the Ordered Delivery requirement on the Receive Port.

Hope this helps

Grant Daly said...

Thanks Kent

That's very useful. I think I'll keep the ordered delivery flag on, and maybe experiment with it off to see what happens out of curiosity.

reese said...

hi kent, your blog is very helpful.
I do a have a question. I'm receiving this kind of error" The published message could not be routed because no subscribers were found. This error occurs if the subscribing orchestration or send port has not been enlisted, or if some of the message properties necessary for subscription evaluation have not been promoted" Im getting a textfile and would use the path as a parameter in a class that i also call in biztalk. would you have any idea about it? thanks

reese said...

Your blog about shareTalk really helped me a lot. Thanks again for that :)

Kent Weare said...

Hi Reese...what this is saying is that BizTalk has received a message but is unsure what to do with it. If the message that you are receiving has a schema, ensure that it is deployed. If the schema is used in a pipeline(such as a flat file) then ensure that you have selected your pipeline in the receive location. Also ensure that if you are using an orchestration that it is started when you have your receive location started.

Also, check out this link for more ways to troubleshoot your issue: http://winterdom.com/2006/03/diagnosingroutingfailuresinbiztalkserver2006

reese said...

hi kent! thanks. actually i dont have a schema for that. I'ts like biztalk will get any kind of file. would that be possible? To receive any kind of flatfile without a schema?
So what i did just to continue with my testing. i created a schema. And now biztalk already gets the file and process it. But i receive this error:

the service instance will remain suspended until administratively resumed or terminated.
If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.
InstanceId: 43a26dfa-e95e-4d92-9361-c67e52241e61
Shape name: Expression_1
ShapeId: bd12fea0-4ccd-44a5-a287-7a9605d0b636
Exception thrown from: segment 1, progress 10
Inner exception: Could not load file or assembly 'Microsoft.Practices.EnterpriseLibrary.ExceptionHandling, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.

Exception type: FileNotFoundException

Now im trying to install the enterprise lib to see what will happen. Actually what I'm trying to do is to get the path of the file. And use that as a parameter in the class i am calling.

Any help? Thanks so much!

Gopakumar said...

Hi Kent,
Thanks for the post. I have a similar situation (as explained below).

My application will receive a "data.xml" file. Once that is received, i need to send a "%datetime%.flag" file to a send location. The send port needs to be dynamic since the file name contains a variable. After the ".flag" file is send, i need to process the input ".xml" file and then send it to the same location as "%datetime%.xml". Then I need to rename the "%datetime%.flag" file as "%datetime%.done" file. Please let me know how to achieve this.

Thanks in advance,
GK

The Integrator said...

Hey Kent,

Just a quick one…

I have a rather strange situation where I need to link three files via two diff keys….

File1
CustId
CustName
.
.
.

File2
CustId
ApplId
.
.
.

File3
ApplId
ApplicationDate
.
.
.

All three messages need to be aggregated into one inbound message.

Now all three files arrive at the same time. I need to link these three files with custId and ApplId.

Any thoughts?

Dipesh

Kent Weare said...

Dipesh,

You will need to create correlation sets so that you can consume the files and have them associated with the current orchestration. Even though you have the files being written at the same time if you can figure out a way to consume the 2nd file first then you can create correlation sets for the AppId and CustId. Perhaps a script could rename or append an extension that would allow BizTalk to consume this file first.

Bhargava said...

Hi,

In case i have to process two signal files and its related files.