Resuming your ETL process in OWB

If your ETL process fails, for whatever reason, you have two options: to restart the process from the beginning, or to resume from the last successfully completed task.

<h3>Restarting from beginning</h3>


<p>It is reasonable easy to restart the process in <span class="caps">OWB</span>, you would have to wait until your top level process flow has completed and then if your target tables have loading type of <span class="caps">TRUNCATE</span>/INSERT, or you use the WB_TRUNCATE public transformation to achieve the same effect, then you can restart.  This restart is however from the beginning of the process flow and so may involve you re-running hours of <span class="caps">ETL</span> that has already ran correctly. If the duration of your <span class="caps">ETL</span> process is under an hour this could be acceptable, however if your <span class="caps">ETL</span> process is more like 6 hours, and only has an 8 hour window this is more of a problem.</p>


<h3>Resuming the process</h3>


<p><span class="caps">OWB</span> has no built in functionality to resume process flows.  One approach we have used is to use a transformation to test whether a mapping has already executed for a run before executing it. This is done in a process flow by using a transformation to test whether the mapping should be executed. Hence each mapping needs the following activities:</p>


<ul>
<li>A Transformation to test the mapping</li>
	<li>The mapping itself</li>
	<li>An OR activity to collect the results</li>
</ul>


<p>Each mapping looks like the one below.</p>

<p><img src="/blog/content/images/2016/05/0337pf-activities1.png" title="PF_Activities" alt="PF_Activities" /></p>


<p>The result of the transformation should be tested in the transition and the required execution path chosen.  In the example above, the transformation returns 0 (zero) if the mapping is to be run, or any positive integer if it is not.  Any other value is treated as an error and will cause the process flow to fail.  This can be implemented by using a complex condition in the transition.</p>


<p><img src="/blog/content/images/2016/05/0338transition-properties1.png" title="Transition Properties" alt="Transition Properties" /></p>


<p>The OR activity is then required to allow execution to continue, whichever execution path reaches it first will cause execution to continue, as the transformation uses mutually exclusive conditions this will mean in reality that only one execution path can reach it.</p>


<h3>Controlling the execution with an ID</h3>


<p>In the example above the transformation can only determine if the the mapping has already run by supplementing the logging that is provided by <span class="caps">OWB</span>.  The transformation must be aware of the logical execution that is being run.  In a real world example the logical run would relate to one data load into the Data Warehouse.  The execution ID identifies a logical run and is set at the beginning of a run and can be passed between each process flow, sub process flow and/or mapping as a parameter.</p>


<p>In the example above the each mapping has been configured to log to a table its start time, the execution ID and, if the mapping completes successfully, the end time.  This can be achieved by pre and post mapping process. An example of this is shown below.</p>


<p><img src="/blog/content/images/2016/05/0336pre-post-mapping.png" title="Pre Post Mapping" alt="Pre Post Mapping" /></p>


<p>Using this technique transformation activity can always establish whether a mapping has been executed successfully for this particular run, this is helped by setting the post mapping process to only run &#8216;ON <span class="caps">SUCCESS</span>&#8217; which can be configured from within a mapping, hence you could ensure the end time would only be populated if the mapping completed successfully.</p>


<h3>Number of errors</h3>


<p>If you are executing your mapping in set-based mode the mapping activity parameter <span class="caps">MAX</span>_NO_OF_ERRORS is set to 0 (zero). If you leave it with the default of 50, it seems that if the mapping errors then this only counts as one error, so the post mapping process executes as one is less than 50, and for that reason it thinks the <span class="caps">ON SUCCESS</span> condition has been met.  Set the parameter as follows:</p>


<p><img src="/blog/content/images/2016/05/0335max-no-of-errors.png" title="Max no of Errors" alt="Max no of Errors" /></p>


<h3>Forcing mappings to execute</h3>


<p>There may be certain instances where you want a mapping to execute regardless, or you want to re-execute the mappings during the course of a run.  In the first instance mappings can you added to process flows without the transformation and OR activity and then they will just execute as per normal.  In the second instance the you can add another parameter to the transformation to force the following mapping to execute, regardless if it already has or not.</p>


<p><img src="/blog/content/images/2016/05/0333force-parameter.png" title="Force Parameter" alt="Force Parameter" /></p>


<h3>Automation using <span class="caps">OMB</span> Plus</h3>


<p>Finally it is worth looking at the overhead this adds to the development process, you now have to add three activities to a process flow and configure transitions, whereas before you simply had to add a single mapping.  One way round this we have used is to use <span class="caps">OMB</span> Plus scripts to either generate the process flows for a list of mappings, or to add all three activities to a process flow.</p>