Does SSIS re-use dirty buffers?
some of several sources of data i'm importing ssis 2008 have text in wrong fields, including country names in postcode field, data source had no county field hold them.  i've written 'transformation' script component uses regular expressions recognise , correct these, , write county name new output field called country.  country appear in country field of result should not there - e.g. usa against valid london address.  on short test runs, problem occurred real data (200k rows).
 
wasted hours debugging code before realising there nothing wrong regular expressions: seemed happening output field in old buffer still set when buffer re-used - hence no problem until there enough data use lots of buffers! if record (about) 10,000 rows earlier had 'usa', , current row did not, old usa still there!
 
i'd assumed new output fields empty, wrote country if these country name store. fix set row.country=nothing rows there no country name.
 
analysis correct: data flow buffers re-used without being cleared? if so, know if fixed in ssis 2008 r2, or in '2010? if not, best way raise bug microsoft?
 
there 2 things microsoft fix problem: either clear output fields before handing them script component, or issuing warning if there code paths not initialise them, (as other variables). until then, users need careful write output fields!
 wasted hours debugging code before realising there nothing wrong regular expressions: seemed happening output field in old buffer still set when buffer re-used - hence no problem until there enough data use lots of buffers! if record (about) 10,000 rows earlier had 'usa', , current row did not, old usa still there!
i'd assumed new output fields empty, wrote country if these country name store. fix set row.country=nothing rows there no country name.
analysis correct: data flow buffers re-used without being cleared? if so, know if fixed in ssis 2008 r2, or in '2010? if not, best way raise bug microsoft?
there 2 things microsoft fix problem: either clear output fields before handing them script component, or issuing warning if there code paths not initialise them, (as other variables). until then, users need careful write output fields!
writeup there, @ ssis - script transform fills in new column values previous row's values if unspecified.
i added workaround.
will people seeing please go , vote bug fixed!
i added workaround.
will people seeing please go , vote bug fixed!
                                                                          SQL Server                                                     >                                                                 SQL Server Integration Services                                                                           
 
 
Comments
Post a Comment