Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: MVP Summit 2008

Next Monday (14th April) brings with it the start of the Microsoft MVP Summit 2008. This is an opportunity for all of the (give or take a few) 4000 MVPs from around the world to travel to Seattle and engage with Microsoft product teams with a view to helping them improve their products. Someone somewhere has deigned that I have the honour of being an MVP bestowed upon me for 2008 and therefore I too shall be boarding a Seattle-bound plane in a couple of days time where I will be spending a lot of time with various folks from the SQL Server Integration Services product team.

As I see it one of the assumed responsibilities of an MVP is to represent the views of the community as well as provide one's own opinions and with that in mind I'd like to invite readers of this blog to leave a comment here answering the questions "What should the SSIS product team be concentrating on for future versions?".

I realise that I'm opening this blog up to swathes of abuse and derision so I would ask that if you have any opinions to proffer then they be constructive, well explained and concentrate on alleviating a particular problem that you have with the product today. For example, my big wish for future versions of SSIS is:

"Other than modularising functionality over multiple packages SSIS doesn't have a very good story around reuse. I would like to see this improved in future versions by giving us the ability to instantiate pre-configured tasks/components in multiple packages."

That's a lot better than saying:

"Please allow more reuse in SSIS"

You get the idea I'm sure. Be realistic as well, "Make it run on LINUX" isn't ever going to happen no matter how much you might want it.

 

So, let me know what irks you about SSIS today. Logging? Reuse? Design-time experience? Deployment? Multi-developer support? Platform architecture? Metadata support? Better support for beginners? Community engagement? Those are all worthy areas for improvements along with many others - let me know what you think and hopefully there will be a common consensus.

I am obligated to point out that all summit attendees are bound by a non-disclosure agreement so I'm afraid I won't be able to report back on what gets discussed next week. Sorry.

 

-Jamie

Published Thursday, April 10, 2008 12:13 PM by jamie.thomson

Comments

 

Mike Sexton said:

Here are just a few...

1. Scripting design tasks are displayed modally with respect to the rest of visual studio.  As SSIS is case sensitive, we could avoid those annoying typos by getting the ability to move to the variables and copying the exact spelling of the variable we need to pass, check on default values and otherwise move along efficiently when we just need to go back and check something while writing code.

2. Create a control flow item that replicates the merge function in SQL 2008 and Oracle 9i, but can be used on other RDMS systems as well.

3. Allow changes to SSIS logging programmatically (e.g. change a configuration value that will allow a switch between basic and verbose logging).

4. When you copy and paste tasks between packages or containers, you end up with a mess.  Is there any way to keep it neat?

5. Allow copying of variables without directly editing the XML.

Hope that helps.

Mike

April 10, 2008 3:29 PM
 

Old School Database Guy : Changes for SSIS said:

April 10, 2008 3:35 PM
 

stittdba said:

I would like to see expressions be available for use in file connection managers, specifically when pointing towards file system paths of other (.dtsx) files. I believe that would make better use of making the config file a more centralized point of runtime configuration. I understand that you can add that parameter in the DTExec command, but most are using config files. I would find this most useful when trying to move from Dev to Test to UAT to Production and using UNC paths in the connection manager.

April 10, 2008 3:37 PM
 

jamie.thomson said:

stiitdba,

Expressions are available for connectoin managers, specifically for the ConnectionString property which is what you need here.

-Jamie

April 10, 2008 3:43 PM
 

Peter said:

Better Excel (2003 primarily) and CSV support.  I pull in data from those formats regularly and it's a beating in SSIS.  First you deal with the wonderful "Excel thinks this is a float so anything that's not a float is NULL" behavior.  I've tried all sorts of different hacks to make Excel treat things as text, to scan more rows, etc - nothing works and I end up dropping postal codes and other data because Excel insists that they are a certain type.

For my next trick I export that out to a text file - CSV or Tab delimited.  That works until it hits a row that doesn't contain all of the columns because Excel exports it that way!  (How do you emphasize that better?)

So I go back into Excel, add a dummy column fully populated all the way through the file.  Export to a text doc again.

Now I can pull it in using SSIS as long as I define the columns for the text file.  I also have to define each file where Excel had the ability to import multiple sheets as long as I didn't mind dropping data.  :)

So back to my original point - just make Excel work.  It would save quite a bit of time messing with all of the imports/exports that I have to go through now.

April 10, 2008 5:16 PM
 

Peter said:

One more while I'm at it.

Delimited Text files - give us the ability to have EOL characters override the number of columns expected.  That way if a row doesn't contain all of the columns (because of e.g. an export from Microsoft Excel per my earlier comment), we can proceed without getting columns containing lots of data due to flowing over the EOL character.  Once again, I deal with this quite a bit in my day to day tasks and it's painful and time consuming.

April 10, 2008 5:21 PM
 

Merrill Aldrich said:

Nice! I live in Seattle, so welcome! I really like SSIS in general but here are some suggestions:

Peeve 1: I'd like better support for local paths, for packages deployed in the file system. We made the decision to deploy our packages like little programs and schedule them with Windows Scheduler instead of inside SQL Server / SQL Server Agent. One unanticipated hassle has been, for example, that I would like to pop open Package Configurations and say the default config for my package is in "./mypkg.dtsConfig"; however that is apparently unsupported and definitely doesn't work in BIDS. Right now this is a WAD, but I see no reason why it couldn't be improved.

Peeve 2: I want to be able to specify a package config file in the Execute Package task, instead of using the default (am I missing something?).

Peeve 3: the whole Save Sensitive with User Key / Don't Save Sensitive setting vs. connection strings and passwords in config files is INCREDIBLY confusing for new users and could stand some clarity in the way it's designed and presented. Me, a little slow perhaps, I took a whole week just to figure out some combination that would work for my team. And it's definitely not the default.

Peeve 4: it would be very handy to be able to control the order of load for parent/child rows within one data flow, so you don't have to go through all sorts of gymnastics to prevent false foreign key violations (you blogged on this one a while back) when legit child rows are loaded before the parent rows.

Peeve 5: everything Peter said about Excel (see http://thedailywtf.com/Articles/Jan48.aspx featured comment containing my rant about Excel and actual Data). Actually, if the Excel team at MS could just pull their heads out of the sand about this issue and make Excel itself less hostile to the data people attempt to put in spreadsheets, that would be super, and the SSIS problem would be simpler to solve. People have been trying to use Excel for data for a long time, and it's been a bad fit for that purpose for a long time, yet the Excel team has done nothing to really address the problem. How about, for example, strongly typed cells? But I digress...

Peeve 6: I can't find a way to add an existing package file to an existing BIDS project without having VS duplicate the file (maybe I am just not doing it right). That's quite an annoyance.

Peeve 7: The XML in files created by the config file tool is not formatted; it's all on one line. My attempts to have VS autoformat the file ruined the files in every case, so I've resorted to a third party tool (Jedit + a plugin) to reformat the XML with line breaks and indentation, then I re-open them in VS to edit. People do have to edit these files after all, and one looong line of text is not too nice. It would also be helpful if the config files were automatically added to the project's "misc" folder upon creation.

Peeve 8: SQL Server is very, very often deployed without case sensitivity, but SSIS is .NET based and case sensitive by default. This causes a LOT of bugs as people don't necessary find all the places where that has an impact. (see the documented issue with Lookup for one example, there are others). It would be keen if this were smarter, e.g. let's have the program examine the data source and see which behavior is appropriate, or let's have a setting to make a package or a whole project case-insensitive (again, unless I missed that setting and it exists :-)

April 10, 2008 6:31 PM
 

Kevin Bucher said:

I would like to see more flexibililty in processing Analysis Services cubes.  Specifically, I have a package that is going to process cubes nightly; my user ID is does not have access to do so in production (and never will).  I want to run a package dynamically with a configuration file to set the Analysis Services connection string dynamically.

The Analysis processing Task does not take connection string (with User ID and Password) to connect for processing.  When I pass in the User ID and Password, it is essentially ignored, and run using the credentials of the account logged in to the machine.

April 10, 2008 8:46 PM
 

Darren said:

Peeve 6 - The default or most obvious way is very annoying I agree. However try this - In Visual Studio, with your SSIS project open, follow the menus - File, Open, File, then select the package. The package will open, but it will not be part of the project. Now from the File menu goto the "Move <FileName> into >" option and select your project.

April 11, 2008 2:33 PM
 

Dmitry Lyalin said:

1. A better managed code object model to develop against

2. More code-behind like functionality for advanced users

3. Equal support for C#/VB.NET for all coding

Please let them know as no one listens to me :).

April 11, 2008 2:41 PM
 

Philip Coupar said:

The ability to have configuration applied from SQL agent that was invoked prior to to any other configuration (i.e. so SQL agent could be used to run multiple versions of the same package which pointed to different SQL Configuration tables)

Performance tuning, having access to more detail on what is actually hapening in phases like validation and pre-execute would help in diagnosing perfomance issues

Task precedence to force the blocking of certain tasks before another has finished.  I have one scenario where I am pairing up transactional data to start and end events with a complex data flow doing various lookups and data manipulation so would not want to repaet that logic, and writing out to a raw file increases the overhead by having to manage files and reduces the portability as increased the number of things likely to error between environments.

I think the cache connection manager in 2008 is likely to solve many other performance issues, but have not had the chance to give it a bit of stress yet!

April 11, 2008 4:12 PM
 

Ivan Peev said:

Phillip,

You may check CozyRoc's Data Flow Source/Destination components, which will help you avoid writing to raw files entirely. I believe you will also gain performance boost as well. Let me know if you need help if you decide to use them: info at cozyroc.com

April 14, 2008 4:41 AM
 

Log Buffer #93: a Carnival of the Vanities for DBAs said:

April 18, 2008 7:29 PM
 

Mike said:

I realise I'm a few weeks too late, but top of my wishlist is:

Add an option (probably on the package object) to perform implicit datatype conversions - a la T-SQL or DTS.  These would kick in in place of the current validation errors etc when datatypes dont match exactly.

This would speed up SSIS development a lot - in particular maintenance overhead when source data changes.  Presumably there would some performance impact, but for most scenarios this doesnt matter.

April 26, 2008 8:15 AM
 

Hrishy said:

Hi

I am just a beginner and of all the ETL tools out there SSIS documentation is much to be desired.

I would prefer a cookbook and training videos from microsoft.

regards

Hrishy

April 29, 2008 8:33 PM
 

Sam Martinez said:

Darn... wish I hadn't missed this bus. There is one thing I'm struggling with, however.

It sure would be nice if there were an "Impersonation Context" option in all connection managers (or at least OLE DB and ADO.NET). When using Windows Integration and running a package from a website, the web site Impersonation context does not carry forward to package spawned threads, so all my connection managers fail.

I've gone the route of using Script Components for everything and writing Impersonation before each attempt to access a database, which works fine, but it was a lot more work, it renders many "canned" components unusable, and it doesn't allow an option for Logging, which requires that you use a "canned" Connection Manager.

June 11, 2008 3:55 PM
 

Brandie Tarvin said:

1) A better cut-n-paste utility that allows you to paste tasks or objects to a specific area of the Control Flow or Data Flow rather than throwing the pasted object randomly in the viewing window where you have to search for it.

2) The ability to disable objects in Data Flow like is currently available in Control Flow.  Right now, if you have several items in the Data Flow and you're only having a problem with one, if you want to execute that DF task, you have to execute all items or delete the ones you don't want to test (and then recreate them when you're done testing).  It would be nice to be able to take a subset and disable them so only the ones you want to run will run.

There are more, but those are the two off the top of my head.

June 18, 2008 6:44 PM
 

Brandie Tarvin said:

I just thought of another one.  I'd like to see the UNION ALL transformation task hold on to established Sorting Keys from the sources.  As it stands in 2005, once you do a UNION, all sorts are lost and you have to use a SORT transformation to pull them back.  Also, you can't use a MERGE JOIN after a UNION or vice versa very effectively.  

Sometimes I want to UNION to result sets together after checking for NULLS then MERGE with another source. You would think SSIS could handle this without having to jump through hoops.

June 25, 2008 4:19 PM
 

Steven Melzer said:

I am using SSIS for the first time on a project and it is extremely frustrating.  I assume I am too late to make suggestions for MVP, but...

1) I had a terrible time with the sensitive data and Oracle on 64-bit.  If you save as "Do not save sensitive data" then the connections will just fail with NULL password error.  If you save as "User Key" locally and export to another server (i.e.  production) with any other type of protection you get the same error.  I finally had to use "Password" protection locally just so I could successfully migrate to production.  The problem now is every package I open (about 20 in my app) on my local machine asks for a password.

2) The "Refresh" button.  Whenever I press the refresh button on a data source I end up screwing up downstream tasks because it creates new data objects.  Why does it do this?  Why can't I change my data source without having to rebuild my object structure (although SSIS does do a good job of fixing it when you double-click on the broken tasks)?  It also loses all sort order when you do the refresh button.

3) As someone mentioned above I wish SSIS could do some rudimentary data conversions automatically.  I am going coming from an Oracle operational database and I have to do a lot of unicode-to-ascii datatype conversions.

4) Deployment is extremely complicated (at least to me).  I still am not sure the correct way to migrate my code.  All the options seem very manual.  Ultimately I want to migrate my packages to a QA SQLServer database for QA testing and then to a production SQLServer database for production.  A developer has to do this because there does not appear to be a very automated way to handle this.

5) The whole impersonation mode thing is a pain.  Myself and one of the systems guys here wasted 2 days figuring it out and making it work.

6) It is odd how some expressions have to be typed directly into the little text field in the properties window, while others offer a dialog box, while others offer an expression builder.  The former case is very frustrating as I have some long expressions.

7) A way to export the UI screens would be handy.  All I can do is print them into PDF or XPS and then use another program to convert them to a graphic and then I can include them in documentation or powerpoint training slides.

8) The Slowly Changing Dimension task's changing attribute destination calls its output fields "param_n" where everywhere else they use the input column name.  

9) As mentioned earlier it would be very nice if copy and paste would maintain the visual structure of the data flow UI, not just the tasks and relationships.

These are the things that come to me after using the product for 2 months now.  I am sure there are correct ways to do everything I mentioned above, but if there are they are not intuitive.

Thanks,

Steve

June 25, 2008 8:07 PM
 

jamie.thomson said:

Steven,

Its interesting that most of your grievances are concerned with the design-time experience rather than what the product does at executon time. That mirrors most people's experiences.

Some info on your points:

1) This is by design. Its a concious effort to make SSIS more secure by making it harder for a nefarious intruder to compromise your packages. SSIS will never store passwords in your package except in encrypted form because the package is basically a text file thus human readable. The generally accepted way of dealing with this is to use configurations to store your connection strings (which you should be doing anyway).

2) Why does it do it? Because the dataflow relies on metadata (i.e. number of columns, column ordinals, column names, column types) in order to function. If you change that metadata then it can't do its job. Its theoretically possible for SSIS to "fix itself" but a design decision was made that it would not do that (largely based on experiences of DTS). SSIS wants YOU to be in charge of making changes rather than it guessing what they should be.

3) Again, SSIS will never do implicit data conversions. Again, a concious design decision.

4) Agreed. The deployment story isn't good.

6) You haven't installed SP1. I highly recommend that you do (actually, install SP2 not SP1)

9) Completely agree.

You raise some good points. You should raise them on http://connect.microsoft.com/sqlserver/feedback.

-Jamie

June 26, 2008 12:28 PM
 

Garrett said:

Steven,

I am having the same problem as you are in #2.  I want to be in charge of what SSIS does but make simple jobs simple.  Adding two fields is requiring me to change over 40 objects in my DTSX package!!!! Only because it won't keep the settings I already have set up.  It is breaking items not related to the change because it switches their Sort Order and any fields I changed the types of are breaking.  It's nuts!  

You can't even edit the code directly because of the indexing.  

It think it will be easier for me to make a new job that imports from the same files  two fields.  I could see this as being beneficial in a way.  Schema versioning. Changes to the Schema are added on as additional packages.  And being new to SSIS I don't have any idea how I can accomplish this :-)  though I understand it theoretically.

June 30, 2008 8:46 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems