[SSIS] : STDEV In Derived Columns
Jan 26, 2007Hello,
Does anyone have already tried to calculate a standard deviation (STDEV) in a derived column ?
Any help is welcome ;-)
Cheers,
Bertrand
Hello,
Does anyone have already tried to calculate a standard deviation (STDEV) in a derived column ?
Any help is welcome ;-)
Cheers,
Bertrand
Here's another one of my bitchfest about stuff which annoy the *** out of me in SSIS (and no such problems in DTS):
Do you ever wonder how easy it was to set up text file to db transform in DTS - I had no problems at all. In SSIS - 1 spent half a day trying to figure out how to get proper column data types for text file - OF Course MS was brilliant enough to add "Suggest Types" feature to text file connection manager - BUT guess what - it sample ONLY 1000 rows - so I tried to change that number to 50000 and clicked ok - BUT ms changed it to 1000 without me noticing it - SO NO WONDER later on some of datatypes did not match. And boy what a fun it is to change the source columns after you have created a few transforms.
This s**hit just breaks... So a word about Derived Columns - pretty useful feature heh? ITs not f***ing useful if it DELETES SOME of the Code itself after there have been changes in dataflow. I cant say how pissed off im about that SSIS went ahead and deleted columns from flow & messed up derived columns just because the lineageIDs dont match.
Meta-data - it would be useful if you could change it and refresh it - im just sick and tired of it that it shows warnings and errors when there's nothing wrong - so after a change i need to doubleclick all my transforms so that those red & yellow boxes would disappear.
Oh and y I passionately dislike Derived columns - so you create new fields based on some data - you do some stuff - combine multiple columns to one, but you have no way saying remove the columns from the pipeline. Y you need it - well if you have 50K + rows with 30+ columns then its EXTRA useless memory overhead for your package.
Hopefully one day I will understand how SSIS works (not an ez task I say) - I might be able to spend more time on development and less time on my bitchfest - UNTIL then --> Another Day - Another Hassle with SSIS
how to declare multiple derived columns in SSIS Derived Column Task in one attempt.as i have around 150 columns coming from Flat file. I had created the required Expression in Excel and now i want add those in derived column task but its allowing only 1 expression at a time.
View 4 Replies View RelatedHi Folks,
Is there any way to add a derived column into the where clause.
Example:
Select Name, Date, Procedure#,
(Case When Procedure# in ('1','2','3') then 'Y' else 'N') AS Class
From Procs
Where Class = 'Y'
Thanks,
Ray
Hi,
Here's my current query, which throws an error that "AgeCalc" is an invalid column in the WHERE clause:
---------------------------------
SELECT
.
.
.,
AgeCalc =
CASE
WHEN dateadd(year, datediff (year, B.DOB, B.DateIn), B.DOB) > B.DateIn
THEN datediff (year, B.DOB, B.DateIn) - 1
ELSE datediff (year, B.DOB, B.DateIn)
END
FROM
ResidentData B
WHERE
(AgeCalc >= 18)
---------------------------------
How do I do conditionals on the "AgeCalc" derived column?
Thanks.
I'm trying to write a query that concatenates multiple records into onederived column. Let's say I have an author (Joe Writer) who has writtenthree books (Book 1, Book2 and Book 3). The author is in tblAuthors, hisbooks are in the tblBooks and they are joined by the AuthorID field(number). If I use a simple select query to give me the author name and thetitle, I will get three records, one for each book written.What I want is to have all three books combined into one derived column. Soif I do the select statement, I will get one column with the author name,and the second column will put together all three names of the bookseparated by a column. So it will look like:Author TitleJoe Writer Book 1, Book 2, Book 3,Rather than having it appear as 3 records:Joe Writer Book 1Joe Writer Book 2Joe Writer Book 3Could someone help me with the SQL involved in this?Thanks for the help.Cheers,Mike
View 4 Replies View RelatedHi
I'm updating an old Access application to SQL Server and am currently trying to decipher one of the reports on the old application. It appears to be evaluating a derived column from one query (qryStudentSuspGroup.Suspension) in the Select statement of another. I have tried to put the query that creates the derived column in as a nested query into the other query but can't get it to work. This is all a bit beyond my rudimentary SQL skills! Any help would be greatly appreciated!
The original Access SQL appears below:
SELECT [Enter the academic year (4 digits)] AS [input], ResearchStudent.Department, ResearchStudent.DateAwarded,
ResearchStudent.StudentNumber, Person.Forenames AS fore, Person.Surname AS Sur, ResearchStudent.Mode,
ResearchStudent.RegistrationDate, StudentExamination.Decision,
IIf(([Suspension]) Is Null Or [Suspension]=0,([DateAwarded]-[RegistrationDate])/365,(([DateAwarded]-[RegistrationDate])-([Suspension]))/365) AS CompDate,
ResearchStudent.EnrollmentCategory, qryStudentSuspGroup.Suspension
FROM ((ResearchStudent LEFT JOIN Person ON ResearchStudent.ResearchStudentID = Person.PersonID)
LEFT JOIN qryStudentSuspGroup ON ResearchStudent.ResearchStudentID = qryStudentSuspGroup.ResearchStudentID)
LEFT JOIN StudentExamination ON ResearchStudent.ResearchStudentID = StudentExamination.ResearchStudentID
WHERE (((Year([DateAwarded]))>=[Enter the academic year (4 digits)]
And (Year([DateAwarded]))<=([Enter the academic year (4 digits)]+1))
AND ((IIf(Year([DateAwarded])=[Enter the academic year (4 digits)],Month([DateAwarded])>8,Month([DateAwarded])<9))<>False))
ORDER BY ResearchStudent.Department, ResearchStudent.Mode, ([DateAwarded]-[RegistrationDate])/365
STDEV() gives incorrect values with reasonable input.
I have a table filled with GPS readings. I've got a column LATITUDE (FLOAT) with about 20,000 records between 35.6369018 and 35.639890. (Same value to the first 5 digits of precision---what can i say, it's a good gps.)
Here's what happens when I ask SQL Server ("9.00.1399.06 (IntelX86)") to compute the standard deviation of the latitude:
// Transact-SQL StdDev function:
SELECT STDEV(LATITUDE) FROM GPSHISTORY
WHERE STATTIME BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
0
// Zero. ZERO??!?!!
//Let's re-implement Std Dev from the definition using other aggregate functions:
DECLARE @AVERAGE FLOAT;
SELECT @AVERAGE = AVG(LATITUDE) FROM GPSHISTORY
WHERE GPSDATE BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
SELECT SQRT(SUM(SQUARE((LATITUDE - @AVERAGE)))/COUNT(LATITUDE)) FROM GPSHISTORY
WHERE GPSDATE BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
6.03401924005392E-06
// That's better. Maybe STDEV is using fixed point arithmetic?!?
SELECT STDEV(10 * LATITUDE)/10 FROM GPSHISTORY
WHERE GPSDATE BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
4.77267753808509E-06
SELECT STDEV(100 * LATITUDE)/100 FROM GPSHISTORY
WHERE GPSDATE BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
1.66904329068838E-05
SELECT STDEV(1000 * LATITUDE)/1000 FROM GPSHISTORY
WHERE GPSDATE BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
8.11904280806654E-06
// The standard deviation should, of course, be linear, e.g.
DECLARE @AVERAGE FLOAT;
SELECT @AVERAGE = AVG(LATITUDE) FROM GPSHISTORY
WHERE GPSDATE BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
SELECT SQRT(SUM(SQUARE(100*(LATITUDE - @AVERAGE)))/COUNT(LATITUDE))/100 FROM GPSHISTORY
WHERE GPSDATE BETWEEN '2007-10-23 11:21:00.859' AND '2007-10-23 17:00:00.062' AND GPSDEVICEID = 0x004A08BC04050000;
6.03401924005389E-06
// Std Dev is a numerically stable computation, although it does require traversing the dataset twice.
//
// This calculation is not being done correctly.
//
// Incidently, SQRT(VAR(Latitude....)) gives 4.80354E-4, which is also way off.
I will redefine STDEV to use a stored procedure similar to the above, but the algorithm used to compute VAR, STDEV etc should be reviewed and fixed.
-Rob Calhoun
All,
I have seen it happen frequently that I type in a perfectly valid SSIS expression (this is easy for me since I am an old hand at C++/C/C#) in a row in a Derived Column transformation, and it turns red. Or sometimes, I will have an invalid expression that I correct, but it stays red. Finally I have also seen it happen that I make some change in the data flow pipeline and suddenly a Derived Column transform develops an error. I then go into the Derived Column transform and find that the expression has turned red. So, I literally have to go into the expression in these cases, and make a trivial change to them to get the red error to go away. Alternatively, I can cut the derived column expression text, and then paste it back in and it works (this is most telling.)
So, it seems to me the Derived Column is somehow holding onto some meta data about the Derived Column that is getting out of date (rather than re-evaluating the correctness of the Expression.) One thing I usually can do to repro this at times is to remove a column (that the Derived Column depends upon) from the pipeline and then re-add it. When I go into the Derived Column it will be red, and then like I said I have to tweak the expression to force SSIS to re-evaluate the expression.
Anyway, I'm curious if anyone else has seen this?
Thanks.
I'm sending a lot of columns through my derived column transform, checking for empty strings from a flat file - I was wondering is there a way that I could "script out" all the transforms instead of enduring this click hell that I'm stuck in inside the derived column transformation editor. I've got probably 100+ columns to configure with the following sort of transform....
Replace Col1
TRIM(Col1) == "" ? NULL(DT_WSTR,2) : Col1
Basically - if the string is empty, then throw Null in the data stream. I'm about a third the way through but it would really be nice if there was a quicker way. Even with the most efficient copying & pasting & keyboard shortcuts, it's still painful.
Chaps,
apologies for the drip-drip approach......
Is it possible to do pattern matching against a string using FINDSTRING or similar in a derived column expression without recourse to including 3rd party regexp style plugins?
I want to seach for a reference in a string which will have the format ANNNNNAAA ie. an alpha with a certain value followed by 5 digits followed by three alphas with specific values.
eg Z00001YYY or X00022HHH
thanks for your assistance,
regards,
Chris
I have a lot of different data flows that need "Derived Column". There are maybe only 5 different such "Derived Column" but they appear many times. Is there a way to eliminate all that double work? It should be something that does not take me more time to do than just duplicating all the "Derived Columns".
View 2 Replies View RelatedHi,
In my working existing package I am adding a derived column as date and datatype as DT_DBTIMESTAMP and my destination data type is datetime. when I try to union these records through an "Union ALL" component its not allowing me to map this column and throws error as "outputcolumnlineageid " and defines the error as metadata for source and destination are not matching.
I checked the metadata and they are DT_DBTIMESTAMP, Am I missing something? please advice.
Trying to setup a derived column to use an expression stored in a package variable. But it seems that variables are always evaluated as text...I need it to evaluate as a Columname sometimes.
Example:
On an ETL of Products I want a new derived column that uses two other columns.
I can hardcode an expression of ProductID + " - " + ProductName and that results in dynamic output. But now I want to use that as an expression stored in a variable so I can change it when needed.
So I make that expression = @[User::Variable]
and stuff into @variable ( a string param) : ProductID + " - " + ProductName
My output is the literal "ProductID + " - " + ProductName", and not the actual ID's and Names
I've tried with/without brackets, quotes and braces but no change.
Any way I can get the pieces in that variable expression to evaluate as column names?
I have 3 different companies that share the same ticket_types(CRMS System). I need to display the Ticket Types and the 3 company's Ticket Count:
Ticket Type | Company A Count | Company B Count | Company C Count
I can get the information individually for each company, but if a company doesn't have a ticket in one of the ticket_types, then it isn't displayed in a row. So, I tried to write the following, which isn't pulling back any data.
DECLARE @startdate date = '20150306'
DECLARE @enddate date = '20151031'
DECLARE @AcctGrp varchar(20) = '111'
;WITH TType
AS
(
SELECT ctp.description as TicketType
[Code] .....
If I run each SELECT individually from above (excluding the last SELECT), it works and I get the following:
TicketType
AR Request Credit
Availability/Rush
Cancel Order
Credit Card Payment
Expedite Order
Freight Quote
[Code] ...
How to get the query results? Am I even close to getting it right?
Two questions regarding Derived Columns in SSIS
1. In a if else expression if condition is false how do you keep the value of the source column eg: Name == "" ? "Unknown" : Name
Above will change all the non blank values to Name and not the actual value
in the Name Column eg. John
2. I have a column (unicode string)that stores date and time (The source is flat file) Is it possible to write expression to select the 1st day of month based on that date, and use this derived column as input to a table with a datetime field.
Hi,
I apologize if this question has already asked (I already looked in this forum but do not find the answer), I need to have this done as soon as possible. I've following data:
column1 column2 begin_date active
------------- ------------ ---------------- -----------
1253 1057 1/1/2006 0
1253 1057 1/1/2007 0
1253 1057 4/1/2007 0
1253 1057 7/1/2007 1
I need to have the final result as following: (the new end_date column is the value of the begin_date in the next row -1)
column1 column2 begin_date active end_date
------------- ------------ ---------------- -------- ------------------
1253 1057 1/1/2006 0 12/31/2006
1253 1057 1/1/2007 0 3/31/2007
1253 1057 4/1/2007 0 6/30/2007
1253 1057 7/1/2007 1 null
Any ideas?
Thank you!
I need to check if the date is Null then use today's date and if not do something else.
If RowModifiedOn IS NULL Then
GETUTCDATE()
ELSE
DATEADD("Hh",@[User::TimeZone],RowModifiedOn)
End If
What do I do wrong here? Can I do something like it?
RowModifiedOn == NULL ? GETUTCDATE() : DATEADD("Hh",@[User::TimeZone],RowModifiedOn)
Thanks.
Hello All,
Can someone help me out in providing the STEPS to solve this problem. My scneario is, I've a table which has got 2 fields and 5 default row values have been filled in. Now, using the above, duirng package runtime, it need to dynamically create additional field and has to store values like for.e.g (0001 America). I'm getting the following error while executing the ssis package.
1. [DTS.Pipeline] Warning: Component "Derived Column" (1170) has been removed from the Data Flow task because its output is not used and its inputs have no side effects. If the component is required, then the HasSideEffects property on at least one of its inputs should be set to true, or its output should be connected to something.
2. [DTS.Pipeline] Warning: Source "OLE DB Source Output" (87) will not be read because none of its data ever becomes visible outside the Data Flow Task.
Please suggest with your valuable solution at the earliest.
Thanks
Vaiydeyanathan.V.S
How would one add this as a derived column
CASE
WHEN Data IS NULL
THEN NULL
WHEN SUBSTRING(REPLICATE('0', 9 - LEN(Data)) + CAST(CYCLE_YYYYMM AS VARCHAR(9)), 4, 6) IS NULL
THEN 0
ELSE RTRIM(SUBSTRING(REPLICATE('0', 9 - LEN(Data)) + CAST(Data AS VARCHAR(9)), 4, 6))
END
From what I have seen the first option is not really required
Hi,
I saw some thing called custom properties for the "Derived transformation" in the msdn site. I tried to use them in a simple package, but I am getting an error as "can't write to derivedoutputcolumnname.friendlyexpression". Friendly expression is one of the custom properties available for the derived transformation output columns.
The steps I followed to get to this error are as follows:
1) Get data from a table using OLEDB Source. Suppose I am getting firstName, LastName etc.
2) Derived column input is values from the above OLEDB Source.
3) I have added a new column called "Concatenated name" which is concatenated value of first and last names.
4) Then in the properties editor of this data flow task in expressions option I clicked on ellipse available. I got an editor for property expression, which contained two columns called "Property" and "Expression". Property column contains dropdown with friendly expressions propety for the derived columns and expression column is a text box, where in we can enter expression to be evaluated for the corresponding friendly expression property.
5) Now when I click on OK and try to debug it gives an error as "Can't write to concatenatedname.friendlyexpresiion".
If anybody has already faced this problem and solved it please let me know, because I am struck here a long time.
Thanks&Regards,
Sreekanth Ammisetty
How to use derived columns in SSIS
string to datetime
Input value : (string)
14/03/2014
NULL
15/04/2015
Note : Having null or blank value
Hi,
I am trying to upload a csv file into a destination SQL Server table using Data FLow objects in MS SQL Server 2005 SSIS. My destination table X has a date column while my source data file (for which i have a flat file connection) does not have a date column. I created a Derived column for date using the system date function in my Source data object and ran the package but it returned errors. I changed the data type of the derived column but I still get errors.
is there any other way i can get the date on the fly i.e generate a date for the incoming source file and map that (insert into) to the destination table ?
Thanks.
I am trying to put the following as an expression in the SSIS Derived Column Transformation Editor.
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0)
It is not allowing it. This works fine in a regular SQL statement.
Does anyone know how I can get this to work?
I am trying to figure out my expression here with derived column
source
Unavailable/Planned
output
Planned(which is after "/".
I achieved this in sql but not finding solution in SSIS Derived column transformation.
i have too many DTS packages to migrate to SSIS, and while examining a DTS package in BIDS (converted with the migration utility) i tried to edit the resulting migrated package, which opened the DTS interface with the two connection icons joined by the big fat arrow with a gear on it...not exactly what i had in mind, iow, it looks like SSIS on the outside, but its still DTS on the inside.
So I stripped out a series of components from a more complex package hoping that simplifying it would reveal the contents of old DTS Transformations tab at least partially set up in a Derived Column transformation.
Can i get there from here, or must i recreate every stinking definition in a derived column manually from the ground up?
thanks very much for your help
Derived column that transforms a CASE statement?
CASE
WHEN [MAIN_GAME] IS NULL AND combo.[CAT_PRODUCT] ='25' THEN 'Standalone'
WHEN [MAIN_GAME] IS NULL AND combo.[CAT_PRODUCT] <> '25' THEN 'No main game'
ELSE LOWER(prod.[TX_PRODUCT_NAME])
END AS [TX_MAIN_GAME]
CAT_MAIN_GAME = nvarchar(6)
FK_GAME = nvarchar(2)
I have flat file source from which data is imported to a Sql table.The target column is int and input column is string .The column has some numeric values and some blank values.when I tried to convert into int values it fails.
View 7 Replies View RelatedHi ,
I am Using Derived column between Source and Destination Control. the Source input column PriceTime is String Data type. but in the Destination is should be a DATE TIME column. How to Convert this string to DateTime in the Derivied Column Control.
I already tried to in the Derived column control
PRICEDATETIME <add as new column> ((DT_DBTIMESTAMP)priceDateTime) database timestamp [DT_DBTIMESTAMP]
But still throwing Error showing type case probelm
Pls help me on this
Thanks & Regards
Jeyakumar.M
Is there a way to pull data from XML columns with SSIS? I mean parsing the XML and pulling only data from XML column without putting it into an XML file first. I have been using xpath queries (XML value method) to do this. Is there any other way?
Thanks
Hi all,
I am trying to aggrgate Values on three columns Customer , OrderDate and Product thorough SSIS. It gives me following masseges and works very very slow.
Before Aggregation I sort the Data by Customer , OrderDate & Product.
The Aggregate transformation has encountered 4085 key combinations. It has to re-hash data because the number of key combinations is more than expected. The component can be configured to avoid data re-hash by adjusting the Keys, KeyScale, and AutoExtendFactor properties.
The Aggregate transformation has encountered 25797 key combinations. It has to re-hash data because the number of key combinations is more than expected. The component can be configured to avoid data re-hash by adjusting the Keys, KeyScale, and AutoExtendFactor properties.
The Aggregate transformation has encountered 253973key combinations. It has to re-hash data because the number of key combinations is more than expected. The component can be configured to avoid data re-hash by adjusting the Keys, KeyScale, and AutoExtendFactor properties.
The Aggregate transformation has encountered 2000037key combinations. It has to re-hash data because the number of key combinations is more than expected. The component can be configured to avoid data re-hash by adjusting the Keys, KeyScale, and AutoExtendFactor properties.
Please help me...
Thank you
Balwant.
I have 12 colums named in source
ERR1 - ERR12. I do have a reference table for ERR1 - ERR12 and haing values 1-12
Now the nature of ERR columns is that each of them has either 1 or more than 1 TICK mark in a small BOX.
In the destinaltion table I have only 1 column to get the Values from 1-12.
How do i match and get the values.
I m extracting data from Excel-sheet and inserting into SQL Server 2005 Database using SSIS(SQL Server Integration services).
Having 2 columns in excel-sheet and want to insert those into 1 column of SQL Table.
Can anyone help me about, how can I insert data from 2 columns into 1 column in SSIS. Is Export/ Import column or Copy Column or Merge will work..
Plz reply soon..
Thanks...