Data Cleansing
Jul 30, 2002Does any one knows any software used for Data Cleansing? Or is there any tools with SQL Server 2000 for Data Cleansing? I look forward to hearing from anyone.
Regards,
Does any one knows any software used for Data Cleansing? Or is there any tools with SQL Server 2000 for Data Cleansing? I look forward to hearing from anyone.
Regards,
I have sql server 2005 developer edition and vs2005 team system.
I want's to use data cleansing feature.i.e.
Integration Services Advanced Transforms Includes data mining, text mining, and data cleansing
Can please some body points me to any good webcast about using this feature
I have a large poorly designed table (inherited) With a Name field that contains comma delimited text containing address information. I need to do several things with it but unfortunately there doesn't appear to be any true consistency in it. When it displays in its own text box it works by placing each section on a new Line and looks ok.But I need to pull it apart and place things like unit number, Building Name in its own column etc. In the data it could be in either the 2nd,3rd, 4th, dependent on what came 1st. the data looks some thing like the following
unitNumber/StreetNumber Space StreetName (Building Name), Subub,City,Country
Some addresses won't have unit number or Suburb or country so when splitting you could have Suburbs and Citys in multiple columns even if you try and stagger the split process.Has any body go a good tool or reference site for dealing for this sort of problem. I have a table that I have made up that has some of the street names that could be used for comparing against existing records but it is by no means fool proof due to spelling inconsistencies . I also have another list of Common building names that could be used to compare, remove and place in the new building column.
I am just starting out with SSIS and trying to get the feel of data cleansing using it. But on my very first project for data cleansing I've got into this weird error.
My data flow is very simple, it has a OLE DB source, Fuzzy Lookup and OLE DB destination. I've built three tables for this purpose, one is source, one is reference (it will be used to match for the real entries in fuzzy lookup) and the last is the destination table.
In all the three tables I've a field of City which I'd like to Fuzzy lookup in the reference table and if it crosses certain confidence level, I'd like to insert to the destination table. City in all the tables has the same datatype, defined in the same way, it is varchar(50).
But when in the fuzzy lookup I try to map the Source tables City field to reference tables City field, it gives me this error:
The following columns cannot be mapped:
[City, CityRef]
One or more columns do not have supported data types, or their data types do not match.
Although as I have mentioned before, both have same data types and are defined in the same manner (i.e. I've just selected the datatypes for those columns and all the other settings are left to default). I just cannot understand why this is happening, plz help me with this. FYI I've also tried to give the City Column different datatypes in all the tables like varchar(max), text, Only to be greeted with the same error message.
Waiting for your reply!!
Regards,
Sajid.
Hi all
I have two data sources - an excel spreadsheet and one access database of Customer details, (quite small in size) and I have to use SQL Server 2005 to Integrate the data and cleanse the data so that I could (in theory) import it into a CRM system (though I won't actually be importing it) as this is for coursework only.
Is there a simple steps to do this is SQL server and guides that I can follow that are relevent? As there are lots of tutorials and I haven't a clue where to start.
many thanks
little_tortilla
I have a character field that should contain either a number or space in one of my transitions but it is coming over with junk in it occasionally. What is the best way to ensure that this data is only numeric or a space? Whenever I hit a < or ] one of those weird symbols I want to replace it with a space in my target field. It is SQL Server to SQL Server.
View 4 Replies View RelatedHi,
We do not have any Address Cleansing tools and the requirement is we have to cleanse the data, finding the best possible record which has all info and update other records accordingly.
I am Not sure we can do this Fuzzy Grouping Transformation.
Example:
I have Source table with following info.
Customer_id
Location_Address
Location_City
Location_State
Location_Zip
Location_County
TT101
252 HARVARD RD
ATLANTA
GA
30340
FULTON
TT101
30340
TT101
252 HARVARD RD
ATLANTA
TT102
125TEST
CUMMING
TT102
125 TEST DR
CUMMING
GA
30040
FORSYTH
TT102
GA
30040
Please let me know the solution
Thanks in advance.
Hi *,
does anyone know if MS supports some kind of breaking strategy within Fuzzy Lookup/Grouping?
Besides that, I'd like to perform a address cleansing operation on a CRM database. I don't have a reference table (Street, Zip, LastLine, etc.) for that. Where can I get an appropriate database? Anyone has some experience with this issue?
Thanks a lot.
S.
Hi
How could I find a vendor of address data cleansing tool integrated into SSIS pipeline?
I need to remove duplicates,
correct zip 5+4 code and
add Lon/Latitude information for a zip.
Many thanks,
OB
In SSIS I use the DQS Cleansing transformation component. I've got a knowledge base (KB) in place and this KB holds various domains and my data source has more input columns than would like to use for a particular clean up operation. I want to use some of the input columns to map against some domains in the KB. It is my understanding that it should be possible to select only the required input columns, but all i can do is select all input columns.
View 3 Replies View RelatedHi, all here,
Thank you very much for your kind attention.
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
Thank you very much in advance for any help.
With best regards,
Yours sincerely,
After testing out the application i write on the local pc. I deploy it to the webserver to test it out. I get this error.
System.Data.SqlClient.SqlException: The conversion of a char data type to a
datetime data type resulted in an out-of-range datetime value.
Notes: all pages that have this error either has a repeater or datagrid which load data when page loading.
At first I thought the problem is with the date, but then I can see
that some other pages that has datagrid ( that has a date field) work
just fine.
anyone having this problem before?? hopefully you guys can help.
Thanks,
I have used both data readers and data adapters(with datasets) in the projects that I have worked on. I am trying to get some clarification on when I should be using which one. I think I am doing this correctly but I want to be sure I am developing good habits.
As the name might suggest, it seems like a datareader is for only reading data. I have read that the data adapter and dataset are for a disconnected architecture. Or, that they can be used for this type of set up. I have been using the data adapter and datasets when writing to a database and the datareader when reading from a database.
Is this how these should be used? Is the data reader the best choice for reading data? Am I doing this the optimal way from a performance stand point?
......................................................thanks in advance
We already integrated different client data to MDS with MS Excel plugin, now we want to push back updated or new added record to source database. is it possible do using MDS? Do we have any background sync process to which automatically sync data to and from subscriber and MDS?
View 4 Replies View RelatedWhen I enter over 4000 chars in any ntext field in my SQL Server 2005 database (directly in the database and through the application) I get an error saying that the data could not be updated because string or binary data would be truncated.Has anyone ever seen this? I cannot figure out what is causing it, ntext should be able to hold a lot more data that this...
View 7 Replies View RelatedI have a requirement to implement CDC for 50+ tables to implement incremental data changes warehouse/reporting rather than exporting the whole table data. The largest table is having more than half a billion records.
The warehouse use a daily copy of OLTP db (daily DB refresh). How can I accomplish this. Is there a downside in implementing CDC just for the sake of taking incremental changes on the tables?
Is there any performance impact if we enable CDC on OLTP db?
Can we make use of the CDC tables on the environment we do daily db refresh so that the queries don't hit OLTP database?
What is the best way to implement CDC to take incremental changes for reporting.
Hi,This is driving me nuts, I have a table that stores notes regarding anoperation in an IMAGE data type field in MS SQL Server 2000.I can read and write no problem using Access using the StrConv function andI can Update the field correctly in T-SQL using:DECLARE @ptrval varbinary(16)SELECT @ptrval = TEXTPTR(BITS_data)FROM mytable_BINARY WHERE ID = 'RB215'WRITETEXT OPERATION_BINARY.BITS @ptrval 'My notes for this operation'However, I just can not seem to be able to convert back to text theinformation once it is stored using T-SQL.My selects keep returning bin data.How to do this! Thanks for your help.SD
View 1 Replies View RelatedI'm using Script Component to load data into Oracle DB due to the poor performance issue. Now, I found it will missing some data during the transmission. Please see the screenshot below:Â
SQL Server:
Oracle:
DDL:
create table Person
(
BusinessEntityID Integer,
FirstName nvarchar2(50),
MiddleName nvarchar2(50),
LastName nvarchar2(50)
);
Result:
I follow up this article:Â [URL] ....
VB Script:Â
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
[Code] ..........
Hi all, i got this error:
[DTS.Pipeline] Error: "component "Excel Source" (1)" failed validation and returned validation status "VS_NEEDSNEWMETADATA".
and also this:
[Excel Source [1]] Warning: The external metadata column collection is out of synchronization with the data source columns. The column "Fiscal Week" needs to be updated in the external metadata column collection. The column "Fiscal Year" needs to be updated in the external metadata column collection. The column "1st level" needs to be added to the external metadata column collection. The column "2nd level" needs to be added to the external metadata column collection. The column "3rd level" needs to be added to the external metadata column collection. The "external metadata column "1st Level" (16745)" needs to be removed from the external metadata column collection. The "external metadata column "3rd Level" (16609)" needs to be removed from the external metadata column collection. The "external metadata column "2nd Level" (16272)" needs to be removed from the external metadata column collection.
I tried going data flow->excel connection->advanced editor for excel source-> input and output properties and tried to refresh the columns affected.
It seems that somehow the 3 columns are not read in from the source file?
ans alslo fiscal year, fiscal week is not set up up properly in my data destination?
anyone faced such errors before?
Thanks
When I execute the below stored procedure I get the error that "Arithmetic overflow error converting expression to data type int".
USE [FileSharing]
GO
/****** Object: StoredProcedure [dbo].[xlaAFSsp_reports] Script Date: 24.07.2015 17:04:10 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
[Code] .....
Msg 8115, Level 16, State 2, Procedure xlaAFSsp_reports, Line 25
Arithmetic overflow error converting expression to data type int.
The statement has been terminated.
(1 row(s) affected)
is there a step by step paper to get there? here is what i need to consider. I Iwill have many customers that will need their own set of records and access pages "branded for their company" each customer will have many clients. I am hosting this application on a windows 2003 server with SQL 2005 server enterprise.
I am using windows authentication, I have created a username in windows, then i added the windows user in SQL management studio in security, granted "DB Read" and "DB write" and again under the database security tab. still from the web authentication fails. i must be nissing a step or two?
I expect to set up a username for each database as i setup new customers.
RE: XML Data source .. Expression? Variable? Connection? Error: unable to read the XML data.
I want my XML Data source to be an expression as i will be looping through a directory of xml files.
I don't see the expression property or the connection property??
I tried setting the XMLData property to @[User::filename], but that results in:
Information: 0x40043006 at Load XML Files, DTS.Pipeline: Prepare for Execute phase is beginning.
Error: 0xC02090D0 at Load XML Files, XML Source [108]: The component "XML Source" (108) was unable to read the XML data.
Error: 0xC0047019 at Load XML Files, DTS.Pipeline: component "XML Source" (108) failed the prepare phase and returned error code 0xC02090D0.
Information: 0x4004300B at Load XML Files, DTS.Pipeline: "component "OLE DB Destination" (341)" wrote 0 rows.
Task failed: Load XML Files
Information: 0xC002F30E at Bad, File System Task: File or directory "d:jcpxmlLoadjcp2.xml.bad" was deleted.
Warning: 0x80019002 at Package: The Execution method succeeded, but the number of errors raised (2) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS package "Package.dtsx" finished: Failure.
The program '[3312] Package.dtsx: DTS' has exited with code 0 (0x0).
Thanks for any help or information.
I setup this package to import data from a Sharepoint list to a SQL Server data table. The primary key of my SQL table is mapped to the Title column of my Sharepoint list. There is a possibility that duplicate values will be entered in the Title field of the Sharepoint list. So when importing data into my table via SSIS, my package always error-out when there it comes across duplicate values. how you others have managed data integrity when importing from a Sharepoint list with the Title column being mapped to the primary key of a table.
View 4 Replies View RelatedHello,
I am wondering what conversion rules apply, when a string, which contains a number, is saved to a SQL Server 2005 into a column of type decimal.
This is the code I€™m using (C++):
CString cValue = "0.75"
_variant_t vtFieldValue;
vtFieldValue = _variant_t(cValue)
pRecordSet->Fields->Item["MyColumn"]->Value = vtFieldValue;
"pRecordSet" is an ADO recordset. The database column "MyColumn" is of type "decimal(19,10)".
The most important question for me is, if the regional settings of the database server or the regional settings of the client PC are considered during the conversion from the string to the decimal value. For example in standard French regional settings the "." would not be recognized as decimal separator.
I am also wondering if the language of the database instance, in which this data is saved, is considered during this conversion or any other settings of this database instance.
So my general question is: Does anybody know exactly what rules apply during the above mentioned conversion?
Thank you for your help.
Regards,
Volker
I've question about how to handle structural datamodel changes in a datasource of PowerPivot. Suppose I'm developing a starmodel in SQL Server and sometimes a datatype changes or a name of a field changes in a table. It seems to me that PowerPivot handle this not gracefully as Analysis MD does (mostly). I received an error because of a wrong fieldname or even no error when a dattype changes in PowerPivot. Is this common or do I something wrong here. Does this mean that every time the datamodel changes the PowerPivot should be recreated? Or am I missing the clue here?
View 6 Replies View RelatedHi everyone,
I have to extract, dayly a list of contacts on a exchange server in a table on our EDW on sql server 2005. Is it possible to get the information directly from a dataflow or i will have to developpe a script task ?
Need help desperatly !!!
I am getting ErrorCode 8 while loading the data from stage to model. I have checked my error view it states that "Member Code is Inactive".
Initially I have loaded same set of data in Model from MDS Stage table but then deleted with ImportType = 5 which removed all the data from the MDM model.
Now i want to load it back but its giving the Error Code 8 .. Before loading the same data i have changed the stage table Importtype to 2 and Importstatusid to 0.
Hello,
I have noticed that for one of my data-flows, the process is really long during the phase "the final commit data insertion has started".
To be accurate, the process is fast until it reaches this phase. It happens often when I load millions of lines.
The extraction is done from a database SQL Server 2005 to a database SQL Server 2005, on the same server (with the SQL Server native provider).
I used a SQL Server destination but I have tried with an OLE DB destination and it is the same situation.
Why the process could be so long during this phase?
There is a way to optimised my package to avoid that?
Any idea is welcome.
Thanks.
Guillaume
HiI'm having problems following the tutorial on creating a data access layer - http://www.asp.net/learn/dataaccess/tutorial01cs.aspx?tabid=63 - when I try to compile in Visual Studio 2005 I get namespace could not be found. I followed exactly the tutorial - I created a dataset and added this code in my aspx page. <asp:GridView ID="GridView1" runat="server" CssClass="DataWebControlStyle"> <HeaderStyle CssClass="HeaderStyle" /> <AlternatingRowStyle CssClass="AlternatingRowStyle" />In my C# file I added these lines... using NorthwindTableAdapters; <<<<<this is the problem - where does this come from? protected void Page_Load(object sender, EventArgs e) { ProductsTableAdapter productsAdapter = new ProductsTableAdapter(); GridView1.DataSource = productsAdapter.GetProducts(); GridView1.DataBind(); }Thanks in advance
View 6 Replies View RelatedMy vendor requires data to be sent in Excel format. Some of my tables have rows over 65,536 so I need to use Excel 2007 (Max of 1,048,576). Right now my data sits in SQL 2000. I am using MS SQL Enterprise Manager 8.0 to prepare the data. Is there some kind of add on or selection I am missing to use DTS to export from SQL to Excel 2007?Thanks in advance.
View 3 Replies View RelatedI think I am definitely thrashing and am not getting anywhere on something I think should be pretty simple to accomplish: I need to pull the total amounts for compartments with different products which are under the same manifest and the same document number conditionally based on if the document types are "Starting" or "Ending" but the values come from the "Adjust" records.
So here is the DDL, sample data, and the ideal return rows
CREATE TABLE #InvLogData
(
Id BIGINT, --is actually an identity column
Manifest_Id BIGINT,
Doc_Num BIGINT,
Doc_Type CHAR(1), -- S = Starting, E = Ending, A = Adjust
Compart_Id TINYINT,
[Code] ....
I have tried a combination of the below statements but I keep coming back to not being able to actually grab the correct rows.
SELECT DISTINCT(column X)
FROM #InvLogData
GROUP BY X
HAVING COUNT(DISTINCT X) > 1
One further minor problem: I need to make this a set-based solution. This table grows by a couple hundred thousand rows a week, a co-worker suggested using a <shudder/> cursor to do the work but it would never be performant.
Was wondering if there was a best practice minimum permissions for creating a SQL login to use when setting up a new shared Data source for SSRS report manager?
Something along the lines of them being a data read for the DB and permissions to update tempdb?
Would have thought it not advisable to have the login be able to update the main db...
I need to create a function that replaces the data in a column with an 'X' based on the LEN of the data in the column. I created one that does a replacement, but it fills the column based on the max data length, and not the current length of the string or integer. An example of what I'm trying to accomplish.
Original data in a varchar(30) column:
thisisavalue
thisisanothervalue
thisisanothervalueagain
shortval
replaced with
xxxxxxxxxx
xxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
xxxxxxx
My current function is replacing the data like this:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx