I have sql server 2005 developer edition and vs2005 team system.
I want's to use data cleansing feature.i.e.
Integration Services Advanced Transforms Includes data mining, text mining, and data cleansing
Can please some body points me to any good webcast about using this feature
Does any one knows any software used for Data Cleansing? Or is there any tools with SQL Server 2000 for Data Cleansing? I look forward to hearing from anyone.
I have a large poorly designed table (inherited) With a Name field that contains comma delimited text containing address information. I need to do several things with it but unfortunately there doesn't appear to be any true consistency in it. When it displays in its own text box it works by placing each section on a new Line and looks ok.But I need to pull it apart and place things like unit number, Building Name in its own column etc. In the data it could be in either the 2nd,3rd, 4th, dependent on what came 1st. the data looks some thing like the following
unitNumber/StreetNumber Space StreetName (Building Name), Subub,City,Country
Some addresses won't have unit number or Suburb or country so when splitting you could have Suburbs and Citys in multiple columns even if you try and stagger the split process.Has any body go a good tool or reference site for dealing for this sort of problem. I have a table that I have made up that has some of the street names that could be used for comparing against existing records but it is by no means fool proof due to spelling inconsistencies . I also have another list of Common building names that could be used to compare, remove and place in the new building column.
I am just starting out with SSIS and trying to get the feel of data cleansing using it. But on my very first project for data cleansing I've got into this weird error.
My data flow is very simple, it has a OLE DB source, Fuzzy Lookup and OLE DB destination. I've built three tables for this purpose, one is source, one is reference (it will be used to match for the real entries in fuzzy lookup) and the last is the destination table.
In all the three tables I've a field of City which I'd like to Fuzzy lookup in the reference table and if it crosses certain confidence level, I'd like to insert to the destination table. City in all the tables has the same datatype, defined in the same way, it is varchar(50).
But when in the fuzzy lookup I try to map the Source tables City field to reference tables City field, it gives me this error:
The following columns cannot be mapped:
[City, CityRef]
One or more columns do not have supported data types, or their data types do not match.
Although as I have mentioned before, both have same data types and are defined in the same manner (i.e. I've just selected the datatypes for those columns and all the other settings are left to default). I just cannot understand why this is happening, plz help me with this. FYI I've also tried to give the City Column different datatypes in all the tables like varchar(max), text, Only to be greeted with the same error message.
I have two data sources - an excel spreadsheet and one access database of Customer details, (quite small in size) and I have to use SQL Server 2005 to Integrate the data and cleanse the data so that I could (in theory) import it into a CRM system (though I won't actually be importing it) as this is for coursework only.
Is there a simple steps to do this is SQL server and guides that I can follow that are relevent? As there are lots of tutorials and I haven't a clue where to start.
I have a character field that should contain either a number or space in one of my transitions but it is coming over with junk in it occasionally. What is the best way to ensure that this data is only numeric or a space? Whenever I hit a < or ] one of those weird symbols I want to replace it with a space in my target field. It is SQL Server to SQL Server.
we are trying to build warehouse in our company its very small database but still. we have come to a point where we are able to pull reports from the database that we have created as OLTP which is truncated and reloading everyday . Now we have to stop doing that and just update the database and check for the changes in the existing data. Can anyone suggest ETL book/webcast/weblinks that could help me.
We see a lot of posts here related to data warehousing - it is clearly a hot topic with forum readers.
You may be interested to see that the Joy Mundy of the Kimball group is giving the following webcast :
Using SQL Server Integration Services to Populate a Kimball Method Data Warehouse featuring Joy Mundy July 11: 11:00 AM€“12:00 Noon PT
Anyone using SSIS in data warehousing will find this to be a most useful webcast - it was a hit at TechEd. I'm sure it will answer many questions and give great insights for forum readers.
Yes/No - i think bitE-Mail/MSNURLImages - should i use links to the image or can i store the image directly(like in Access)?Password - is there encryption? like in MySQL? Whats the Data Type and the length i should use. i don't need entering unicode chars so VARCHARs should be good for most?
I am trying to create a process that will take data source that has been output from a proprietary ISAM database to import into a SQL database. This particular ISAM system cannot be accessed via OleDB or ODBC. The thing is I want the process to be able to create the required tables based on data structure information that has been somehow encoded into the data source.
Currently the solution I am going with is to spit out a CSV file that has a header with table and data format information followed by rows of actual data that gets parsed by a SQL script however I am sure that Microsoft must have some kind of preferred solution to this kind of problem but I have not been able to find it. I have looked at the the tools that are available when creating a SQL Server DTS package as well as what seems to be available using the new Integration Services but nothing seems to be any better than the solution I just mentioned.
Anyone have any ideas, I am willing to bet there is a much better way of doing this.
I have a table that among other holds volume data. I need to calculate something called Intelligent Volume based on set of rules. After all rules were followed and I still find data that does not belong to any of the rules this data is bad and needs to be reported on (can not be discarded) I wanted to do this is sql task (insert intel. volume) and I am using Cursor to loop through all the data and all rules. How do I redirect this data to a file so we can report on those records?
We do not have any Address Cleansing tools and the requirement is we have to cleanse the data, finding the best possible record which has all info and update other records accordingly.
I am Not sure we can do this Fuzzy Grouping Transformation.
does anyone know if MS supports some kind of breaking strategy within Fuzzy Lookup/Grouping?
Besides that, I'd like to perform a address cleansing operation on a CRM database. I don't have a reference table (Street, Zip, LastLine, etc.) for that. Where can I get an appropriate database? Anyone has some experience with this issue?
Hi All: Not sure what's causing this problem... I have a win2k webserver running FPSE and .NET 2.0 on a private subnet (192.168.x.x). The webserver connects to a separate SQL 2005 server running win2003 on the same subnet. I have an alias on the webserver pointing to the dB server, and in web.config I'd have a connection string like: <connectionStrings> <add name="webConnectionString" connectionString="Data Source=sql2005serveralias;Initial Catalog=webdB;Persist Security Info=True;User ID=webUser;Password=webPWD;Network Library=dbmssocn"providerName="System.Data.SqlClient" /> </connectionStrings> That connection string, as is, works...pages on the website connect to the server fine. If, however, I open the website in VWD2005 from a remote location (using FPSE over HTTP), and try to create a new page with a sql data connector, using the existing connection string in the web.config, I can't connect to the database. I get a "TCP Provider, error: 0 ...database is not accessible" type of error. Is this because I'm outside the subnet, and my local copy of VWD is trying to connect to the server on "192.168.x.x" and of course can't? If so, is the only way to work off the server is to VPN into the firewall and thereby get a local IP address assigned to my computer?
In SSIS I use the DQS Cleansing transformation component. I've got a knowledge base (KB) in place and this KB holds various domains and my data source has more input columns than would like to use for a particular clean up operation. I want to use some of the input columns to map against some domains in the KB. It is my understanding that it should be possible to select only the required input columns, but all i can do is select all input columns.
I have an application that automatically reads a lot of data from a third-party application into my database, via XML. For example, I might read a couple thousand rows-worth of XML data, one row at a time in a foreach loop. To reduce the load on their server and database, I thought about putting a 2 second delay in between each of my automatic requests. Would this really help much, or is there enough overhead (setting up/tearing down connections, etc) with each request that it wouldn't reduce server load much anyway? Is 2 seconds enough? Too little or too much?
I have a problem at pulling data on PDA using Sql Mobile 5 from Sql Server 2000.
It work s for most tables, except for two . Every table of those two has about 1000 records and every record about 150 bytes, so all data is about 150 kbytes.
When I connect to the Sql Server 2000 through LAN, no problem. But when I connect through GPRS - VPN, the error occurs. But, the PDA- phone connection is via Bluetooth and at ping no TimeOut appears, the replies are about 600-700 ms. When connecting with a desktop computer through GPRS no such problem occurs.
So , I don't know what could be the cause: the connection or the Sql Mobile which doesn't retry enough in case of error to transfer all data .
A specialist from the GPRS provider told me to use 3G phone , but , because the GPRS phones have already been bought, this would be a too big investment. So, I don't think this is a good idea in this moment .
I found similar problem on the net and a solution was to use Pooling = False. But , in Sql Mobile there is no pulling parameter available when making the connection string .
hello i am just starting to learn sql and know the basics, but now im looking for a good book to learn some more. A book that covers stored procedure would be very useful. If possible a book with q and a would be very good because i feel this tests if u understand what was just explaned. but if there is a good book without this it is ok. All sugestions welcome
hii am using vs2005 for development of web application for reporting with sqlexpress05 as back end .later when project is ready for deployment i have to deploy the project on remote hosting server where i have limited access and sqlserver2000 database to use.i want to ask is there are any limitation or problem of sqlexpress while deploying it on remote sqlservre 2000.and should i have to to continue with sqlexpress as back end.is there any problems for using dynamic database connections(by using smart tags) other than programaticaly connecting database to asp.net ie by writing code.i am new in developmentplease guide me, please guide
hello all..i have make a searching, but is not good. my code like that:Public Class getall Public Function getitem(ByVal id As String) As DataSet Dim con As SqlConnection = New SqlConnection("Data Source=BOYsqlexpress;Initial Catalog=GAMES;User ID=ha;Password=a") Dim ds As New DataSet() Dim adapter As New SqlDataAdapter("select * from [item] where name like '%" & id & "%'", con) Try con.Open() adapter.Fill(ds, "user") Return ds Catch ex As Exception Console.Write(ex.Message) Finally con.Close() con = Nothing End Try ' Next Return ds End Functionand class my item in database is containning dragon ball 3, counter strikeif i insert dragon, it can display dragon ball 3.but if i insert dragon 3, it not display dragon ball 3.it should display dragon ball 3 .how should i change my code?thx...
So, as you can see, at first it appears that I'm after a LEFT JOIN - meaning that the grandparents don't need to have child records to be returned, but, then it turns out that I need INNER JOINS - to limit grandparents when I choose children.
I'm wondering if there's a better SQL Editor than MS Query Analyzer on the market? I like a lot of the functionality provided by QA but want extra stuff like you get in VB6: intelli-sense (sytanx prompting), auto-complete (CTRL+Space provides list of sp's and tables, etc.) plus any other time saving features.
I've tried a few products but nothing quite hits the mark. Is there a program you use and recommend I trial?
Folks, i've got a table with a column; ACCOUNT VARCHAR(30). All the values numeric though. (leave abt the datatype yet). The column is clustered indexed.
SELECT * FROM MYTABLE WHERE LEFT(ACCOUNT,3)='123' execution plan shows CLUSTERED INDEX SCAN.
SELECT * FROM MYTABLE WHERE ACCOUNT LIKE '123%' execution plan shows CLUSTERED INDEX SEEK.
How, why. Why doesn't the optimizer works good for the first query?
Hi, ive got some work to do on SQL queries, the scenario is below and at the bottom is my attempt at answering in the questions: Could somebody simply tell me if the answer at the bottow are correct, if not what I have done wrong.
A local company that produces machine parts has decided to develop an in-house database system. They have identified the following tables: -
tblOrders OrderNo, CustomerNo, Date, OrderTotal
tblCustomers CustomerNo, Name, Street, Town, County, Postcode
tblParts PartNo, Description, UnitCost
tblItems OrderNo, PartNo, Quantity, ItemTotal
Create SQL queries to produce the following: -
a) Details of all orders over £1000 sorted by customer number.
b) A list of all part descriptions and their quantities appearing on order 39
c) Delete all orders placed by customers in Wrexham.
d) Archive all orders placed by customer Clarke into a new table called tblArchive.
e) Increase the price of all parts whose description includes the word “washer” by 4%.
These are my answer, which im not too sure if they are correct. If any1 could tell me if there correct or not that would be great, thanks.
a) SELECT * FROM tblOrders ORDERBY CustomerNO WHERE OrderTotal > 1000
b) SELECT tblParts.PartNo, tblParts.Description, tblItems.Quantity FROM tblItems INNER JOIN tblParts ON tblItems.PartNo = tblParts.PartNo; WHERE OrderNo = 39
c) DELETE tblOrders.* FROM tblOrders INNER JOIN tblCustomers ON tblOrders.CustomerID = tblCustomers.CustomerID WHERE Town = “Wrexham”
d) INSERT INTO tblArchive SELECT * FROM tblOrders INNER JOIN tblCustomers ON tblOrders.CustomerID = tblCustomers.CustomerID WHERE Name = “Clarke”
e) UPDATE tblParts SET UnitCost = [UnitCost]*1.04 WHERE Description LIKE “*washer” or Description LIKE “washer*” or Description LIKE “*washer*”
I'm about 6 weeks into SQL and SQL Server (7) - I was wondering whether you could share your opinions about which language to use as a programming tool for developing apps for & with SQL Server. I'm choosing between C++ (Visual) or JAVA.
I already know C and the DB-Libe contains a lot of it but I'm kinda trying to expand some horizons. I'm ok with either C++/VC++ or JAVA but I only have time to learn (or be good at) one.
Any suggestions? (I'd like to hear what you think even if you say neither C++ or JAVA - maybe VB? What's easy and marketable is what matters most.)