SQL 2012 :: String Splitting Using Like And Data Cleansing
Mar 28, 2014
I have a large poorly designed table (inherited) With a Name field that contains comma delimited text containing address information. I need to do several things with it but unfortunately there doesn't appear to be any true consistency in it. When it displays in its own text box it works by placing each section on a new Line and looks ok.But I need to pull it apart and place things like unit number, Building Name in its own column etc. In the data it could be in either the 2nd,3rd, 4th, dependent on what came 1st. the data looks some thing like the following
unitNumber/StreetNumber Space StreetName (Building Name), Subub,City,Country
Some addresses won't have unit number or Suburb or country so when splitting you could have Suburbs and Citys in multiple columns even if you try and stagger the split process.Has any body go a good tool or reference site for dealing for this sort of problem. I have a table that I have made up that has some of the street names that could be used for comparing against existing records but it is by no means fool proof due to spelling inconsistencies . I also have another list of Common building names that could be used to compare, remove and place in the new building column.
Hi to everyone,My problem is, that I'm not so quite sure, which way should I go.The user is inputing by second part application a long string (let'ssay 128 characters), which are separated by semiclon.Example:A20;BU;AC40;MA50;E;E;IC;GREENNow: each from this position, is already defined in any other table, asa separate record. These are the keys lets say. It means, a have someproperities for A20, BU, aso.Because this long inputed string, is a property of device (whih alsohas a lot of different properities) I could do two different ways ofstoring data:1. By writing, in SP, just encapsulate each of the position separatedby semicolon, and write into a different table with index of device,and the position in long stirng nearly in this way:Major device data tableID AnyData1 AnyData2 ... AnyData3123 MZD12 XX77 .... any comment text124 MZD13 XY55 ... any other commentString data Tablefk_deviceId position value123 1 A20123 2 BU123 3 AC40.....123 8 GREENThe device table, contains also a pointer (position), which mightchange, to "hglight" specified position.Then, I can very easly find all necessary data. The problem is, I needto move the device record data (from other table) very often into otherhistory table (by each update). That will mean, that I also need tomove all these records from 1 -8 for example to a separate historytable, holding the index for a history device dataset. This is a littleinconvinience in this, and in my opinion, it will use to much storagedata, and by programming, I need always to shift this properities intohistory table, whith indexes to a history table of other properities.2. Table will be build nearly in this way:Major device data tableID AnyData1 AnyData2 ... AnyData3 stringProperty pointer123 MZD12 XX77 .... any comment text A20;BU;AC40;MA50;E;E;IC;GREEN 3124 MZD13 XY55 ... any other comment A20;BU;AC40;MA50;E;E;IC;GREEN 2By writng into device table, there will be just a additional field forthis string, and I will have a function, which according to specifiedpointer, will get me the string part on the fly, while I need it.This will not require the other table, and will reduce the amout ofdata, not a lot ... but always.This solution, has a inconvinance, that it will be not so fast doing asearch over the part of this strings, while there will be no real indexon this.If I woould like to search all devices, by which the curent pointervalue is equal GREEN, then I need to use function for getting thevalue, and this one will be not indexed, means, by a lot amount ofdata, might be slow.I would like to know Your opinion about booth solutions.Also, if you might point me the other problems with any of thissolution, I might not have noticed.With Best RegardsMatik
What I need is split the data into two columns if data in column Main starts with 'PR-' then output result to column P and if it starts with 'CC-' then to column C (the output needs to be in one table).
Currently I have a column with multiple postcodes in one value which are split with the “/” character along with the corresponding location data. What I need to do is split these postcode values into separate rows while keeping their corresponding location data.
I'm a non-programmer and an SQL newbie. I'm trying to create a printer usage report using LogParser and SQL database. I managed to export data from the print server's event log into a table in an SQL2005 database.
There are 3 main columns in the table (PrintJob) - Server (the print server name), TimeWritten (timestamp of each print job), String (eventlog message containing all the info I need). My problem is I need to split the String column which is a varchar(255) delimited by | (pipe). Example:
2|Microsoft Word - ราย?ารรับ.doc|Sukanlaya|HMb1_SD_LJ2420|IP_192.10.1.53|82720|1
The first value is the job number, which I don't need. The second value is the printed document name. The third value is the owner of the printed document. The fourth value is the printer name. The fifth value is the printer port, which I don't need. The sixth value is the size in bytes of the printed document, which I don't need. The seventh value is the number of page(s) printed.
How I can copy data in this table (PrintJob) into another table (PrinterUsage) and split the String column into 4 columns (Document, Owner, Printer, Pages) along with the Server and TimeWritten columns in the destination table?
In Excel, I would use combination of FIND(text_to_be_found, within_text, start_num) and MID(text, start_num, num_char). But CHARINDEX() in T-SQL only starts from the beginning of the string, right? I've been looking at some of the user-defind-function's and I can't find anything like Excel's FIND().
Or if anyone can think of a better "native" way to do this in T-SQL, I've be very grateful for the help or suggestion.
Does any one knows any software used for Data Cleansing? Or is there any tools with SQL Server 2000 for Data Cleansing? I look forward to hearing from anyone.
I have sql server 2005 developer edition and vs2005 team system. I want's to use data cleansing feature.i.e. Integration Services Advanced Transforms Includes data mining, text mining, and data cleansing
Can please some body points me to any good webcast about using this feature
I am just starting out with SSIS and trying to get the feel of data cleansing using it. But on my very first project for data cleansing I've got into this weird error.
My data flow is very simple, it has a OLE DB source, Fuzzy Lookup and OLE DB destination. I've built three tables for this purpose, one is source, one is reference (it will be used to match for the real entries in fuzzy lookup) and the last is the destination table.
In all the three tables I've a field of City which I'd like to Fuzzy lookup in the reference table and if it crosses certain confidence level, I'd like to insert to the destination table. City in all the tables has the same datatype, defined in the same way, it is varchar(50).
But when in the fuzzy lookup I try to map the Source tables City field to reference tables City field, it gives me this error:
The following columns cannot be mapped:
[City, CityRef]
One or more columns do not have supported data types, or their data types do not match.
Although as I have mentioned before, both have same data types and are defined in the same manner (i.e. I've just selected the datatypes for those columns and all the other settings are left to default). I just cannot understand why this is happening, plz help me with this. FYI I've also tried to give the City Column different datatypes in all the tables like varchar(max), text, Only to be greeted with the same error message.
I have two data sources - an excel spreadsheet and one access database of Customer details, (quite small in size) and I have to use SQL Server 2005 to Integrate the data and cleanse the data so that I could (in theory) import it into a CRM system (though I won't actually be importing it) as this is for coursework only.
Is there a simple steps to do this is SQL server and guides that I can follow that are relevent? As there are lots of tutorials and I haven't a clue where to start.
I have a character field that should contain either a number or space in one of my transitions but it is coming over with junk in it occasionally. What is the best way to ensure that this data is only numeric or a space? Whenever I hit a < or ] one of those weird symbols I want to replace it with a space in my target field. It is SQL Server to SQL Server.
Hello,I have been placed in charge of migrating an old access based databaseover to sql server 7.0. So far, I have imported all the tables intosql server, but now I have come across the issue of needing to split astring variable. For instance, in the old database, the variable forname was such that it included both first and last names, whereas inthe new database there are seperate entities for first and last name.I know that there is a way to write a script that will separate out thetwo strings by using the "space" in between the name, but I'munfamiliar how to do this. Any suggestions? Thanks!Rick
I'm trying to split a hyphen-delimited string into three columns in a view. I've been using substring and len to split up the string, but it is getting very complicated (and isn't working in all cases). I've used a SPLIT function in vbscript - does t-sql have anything similar? I've attached a spreadsheet that shows what I am looking for. Maybe someone can guide me in the right direction?
I need to creat distinct terms of the example parsing the term on the '|' character. I will be using mysql.
example: 1885-1974.|Johnson family|Frontier and pioneer life - Alberta - Black Hill district|Cadogan region (Alta.) - Biography|Black Hill district (Alta.) - Biography
I have a column in a table which looks like below.
Column ------- AA123 D123 AXC1 QF23
I need to split this value into two part, Alphabets and numeric. How to do this using SQL query.My column value will not have mixed characters like A1D3,G32S,12F.It will always follow the ablve pattern mentioned above.
I'm creating a web-based NT RAS report site and am looking for the most efficient way to import the data from NT Event log into SQL2k. I'm using the 'dumpel' utility from rsc kit and all is fine except the 10th column - the message detail:
"The user DOMAINuserid connected on port Mdm15 on 08/23/2002 at 07:25am and disconnected on 08/23/2002 at 07:27am. The user was active for 2 minutes 23 seconds. 78809 bytes were sent and 50675 bytes were received. The port speed was 49300."
I need to parse this one long text string into 6 distinct columns: userID, port, duration, bytes_xmt, bytes_rcv and portspeed. After a quick review of the rowsets, the strings seem to hold a consistent output ... no real variances I can see.
I've dablled with views but am facing a small performance issue that could get bigger: The sql server not only has to run the text file import package, but also the view to format the text dump into a workable dataset, then my report code bangs over 30 queries against the final dataset. It already takes our SQL2k server over 3 minutes to parse about 20,000 rows and the server's a beast (dual 1.8 p4 cpu, 3gb ram, raid, etc).
What I think would work best is to abandon the view (performance will only get worse as the row count increases) and instead INSERT the rows into one table.
Any ideas anyone? any good scripts out there that can help me to parse the long text string quicker that using substring and replace functions?
So we have a field called forenames, and it needs to be split into fields forename_1, forename_2, forename_3, forename_4 (don't ask).
Ok, I've come up with this so far, which works, but is pretty nacky in my opinion. Has any one got a better way of achieving this?
SELECT forenames , Replace(forenames, ' ', '.') , Reverse(ParseName(Replace(Reverse(forenames), ' ', '.'), 1)) As [f1] , Reverse(ParseName(Replace(Reverse(forenames), ' ', '.'), 2)) As [f2] , Reverse(ParseName(Replace(Reverse(forenames), ' ', '.'), 3)) As [f3] , Reverse(ParseName(Replace(Reverse(forenames), ' ', '.'), 4)) As [f4] FROM ( SELECT 'John' As [forenames] UNION SELECT 'John Paul' UNION SELECT 'John Paul George' UNION SELECT 'John Paul George Ringo' ) As [x]
Results
forenames (no column name) f1 f2 f3 f4 ---------------------- ---------------------- ---- ---- ------ ----- John John John NULLNULL NULL John Paul John.Paul John PaulNULL NULL John Paul George John.Paul.George John PaulGeorgeNULL John Paul George Ringo John.Paul.George.Ringo John PaulGeorgeRingo
I have a column named "LIST" in a table with strings like the following:151231-1002-02-1001151231-1001-02-1001151231-1002-02-1002151231-1003-02-1001etc....What I'd like to do is include an ORDER BY statement that splits thestring, so that the order would be by the second set of four numbers(i.e. between the first and second - marks), followed by the third setof two numbers, and then by the last set of four numbers.How would I do something like this?--Sugapablo - Join Bytes!http://www.sugapablo.com | ICQ: 902845
I need a help in SQL Server 2000. I am having a string variable in the format like -- (1,23,445,5,12) I need to take single value at a time (like 1 for 1st, 23 for 2nd and so on) from the variable and update the database accordingly. This is like a FOR loop. Can anyone help me out in splitting the variable using the comma separator...
Hi All!!! I was tasked to come up with a search function and the content of the database given to me is in Chinese Characters. This would be my first time dealing with Chinese characters in the database and I need help with the following problem: The company wants to conduct the search in such a way that, instead of having the system read the entire sentence/phrase which the user keyed in as a SINGLE string, they want the Chinese Characters to be accessed individually, so that as long as any information in the database contains any one of the characters which the user have entered, they will be retrieved and returned. So how do I go about doing this? Does it have anything to do with Unicode? By the way, everything abt the search tool is working fine, I am just left with this dilemma of having the system recognise the entire sentence as ONE STRING, instead of conducting a search word by word or character by character. Anyway, the following is the SQL statement of my SQL Data Source which is bound to a Gridview displaying the returned results after a search is done...1 SELECT Name, Trans, Address1, Address1T, Address2, Address2T, City, CityT, CRPLID 2 FROM CRPL 3 WHERE (Trans LIKE '%' + @Trans + '%') OR 4 (Name LIKE '%' + @Name + '%') OR 5 (Address1 LIKE '%' + @Address1 + '%') OR 6 (Address1T LIKE '%' + @Address1T + '%') OR 7 (Address2 LIKE '%' + @Address2 + '%') OR 8 (Address2T LIKE '%' + @Address2T + '%') OR 9 (City LIKE '%' + @City + '%') OR 10 (CityT LIKE '%' + @CityT + '%')
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
This works very well, other than instances of the delimter are, themselves, considered to be results. For example:
SELECT * FROM vs_SplitTags(' ', 'foo bar') AS result returns: pn s 1 foo 2 bar
which is exactly the result I would want.
However, SELECT * FROM vs_SplitTags(' ', ' foo bar ') AS result -- There are spaces before 'foo' and after 'bar' returns pn s 1 2 foo 3 bar 4
And SELECT * FROM vs_SplitTags(' ', 'foo bar') AS result -- There are two spaces between 'foo' and 'bar' returns pn s 1 foo 2 3 4 bar
I want the function to ignore whitespace altogether, be it a single space or multiple spaces. Other than to delimit the boundries between words, of course.
In other words, all three examples above should produce the same result: pn s 1 foo 2 bar
How can I do this? Any thoughts much appreciated...
I'm quite new to SQL. I'm able to extract the info that I need, but only into a result of one row, like:
Order header | Order details
ID | Customer name | Customer address | Product number | Product name | Quantity | Price | Product number | Product name | Quantity | Price 2 Andy Andy's way 2 24 Glue 3 35 39 Oyster 2 9
So we have new servers that are going to be installed with SQL 2012 and I'm debating the wisdom of splitting tempdb with multiple files.
I know it's a myth that performance automatically improves if you split it into a number of files based on processors, but I'm debating the wisdom of putting a file on each of my data / log file drives.
For instance, I have a server with a C: drive (OS), D: drive (Data for system DBs and install of programs - 458 GB), an F: drive for user DB data files (767 GB), and a J: drive for log files (255 GB).
Obviously no files are going on C:. I'm debating on whether or not we should even leave system DBs on the D: drive given in our current 2k8 servers, we end up with Memory.dmp files over flowing the D: drives as well as .cabs and other install / update files that tend to collect on that drive over the years.
But if we leave the system DBs on D:, I'm wondering if adding a second tempdb file to F: and a third to J: will improve query performance or not.
Looking for returning multiple entries from a time span. I have a date, start-time, end-time and duration. I need the start-times separated in a list. It's fine if temp tables are needed - I have that clearance.
I have a very interesting problem in T-SQL coding for which I can't figure out the solution. Actually there is a Line_1_Address column in our data warehouse address table which is being populated from various sources. Some sources have already concatenated house number + street address fields in the Line_1_Address column whereas one source has separated columns for both data fields.
Now I'm trying to extract data from this data warehouse table and I need to split the house number from street address and load it into separate columns in my destination table. In case there is no data for house number then I should load it as NULL.
The issue is that data in this Line_1_Address column is very inconsistent so I don't know which functions to use. Here is some sample data for your consideration:
Line_1_Address 101 E Commerce ST 120 E Commerce ST 2 Po Box 301 W. Bel Air Ave West Main Street, PO Box 1388
I have a description field in a table which also stores unit of measure in the same column but with some space between them, I need to split these into two different columns.
I have system id information in table system_ids and productids and systemidinsformation has lot of data but I am looking two strings in tire data to pull into two separate columns. details below
Database versions :ms sql 2008/2012 tablename:system_id's column:system id information
sample data from system_id_information column
######################################## <obj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:vim25" versionId="5.5" xsi:type="ArrayOfHostSystemIdentificationInfo"><HostSystemIdentificationInfo xsi:type="HostSystemIdentificationInfo"><identifierValue> unknown</identifierValue><identifierType><label>Asset Tag</label><summary>Asset tag of the system</summary><key>AssetTag</key></identifierType>
[Code] .....
I am looking output of two columns, which are bolded
product_id snumber 654081-B21 MXQ43905SW
for serial number this is common
before string :HostSystemIdentificationInfo"><identifierValue>
and after string </identifierValue><identifierType><label>Service tag
and snumber is always between the before and after string and number of characters of snumber varies and entire data for a row also varies