I am required to send an XML file of our clients to head office in Belgium for comparison against a database of known undesirables. The data is in a legacy system with a custom database so I have created an SSIS package that extracts the tables I need into SQL Server and have developed a program that reads from a text source and creates the XML then Secure FTPs it to Hong Kong who will handle it from there.
My problem lies in actually extracting enough data to avoid too many false positives. The scanning will check name, identity (passport number, etc.), town/city and country. We don't hold an identity number and the town/city and country are buried in free format fields. A quick analysis of the 419,000 records shows that the spelling is terribly unreliable, too. In most cases country has not been entered because the clients are local and even when they are overseas, sometimes only the city has been entered. That is often misspelt, too e.g. Kuala Lumpar or Melboure.
The addresses are held in 3 equal length fields called Address_1, Address_2 and Address_3. There's no guarantee that I will find the town/city or country in any particular one of these fields. In some cases, the street number and name are in Address_3 because the first two hold a company name and a C/O line.
So I'm not going to fret over the ones where the address information is nonsense or missing but I would like to try and extract valid country names and town/city names, where present and this is where I get stuck. I'm from a COBOL programming background and although I'm loving getting used to the power of SQL, I'm still a bit stumped when I come across a problem like this probably because I keep thinking of the solution in procedural terms.
I have a feeling that the solution will be to create two separate reference tables, one of towns/cities and the other of countries. I would then somehow search the 3 fields looking for those keywords and if found, entering them in the appropriate part of the output text file to represent town/city and/or country. I did also think about destringing to find the separate words but that doesn't help where the name consists of two words such as NEW ZEALAND.
I would love to hear from anyone who has dealt with a similar problem and has a neat solution to this using SQL.
I have to extract a specific part of a string from a column in a sql server table. Following are the details and I have given the sample table and the sample strings.
I have 2 columns in my table [dbo].[StringExtract] (Id, MyString)
The row sample looks like the following
I have to extract the Id and a part of the column from mystring.
Id MyString 1 ABC|^~&|BNAME|CLIENT1||CLIENT1|20110609233558||BIC^A27|5014589635|K|8.1| ABC1|^~&|BNAME1|CLIENT1||CLIENT1|20110609233558||CTP^A27|5014589635|I|7.1| DEF||5148956598||||Apprised|Bfunction1||15|LMP|^^^201106101330| alloys3^ally^crimson^L||||alloys3^ally^crimson^L||||alloys3^ally^crimson^L|||||Apprised|
[Code] ....
The part I want to extract is in the line "ZZZ" and the string part that i want to extract is between the 5th and 6th pipes (|). So my output looks like the following
Id DesiredString 1 Extracts^This^String1 2 Extracts^This^String2 3 Extracts^This^String3
Is there a way to extract this either using TSQL or SSIS.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[StringExtract]') AND type in (N'U')) DROP TABLE [dbo].[StringExtract] GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[StringExtract]') AND type in (N'U')) BEGIN CREATE TABLE [dbo].[StringExtract]( [Id] [int] NULL,
We have 4 regions, currently we only have 3 servers in the field, and therefore only 3 regional id’s are being used to store the actual data of the pbx. The central server (RegionalID = 0) is holding the data for itself and the 4th region until the new server is deployed. It now has to be deployed and therefore the data migration for this region has to take place. I am trying to extract all the data for this 4th region (RegionalID= 1) from the central server database from all the relevant tables. When doing this I will firstly, have to check that the CallerID is valid, if it is not valid, then check that RegionalDialup = ‘0800003554’ which is the dialup number for this 4th region (RegionalID = 1).
I have a table named lnkPBXUser which contains the following:
I have a table named tblDialupLog which has 20 columns, I have selected only the columns I am interested in (below):
PBXIDDailupDT DongleAccessNum CLI RegionalID RegionalDialup 838/8/2006 8:58:11 AM T2 UQ 28924 013249370000800003554 5438/8/2006 8:55:44 AM T0 UA 33902 012362350000800003554 12198/8/2006 8:59:03 AM T3 ZD 02031 015295809500800003554 10128/8/2006 9:02:54 AM T0 UA 41261 017301105000800003554 13318/8/2006 8:59:57 AM T0 UA 01938 012460462700800003554 19798/8/2006 9:02:52 AM T0 UA 09836 016375121000800003554 19038/8/2006 8:58:41 AM T0 UA 26009 014717535600800003554 15228/8/2006 8:58:54 AM T3 MB 94595 057391287100800004249 3198/8/2006 8:51:28 AM T2 ZD 32892 054337510000800004249 32708/8/2006 9:04:26 AM T2 MB 8733100800004249
I have a table named tblCodes, it contains all regions but I only need to select the codes for RegionalID 1 :
CodeIDRegionalID ExtName SubsNDCDLocCDUpdateStatusRegionDesc 79731 PRETORIA 0123620NORTH EASTERN REGION 79741 HARTEBEESHOEK 012 30120NORTH EASTERN REGION 79751 HARTEBEESHOEK 01230130NORTH EASTERN REGION 79761 PRETORIA 01730140NORTH EASTERN REGION 79771 PRETORIA 01230150NORTH EASTERN REGION I have a table named tblDongleArea which contains the following (below only shows dongle area codes for the fourth region( RegionalID = 1):
Ok, I am dealing with the lnkPBXUser table at the moment,
I need to be able to join lnkPBXUser and tblDialupLog, then compare tblDialupLog.CLI to tblCodes.SubsNDCD + tblCodes.LocCD (when these two columns are concatenated the result will only be a substring of tblDialupLog.CLI. (this is to make sure that the CLI exists in tblCodes.)
If it does exist, then it is part of the fourth region and should be returned in the result set.
If it does not exist, I then need to check that tblDongle.DongleAreaCode is a substring of tblDialupLog.DongleAccessNumber.
If it is a valid DongleAreaCode for that region, then it is part of the fourth region and should be returned in the result set.
If it does not exist, I then need to check that tblDialupLog.RegionalNumber = ‘080003554’.
So from the above tables an expected result would be:
RegionalID pbxID userID 0 1012 17 0 543 2
Please assist, it would be greatly appreciated. Regards SQLJunior
-- GETALLWORDS() User-Defined Function Inserts the words from a string into the table. -- GETALLWORDS(@cString[, @cDelimiters]) -- Parameters -- @cString nvarchar(4000) - Specifies the string whose words will be inserted into the table @GETALLWORDS. -- @cDelimiters nvarchar(256) - Optional. Specifies one or more optional characters used to separate words in @cString. -- The default delimiters are space, tab, carriage return, and line feed. Note that GETALLWORDS( ) uses each of the characters in @cDelimiters as individual delimiters, not the entire string as a single delimiter. -- Return Value table -- Remarks GETALLWORDS() by default assumes that words are delimited by spaces or tabs. If you specify another character as delimiter, this function ignores spaces and tabs and uses only the specified character. -- Example -- declare @cString nvarchar(4000) -- set @cString = 'The default delimiters are space, tab, carriage return, and line feed. If you specify another character as delimiter, this function ignores spaces and tabs and uses only the specified character.' -- select * from dbo.GETALLWORDS(@cString, default) -- select * from dbo.GETALLWORDS(@cString, ' ,.') -- See Also GETWORDNUM() , GETWORDCOUNT() User-Defined Functions CREATE function GETALLWORDS (@cSrting nvarchar(4000), @cDelimiters nvarchar(256)) returns @GETALLWORDS table (WORDNUM smallint, WORD nvarchar(4000), STARTOFWORD smallint, LENGTHOFWORD smallint) begin -- if no break string is specified, the function uses spaces, tabs and line feed to delimit words. set @cDelimiters = isnull(@cDelimiters, space(1)+char(9)+char(10)) declare @k smallint, @wordcount smallint, @nEndString smallint, @BegOfWord smallint, @flag bit
select @k = 1, @wordcount = 0, @nEndString = 1 + datalength(@cSrting) /(case SQL_VARIANT_PROPERTY(@cSrting,'BaseType') when 'nvarchar' then 2 else 1 end) -- for unicode
while charindex(substring(@cSrting, @k, 1), @cDelimiters) > 0 and @nEndString > @k -- skip opening break characters, if any set @k = @k + 1
if @k < @nEndString begin select @wordcount = 1, @BegOfWord = @k, @flag = 1 -- count the one we are in now count transitions from 'not in word' to 'in word' -- if the current character is a break char, but the next one is not, we have entered a new word while @k < @nEndString begin if @k +1 < @nEndString and charindex(substring(@cSrting, @k, 1), @cDelimiters) > 0 begin if @flag = 1 and charindex(substring(@cSrting, @k-1, 1), @cDelimiters) = 0 begin select @flag = 0 insert into @GETALLWORDS (WORDNUM, WORD, STARTOFWORD, LENGTHOFWORD) values( @wordcount, substring(@cSrting, @BegOfWord, @k-@BegOfWord), @BegOfWord, @k-@BegOfWord ) -- previous word end if charindex(substring(@cSrting, @k+1, 1), @cDelimiters) = 0 select @wordcount = @wordcount + 1, @k = @k + 1, @BegOfWord = @k, @flag = 1 -- Skip over the first character in the word. We know it cannot be a break character. end set @k = @k + 1 end
if charindex(substring(@cSrting, @k-1, 1), @cDelimiters) > 0 set @k = @k - 1 if @flag = 1 insert into @GETALLWORDS (WORDNUM, WORD, STARTOFWORD, LENGTHOFWORD) values( @wordcount, substring(@cSrting, @BegOfWord, @k-@BegOfWord), @BegOfWord, @k-@BegOfWord ) -- last word end
I am pleased to offer, free of charge, the following string functions Transact-SQL:
AT(): Returns the beginning numeric position of the nth occurrence of a character expression within another character expression, counting from the leftmost character. RAT(): Returns the numeric position of the last (rightmost) occurrence of a character string within another character string. OCCURS(): Returns the number of times a character expression occurs within another character expression (including overlaps). OCCURS2(): Returns the number of times a character expression occurs within another character expression (excluding overlaps). PADL(): Returns a string from an expression, padded with spaces or characters to a specified length on the left side. PADR(): Returns a string from an expression, padded with spaces or characters to a specified length on the right side. PADC(): Returns a string from an expression, padded with spaces or characters to a specified length on the both sides. CHRTRAN(): Replaces each character in a character expression that matches a character in a second character expression with the corresponding character in a third character expression. STRTRAN(): Searches a character expression for occurrences of a second character expression, and then replaces each occurrence with a third character expression. Unlike a built-in function Replace, STRTRAN has three additional parameters. STRFILTER(): Removes all characters from a string except those specified. GETWORDCOUNT(): Counts the words in a string. GETWORDNUM(): Returns a specified word from a string. GETALLWORDS(): Inserts the words from a string into the table. PROPER(): Returns from a character expression a string capitalized as appropriate for proper names. RCHARINDEX(): Similar to the Transact-SQL function Charindex, with a Right search. ARABTOROMAN(): Returns the character Roman numeral equivalent of a specified numeric expression (from 1 to 3999). ROMANTOARAB(): Returns the number equivalent of a specified character Roman numeral expression (from I to MMMCMXCIX).
AT, PADL, PADR, CHRTRAN, PROPER: Similar to the Oracle functions PL/SQL INSTR, LPAD, RPAD, TRANSLATE, INITCAP.
More than 5000 people have already downloaded my functions. I hope you will find them useful as well.
For more information about string UDFs Transact-SQL please visit the http://www.universalthread.com/wconnect/wc.dll?LevelExtreme~2,54,33,27115
Please, download the file http://www.universalthread.com/wconnect/wc.dll?LevelExtreme~2,2,27115
Hi,I'd be interested in people's thoughts about the following. A user on my site will be searching for a venue name, and that could officially include a sponsor which the user might not search for. Now I am using the AutoCompleteDropdown from the AJAX Control Toolkit, so the user will start typing in a few characters and the results will be returned. I can generate the results from sql by doing a simple LIKE '%' + @searchTerm + '%' however, this fills me with great fear of table scans. At the moment, we'd be querying against a table of 5K records, but our application is very new.I'm thinking one option is to split the words into another table - a one to many relationship to hold each word of the venue. The benefit of this would be that you could do a:LIKE @term + '%'but then I have the cost of the join. (And the added complexity which is not a major issue)Any thoughts/tips?Thanks!
Is there a way in SQL Server 2000 to extract data from a table, such thatthe result is a text file in the format of "Insert Into..." statements, i.e.if the table has 5 rows, the result would be 5 lines of :insert into Table ([field1], [field2], .... VALUES a,b,c)insert into Table ([field1], [field2], .... VALUES d, e, f)insert into Table ([field1], [field2], .... VALUES g, h, i)insert into Table ([field1], [field2], .... VALUES j, k, l)insert into Table ([field1], [field2], .... VALUES m, n, o)Thanks in advance
I would like to offer you the following string functions Transact-SQL
GETWORDCOUNT() Counts the words in a string GETWORDNUM() Returns a specified word from a string AT() Returns the beginning numeric position of the first occurrence of a character expression within another character expression, counting from the leftmost character RAT() Returns the numeric position of the last (rightmost) occurrence of a character string within another character string OCCURS() Returns the number of times a character expression occurs within another character expression PADL() Returns a string from an expression, padded with spaces or characters to a specified length on the left side PADR() Returns a string from an expression, padded with spaces or characters to a specified length on the right side PADC() Returns a string from an expression, padded with spaces or characters to a specified length on the both sides PROPER() Returns from a character expression a string capitalized as appropriate for proper names RCHARINDEX() Is similar to a built-in function Transact-SQL charindex but the search of which is on the right ARABTOROMAN() Returns the character Roman number equivalent of a specified numeric expression ROMANTOARAB() Returns the number equivalent of a specified character Roman number expression ...
For more information about string UDFs Transact-SQL please visit the http://www.universalthread.com/wconnect/wc.dll?LevelExtreme~2,54,33,27115
Please, download the file http://www.universalthread.com/wconnect/wc.dll?LevelExtreme~2,2,27115
Description GETWORDCOUNT() Counts the words in a string GETWORDNUM() Returns a specified word from a string AT() Returns the beginning numeric position of the first occurrence of a character expression within another character expression, counting from the leftmost character RAT() Returns the numeric position of the last (rightmost) occurrence of a character string within another character string OCCURS() Returns the number of times a character expression occurs within another character expression PADL() Returns a string from an expression, padded with spaces or characters to a specified length on the left side PADR() Returns a string from an expression, padded with spaces or characters to a specified length on the right side PADC() Returns a string from an expression, padded with spaces or characters to a specified length on the both sides PROPER() Returns from a character expression a string capitalized as appropriate for proper names RCHARINDEX() Is similar to a built-in function Transact-SQL charindex but the search of which is on the right
For more information about string UDFs Transact-SQL please visit the http://www.universalthread.com/wconnect/wc.dll?LevelExtreme~2,54,33,27115
I have date coming to one page as a string in the following format"May 4 2005 12:00AM" I need to query one of my tables using this date in combination of other nondate values. How can I convert this date into valid sql server datetime format before I query a database tables Please help
I have a table which stores date-of-birth in varchar 19861231(yyyymmdd). A view takes this data. I want to store this date as mmddyyyy in the view. How can we achieve this?
Lets say I have a column of type varchar and need to extract an integer value from the middle of it. The string looks like this:'this part is always the same' + integer of varying length + 'this part is different but always the same length'Is there a way to trim the constant string lengths from the beginning and end?
I have a long text in 'Quote' column as below and i have to extract Trip Duration, Destination and Base Rate from this text. The ‘Base Rate’ will be repeated throughout the text if there is more than one traveler and I only need the first instance.
Begin Quote Calculation<br /> <br />....<br /> Agent Id: 001<br /> Trip Duration: 5days<br /> Relationship Type: Individual<br />....nDestination: AreaTwo<br /> <br ...../>Resolved Trip Type To: 1 with Trip Subtype: 0<br /> Resolved Relationship: Individual....... /> *Base Rates*<br /> Base Rate: 6.070000<br />.....Resolved Trip Type To: 2 with Trip Subtype: 0<br /> Resolved Relationship: Individual....... /> *Base Rates*<br /> Base Rate: 9.070000<br />.....
Result
Trip Duration: 5 days Destination: AreaTwo Base Rate: 6.070000
Is there a function that will extract part of a string when the data youwant does not occur in a specific position?Field "REF" is varchar(80) and contains an email subject line and the emailrecipients contact nameExample data:Rec_ID REF1 Here is the information you requested (oc:JohmSmith)2 Thanks for attending our seminar (oc:Peggy SueJohnson)3 Re: Our meeting yesterday (oc:Donald A. Duck)What I need to extract is the contact name that is in parenthesis after theoc:The name is always in parenthesis and occurs immediately after "oc:" - nospaces after the "oc:"Thanks.
1. I have a table with a column for region names. Region Names are in 2 formats basically - "NAME-BU*RM" OR "NAME*RM". I want to extract just "Name" from this string. The length of "Name" varies and I want to extract all characters included for "Name". Can anyone advise what the query/SQL statement would look like?
2. I wrote a VB code to generate a xls file. Users are able to run it fine but if they have another file with same name already open, then it just crashes excel. So I want to include a code that checks if file "file.xls" is open on user's machine. If file is open, then message "file "File.xls" is already open. Generating File_1.xls" Run the code but create the file with file name "file_1.xls" If file doesn't exist, then run code and create file with file name "File.xls"
Is there anyway to extract part of a string in a stored procedureusing a parameter as the starting point?For example, my string might read: x234y01zx567y07zx541y04zMy Parameter is an nvarchar and the value is: "x567y"What I want to extract is the two charachters after the parameter, inthis case "07".Can anyone shed some light on this problem?Thanks,lq
I want to extract two strings from xxxxx - yyyyyy separately as xxxxx and yyyyyy. The source always has two strings brought together with a - symbol. How to extract these two strings.
I would like to get the next recent Event Based on the current date time, It should be like below format...Suppose today is 2014-12-17 and Current Time is 07:30 AM The expected result should be
NextEvent ------------ Upcomming Event : Event 4 - This Morning 10:45 AM
Specifically i am looking for a Format Like Below
If we have the event by TODAY, The result should be in below format This Morning (12:00 AM to 12:00 PM) This Afternoon (12:00 PM to 4:00 PM) This Evening (4:00 PM to 7:00 PM) Tonight (7:00 PM to 11:59 PM)
Here is a sample order # we used for one of our shipments: BL-53151-24954-1-0001-33934
I need to extract the "24954" portion of that order # while within an INNER JOIN, but not sure how.
My problem is we have 2 order tables: OrderTable1 contains a field with the full order #. OrderTable2 contains a field with only the "24954" portion. I need to JOIN on these 2 fields somehow.
SELECT ot1.Full_Order_No , ot2.Order_No FROM OrderTable1 ot1 INNER JOIN OrderTable2 ot2 ON ot2.Order_No = [do something here to truncate ot1.Full_Order_No]
How can I do this?
Few notes:
-the 1st part of the order number, "BL-53151-" will ALWAYS be the same. It's our client # which will never change for the purpose of this query. -The portion I need (24954) can be more or less than the 5 current digits. -There will always be 6 portions to the order number, split up between 5 dashes.
Yeah, it's pretty simple. Maybe it'll help someone out.
-- USAGE: fn_extract_chars(string_to_search, 'letters' -or- 'numbers') CREATE FUNCTION fn_extract_chars (@x varchar(128), @y char(7)) RETURNS varchar(128) AS BEGIN DECLARE @chars varchar(128) DECLARE @pos int DECLARE @action varchar(32) SET @pos = 0 SET @chars = ''
IF @y = 'numbers' SET @action = '[0-9]' ELSE IF @y = 'letters' SET @action = '[a-zA-Z]'
WHILE @pos < (DATALENGTH(@x) + 1) BEGIN IF PATINDEX(@action,SUBSTRING(@x, @pos, 1)) > 0 BEGIN SET @chars = @chars + (SELECT SUBSTRING(@x, @pos, 1)) END SET @pos = @pos + 1 END RETURN(@chars) END
I am trying to write a query in sql query analyzer that will extract a date that appears after the first comma in the string. An example of the data is below:
I have a column that contains the follwoing string I need to compare.
ek/df/cv/ ek/df/cv/f
All fields bfore the third / are not fixed but behind the third/ is eiter nothing or one letter I need a function to extract all the fields before the third / to compare if they are equal.
I can't do it by using the combination of Substring() and charindex() and Len()