SQL Server 2012 :: Compare Characters Between 2 Strings And Eliminate Similarities
Oct 13, 2015
I am trying to write a function to compare the characters between 2 strings and eliminate the similarities to be able to return at the end the number of differences between them.
Having in mind i need the bigger number of differences to be returned also if a character is repeated in one of the 2 words it will be eliminated once because it exist only one time in other string.
I will give an example below to be more clear
--Start
declare @string1 as varchar(50)='imos'
declare @string2 as varchar(50)='nasos';
WITH n (n) AS (
SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)
[Code] ....
The differences in first string from second one are 2 (i,m) while the differences in second string from first one are 3(nas).
So the function should return 3 in previous example.
I am asked to compare the address fields (three columns of nvarchar(100) ) of a customer database (around 10,000 records) and find any duplicates. If it is a character by character match, I could have just GROUPed and get the result.
But, I am expected to produce a list with similar addresses which the guys who entered may have use slightly different spelling or more or less characters, or a "." here and there.
My requirement is that if the string in the column has any of the characters from 'ACDIPFJZ' , those characters have to be retained and the rest of the characters have to be removed.
I am trying to build a simple search engine using Sql Server 2000 to scan information about approximatelly 20.000 products.
Heres what I am doing:
I created a table called keywords that contains a reference for each product.
keyword -> varchar(100) items -> Text
keyword data example:
[keyword] [items] car 1, 3, 5, 7 blue 3, 5 compact 1,7
I am not using clustered index.
To search basically I run the "AND" or "OR" to select the keywords I want to target. I need to run another select that would compare the data in the items field depending of the condition selected. If "AND" clause is used I would need to compare all the items that contains the same reference, for example:
looking for car compact using "AND" clause result = 1
looking for car compact using "OR" clause result = 1,3,5,7
There is no table that holds references. The items are stored in a text field in the keyword table. I can compare data using script like AsP spliting the items by comma or space, but that can be too slow and use up a lot of RAM. Another solution would be to use a table to hold the references but that would affect performance dramatically because of the large number of records created and storage space used. One example, if I have 60.000 keywords and each keyword has an avereage of 200 references, I would have to generate 12 milion records.
I want to know if there is a function or routine in SQL server to compare matched references on the fly in the server between two or more fields and how should I do it.
In addition, in this scenario, how should a clustered index help?
I have a table in an Access db that stores information about speech files. One of the fields in this table is called "Text" and it contains the phrase spoken in that particular speechfile.
These phrases often have characters such as the "#" sign at the end to indicate what tone of voice is used.
I am trying to create a Search where users can enter the phrase they are looking for, and will be returned the file (or combination of files) that contain this phrase.
My problem is, when I try to search for a string of text that includes the "#" I get 0 results everytime.
An example of what I am doing is this:
SELECT Speechfiles.Name FROM Speechfiles WHERE Speechfiles.Text LIKE 'aero#'
It works fine for 'aero' or '*aero*' but whenever I try to add a character that is not a letter, it won't work.
If anyone has any ideas, I would REALLY appreciate it!!! I am completely at a loss.
Is there a limit on the size of the strings on both sides of the '=' sign for string comparison? If I have two varchar(max) strings, will the comparison be done beyond 8,000 characters?
SELECT ROUND ('6.465',2) --- result 6.46 and SELECT ROUND (6.465,2) --- result 6.47 with
It's because you're relying on an implicit conversion from a string to a decimal data type which SQL server will do to 2 decimal places by default...
Alright:
SELECT ROUND (CONVERT(DECIMAL(3,2),'6.465'),2) --- result 6.47 Now please explain this: SELECT ROUND('0.285',2) -- 0.28 SELECT ROUND(0.285,2) -- 0.29 SELECT ROUND (CONVERT(DECIMAL(3,2),'0.285'),2) --- result 0.29 The string value does not seem to be converted to decimal with 2 decimal places.
MS is on the safe side with mentioning the last digit is always an estimate But because the result of the estimate is always the same, I would like to know:
* how is a string value exactly implicitly converted?
* how exactly does the estimation work, that in case of doubt rounds a value up or off?
I'm rewriting a huge FOR XML EXPLICIT procedure to use FOR XML PATH, and need to compare previous output to the refactored one, so i didn't mess up XML structure.
The thing is, i'm not sure that SQL Server will always generate exactly same xml **string**, so i'd rather not compare by:
WHERE CAST(@xml_old AS NVARCHAR(MAX)) = CAST(@xml_new AS NVARCHAR(MAX))
nor do i want to manually validate every node, since the generated xml-structure is quite complex.
We can use comparison operators with strings as well. Hence, I tried to use the following query on a SQL Server 2012 instance with the sample AdventureWorks2012 database (the collation of the database and of the column is the default:
SQL_Latin1_General_CP1_CI_AS):
USE AdventureWorks2012 ; GO
--Returns 5 records SELECT pp.Name FROM Production.Product AS pp WHERE pp.Name >= N'Short' AND pp.Name <= N'Sport' ; GO
The query only returns 5 records. This despite the fact that the search is an inclusive search and the Production.Product table contains records that begin with "Sport".
Now, when I replace "Sport" with "Sporu" (just moving one character up in the alphabet to verify whether characters after the word have any impact on the search) gives me 8 records.
USE AdventureWorks2012 ; GO
--Returns 8 records SELECT pp.Name FROM Production.Product AS pp WHERE pp.Name >= N'Short' AND pp.Name <= N'Sporu' ; GO
What's going on inside of SQL Server that allows it to fetch "Short-Sleeve Classic Jersey" for the starting word "Short" but prevents it from fetching "Sport-100 Helmet" for the ending word "Sport" despite the search being an inclusive search?
Table A IdName 101Dante 102Henry 103Harold 104Arnold
Table B NumberName 102Dante 107Gilbert 109Harold 110Arnold 106Susan 112Marian
I want the result in table 3 like below, if value exists in Table A and not exists in Table B then the record should enter in table 3 with table name in new column, and vice versa.
Table C Col1Col2 HenryTable A Gilbert Table B Susan Table B Marian Table B
using below logic to get the values from tables..
select t1.columnA , t2.* from table1 t1 join table2 t2 on t2.columnB = t1.columnA
using below script to compare two tables and get the values.
how to get the count of 'Table A' , 'Table B' , 'Table A & Table B' using below script.
Ex: 'Table A' -- 150 'Table B' -- 300 'Table A & Table B' -- 150 SELECT Col1 = ISNULL(a.name,b.name), Col2 = CASE WHEN ISNULL(a.name,'') = '' THEN 'Table B' WHEN ISNULL(b.name,'') = '' THEN 'Table A' ELSE 'Table A & Table B' END FROM #tableA a FULL JOIN #tableB b ON a.name = b.name;
SET NOCOUNT ON; DECLARE @items TABLE (ITEM_ID INT, ITEM_NAME VARCHAR(10)) INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 10,'ITEM 1' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 11,'ITEM 2' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 12,'ITEM 3' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 13,'ITEM 4' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 14,'ITEM 5' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 15,'ITEM 6' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 16,'ITEM 7' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 17,'ITEM 8' SELECT * FROM @items
-- table with categories
SET NOCOUNT ON; DECLARE @categories TABLE (CAT_ID INT, CAT_NAME VARCHAR(10)) INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 100,'WHITE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 101,'BLACK' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 102,'BLUE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 103,'GREEN' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 104,'YELLOW' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 105,'CIRCLE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 106,'SQUARE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 107,'TRIANGLE' SELECT * FROM @categories
--table where categories are assigned to master categories
SET NOCOUNT ON; DECLARE @master_categories TABLE (MASTERCAT_ID INT, CAT_ID INT) INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,100 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,101 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,102 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,103 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,104 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 2,105 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 2,106 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 2,107 SELECT * FROM @master_categories
-- items-categories assignment table
SET NOCOUNT ON; DECLARE @item_categories TABLE (CAT_ID INT, ITEM_ID INT) INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 100,10 INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 105,10 INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 100,11 INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 105,11
[code]....
So now I need to query the table @t4 in and to determine the items that are assigned to category 'WHITE' in master category 1 and to 'CIRCLE' in master category 2.The important thing is to return items that are assigned solely to 'WHITE' in master cat 1 and solely to 'CIRCLE' in master cat 2.In the above example it would be only the ITEM 1 (id=10) that is returned:
1. ITEM 2 (id=11) is not returned because it has the assignment to category 'SQUARE' in master cat 2 additionally
2. ITEM 3 (id=12) is not returned because it has the assignment to category 'BLACK' in master cat 1 additionally
3. ITEM 4 (id=13) is not returned as it does not have assignment to category 'CIRCLE' in master cat 2 but only to 'WHITE' in master cat 1
3. ITEM 5 (id=14) is not returned as it does not have assignment to category 'WHITE' in master cat 1 but only to 'CIRCLE' in master cat 2
I need extracting string that is between certain characters that are in certain position.
Here is the DDL:
DROP TABLE [dbo].[StoreNumberTest] CREATE TABLE [dbo].[StoreNumberTest]( [StoreNumber] [varchar](50) NULL, [StoreNumberParsed] [varchar](50) NULL) INSERT INTO [dbo].[StoreNumberTest]
[Code] ....
What I need to accomplish is to extract the string that is between the third and fifth '-' (dash) and insert it into the StoreNumberParsed while eliminating the fourth dash.
Sample output would be:
KY117 CA132 OH174 MD163 FL191
I know that parse, charindex, patindex all might come in play, but not sure how to construct the statement.
Is there a efficient way to compare two different columns of 2 different rows in a data set as shown below.
For eg: I would like to DateDiff between Date2 of RowID 1 and Date1 of RowID 2 of IDNo 123. After this comparision , if datediff between two dates are <=14 then i want to update 1 else 0 in IsDateDiffLess14 of RowID1 . In below example its 0 because datediff of two dates >=14. So, want to compare the Date2 and Date1 in this sequence for the same IDNo. For RowID 6 there is only 1 row and no other row to compare, in this case IsDateDiffLess14 should be updated with 0.
I run the script below once a day to keep track of row count over time. I would like to compare the results from today and yesterday to see if anyone deleted more than 20% of data from any given table. How would I do this? I really don't need the data anymore than a day just to compare the results.
Mon - Run script to collect row count Tues - Run script to collect current row into temp table ,compare all row count in both tables ,purge records from Monday and insert current Wed - Run script to collect current row into temp table ,compare all row count in both tables
What I need to be able to find is any records where the Discontinue_Date is greater than the Effective_Date on the next row for a given Customer ID and Part_ID. This is a customer pricing table so the Discontinue_Date of row 53 for example should never be greater than the Effective_Date of row 54130, these are the records I'm looking to find. So I'm looking for a SELECT query that would look for any records where this is true. Obviously the last Discontinue_Date row for a Customer_ID will not have a next row so I wouldn't want to return that.
I am in process to develop TSql code to identify change in data.
I read about Binary_checksum and hashbyte. Some people say hashbyte is better than binay_checksum as chances of collision are less.
But if we may consider following, chances exist in hashbyte too. My question is what is the best way to compare data to identify change (I can't configure CDC) ?
I want to compare the filepath column in table with physical drive files and get the details of files which in table and not in physical and viceversa...
I am trying to count the characters in a sting before a space. Here is the example of what I am trying to accomplish.
"2073 9187463 2700' 4 7 4, the string character count is 4 before the space, 7 is the count before the next space and the last is the last in the string, if there was more characters within this string for example....'2073 9187463 2700 7023 6044567' it would return the number of characters in the string before the space and at the very end of it.
SQL Server 2012 SP2 Enterprise Edition (11.0.5058.0) on Windows Server 2008 R2
At some point a few months ago we encountered an issue where we hit some size limit on the amount of text we could enter into a Transact-SQL step of an Agent job. Attempting to create a job like this with sp_add_job will produce the error
Msg 50000, Level 16, State 10, Procedure sp_add_jobstep_internal, Line 255 String or binary data would be truncated.
Adding the job step via SSMS yields
Alter failed for JobStep 'xxx'. (Microsoft.SqlServer.Smo) Additional information: An exception occurred while executing a Transact-SQL statement or batch (Microsoft.SqlServer.ConnectionInfo) String or binary data would be truncated. The statement has been terminated. (Microsoft SQL Server, Error: 8152)
I've checked sp_add_jobstep_internal, sp_add_jobstep and the sysjobsteps table and all references to the command field are nvarchar(max). We can run the same job creation code without error on a SQL Server 2008 R2 Enterprise Edition machine and two SQL Server 2012 SP2 Developer Edition boxes. All our 2012 servers were fresh installs, not upgrades.
I am looking for the fastest way to strip non-numeric characters from a string.
I have a user database that has a column (USER_TELNO) in which the user can drop a telephone number (for example '+31 (0)12-123 456'). An extra computed column (FORMATTED_TELNO) should contain the formatted telephone number (31012123456 in the example)
Note: the column FORMATTED_TELNO must be indexed, so the UDF in the computed column has WITH SCHEMABINDING.... I think this implicates that a CLR call won't work....
I have a varchar field which contains some Greek characters (α, β, γ, etc...) among the regular Latin characters. I need to replace these characters with a word (alpha, beta, gamma etc...). When I try to do this, I find that it is also replacing some of the Latin characters.
I'm trying to replace special characters in SQL SERVER and all the solutions for this RDBMS that I found, it uses loops and the source of my data it's in Oracle. in ORACLE and they use REGULAR EXPRESIONS to solve it..Do you know what its the better option to replace special characters? Using loops in SQL SERVER or REGULAR EXPRESSIONS in ORACLE ?
Say you have a table that has records with numbers sort of like lottery winning numbers, say:
TableWinners num1, num2, num3, num4, num5, num6 33 52 47 23 17 28 ... more records with similar structure.
Then you have another table with chosen numbers, same structure as above, TableGuesses.
How could you do the following comparisons between TableGuesses and TableWinners:
1. Compare a single record in TableGuesses to a single record in TableWinners to get a count of the number of numbers that match (kind of a typical lottery type of thing).
2. Compare a single record in TableGuessess to ALL records in TableWinners to see which record in TableWinners is the closest match to the selected record in TableGuesses.
I have a problem where I want to write a function to remove recurring characters from a string and replace them with a single same character.
For instance I have the string '12333345566689' and the result should be '12345689'. In Oracle I could do this with "regexp_replace('12333345566689', '(.)1+', '1')", but in T-SQL the only solution I could think of is something like this:
DECLARE @code NVARCHAR(255) SET @code = '12333345566689'; SET @code = REPLACE(REPLACE(REPLACE(@Code, '1', '~1'), '1~', ''), '~1', '1');
and repeat this for 2 - 9. But I'm sure there is a more elegant version for this in SQL Server 2012.
I have an Address column that I need to Substring. I want to remove part of the string after either, or both of the following characters i.e ',' OR '*'
Example Record 1. Elland **REQUIRES BOOKING IN*** Example Record 2. Theale, Nr Reading, Berkshire Example Record 3. Stockport
How do I achieve this in a CASE Statement?
The following two case statements return the correct results, but I some how need to combine them into a single Statement?
,LEFT(Address ,CASE WHEN CHARINDEX(',',Address) =0 THEN LEN(Address ) ELSE CHARINDEX(',' ,Address ) -1 END) AS 'Town Test'
,LEFT(Address ,CASE WHEN CHARINDEX('*',Address ) =0 THEN LEN(Address) ELSE CHARINDEX('*' ,Address ) -1 END) AS 'Town Test2'
I was wondering how Fuzzy Grouping deals with and handles first name similarities. Is there a way to configure it so that Anthony = Tony, Bill = William, etc€¦? I created a simple package with several rows containing similar first names and ran the fuzzy grouping on the first name column. I received only one possible duplicate of Will = William which was at 56%. I lowered the threshold down to 1% and still only one match.
Now I understand and appreciate the reasons for this but was wondering if this type of situation was considered and a way of dealing with it is available.
I've been trying to make the following query more performant by breaking it up into smaller pieces.
SELECT MT.A3+MT.A4 AS A34,MT.A3 -- ,M.* FROM Master_TAB M JOIN (SELECT M.A1,t3.A3,t3.A4,M.A6,M.A2,ROW_NUMBER() OVER (PARTITION BY A1,A6,A3,A4 ORDER BY A5 DESC) AS rownum
[Code] ....
I know that the Spill is caused by the Sort but I can't remove the sort (sort can't be done in front end). My master table had 1.7 million rows and almost 200 columns (bad design? I know but can't be changed as there's too much that would be affected) every row is little over 1KB
Here's my attempt...
-- MASTER_TAB has 1.7 million rows and 50 columns CREATE TABLE [dbo].[tmp_ABC]( [A1] [varchar](13) NOT NULL, [A2] [varchar](5) NOT NULL, [A3] [varchar](4) NOT NULL, [A4] [varchar](4) NOT NULL, [A5] [int] NULL ) ON [PRIMARY]
[Code] ...
This is the Query that is causing the Spill (in reality I'm supposed to bring back all 200 columns fro the master table but for debug purposes I limited the columns)
Select c.A3+c.A4 as A34, c.A3, c.A1 -- M.* from tmp_DEF c join MASTER_TAB M on M.A1 = c.A1 and M.A2 = c.A2 order by c.A3, C.A4
if I just run the following I get no spill:
Select c.A3+c.A4 as A34, c.A3, c.A1 from tmp_DEF c order by c.A3, C.A4
as soon as I add the Master table as a Join I get the Spill...
I read many articles, tried many suggested things (creating indexes... clustered, non-clustered) without success. Maybe I'm totally in Left Field and should enhance the performance going another route?
I have one database with several tables in it (table 1, table2, table3). In each table is two colums (colum1 = a number (201220) and colum2 = a number (0.50). Now, both tables will have rows with the same data in colum 1, but colum two will have different numbers (different prices). My goal is to run a query that will compare both colums in all three tables, take the lower of the three based on colum 2 and spit out the row. Obviously, this would output all rows (around 175k). The point is to create a least cost spreadsheet (csv) file based on evaluating all three tables.
A customer has messed up while moving their databases. After working for a week they found that data is missing in the database.I have two backups, one from the old server and one from the new server today, they have been working in the new one for a week.
I need to compare these two databases and then update the new database with all data that is in the old one but not in the new database. Join the data in the two databases so to say. Both databases are from the same application so they use the same users, schema and so on.