Performance Help: Binary Comparisons Vs Full Normalization?
Feb 18, 2008
I have a pretty intensive query that I need performance help on. There are ~1 million de-normalized 'adjustment rows' that I am checking about 20 different conditions on, but each of these conditions has multiple possibile entries.
For example, one condition is 'what counties apply' to each row? Now I could cross-join a table listing every county that applies to every row, which would mean 1 million rows X 3,000 potential counties. And for every one of these 20 condition, I'd need to be joining tables for each of these lookups.
Instead, I was told to do a binary comparison of some sort, but I'm not exactly sure of how to do it. This way, I'm not needing to do any joins, but just have a large binary string, with bits representing each county.
Since each query I know the exact county searched, I can see if each row applies (along with each of the other conditions I must check vs the other binary strings).
I accomplished this using:
AND Substring(County, @CountyIndex, 1) = '1'
I have a character string for county, which is painfully slow when running all of these checks.
My hope is if the county in the lookup is 872, I can just scan the table, looking at bit #872 for the county field in each record, rather than joining huge tables for every one of these fixed fields I need to test.
My guess is the fastest way is some sort of binary string comparisons, but I can't find any good resources on the subject. PLEASE HELP!
I have a pretty intensive query that I need performance help on. There are ~1 million de-normalized 'adjustment rows' that I am checking about 20 different conditions on, but each of these conditions has multiple possibile entries.
For example, one condition is 'what counties apply' to each row? Now I could cross-join a table listing every county that applies to every row, which would mean 1 million rows X 3,000 potential counties. And for every one of these 20 condition, I'd need to be joining tables for each of these lookups.
Instead, I was told to do a binary comparison of some sort, but I'm not exactly sure of how to do it. This way, I'm not needing to do any joins, but just have a large binary string, with bits representing each county.
Since each query I know the exact county searched, I can see if each row applies (along with each of the other conditions I must check vs the other binary strings).
I accomplished this using: AND Substring(County, @CountyIndex, 1) = '1' I have a character string for county, which is painfully slow when running all of these checks.
My hope is if the county in the lookup is 872, I can just scan the table, looking at bit #872 for the county field in each record, rather than joining huge tables for every one of these fixed fields I need to test.
My guess is the fastest way is some sort of binary string comparisons, but I can't find any good resources on the subject. PLEASE HELP!
This question is from a deveoper that I work with:
In SQL Server 7.0:
Do you know of a query or sp which will return the list of objects in a DB, sorted in descending order by last changed date?
I need to generate a list of all the stored procedures created or modified since a specified date. I can get the created ones, but I can't see how to get the modified ones.
I am about to heavily index a table and have to include atleast 3 to 4 olumns in the fulltext index for this table. The table is updated very frequently and the also the columns that are involve in the fulltext indexing undergo frequent updates. As of now, I can't avoid using full text indexing as these columns are very very lengthy and basically contail text. The users of the database will give some key words as the search criteria to get infomation as to what they are looking for. How frequently should I update my full text catalog. This is a scenario where the full text is operating on various tables and each of thses table might be containingaround 300,000 to 800,00 rows. I would appreciate an intelligent siggestion as I need it as soon as possible.
Hello everybody, I've got a little problem wich i'm trying to solve since 1-2 years and i hoped it would go away with SQL 2005 - but that wasn't the case :(.
Situation: I've just bought a new Server containing: SQL 2005 64 Bit Enviroment 4 GB RAM 2x AMD Opteron 2 GHz Prozeccors (Dual Core) 2x RAID Controllers (RAID 1) containing 1.1 System 1.2 Data 2.1 Transaction Logs
I've created a full-text table containing all the search terms i need to search. Table build: RecID - int - Primary Key SrcID - varchar(30) ArticleID - int - referring to an original table SearchField - varchar(150) - Containing the search terms timestamp - timestamp field
Fulltext index: RecID as Primary Key SearchField as indexed field - Wordbreaker: Neutral (containing several languages), Accent sensitivity off
Now i've got different tables imported in here resulting in a table size of ~ 13 million rows.
There is no problem with the performance on this catalog if i search a term wich isn't contained in more than 200-300 recordsets - but if i search for a term wich could occur in 200'000 upwards it gets extremely slow.
On the slow query the first records get in after no time, but until the query finished up to 60 seconds pass. The problem is that i have to sort by a ranking value wich is stored externally - so i need all results to sort them...
current (debugging) query: SELECT ArticleID FROM fullTextTable AS ft INNER JOIN CONTAINSTABLE(FullTextCatalog,SearchField,'"term*"') AS ftRes ON ftRes.[KEY]=ft.idEntry
Now if i check in the performance monitor: As soon as i run the query the 'Avg. Disk Read Queue Length' counter on disk D (SQL Data Files) jumps to the top, until the query has finished. Almost no read/write activity on C: where the Fulltext is stored...
If i rerun the query, after it finished once successfully - it takes place below 1-2 seconds, would be nice to get that result in first place :).
I have a table with 3M rows that contains a varchar(2000) field withvarious keywords. Here is the table structure:PKColumnImageIDFullTextColumnThere is an association table:ImageIDContractIDNow, I want to do a query where the ContractID = x and Contains someword in the FullTextColumn. There is an association table that mapsImages to Contracts - so I can't use the trick of putting the Contractcode in the FullTextColumn.I'm finding that first the FTS service is performing a search on theKeyword (which can take a long time if 100K rows are returned) thenjoining to the association table for the particular contract.Is there anyway to make this faster by telling the FTS service, onlysearch this subset of rows for the keyword based on the contract.Sorry if this sounds convoluted. Appreciate any help you can suggest.Thanks!
I have implemented a login audit on a particular system which catches the users login details, including their application logon name and NT username.
What I want to do is report on users who have logged on to the software using someone else's workstation (i.e. logged on to more than one workstation).
--Insert test data. Please note that loginName and ntUsername are rarely the same INSERT INTO @logins (loginName, ntUsername, loginDate) SELECT 'Amy', 'Amy', '20070101' UNION SELECT 'Amy', 'Amy', '20070102' UNION SELECT 'Amy', 'Amy', '20070103' UNION SELECT 'Bob', 'Bob', '20070101' UNION SELECT 'Bob', 'Bob', '20070102' UNION SELECT 'Bob', 'Amy', '20070103' UNION --Bob has logged on using 2 different NT accounts SELECT 'Cal', 'Cal', '20070102' UNION SELECT 'Cal', 'Amy', '20070102' UNION --So has cal SELECT 'Dom', 'Dom', '20070102' UNION SELECT 'Dom', 'Dom', '20070102'
Any ideas? I just can't think of the logic needed to get what I want.
I have the following query below that I am trying to get working. What I want it to do is check for users who have sat a module and failed it and compare it to a table to check that they have not passed the module second time and report only those who have failed withg no passes. Query below.
SELECT DISTINCT dbo.PPS_SCOS.NAME, PPS_PRINCIPALS.NAME, pps_transcripts.date_created, score, max_score, status FROM (dbo.PPS_SCOS JOIN dbo.PPS_TRANSCRIPTS ON dbo.PPS_SCOS.SCO_ID = dbo.PPS_TRANSCRIPTS.SCO_ID) JOIN dbo.PPS_PRINCIPALS ON dbo.PPS_TRANSCRIPTS.PRINCIPAL_ID = dbo.PPS_PRINCIPALS.PRINCIPAL_ID WHERE dbo.PPS_SCOS.NAME LIKE 'MTB-S001%' AND PPS_PRINCIPALS.LOGIN LIKE '%test%' AND dbo.PPS_TRANSCRIPTS.STATUS LIKE 'F' AND PPS_TRANSCRIPTS.TICKET not like 'l-%' AND dbo.PPS_PRINCIPALS.NAME NOT IN ( SELECT DISTINCT dbo.PPS_SCOS.NAME FROM (dbo.PPS_SCOS JOIN dbo.PPS_TRANSCRIPTS ON dbo.PPS_SCOS.SCO_ID = dbo.PPS_TRANSCRIPTS.SCO_ID) JOIN dbo.PPS_PRINCIPALS ON dbo.PPS_TRANSCRIPTS.PRINCIPAL_ID = dbo.PPS_PRINCIPALS.PRINCIPAL_ID WHERE dbo.PPS_TRANSCRIPTS.STATUS LIKE 'P' AND dbo.PPS_SCOS.NAME LIKE 'MTB-S001%' AND PPS_PRINCIPALS.LOGIN LIKE '%test%' AND PPS_TRANSCRIPTS.TICKET not like 'l-%' ) ORDER BY pps_PRINCIPALS.NAME
I have a table with 10 rows with a varbinary column
I wish to concatenate all the binary column into a single binary column and then write that to another table within the database. This application splits a binary file (Word or PDF document) into multiple segments (this is Column2 as below)
if i use this to parse the current date to the right side of the time. right(getdate(),7) - i'll get something like 7:30AM.
i also have Times stored in a column of a table, but as a string not a date time. it seems to compare okay, but when the time is say 1:30PM and im comparing it if its greater than or equal to (>=)to 7:30AM - it doesnt return.
i think its ignoring the AM/PM Meridian Values and just comparing the numbers.
is there a conversion i could use to do this? ive tried a military time conversion i found but it converts to hrs,min,milliseconds. convert(char(8),(convert(datetime,current_timestam p,113)),114)
if anyone knows a good way to do this - i would appreciate it.
I've been reading a bit about full-text searches, phonetic values and match-queries and just don't know where to begin.
What I'm eventually going to do, is make procedures for matching names, finding records that are close matches and presenting them in a subform below the actual member that you look up.
E.g. if an employee looks up Sergej, he or she will also see Sergey, Sergei etc. below the membersheet.
BOL isn't very practical in examples, and its about 7 years since I took my SQL-Server 7.0 MS courses, plus I've primarily worked as an administrator up until last fall, not a developer. So where to begin?
Hi We are using the SQL Server 2005 Full Text Service. The data is not huge, but the kind of data is that each record is small and there are a large number of records. There are 35 million records now with 11 GB of data and about 1.6 GB of FT catalog on the table. This is expected to grow to at least 10 times the size of this data. The issue is with FTS taking a long time to return results when the number of hits (rows) getting returned from FTS is large for some searches, it takes a very long time. With the same data & catalog, those full text queries for less common words return timely. The nature of the problem doesnt allow us to only have top results. We need all the results. So it’s not about the size of data but the number of results getting returned from FT. (As the catalog is inverted). The machine is dual processor with 4 GB RAM.
I am considering splitting the table and hence the catalog and using multiple servers to do full text searches in smaller catalogs. Is there any other way this issue can be solved ?
If splitting is the only way, can you give me an idea as to what is a statistical/standard limit to the number of search results/cataog size as which FTS gives good results
What is the best method for ignoring the time in datetime comparisons. Say I want all records on 07/08/1996 regardless of their time. Or all records between 01/01/1999 and 04/01/1999 even if one of the records on 04/01/1999 had a time of 16:32:22
This looks like a bug - hopefully somebody can explain what is actuallyhappening. Using SQL Server 2000 SP4.Here's a repro script with comments:/* repro table */CREATE TABLE dbo.T (ID int NOT NULL,Time datetime NOT NULL,CONSTRAINT PK_T PRIMARY KEY (ID, Time))GO/* the problem does not happen without this index */CREATE NONCLUSTERED INDEX IX_T ON dbo.T (Time)GO/*sample row - note thatCAST('2006-04-08 13:14:58.870' AS smalldatetime) = '2006-04-08 13:15:00'*/INSERT INTO dbo.T (ID, Time)VALUES (1, '2006-04-08 13:14:58.870')GO/*This does not return any rows - why?The comparison should evaluate to TRUE.*/SELECT *FROM dbo.TWHERE CAST(Time as smalldatetime) >= '2006-04-08 13:15:00'GO/*This does return the row.*/SELECT *FROM dbo.TWHERE CAST(DATEADD(millisecond, 0, Time) as smalldatetime) >='2006-04-08 13:15:00'GODROP TABLE dbo.TGOThe difference between the two SELECT statements is that the first one usesa non-clustered index seek, whereas the second one uses a scan of the sameindex.--(remove a 9 to reply by email)
I am trying to compare the ADDRESS FIELDS Between 2 tables in SQL SERVER 2008. However when I run the comparisons below it throws the error below:
Query: select inner JOIN TABLE2 B ON COLLOAN= COLLOAN1 a.ADDRESS<>b.PropertyAdd
Error : Cannot resolve the collation conflict between "SQL_Latin1_General_Pref_CP437_CI_AS" and "SQL_Latin1_General_CP850_BIN" in the equal to operation.
I'm using a bit-wise comparison to effectively store multiple values in one column. However once the number of values increases it starts to become too big for a int data type.you also cannot perform a bitwise & on two binary datatypes. Is there a better way to store the binary data rather than int or binary?
Is it possible to normalize a database using SQL statement? I have a huge duplicated records on a certain fields and need to do some normalization on it. For example, the raw data below
I'm working on a normalization for one of my classes and I've been sick and I feel lost now, could one of you please look at my database statements and tell me if/what is wrong with it?
Hi everyone.. Well i have my tables ready to build the database on to the sql server... My probs is normalization of the tables being used in the database.... Is there any best possible way / [short-cut.. very weird to ask this :) ] using the sql server...?
I have a web app which is used to do normal insert/update of employee info. Connected to each employee that is entered is some data that is imported from an outside source for each employee. The question I have is currently my database is very normalized and importing data from this outside source will be quite a pain because of this. Is it bad practice to denormalize a specific table if no user will every insert/update it beside DTS?
What's my question is ? Whether there are any tools available for normalization produced by Database vendors or any third party. If yes, Can u kindly give me clear documentation.
Does this show "poor" design? It has been suggested to me to do a "Logical Model" of my data base and that will make it easier to "normalize" the tables. I tried this and come up with the following but I don't know if I am stretching it too thin. One rule of the 2NF is to ensure all tables have a primary key, and as you can see, my tbProjectTeam has a primary key, but that is made up of the entire row. Same goes for the tbDepartmentActivities.
tbEstimatedProjects Reference (PK) | Name | City | Postal |... ----------------------------------------------------------- 1 | Some Project | Niagra Falls | N8E7J5 | ....
Hi,I am using MS-SQL server to store my database.My problem is that I have around 150+ database files in DBF format.Each database file consists of fields ranging from 2 to 33 in number.Also, there are some fields which have just one entry and rest areNULL.This database will be accessed by a printing software.Please advice as to how I should proceed to normalize this database.Regards,Shwetabh
I've come up with this issue in several apps now. There are things that, fromone perspective, are all handled the same, so it would be desirable that theyall be handled in the same table with some field as a type specification.From other perspective of foreign key relationships, however, they aredifferent things and can't be stored in the same table.For example, I have a scheme for indicating mappings between dimension recordsat one time period to new dimension records at another time period. I coulduse one set of tables for all mappings since they all work exactly the sameway, but then I can't set up DRI between the mapping tables and the dimensiontables. If I just make separate mapping tables for each dimension table, thenI'm creating 4 new tables per dimension table, all identical with respect towhat fields they contain, what kinds of unique constraints they have, and whatrelationships they have to each other with the sole distinction that they eachmap to the integer-type key of a different dimension table. I would not lookforward to doing maintenance on this schema!Is there any strategy for having the cake and eating it, too?
create table cd_fiq_a ( fiq_id int not null primary key, fiq_name varchar(50) not null)
create table cd_sal_b ( sal_id int not null primary key, sal_name varchar(50) not null)
create table cd_rak_c ( rak_id int not null primary key, rak_name varchar(50) not null)
insert into cd_fiq_a values (1, 'Fiq1')
insert into cd_fiq_a values (2, 'Fiq2')
insert into cd_sal_b values (1, 'Sal1')
insert into cd_sal_b values (2, 'Sal2')
insert into cd_sal_b values (3, 'Sal3')
insert into cd_sal_b values (4, 'Sal4')
insert into cd_sal_b values (5, 'Sal5')
insert into cd_rak_c values (1, 'Rak1')
insert into cd_rak_c values (2, 'Rak2')
insert into cd_rak_c values (3, 'Rak3')
insert into cd_rak_c values (4, 'Rak4')
Now there is a relationship between cd_fiq_a, cd_sal_b and cd_rak_c. For a given Faq there can be one or more records of Sal. For a given Fiq and a given Sal there can be one or more records of Rak. I am thinking that i can do it one table or two tables:
One Table Solution ----------------------------
create table relation_d ( relation_id int not null primary key, fiq_id int not null foreign key REFERENCES cd_fiq_a (fiq_id), sal_id int not null foreign key REFERENCES cd_sal_b (sal_id), rak_id int not null foreign key REFERENCES cd_rak_c (rak_id), sort_order int not null )
Two Table Solution ---------------------------
create table relation_header_d ( relation_header_id int not null primary key, fiq_id int not null foreign key REFERENCES cd_fiq_a (fiq_id), sal_id int not null foreign key REFERENCES cd_sal_b (sal_id) )
create table relation_detail_e ( relation_detail_id int not null primary key, relation_header_id int not null foreign key REFERENCES relation_header_d (relation_header_id), rak_id int not null foreign key REFERENCES cd_rak_c (rak_id), sort_order int not null )
Which solution is more normalized and will result in better execution of Sql? Or is there any other solution which is more better?
My question concerns the amount of normalization i require for my specific needs. I realize this is a difficult question without knowing alot more about my database. I am hoping with some information I could get advice from those more experienced than I.
- My database consists of 9 tables with the maximim number of columns being 11 which is the contacts table.
- the largest data type is nvarchar 125.
- number of rows in the largest table will eventually grow to hundreds of thousands.
- users access the database online
This is large to me but I expect that to some of you this not.
My application would be easier to setup if the contacts table were to include address info.
So my question is, for a database of this size could I create a contacts table similar to the customer table in the Microsoft Northwind sample data base with the address included, or should I model something more like the contact table in the Microsoft Adventureworks db with the address and State/province split to separate tables.
Any help you could provide with this scetchy info would be greatly appreciated.
Hai everybody recently i came across this article and i have tried to answer all the follwoing questions. But i am not sure its correct or not..so you peoples can comment on the follwoing questions.
Hi All Please guide me in the following situation. I am new in programming I have a master table tblCompany with fields: Company Name, Address, Phone number Second table is tblUsers with Company Name, User Name , Password Third table is tblDealing with field Company Name , Dealer Name, Dealer Address According to the normalization rules I shoud put a column named Company_Id in tblCompany(master table) and use it in other two tables instead of CompanyName colum to reduce the data retundancy. But my question is accessing data from master detail tables with join quries will take more processing time(taking the company name against the company ID). On the other hand memory wise its same to store the company ID(like 0012786) and company name (like somecompany Ltd). So should I go for normalization or simply store the Company name in each table. Thanks