SQL Server 2012 :: Best Way To Handle Like Percentage On Column Too Large For Index
Sep 18, 2015
We have a table to 100M rows and up until now we were fine with an non clustered index a varchar(4000) because we never went above 900 bytes (yes it is a bad design).We have the need to support international character sets now so the column was updated to nvarchar(4000) and now we have data past the 900 byte limit.
The data is long, seems useless but is needed by the business and they need to be able to search "where bigcolumn like 'test%'". With an index, even with a huge amount of data, it was 'fast'. Now of course without an index it is unusable. The wildcard is always at the end of the search. I made a full text index on the column and basic queries such as: select * from ourtable where contains(bigcolumn, 'AReallyLongStringofTextHere') works fine unless there is a space in the data. We loose thousands of returned rows because of spaces in the data.
I have tried select * from ourtable where contains(bigcolumn, '"AReallyLongStringofTextHere that includes spaces"') but not all of the data is returned. I get 112 rows with the contains statement. The table scanning statement of "select * from ourtable where bigcolumn like 'AReallyLongStringofTextHere that includes spaces%' returns 1939 rows.I understand that a full text index is breaking the long string up since it contains spaces. Is there a way to retain the entire string as 1 index entry or is there a way to fix my query to return all of the rows?
I need to create a Clustered Index (CI) on a very large SQL Server 2012 database table. This table has about approximately 10 billion rows, 500 GB in size. The job ran for about 20 hours into it and then fails with error: "Out of disk space in tempdb". My tempDB size is 1.8TB, but yet it's still not enough.
Here is my script:
CREATE CLUSTERED INDEX CI_IndexName ON TableName(Column1,Column2) WITH (MAXDOP= 4, ONLINE=ON, SORT_IN_TEMPDB = ON, DATA_COMPRESSION=PAGE) ON sh_WeekDT(Day_DT) GO
I have created NONCLUSTERED index on table but my report is taking more time that's why i created columnstore NONCLUSTERED index on the same table but i have one query, if any table have row and column level index(same columns in index) . Which index query will consider.
I have a table which gets updated with the usage figure every week. Any similar t-sql which returns the increase in usage percentage of all the columns.
I have a table of People and their ID, the starting month (a fixed number of months, say 10 for this), the ending month, and the percent of work time (0-1 being 0-100%). If they have a % work of 0, I do not want to see anything. But if the % changes, from say .5 to .75, I would need the first and last month they were at .5, and the first and last month they were at .75
The Table:
/****** Object: Table [dbo].[TestProject] Script Date: 02.07.2014 10:15:08 ******/ IF OBJECT_ID('TempDB..#TestProject2','U') IS NOT NULL DROP TABLE [dbo].[#TestProject2] GO CREATE TABLE [dbo].[#TestProject2]( ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
[Code] ....
The data:
--===== All Inserts into the IDENTITY column SET IDENTITY_INSERT #TestProject2 ON INSERT INTO #TestProject2 ("ID","PersonID", "PercentLoad","MonthID") SELECT 1,123456,0,1 UNION ALL
Obviously Excel is the tool of choice for most people but with it's limited ability to leverage RAM (32 bit) and it's limitation with rows at just over 1 million what other choices do we have for viewing data?
My bosses boss created several OLAP universes and they seems to fly a lot fast than regular relational database. This still doesn't work with the fact the data can't be worked with unless you have a strong front end that can handle processing all those rows.
Im looking at this article http://www.dotnetjunkies.com/Article/EA868776-D71E-448A-BC23-B64B871F967F.dcik and it seems like they are selecting the entire customers table into the temp table, correct ?
We have a large table with many columns and many indexes. One poorly performing query is having to do a key lookup when the where clause includes a particular column with no covering index.
Are you generally better off adding a new index or adding the column to an existing index ( included columns )Column: LAST_STATE_RESPONSE_CODE
The Query Processor estimates that implementing the following index could improve the query cost by 88.9332%.
*/ /* USE [ database name] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[SERVICE_REQUEST] ([BUSINESS_PROCESS_STATUS],[[color=#F00]LAST_STATE_RESPONSE_CODE[size="3"][/size][/color]],[CONCRETE_TYPE]) INCLUDE ([LIENHOLDER_PERFORMING_LIEN_FILING_ID],[MAKE],[YEAR],[MANUFACTURER_ID],[CLIENT_ID]) GO
I'm trying to get a calculation based on count(*) to format as a decimal value or percentage.
I keep getting 0s for the solution_rejected_percent column. How can I format this like 0.50 (for 50%)?
select mi.id, count(*) as cnt, count(*) + 1 as cntplusone, cast(count(*) / (count(*) + 1) as numeric(10,2)) as solution_rejected_percent from metric_instance mi INNER JOIN incident i on i.number = mi.id WHERE mi.definition = 'Solution Rejected' AND i.state = 'Closed' group by mi.id
When we use Partition switch and load the data to a table, can we refresh the indexes for specific partition, so that we don't need to rebuild / refresh for complete is this possible ?
Hi, I have a table Called Actcodes and has 2 columns Name and Description... And i want to insert the data from this table called PlanDBF to ActCode.. and this my insert statement... INSERT INTO Statements..AscActCodes ( Name, Description ) SELECT ACT_CODE1, ACT_DESC1 ACT_CODE2, ACT_DESC2, ACT_CODE3, ACT_DESC3, ACT_CODE4, ACT_DESC4, ACT_CODE5, ACT_DESC5, ACT_CODE6, ACT_DESC6, ACT_CODE7, ACT_DESC7, ACT_CODE8, ACT_DESC8, ACT_CODE9, ACT_DESC9, ACT_CODE10, ACT_DESC10, ACT_CODE11, ACT_DESC11, ACT_CODE12, ACT_DESC12, ACT_CODE13, ACT_DESC13, ACT_CODE14, ACT_DESC14, ACT_CODE15, ACT_DESC15, ACT_CODE16, ACT_DESC16, ACT_CODE17, ACT_DESC17, ACT_CODE18, ACT_DESC18, ACT_CODE19, ACT_DESC19, ACT_CODE20, ACT_DESC20 FROM PlanDBF
each actcode and its description is gonna be different and so i am not sure how to do this 1 - many column insert.
We are designing a Staging layer to handle incremental load. I want to start with a simple scenario to design the staging.
In the source database There are two tables ex, tbl_Department, tbl_Employee. Both this table is loading a single table at destination database ex, tbl_EmployeRecord.
The query which is loading tbl_EmployeRecord is, SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID.
Now, we need to identify incremental load in tbl_Department, tbl_Employee and store it in staging and load only the incremental load to the destination.
The columns of the tables are,
tbl_Department : DEPARTMENTID,DEPTNAME
tbl_Employee : EMPID,EMPNAME,DEPARTMENTID
tbl_EmployeRecord : EMPID,EMPNAME,DEPTNAME
How to design the staging for this to handle Insert, Update and Delete.
We have a typical issue with Column Store Index, we have a procedure which does 2 activities - Switch & Reverse Switch
Switch: 1. Fetch the Partitions needed to be switched 2. Switch the data from Main Table to Switch table 2. Disable the Column store on Switch table
SSIS Package: 3. Load data to Switch (Insert / Update)
Reverse Switch: 4. Enable the Switch 5. Switch back the data from Switch table to Main table
Issue: Some time the Column store is not getting disabled, and the package fails complaining try disabling the Column store index and try loading data.
If we re-run the procedure, the column store gets disabled.
We are planning to upgrade. We are using Sql 2008R2 now. Which is the better option migrating to SQL 2012 or migrating to 2014?I am thinking 2014 has memory optimized tables and updatable column stored index. So it is better option.
Hi! I recently was confronted with a problem where a piece of text that was included in many NTEXT column values in a table needed to be replaced with another piece of text. You can't issue normal REPLACE statements against NTEXT columns, so this seemed to be a bit of a challenge €” issuing a REPLACE() against a TEXT or NTEXT column in SQL Server yields error
Server: Msg 8116, Level 16, State 1, Line 1Argument data type ntext is invalid for argument 1 of replace function.
For Example: I want to replace string <![CDATA[sp_YOTAssetAdditionalOffences 0, ArgParamHearingsId, ArgParamLanguage]]> with <![CDATA[sp_YOTAssetAdditionalOffences 0, ArgParamHearingsId, ArgParamLanguage, ArgParamReferralId]]> in NTEXT column values in a table.
I am creating a new SSIS package where there is a flat file (CSV) I am importing. Well, there will be an end-user using a web UI to initiate the import of the file. However, this flat file is generated from another company and then uploaded to our network.
This flat file has the potential to have duplicate rows that would have already been imported at a previous date. With the constraints in the table, we have guaranteed that there will not be duplicates added, but the SSIS package fails immediately upon attempting to insert a duplicate row - and the bulk update is rolled back.
What I need to be able to do is load all of the rows that are not duplicates into the table. I am guessing that with my current approach, this is not possible. I am using a Data Flow task that converts the data in the flat file and then performs a bulk copy to load the data into the table.
How can I either ignore duplicate rows, or otherwise gracefully handle this data import?
We have a large OLAP database, about 2.5 TB spread out over 3 data files on three different drives, and recently someone ran a query that created a table that continued to grow until the data files filled the available disk space (about 3 TB total - 1 TB per drive).
Tonight I plan on running a full backup (it's in Simple mode) and running a ShrinkFile on all three files sequentially with TRUNCATEONLY just so it will remove the space after the last extent. Any way to tell ahead of time how much space this will recover?
Granted running a DB Shrink is one of those things you just don't do, but this is a one-time shot and unavoidable to get the file size back under control.
I am attempting to do a rather simple purge task on a very large table. This task will need to take place daily and delete records older than 6 months out of the database. On first pass this will delete well over 130 million rows. I thought the best way to handle this is create a proc and call the proc from a SQL Agent Job that runs nightly. Here is an example of the script:
CREATE PROCEDURE usp_Purge_WCFLogger AS SET NOCOUNT ON EXEC sp_rename 'dbo.logs', 'logs_work' GO SELECT * INTO dbo.Logs_Backup FROM dbo.Logs_Work WHERE TIMESTAMP < DATEADD(month, -6, GETDATE())
I am fetching large amount of data from teradata to sql server using linked server. I am facing below query:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
I have a table with about 466 Million rows. In this table there is a int column called WeeksToRetain as well as a EventDate column containing the date the row was inserted. I am trying to delete all the rows that that should be deleted according to the WeeksToRetain. For example, if the EventDate is 5/07/15 with a 1 in the WeeksToRetain column the row should be removed by 5/14/15. I am not sure what days SQL considers the beginning and end of the week. However the core issue I am having is the sheer mass of deletions I must do and log growth.
So I am trying to do the delete in batches. More specifically I want to load a temporary table with a million rows, then use the temporary table to load a sub temporary table with 100,000 rows and join this temporary table to the table I want to delete from looping through 10 times to get 1 million. The Logging.EvenLog table which is the table I'm trying to purge has a clustered index on EventDate (ASC). I would like to run this in a schedule job with enough time between executions for log backups to run.
DECLARE @i int DECLARE @RowCount int DECLARE @NextBatchDate datetime CREATE TABLE #BatchProcess ( EventDate datetime, ApplicationID int,
A New Monthly data is being loaded, checked and finally approved after 6 or 7 iteration before approval.Because of this iteration the monthly data set is being added then deleted then added then deleted few times.Because the table is big this process takes time, any thoughts on how to make the delete insert process faster.Keep in mind I cannot do much because it is a production table and is being access by other users to do other analysis.
Delete is done based on trx_date which is a year/month combo, like 201508.
The table has monthly sales by customer aggregated.
The table structure is:
CREATE TABLE [dbo].[Sales]( [batch_key] [int] NOT NULL, [Company_key] [int] NOT NULL, [customer_key] [char](22) NOT NULL, [Trx_Date] [int] NOT NULL, [account] [nvarchar](35) NOT NULL,
I am trying to build a query where I want to extract the sum of the scores for each code MCC and get the percentage over the sum of all the scores over the last 90 days
select MCC, sum(score) as total from scores (select Datediff(day, creationdate, getdate()) as Q from scores where Datediff(day, creationdate, getdate()) <90) group by MCC
In a Library Management database we have these tables
1) Document ( DocNo , Doc_type , permalink,inDate) 2)Title(id, DocNo,Main_Title, Other_Title) 3)Author(id , Author_Name , Author_Family,Type--Like:main author , translator ,....) 4)Publisher(id,DocNo , Name,Publisedate,address) 5)Subject(id,DocNo,Subject) 6)Description(id,DocNo,ISBN,description)--one document may have some ISBN,etc
In document table I have 500,000 records.
I want to search a word in these tables ,for example i want to search 'Computer' ,this word may be in subject or title or description and etc. How can I do this with best performance?
I have a source table in the staging database stg.fact and it needs to be merged into the warehouse table whs.Fact.
stg.fact is not a delta feed it is basically an intra-day refresh.
Both tables have a last updated date so its easy to see which have changed.
It will be new (insert) or changed (update) data that I am interested in, there are no deletions.
As this could be in the millions of rows that are inserts or updates then this needs to be efficient.
I expect whs.Fact to go to >150 million rows.
When I have done this before I started with T-SQL Merge statement and that was not performant once I got to this size.
My original option was to do this is SSIS with a lookup task that marks the inserts and updates and deal with them seperately. However I set up the lookup tranformation the reference data set will have a package variable in the SQL commnd. This does not seem possible with the lookup in 2012! Currently looking at Merge Join transformation and any clever basic T-SQL that could work as this will need to be fast, and thats where I think that T-SQL may be the better route.
Both tables will have >100,000,000 rows Both tables have the last updated date The Tables are in different databases but on the same SQL Instance Each table holds 5 integer columns, one Varchar, one datatime
Last time I used Merge it was a wider table with lots of columns so don't know if this would be an option.
I want to add a percent column to the RIGHT of the total, and also on the bottom row. I can't find any clear examples of how to do this. If I had a new column, it adds additional headers beneath my top row. Or, my columns appear to the LEFT of the data, not the right. Can some please post some simple instructions that will make my simple matrix look like this:
Name Jan Feb Total %
John 5 6 11 60%
Mary 3 4 7 40%
Total 8 10 18 100%
% 40% 60% 100% 100%
I am so stuck on this I can pull my hair out.
Thanks!
Michael
p.s. I really hope the next version of SSRS has a simple "Sub-total %" option that you can enable just like the sub-total column.
I need to get this data into an SQL table in the following form so I can use it to further manipulate the data and update several other tables. I am thinking that UNPIVOT or CROSS APPLY might be the way to go, but am not sure how to code it.
Our system runs a SQL Server 2012 DB, it has a table (table_a) which has over 10M records. Our system have to receive data file from previous system daily which contains approximate 3M updated or new records for table_a. My job is to update table_a with the new data.
The initial solution is:
1 Create a table (table_b) which structur is as the same as table_a
2 Use BCP to import updated records into table_b
3 Remove outdated data in table_a: delete from table_a inner join table_b on table_a.key_fileds = table_b.key_fields
4 Append updated or new data into table_a: insert into table_a select * from table_b
As the test result, this solution is very inefficient. Step 3 costs several hours, e.g. How can I improve it?