I have a table with raw scientific test results in a single field, some of which are over 25Mb field. I need to parse into the field to find and aggregate selected values from the field.
Table structure is
CREATE TABLE [dbo].[Gxxx_Data](
[id] [uniqueidentifier] NOT NULL,
[Status] [nvarchar](50) NULL,
[GxxxItem_ID] [int] NULL,
[Stats_Data] [varbinary](max) NULL,
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
[code]...
From which I need to parse and summarize the (Assembler) opcodes (MOV,CMPi, SHR etc...)I need to parse the large field [Stats_Data] to locate the target data.The internal result strings are delimited with Char(10), conservative counts are from 64k to over 100k lines in each record. Is there a way to parse the individual lines into another table (temp) that would be queried/regexed ?
We serialize a custom object into a byte array (byte[1000]) and store it in a SQL Server 2005 table column as varbinary(1000). There are a lot of rows retrieved with each SqlDataReader from C# code: up to 3,456,000 rows at a time (that is, roughly 3.5 million rows).
I am trying to figure out what will be more efficent in terms of CPU usage and processing time. We have come up with quite a few approaches to solve this problem.
In order to try a few of them, I have to know how I can extract certain "pieces" of data from a varbinary value using T-SQL.
For example, out of those 1000 bytes, at any given moment we need only the first 250 bytes and the third 250 bytes. Total: 1000 -> [250-select][250-no-need][250-select][250-no-need]
One approach would be to get everything and parse it in C#: get the 1st and the 3rd chunks of data and discard the unneeded 2nd and 4th. This is WAY TOO BAD.
Another approach would be to delegate the "filtering" job to SQL Server so that SqlDataReader gets only what it needs.
I am going to try a SQL-CLR stored procedure, but when I compared performance of T-SQL vs. SQL-CLR stored procs a few weeks ago, I saw that the same job is done by T-SQL a bit faster AND (more importantly for us) with less CPU consumption than SQL-CLR.
So, my question is: how do I select certain "pieces" of varbinary column data using T-SQL?..
In other words, instead of SELECT MyVarbinary1000 FROM MyTable how do I do this: SELECT <first 250 from MyVarbinary1000>, <third 250 from MyVarbinary1000> FROM MyTable ?
Hello, I have decided to use Linq for my current ASP.NET project and so far it has been good, but now I am implementing a system that will allow users to upload binary content such as pictures and videos. For ease of management and security, I have decided to store this content directly in the database. The performance hit is a minor concern because very few user-uploaded images/videos will be seen on any given page (usually just one). From the limited tutorials I have seen on the internet, Linq supports the SQL Server varbinary column through its System.Linq.Binary class. This class does not appear to support STREAMS and instead opts to load all of the contents into memory. This content can then be converted to an array of bytes, which can then be output to the browser via the response stream. This is not good. What if I am sending a video that is very large? Varbinary supports up to 2 GB. I can't have a 2 GB video sitting in memory. It makes a lot more sense to stream it via a small buffer. Obviously, I am going to limit the size of the content that users can upload, but the core problem remains. If I limit content size to 2 MB and I have 2 GB of memory on the server, then I can only serve 1000 users concurrently. In reality, that number would be much less because of other processes running on the server. Is there no way to stream data from a varbinary column with Linq using a small buffer of bytes? Do I need to implement some custom logic on my Linq classes? Since these classes are automatically generated, how would I do such a thing? Thanks.
What are some good strategic approaches to using freeform text fields fordata that needs to be queried? We have a product whose tables we can'tchange, and I need to count on a "description" field for storing a value.Two, actually. I'm thinking of adopting this convention:InvoiceNumber@VendorAcronymThere'd be a lot of vendors.Additional issue: sometimes these values would be referred to in thedescription field, and I'd need to distinguish them as referrals ratherthan as original recorded instances of the values. For that, I imaginedeither:InvoiceNumber@@VendorAcronymorInvoiceNumber&VendorAcronymInvoiceNumber//VendorAcronymetc. -- something like that.I'm just wondering if there's best practice for doing anything this stupid(hey, I'm stuck with this as our only option just now; hopefully it's onlytemporary). How to parse out whatever I end up implementing -- well, itneeds to be tractable.Thoughts?--Scott
I have just started to look at SQL and have a theory question that I could apply to a test I want to run. I have some legacy data from a previous project and the database was not designed properly (in my opinion). They have ONE field to capture City and State information. All the data is formatted City, State .
Does SQL have commands that can look at data in a field, strip out info before and info after a comma and then write that to other fields?
So, I would like to normalize this to take the data in a field called CityState and parse it, trim it and then populate two new fields 1) City and 2) State.
I have a very large database running in a production environment. The backup files are getting to be very difficult to manage. They are at 70gb as of now and growing at a rate of 10gb per month. This is due to document images being stored in the database in one table. I have read a little about a new feature in SQL Server 2012 called Filetable. Can we migrate to SQL Server 2012 and convert to the filetable structure, will this decrease the size of the backups? Would backup of the document images be taken care of by our nightly file system backups now?
I'm trying to transfer a table from SQL Server 7 database to another SQL server 7 database on another server. This table has a text field with lots of data (~.5-1 G). I'm using the export wizard and the transfer appears to complete successfully, but when I view it, the text field data has been truncated.
I've got a number of stored procedures that I have for reporting
All are of a similar starting format
For easier maintenance and to take away the need to change all of them if the methodology changes I want to split out shared code.
What I want to do is to take out the part that populates the @ID1 table into a separate stored proc which will be called from the report procs. The values from the shared proc will then be parsed back to the reporting proc.
I thought about using a function but I don't think it will be flexible enough as in certain cases I want to parse 2 or more IDs back into the final output.
I also don't want to make the code too complex so that it is relatively easy to read
CREATE PROC dbo.ReportM1 @ID INT AS DECLARE @ID1 TABLE (ID INT PRIMARY KEY, UNIQUE(ID)) IF @ID = 0 INSERT INTO @ID1
[Code] ....
The first question I have is: can i do it with a table variable when going between procs or do i need to build a real table if i want it to maintain the logic in 1 place.
May be worth bearing in mind that the end user who will be executing the proc will only have read + execute stored proc access permissions so dropping, updating or creating real tables is not an option. #Temp tables are possible but since am using table variables throughout would prefer to stick with them.
I have an application that stores xml data in an unusal manor. Basically a SQL Key column and an XML string.The XML string is not really standard XML, but it is what it is, and I'm stuck with it. It is in the format;
<row key="Value.01" xml:space="preserve"><c1>FirstName</c1><c2>LastName</c2><c3>10 Street Address, City ST 012345-1234</c3><c4>5</c4><c5>50</c5><c6>500</c6></row>
I am able to pull values out via SELECT p.value('(./c1)[1]', 'VARCHAR(8000)') AS c1, p.value('(./c2)[1]', 'VARCHAR(8000)') AS c2 FROM dbo.UserXMLTable CROSS APPLY XMLRECORD.nodes('/row') t(p) where p.value('(./c1)[1]', 'VARCHAR(8000)') like 'First%'
However I've been struggling with selecting row with a LIKE clause. Something like ;
SELECT * FROM dbo.F_UserXMLTable where XMLRECORD.value('(./c1)[1]', 'VARCHAR(8000)') like 'First%'
I have tried a number of permutations of XML syntax but so far have been stumpled.
Please note "<row key="Value.01" xml:space="preserve">" has a <SP> in the name 'row key' .
I have an idea to use LIKEW opeartor (with te wildcards) to match large (>10Kb) text fields of 'text' and 'ntext' types. Are there known problems here?
I want to log errors to a table. If the error is with a URL, I want to store the URL. These URLs can be very large, hundreds of characters, but I only need to store it if it causes the error, which should be very infrequent. Which is the better design:
Create a large varchar field in the log table to hold the URL, or null if the error wasn't with the URL. Create a foreign key field in the log table to a second URL table, which has a unique ID and a large varchar, and only create a record in this table if the error is with the URL.
One concern I have with design 2 is that there could be many other fields that are infrequent. Do I create a separate table for every one?
I'm trying to parse out a line of data that is separated by the text "atc1.", "atc2." etc.
For example,
[atc1.123/atc2.456/atc3.789/atc4.xyz/]
If I only want the data after atc2., then I could search the string for "atc2." and collect all the characters afterwards. But how can I make sure to trim off all the data after "atc3." to make sure I'm only collecting "456" from the example above?
For large databases is it a good idea to create indexes for fields that are used in Where statements? Does that improve performance and reduce overhead?
I have designed a CV database with complete CV stored in a TEXT field. There is a keyword search which queries the TEXT field also. The query conditions are defined in T-SQL submitted through an ASP page. There is about 20,000 records now. Now while querying the database for keyword search I am receiving time out errors. Is there any solution other than Index server to rectify this situation. How can I speed up the query execution time. Please advise.
The SQL CAT team's recommendation is to avoid partitioning dimension tables: URL.....I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc.I'm not a DW expert so was wondering what are the reasons to not partition dimensions?
I'm preparing a checklist for myself before getting ready to migrate from 2005 to 2012. Our largest database is a nice one at over 250GB. I'm thinking my best bet to minimize any downtime would be to Restore the DB (NORECOVERY) on the new server and keep rolling it forward with the transactional logs. Eventually I'll need to bring the old DB offline and do one last backup and apply that one to the new server but that should be a small time frame given the whole process could take several hours.
We have a large OLAP database, about 2.5 TB spread out over 3 data files on three different drives, and recently someone ran a query that created a table that continued to grow until the data files filled the available disk space (about 3 TB total - 1 TB per drive).
Tonight I plan on running a full backup (it's in Simple mode) and running a ShrinkFile on all three files sequentially with TRUNCATEONLY just so it will remove the space after the last extent. Any way to tell ahead of time how much space this will recover?
Granted running a DB Shrink is one of those things you just don't do, but this is a one-time shot and unavoidable to get the file size back under control.
I am attempting to do a rather simple purge task on a very large table. This task will need to take place daily and delete records older than 6 months out of the database. On first pass this will delete well over 130 million rows. I thought the best way to handle this is create a proc and call the proc from a SQL Agent Job that runs nightly. Here is an example of the script:
CREATE PROCEDURE usp_Purge_WCFLogger AS SET NOCOUNT ON EXEC sp_rename 'dbo.logs', 'logs_work' GO SELECT * INTO dbo.Logs_Backup FROM dbo.Logs_Work WHERE TIMESTAMP < DATEADD(month, -6, GETDATE())
I need to create a Clustered Index (CI) on a very large SQL Server 2012 database table. This table has about approximately 10 billion rows, 500 GB in size. The job ran for about 20 hours into it and then fails with error: "Out of disk space in tempdb". My tempDB size is 1.8TB, but yet it's still not enough.
Here is my script:
CREATE CLUSTERED INDEX CI_IndexName ON TableName(Column1,Column2) WITH (MAXDOP= 4, ONLINE=ON, SORT_IN_TEMPDB = ON, DATA_COMPRESSION=PAGE) ON sh_WeekDT(Day_DT) GO
I have a very large table that I need to partition. Ideally the table will write to three filegroups. I have defined the Partition function and scheme as follows.
CREATE PARTITION FUNCTION vm_Visits_PM(datetime) AS RANGE RIGHT FOR VALUES ('2012-07-01', '2013-06-01') CREATE PARTITION SCHEME vm_Visits_PS AS PARTITION vm_Visits_PM TO (vm_Visits_Data_Archive2, vm_Visits_Data_Archive, vm_Visits_Data)
This should create three partitions of the vm_Visits table. I am having a few issues, the first has to do with adding a new clustered index Primary Key to the existing table. The main issue here is that the closed column is nullable (It is a datetime by the way). So running the following makes SQL Server upset:
ALTER TABLE dbo.vm_Visits ADD CONSTRAINT [PK_vm_Visits] PRIMARY KEY CLUSTERED ( VisitID ASC, Closed ) ON [vm_Visits_PS](Closed)
I need to define a primary key on the VisitId column, but I need to include the Closed column in order to partition on it.how I would move data between partitions on a monthly basis. Would I simply update the Partition function, or have to to some sort of merge, split, or switch function?
I need to create script that will import large XML files (500 - 7GB) on a daily basis and store the data in a relational db structure.
What is the best and fastest way of importing such files. I have played around with smaller files and found the following.
1. SSIS XML Data Source: It doesn't seem to like the complex elements types and throws out the file. 2. Using Bulk File Import, sorting the file in XML variable and using XQuery to parse the file: This works but it can't take a file more than 2GB in size, so I can't use this method. 3. C# + XML Serialization: This also works, but seems to be terribly slow. I open the DB connection once, so it doesn't open and close for each db call, but still seems like it takes a long time.
how to import large XML quickly in a relational table structure?
I am fetching large amount of data from teradata to sql server using linked server. I am facing below query:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
I have a table with about 466 Million rows. In this table there is a int column called WeeksToRetain as well as a EventDate column containing the date the row was inserted. I am trying to delete all the rows that that should be deleted according to the WeeksToRetain. For example, if the EventDate is 5/07/15 with a 1 in the WeeksToRetain column the row should be removed by 5/14/15. I am not sure what days SQL considers the beginning and end of the week. However the core issue I am having is the sheer mass of deletions I must do and log growth.
So I am trying to do the delete in batches. More specifically I want to load a temporary table with a million rows, then use the temporary table to load a sub temporary table with 100,000 rows and join this temporary table to the table I want to delete from looping through 10 times to get 1 million. The Logging.EvenLog table which is the table I'm trying to purge has a clustered index on EventDate (ASC). I would like to run this in a schedule job with enough time between executions for log backups to run.
DECLARE @i int DECLARE @RowCount int DECLARE @NextBatchDate datetime CREATE TABLE #BatchProcess ( EventDate datetime, ApplicationID int,
Using merge replication + web synchronization, I have a situation when there are large amount of data changes to upload to the publisher, Merge agent would create a large request and send it over. The publisher gets it and is able to work on it. After few minutes it has finished but (I assume) the connection has been dropped. At the subscriber's side, it appears that the merge agent is hung. The output would look like something like this:
Upload request size is XXX bytes. Uploaded a total of 100 chunks. Uploaded a total of 200 chunks. Uploaded a total of 211 chunks.
The request message was sent to [URL] ....
Normally, when the publisher finishes working, the merge agent then continues processing. But when it takes more than few minutes (it seems to break about at 2 or 3 minutes), merge agent will hang as long as the InternetTimeout setting is (currently 20 minutes) before finally failing and retrying.
But that's not right. The publisher was done and can't communicate back to the merge agent (presumably because the connection was dropped). As a result, merge agent will try to re-enumerate changes on top of giving appearance that it's hung.
I've already fiddled with settings such as MaxUploadChanges, UploadGenerationsPerBatch, UploadReadChangesPerBatch, and UploadWriteChangesPerBatch. However, none of those setting actually ensure that the request message is too large. It has worked in breaking up the changes into separate batches (e.g. processing a single table rather than all tables) which results in more frequent updates and thus avoid the problem.
However, when a single table has several changes, it is still lumped into one large request which then takes more than 2-3 minutes to process on publisher's side and thus I still end up with the same symptom of merge agent hanging.
Is there anything else I could try to get merge agent to keep its connection alive even during processing a large request?
A New Monthly data is being loaded, checked and finally approved after 6 or 7 iteration before approval.Because of this iteration the monthly data set is being added then deleted then added then deleted few times.Because the table is big this process takes time, any thoughts on how to make the delete insert process faster.Keep in mind I cannot do much because it is a production table and is being access by other users to do other analysis.
Delete is done based on trx_date which is a year/month combo, like 201508.
The table has monthly sales by customer aggregated.
The table structure is:
CREATE TABLE [dbo].[Sales]( [batch_key] [int] NOT NULL, [Company_key] [int] NOT NULL, [customer_key] [char](22) NOT NULL, [Trx_Date] [int] NOT NULL, [account] [nvarchar](35) NOT NULL,
In a Library Management database we have these tables
1) Document ( DocNo , Doc_type , permalink,inDate) 2)Title(id, DocNo,Main_Title, Other_Title) 3)Author(id , Author_Name , Author_Family,Type--Like:main author , translator ,....) 4)Publisher(id,DocNo , Name,Publisedate,address) 5)Subject(id,DocNo,Subject) 6)Description(id,DocNo,ISBN,description)--one document may have some ISBN,etc
In document table I have 500,000 records.
I want to search a word in these tables ,for example i want to search 'Computer' ,this word may be in subject or title or description and etc. How can I do this with best performance?
I have a source table in the staging database stg.fact and it needs to be merged into the warehouse table whs.Fact.
stg.fact is not a delta feed it is basically an intra-day refresh.
Both tables have a last updated date so its easy to see which have changed.
It will be new (insert) or changed (update) data that I am interested in, there are no deletions.
As this could be in the millions of rows that are inserts or updates then this needs to be efficient.
I expect whs.Fact to go to >150 million rows.
When I have done this before I started with T-SQL Merge statement and that was not performant once I got to this size.
My original option was to do this is SSIS with a lookup task that marks the inserts and updates and deal with them seperately. However I set up the lookup tranformation the reference data set will have a package variable in the SQL commnd. This does not seem possible with the lookup in 2012! Currently looking at Merge Join transformation and any clever basic T-SQL that could work as this will need to be fast, and thats where I think that T-SQL may be the better route.
Both tables will have >100,000,000 rows Both tables have the last updated date The Tables are in different databases but on the same SQL Instance Each table holds 5 integer columns, one Varchar, one datatime
Last time I used Merge it was a wider table with lots of columns so don't know if this would be an option.