Hopefully this makes sense, not sure what to even begin researching...
I'm trying to optimize all facets of this process, as it will take over the resources on my server if not done efficiently.
I have CSV files containing INTs that I need to upsert (match to an existing/earlier imported array or create a new record set) millions of times a day. To be clear, this data is a small subset of the actual import, this arrays contents are not the main data of the process, and the value of the entire array is meant to be related to higher level tables.
The contents of the CSV array are 99.9+% repeating, meaning they will very often share the exact same contents as a a previously imported array. A rough guess is there are 20k combinations existing, and less than 1k new per month, and will range from 6 cols x 15 rows to 6 cols x 50 rows.
So current plan is to use a MD5 hash during the (not SQL related) export process to identify the contents of this CSV file, and export only the md5 (32 digit hex) as a lookup to identify the contents. If the SQL import process finds a new (unknown) MD5 it will request the actual contents, otherwise it will simply use the MD5 as a key/id/code for the actual array contents that are already stored.
There's probably a certain terminology I'm not familiar with for this type of thing.. I've never heard of something like this. I realize collision is a threat, but I'm unsure how much I should be worried about it with this type of data (similar size/contents, but a relatively small amount of possibilities). I think up to even 0.1% collision would be acceptable which is probably way more than enough.
Does this sound like a bad idea to anyone? Are there certain hash functions I should use for this type of thing? Anyone have suggestions of where to look next?
I've got this little problem. I need to insert data from a table to another table. The scenario looks as follows: I've got 'Company' table (no duplicated records there) and 'Contacts' table (one to many relation: for one company there can be more than one contact). The following statement retrieves the data but it shows me everything, including all contacts and therefore I get duplicating values, e.g. company name. Is there any way of changing the following query so it works?
A user can login at front hand (WEB APP ) and username and password is stores in HASHED FORMAT I want to access that password into a orignal text from database ( sql server) Pls help me out is there any way to get orignal text .
Pulled all my hair out now, can someone please offer some help before I go totally mad...
I have no page header, this is all in the body. I have a List with account number and job number in it at the top, called list1. In list1 I have list2, with the job details, basically a list of people who attended the job. The fields in list2 are textboxes, they're not in a table.
EG: Acc No: 12345 Job No: 54321
Name Fred Bloggs Joe Bloggs Etc
When my list2 of names goes over 1 page, I want to repeat the account number and job number on each subsequent page. In my real report I actually have 10 fields I want to repeat.
I then want to be able to export this report to a PDF.
I can't put the header in the page header and RepeatWith doesn't work when exporting to PDF, or at all depending on what mood it's in.
I have run into a strange issue that I believe is a SQL Reporting Services issue.
I have a report laid out in landscape setting that has 4 columns of text. Two of the columns are sub-reports (due to the complexity and size, we did not flatten out the data in the stored procedure) and two of the columns are regular fields.
The 2 columns of regular fields are smaller, and normally only grow to about 1/2 the height of he page. One of the two sub-reports contains large amounts of text, and at time grows larger than the height of the page.
When the sub-report grows larger than the current page, it correctly starts up on the next page. But the 2 fields of data from the main dataset (not the sub-report columns) repeat themselves on the next page as well.
What is even more strange is the 2 fields of data from the main dataset only repeat data to grow vertically as far as the sub-report needs to grow. So if there is more data in either of these 2 fields than is needed for the sub-report to grow on the 2nd page, it will cut off the data in both of these fields.
I have tried placing the information in a group header. Turned the "Repeat on new page" both True and False, Took away the table header and footer, forced a page break after each group, tried using the "Hide Duplicates" property on the field within the details section, and nothing has seemed to fix the issue.
If anyone has run into this and found a work around, let me know.
Can someone tell me why my stored procedure is repeating the Name in the same column? Here's my stored procedure and output: select distinct libraryrequest.loanrequestID, titles.title, requestors.fname + ' ' + requestors.lname as [Name], Cast(DATEPART(m, libraryrequest.requestDate) as Varchar(5)) + '/' + Cast(DATEPART(d, libraryrequest.requestDate) as Varchar(5)) + '/' + Cast(DATEPART(yy, libraryrequest.RequestDate) as Varchar(5)) as RequestDate, Cast(DATEPART(m, libraryrequest.shipdate) as Varchar(5)) + '/' + Cast(DATEPART(d, libraryrequest.shipdate) as Varchar(5)) + '/' + Cast(DATEPART(yy, libraryrequest.shipdate) as Varchar(5)) as ShipDate from LibraryRequest join requestors on requestors.requestorid=libraryrequest.requestoridjoin titles on titles.titleid = requestors.titleidwhere shipdate is not null Output: ID Title Name Request Date Ship Date 29 Heads, You Win Brenda Smith 1/18/2008 1/18/200835 Still More Games Brenda Smith 1/22/2008 1/22/200851 The Key to.. Brenda Smith Brenda Smith 1/29/2008 1/29/200852 PASSION... Brenda Smith Brenda Smith 1/29/2008 1/29/200853 LEADERSHIP Brenda Smith Brenda Smith 1/29/2008 1/29/2008 Going crazy ugh...
I am querying several tables and piping the output to an Excel spreadsheet. Several (not all) columns contain repeating data that I'd prefer not to include on the output. I only want the first row in the set to have that data. Is there a way in the query to do this under SQL 2005?
As an example, my query results are as follows (soory if it does not show correctly): OWNERBARN ROUTE DESCVEHDIST CASE BARBAR TRACKING #70328VEH 32832869.941393 BARBAR TRACKING #70328VEH 32832869.941393 BARBAR TRACKING #70328VEH 32832869.941393 DAXDAX TRACKING #9398VEH 39839834.942471 DAXDAX TRACKING #9398VEH 39839834.942471 DAXDAX TRACKING #9398VEH 39839834.942471 TAXTAX TRACKING #2407 40754.391002 TAXTAX TRACKING #2407 40754.391002 TAXTAX TRACKING #2407 40754.391002
I only want the output to be: OWNERBARN ROUTE DESCVEHDIST CASE BARBAR TRACKING #70328VEH 32832869.941393
I'm using RS2000 SP2 and am getting an issue when exporting to PDF. If I have a table that spans more than one page and I set the RepeatHeaderOnNewPage to True, then on occasion the table header will be displayed on top of the first few rows of data. It does not happen on all the pages or all the time and I can not find any information on this issue. Has anyone come across this before and solved it?
Hi All, In SQL 2000, i need to Export Data,so that datashould be encrypted.When i try to import that in any database it should authenticate the user and should get decrypted.IZAT Possible.
Hi,I have a .net application and i added a code that encrypts data saved in database. However, there is already data in the fields that was entered before this change.I know need to check if the values in those fields are encrypted and if not i need to encrypt them.How can I perform such a check and update the relevant data?I use TrippleDES in .net to encrypt/decrypt the data.Thanks
Does SQL Server 2000 provide any data encryption/decryption functionality so that certain fields (e.g. SSN, Age and Salary) will be encrypted before writing into the table and decrypted once loading out of the table?
We like to secure datas. Only a few people are autorized to read this information, but today, these informations are readable with a simple query with a query analyzer for exemple.
I'd like to encrypt datas with reversible function in one field of a table
Is there a function able to do this kind of work in SQLServer V7 or 2000 ?
I want to encrypt certain data like password, ssn, credit card info etc before saving in database. Also, this encrypted data can be queried using standard SQL statements like:
select * from users where userid=454 and password = 'encrypted data'
The mechanism to encrypt data could be in a .net application. The code that does encryption/decryption should also be protected so that it doesnt work if it falls in wrong hands.
Can anyone suggest what would be the best way to accomplish above?
We have migrated a CRM Database from SQLServer 2000 to SQLServer 2005.
Database contains very sensitive data about customer in text format (Datatype varchar(20)) how can i encrypt the same without any change in the table design.
Hi there ,1. i have a database and i want to encrypt my passwords before storing my records in a database plus i will later on would require to authenticate my user so again i have to encrypt the string provided by him to compare it with my encrypted password in database below is my code , i dont know how to do it , plz help 2. one thing more i am storing IP addresses of my users as a "varchar" is there a better method to do it , if yes plz help me try { SqlConnection myConnection = new SqlConnection(); myConnection.ConnectionString = ConfigurationManager.ConnectionStrings["projectConnectionString"].ConnectionString; SqlDataAdapter myAdapter = new SqlDataAdapter("SELECT *From User_Info", myConnection); SqlCommandBuilder builder = new SqlCommandBuilder(myAdapter); DataSet myDataset = new DataSet(); myAdapter.Fill(myDataset, "User_Info"); //Adding New Row in User_Info Table DataRow myRow = myDataset.Tables["User_Info"].NewRow(); myRow["user_name"] = this.user_name.Text; myRow["password"] = this.password.Text; // shoule be encrypted //not known till now how to do it myRow["name"] = this.name.Text; myRow["ip_address"] = this.ip_address.Text; myDataset.Tables["User_Info"].Rows.Add(myRow); myAdapter.Update(myDataset, "User_Info"); myConnection.Close(); myConnection.Dispose(); } catch (Exception ex) { this.error.Text = "Error ocurred in Creating User : " + ex.Message; }
I have a new SQL 2005 (SP2) Reporting Services server to which I've just upgraded and deployed some SSRS 2000 reports.
I have a subreport that contains a matrix with two groups. The report data seems to be inexplicably repeating the data for the first row in the group for all rows in the group. Example:
ID1 ID2 DisplayData
1 1 A
1 2 B
1 3 C
2 1 A
2 2 B
2 3 C
Parent group is on ID1, child group is on ID2, report would show:
1 1 A
2 A
3 A
2 1 A
2 A
3 A
Is this a matrix bug in 2005 SP2, or do I need to do something differently? I can no longer pull a comparison version from an SSRS 2000 server to verify, but I believe it was working as expected before...
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Hi, this may be a basic question but i can't get my mind straight and need some advice.
I've read there's limitations on using the Lookup Task.
What i want to do is simply look if a Data exists on a table, if it does i want to update some columns (or delete the row and insert a new row), if it doesnt exists i want to inser a row.
That's all, easy as hell in sql and c#/.net
The thing is i dont know whats the best way of doing it in SSIS.
My limited ssis knowledge tells me Script Component should be able to do the work. That is, send it a variable holding the values i want to verify in the database and depending if it exists to the whats necesary.
So my question is, can i connect to the DB via Script and make my custom selects there?
If so how can i do it?
If not, what's the best way of doing that simple task?
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record. Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
I have a lookup transformation that retrieves a key for a certain column of values, in this case, a name. So, I go in to the lookup table with a name and come out with its key. I had it working and then I added new entries to the lookup table for a bunch of new names. Now, for some reason, I am not getting the matches for the new names. But I am still getting the matches for the names that existed before I added the new ones.
I'm wondering if the lookup transformation is using the old set of data and some how not picking up the new names. Do I have to trigger something in the lookup transformation to let it know that the lookup table data has changed?
Here is the deal - I have a flat file with 2 fields: ProdNum FeatureCodes 1 A01, B22, F09 2 C13,C24,E05,G02,G09,J07,J09,M03,M17 3 J07,M01,M17,N02,N11,N13,X15
The I have Excel file like: Code Description A01 Handicap features B22 Smoke-Free F09 Tinted Windows C13 Picnic Area C24 Extra Storage J07 Tile Flooring M01 Central Airconditioning
The result in a database needs to be like: ProdNum Features 1 Handicap features, Smoke-Free, Tinted Windows 2 Picnic Area, Extra Storage, ..... 3 Tile Flooring, Central Airconditioning, .....
(Just getting started with SSIS) I have a data flow task with source S1 and destination D. The mappings from S1 to D are straightforward, but additionally, I need to lookup a value from source S2 one time and map the value a column in D for every row of S1. Note that this lookup based on a constant value independent of S1, and therefore no column mappings are involved (this lookup is essentially "standalone").
I would expect I could do the lookup, assign the value to a data flow level variable, then reference that variable in a derived column transform or some such thing, but I'm having trouble figuring out how to do the lookup in the data flow task and assign it to a variable. Am I on the right track here, or...?
Hi, I'm transferring lots of rows (each row is a problem) from an oltp database to a datawarehouse. These problems (rows) are transferred to the fact table. And a problem has attributes like status and logdate and solved_date (default null) and a sequence number. I also check every day for new problems in the source. Now the thing is ... I can write an SQL query to incrementally add new rows by using the sequence number as parameter. This is to lower querying time (BECAUSe the problem table has about 2.000.000 rows)
The Actual Question: I also need to write a query to update the tables in the Datawarehouse. For example when a status of a problem changes. When status changes from "unsolved" to "solved" it gets a date in the solved_date attribute. Now ... How can I write a query that goes fast through those 2000000 rows? Now I can't use the sequence_number because its not known when the problem gets solved. There is the fact that I run a query every day, so maybe I need to look at problems where a solved date is added on that day? Im just a novice in sql so maybe you guys know ways to do fast lookups in many rows? Or are there Data Flow tasks I can use? I was thinking about the slowly changing dimension but thats for dimensions right? :) So maybe any other tasks that are possible? I just need to put back on track because I cant find a good way to do this.
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged: