Question On Large Volume Of Training Dataset
May 10, 2007
Hi, all experts here,
Thanks a lot for your kind attention.
I have a question on training large volume of datasets. In this case, the training will take a long while to complete, is there anything we can do to improve that? I know, we obviously cant split the training dataset into different smaller datasets. What we can do to improve that?
Hope my question is clear for your help.
Thank you very much in advance for your advices and help and I am looking forward to hearing from you shortly.
With best regards,
Yours sincerely,
View 3 Replies
ADVERTISEMENT
Sep 21, 2005
Hi,We need to use a free database for a project because of tight budget.Is MSDE ok for handling large volume of data and 70 - 80 users?My understanding is that MSDE is optimized for 5 concurrent users.Is MySQL better than MSDE?Thanks,Ben
View 1 Replies
View Related
Oct 1, 2015
I have a small number of rows in a dataset, Table 1. There is a CLOB on a large dataset, Table 2. They join on a PK. I would like to retrieve this CLOB and add it to the data flow for Table1. In short I want to emulate the following:
Table 1: Small table without CLOB, 10 rows.
Table 2: Large table with CLOB, 10,000,000 rows
select CLOB
from table2
where pk = (select pk from table1)
I want this to return the CLOBs for the small number of rows in Table 1. The PK is indexed obviously so it should be a fast look up.
Table 1 and Table 2 live on different Oracle databases. How do I perform this operation efficiently in SSIS? It seems the Lookup and Merge Join wont do this.
View 2 Replies
View Related
Mar 2, 2008
Hi All,
Is there a way the fuzzy lookup or grouping can be trained so that similarities and confidence values rely on previously matched strong links?
For example: I can link 80% of my two datasets using one strong identifier (say phone #) which I trust. My goal then, is to use the probability of matching of the rest of my linking fields (say Name,Address,Gender,DOB) in a "matched by phone number" pair to train a fuzzy lookup task to be done on the unlinked 20% of the datasets.
This "training set" would in theory influence the similarity and confidence values of the fuzzy output since each linking column would carry a different weight or contribution towards a confident match.
Does anyone out there knows how to do this in practice in SSIS?
View 1 Replies
View Related
Jun 15, 2007
Can I ask how to split the dataset into training and validation when running descision tree model?
View 3 Replies
View Related
Dec 16, 2007
Hi,
I have to transform about 60 millions of data and it runs so slow that it never finishes in my testing. Should I have to process it chunk by chunk? Or is there any other techniques I can use (I am using data flow task). Thanks for advice.
View 12 Replies
View Related
Jun 19, 2007
Hi, all experts here,
I am wondering is there any way to select only a portion of a data set to train the mining model? In this case, I mean we dont need to split the dataset in advance, what I want to do is being able to select any random portion of a selected dataset to train a mining model. Any advices?
I am looking forward to hearing from you and thanks a lot in advance for your advices and help.
With best regards,
Yours sincerely,
View 3 Replies
View Related
Apr 17, 2015
LOCALID - POSTCODE - GPCODE
PTO1395164 - DN34 1AB - G9999981
PTO1395164 - DN34 1AB - G9999981
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - DN34 1AB - G8909058
PTO1395164 - TZ14 2AX - G8909058
PTO1395164 - TZ14 2AX - G8909058
The sample data above shows 1 customer with multiple episodes (different attend dates – not important here), during the course of these attendances they moved home and moved GP practice.
Is there a simple way in Access to show a summary of this eh PTO1395164 = 2 postcodes, 2 GP’s
THe ultimate aim would be to identify where a customer has changed postcode or GP within a selected timeframe and disregard the rest.
View 4 Replies
View Related
Feb 27, 2006
Hi,
I'm running an application on a server which grabs data from a database table on another server using SqlConnection, SqlDataAdapter and DataSet.
The application then updates every row in that DataSet's DataTable and the updates are saved back using DataAdapter. The code is pretty much straightforward code that you would find on MSDN documentation for using DataSets. The table contains a little over a million rows.
When I run the application, I get an error saying the Server Application is not available. Upon looking into the application event log, I get this message.
aspnet_wp.exe was recycled because memory consumption exceeded the 306 MB (60 percent of available RAM)
How do I get round this? I thought DataSets were supposed to handle large datatables comfortably without having memory issues.
-Thanks
View 1 Replies
View Related
Jun 2, 2015
I have a well-structured but also very large binary data-set that is generated by a C++ application every five minutes. The data needs to be accessed by SQL applications. Since data is generated every five minutes, performance is key, both for write and read. The data set is about 500MB.If data is written to the file system, the write performance doesn't involve SQL server. For reading it, I have a CLR to read the portions of the data that I need based on offset and length. That works and is very fast. The problem is that data is stored in the file system, so it is not self-contained within the database.
A second option that I haven't explored yet, is to write the data into a table as VARBINARY(MAX). I would read the data using SUBSTRING with appropriate offset and length. Performance of SQL write/read of binary data of this size, and whether there is a third option I haven't thought off. I'm using SQL Server 2014.
View 5 Replies
View Related
Jun 7, 2006
I have a dataset that is between 40-50K records that has to go through a process that is pre-defined. SSIS works just fine with the smaller sets even up to 20K but this job keeps blowing up saying something along the lines of cannot write to recordset destination. Does this make sense to anyone? The sever is a 2 processor with 2GB of ram. Physical memory usage spikes to about 1.6GB during the run but the processor never really gets above 30% usage. Does this product just not scale yet?
View 1 Replies
View Related
May 26, 2015
I have a report with multiple datasets, the first of which pulls in data based on user entered parameters (sales date range and property use codes). Dataset1 pulls property id's and other sales data from a table (2014_COST) based on the user's parameters. I have set up another table (AUDITS) that I would like to use in dataset6. This table has 3 columns (Property ID's, Sales Price and Sales Date). I would like for dataset6 to pull the Property ID's that are NOT contained in the results from dataset1. In other words, I'd like the results of dataset6 to show me the property id's that are contained in the AUDITS table but which are not being pulled into dataset1. Both tables are in the same database.
View 0 Replies
View Related
May 27, 2015
I have a report with multiple datasets, the first of which pulls in data based on user entered parameters (sales date range and property use codes). Dataset1 pulls property id's and other sales data from a table (2014_COST) based on the user's parameters.
I have set up another table (AUDITS) that I would like to use in dataset6. This table has 3 columns (Property ID's, Sales Price and Sales Date). I would like for dataset6 to pull the Property ID's that are NOT contained in the results from dataset1. In other words, I'd like the results of dataset6 to show me the property id's that are contained in the AUDITS table but which are not being pulled into dataset1. Both tables are in the same database.
View 3 Replies
View Related
Jan 24, 2000
We have a 4 processor 350 Hz NT 4.0 SQL server. Currently we have an application
that is inserting rows one at a time, each row insert is a separate transaction.
Currenty we are averaging 2500 rows a second with each row ( 56 bytes wide).
The data and the log are on one string of Raid disk. We plan to get another controller
and raid string to separate the data and the log onto separate controllers.
The developer is modifying the application to insert the data in blocks. What is the
impact to the transaction log? He seems to think that by inserting page blocks on
rows there would be less data going into the transaction log. Why would this be so?
Does anyone have any information on practical limits for inserts and log truncation
with similar machine configurations. He would like to try to get around 150,000 rows a second.
Has anyone accomplished inserts at this rate? What type of machine configuration?
View 1 Replies
View Related
Feb 28, 2008
I have a summary table with a 9 field composite primary key. Every 10 minutes, my system generates 2 files of 500,000 to 750,000 rows to be summarized into this table. I first Bulk insert those into a temp table, and then trigger an inner-join update query to do the updates, followed by a left-outer join to do the inserts. As the day goes on, millions of rows in my summary table, this process is too slow. Any ideas about causes/solutions???
RLiss
View 2 Replies
View Related
Sep 11, 2006
Hi Guys
I Have not been able to solve this problem from quiete a while now.
I am using sql server 2005.
I have got a table which contains these columns - start date, end date and volumes
if the month in the start date is same as that of end date, the volume remains same, else if the months in the two dates are different, then i have to distribute the volume in such a way that some part will go in the first month and the rest in the other month.. i have to somehow calculate (or prorate) the volume according to the no of days in each month
I have to perform a query on this table so that I can group the volumes for different months and different years.
Hope I have made this quite clear.
Thanks
Mita
View 7 Replies
View Related
May 21, 2007
I found out the data I need for my SQL Report is already defined in a dynamic dataset on another web service. Is there a way to use web services to call another web service to get the dataset I need to generate a report? Examples would help if you have any, thanks for looking
View 2 Replies
View Related
Oct 12, 2007
Is there any way to display this information in the report?
Thanks
View 3 Replies
View Related
Jul 17, 2007
I just noticed that; although my server has 2 physical volumes my log files and DB are on the same one. How do I do it?It's SQL Server 2000 running on Windows 2000 Server.As a side note: Why does the database's Properties display in EM allow definition of multiple log files?Thank you!
View 1 Replies
View Related
Jul 20, 2005
Hello!Does anybody know whether mssql2000 and emc mirrorvew _certified_ forjoint work?(Mirrorview is a fc-based remote mirroring solution)I mean is it supported from the MS point of view to put mssqldatafiles on emc mirrorview volumes?For example Oracle corp. has "Oracle Compatible Remote MirroringTechnologies" certification.But what about MS?
View 1 Replies
View Related
Apr 24, 2007
We have an application that was built and testing using SQL Server Express. One of our clients is deploying it using SQL Server Standard and plans to put the data files and log files on separate disk volumes.
In allocating the available disks to the volumes, they are looking for a recommendation on how big the log file volume versus the data file volume should be. Over time there will several years worth of data in the data files. I assume the log files need to be at least big enough to log all the changes between back-ups. Are there any general rules of thumb? Or whitepapers that discuss the trade-offs?
Thanks in advance...
View 1 Replies
View Related
Jan 29, 2007
Hi All
I would like to know whether SQLServer can be installed in a raw volume or not..... Is there best practice guide for this. .
Regards,
Vijay
View 2 Replies
View Related
Jun 16, 2008
I have a question on how to sum data by a certain date range. Here is the data I'm looking at. I have volume measured usually (but not always) every day. I want to sum the volume from the 2nd of the month to the first of the next month. I want to do this for every month. I have the columns of my data listed below. Can anyone help me with this? I've been trying to read up on it, but I'm not finding anything.
Entity Date Measured Volume
1 4/01/2008 5
1 4/02/2008 4
1 4/03/2008 6
1 4/04/2008 5
1 4/08/2008 7
1 4/12/2008 8
1 4/13/2008 5
1 4/14/2008 7
1 4/25/2008 8
1 4/30/2008 9
1 5/01/2008 6
1 5/02/2008 8
Thanks in advance for any help!
View 4 Replies
View Related
Apr 18, 2007
Hi Guys,
I am facing problems with concurrent access in SQL Server 2000,The scenario is that the DB contains one huge de-normalized table containing 40 million records.
The application frequently queries this table to populate other derived tables,the sql queries take a long time to return results.
So if one query is in execution the other user's query goes into a
wait mode.Please suggest how I can better this.
Or do I need to upgrade to 2005.
Regards,
Prashant
View 2 Replies
View Related
Mar 19, 2008
Hello ,i am a master student and i am making a seminar about high volume DB performance problems ,example : if i have a table with length of 1000000 record and this length is growing exponentially by the time,what the problems may i face in insertion ,deletion , search,in such table?? and what the problems in processing such DB in general
View 1 Replies
View Related
May 31, 2007
Hi Good morning to all,
My day started with loading huge volume of data and my data flow task failed to do so.
My data flow has a flat file connected to a OLEDB target. This is a one to one mapping. My source file contains 50 lac records and it is of 500 MB in size.
I'm processing the data with all the default buffer settings. I have 4 CPUs in my server.
the system process DTSDebug.exe is utilizing more than 2GB page size. My average CPU usage being 70% when one of those CPU s is hitting 100% utilization.
I'm very new to SSIS. So, please provide me some info how do i set my buffers and do we have any PDF for performance and tuning in SSIS ?
Do we have any bulk load transformation in SSIS to load into DB2UDB ?
If so how do i get it installed?
Thanks in advance,
Suresh N
View 2 Replies
View Related
Nov 15, 2007
I am in the process of choosing between either SQL Workgroup or Standard Edition. I see the differences in features on the comparison table, but do not see any references to the differing capabilities in handling transactions.
Is there any differences between Workgroup and Standard in terms of handling transaction/data capabilities? i.e. Does Standard have the superior capability in handling X times more TPMs than Workgroup?
If not, am I correct to assume that this is totally determined by hardware configuration (# of CPUs, processor speed, HD speed, RAM) ?
If the data volume / transactions handling is solely determined by hardware configuration, and I know the # of transactions and amount of R/W per second, .......where would be a good reference to know what kind of hardware configuration I need (ideally, once I know the hardware configuration, I guess I would be able to determine I need Workgroup or Standard)
Thanks in advance,
benbry
View 3 Replies
View Related
Aug 22, 2007
We are creating an enterprise application for fuel, and I am fighting with my DBA about the proper way to store volume and currency in the database. We have 2 main arguments. The first argument is whether we should store costs in the database in $ and convert in the presentation layer, or to store the amount and currency in the database. We sell product from the US in dollar but depending on the customer we may invoice in Euro. Our second argument is the same, execept with volume and UOM. We often purchase product by BBL but sell/transfer by gallon, or Ton.
please tell us the best practice for our dilemma.
View 2 Replies
View Related
Apr 5, 2007
Not really a question. Just looking for people with experience with SB in a highly transaction env. with passing a lot of messages. What kind of challenges have you ran into when you are processing the messages. I am currently writing a SB application for a large financial institution, and want to get some ideas of challenges that I might face when volume gets really high (couple of million transactions per day).
Thanks,
Tim
View 4 Replies
View Related
May 7, 2008
Hi,
I have a stored procedure attached below. It returns 2 rows in the SQL Management studio when I execute MyStorProc 0,28. But in my program which uses ADOHelper, it returns a dataset with tables.count=0.
if I comment out the line --If @Status = 0 then it returns the rows. Obviously it does not stop in
if @Status=0 even if I pass @status=0. What am I doing wrong?
Any help is appreciated.
ALTER PROCEDURE [dbo].[MyStorProc]
(
@Status smallint,
@RowCount int = NULL,
@FacilityId numeric(10,0) = NULL,
@QueueID numeric (10,0)= NULL,
@VendorId numeric(10, 0) = NULL
)
AS
SET NOCOUNT ON
SET CONCAT_NULL_YIELDS_NULL OFF
If @Status = 0
BEGIN
SELECT ......
END
If @Status = 1
BEGIN
SELECT......
END
View 4 Replies
View Related
Apr 18, 2006
Hi,
I have been asked to design a solution for a client of mine who basically requires the daily analysis and reconciliation of the differences between 2 extremely large text files.
The files are not in an identical format but are both in some form of delimited format (one is CSV, the other is a little more complex). For the sake of this question, let's assume that I can effectively import each file into an MS SQL table.
Each file will have in excess of 100,000 rows each day (new data for each day).
Whilst I know that MS SQL does easily have the capacity to store the data, is there a recommended way to tackle the potential problems (I imagine that performance is important... they will be running the report every day)
Or is building the solution as simple as importing the data into 2 tables, and then querying the differences and outputting as a report using Crystal?
Any suggestions appreciated.
Thanks
Rael
View 10 Replies
View Related
May 6, 2008
Hi,
Is there any SQL Server functionstored procedure available to get drivesvolumes and mount points total size and free space information?
I don’t want to use clr approach for this.
Can you please provide any pointers related to this.
Thanks in advance,
-Kiran
View 1 Replies
View Related
Dec 10, 2007
Hello every body
i am doing a research about high volume database treatment (maybe a database with tera bytes volume) , so is there any optimization or specialization for queries deal with such database? !!
View 5 Replies
View Related