Train And Test Data Sets
Feb 8, 2007I've seen that sometimes is better to split the table into a test dataset and a training dataset, and I'll appreciate if anyone can explain why is this...
thanks
Santiago Aceñolaza
Argentina
I've seen that sometimes is better to split the table into a test dataset and a training dataset, and I'll appreciate if anyone can explain why is this...
thanks
Santiago Aceñolaza
Argentina
Hi, all experts here,
I am wondering is there any way to select only a portion of a data set to train the mining model? In this case, I mean we dont need to split the dataset in advance, what I want to do is being able to select any random portion of a selected dataset to train a mining model. Any advices?
I am looking forward to hearing from you and thanks a lot in advance for your advices and help.
With best regards,
Yours sincerely,
Hi, in SQL Management Studio, we just click 'Process' to train the data, how could I conditionally select a subset of Data Source (View) to train?
Thanks in advance.
Ricky.
Hi, all here,
Thank you very much for your kind attention.
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
Thank you very much in advance for any help.
With best regards,
Yours sincerely,
I'm currently working on a BI architecture for a customer, and consider to propose the Power BI data catalog as a data distribution layer. The customer will use Power BI, but also has other BI tools.
Are data sets in the data catalog available to other clients than Power Query alone? E.g. are there OData feed endpoints available? If not, what would be the best way to give other tools access to the data?
Could I ask how to spit the data into training and validation sets when doing data mining?
Thanks
Hi, I was wondering how it is posible to join three data sets from different data flows into one txt file.
Let's explain a little more:
I have 3 dataflows. Each of them connect to sql server and and by a SQL command, they bring data into SSIS.
Each SQL command differ between them. So each data set have different columns (they dont have the same format). Also the amount of columns differ between each one.
What I need is to join the three data sets into one txt file. How can I do this? It is posible to join them with different data set formats into a txt file?
Is this the best way to join different data? It is better to use as many OLE DB Sources are needed instead of different data flows?
Thanks for your help!
hi,
I newly Installed my SQL 2005.When I try to train my Model its giving me "Key not valid for use in specified state." Can anyone help me how to figure it out?
Thanks,
Karthik
Hi,
I'm currently trying to retrieve results from a large dataset, there are over 45000 records and I need to use them all to peform counts etc. I have set up views, but my page is still being returned slowly, is there anything I can do to speed this up?
Thanks
Gemma
I am trying to query one table and get two different timeperiods of data, I am summing monthly totals to provide a running year total, but I also need last month's total in a seperate column. This is what I have so far but the subquery makes me group it which provides duplicate grouping.DECLARE @LASTPD AS INT
SET @LASTPD = (SELECT MAX(LASTPERIOD) FROM TABLE)
SELECT NAME,
POST_PD AS [MONTH],SUM(CHARGE_AMOUNT) AS MONTHLY_$,
LASTMONTH.LAST_MONTH,(SELECT SUM(CHARGE_AMOUNT) AS LAST_MONTH
FROM TABLE INNER JOIN TABLE2
ON TABLE2.NAME = TABLE.NAME
WHERE POST_PD = @LASTPD
AND TABLE2.NUM= 539
GROUP BY NAME) AS LASTMONTH
INTO #TEMP_SAFROM TABLE
INNER JOIN TABLE2
ON TABLE2.NAME = TABLE.NAME,(SELECT SUM(CHARGE_AMOUNT) AS LAST_MONTH
FROM TABLEWHERE TABLE2.NUM = 539
GROUP BY NAME, POST_PDORDER BY NAME, POST_PD
SELECT NAME,
LAST_MONTH,
CAST(SUM(MONTHLY_$)AS DECIMAL(20,2)) AS YEARLY_$
FROM #TEMP_SA
GROUP BY NAME
ORDER BY NAME
Hi All,
I would like to match two sets of data. I have setup a view of data that contains a group of customers and their details. I want to view this data, but also find these customers in another table based on matching their surname and date of birth, then retreive the information stored on these customers from the second table.
Does anyone have any suggestions how i would go about doing this?
Thanks in advance
Humate
quote:Originally posted by Michael Valentine Jones
It takes real skill to produce something good out of a giant mess.
I have the following situation. One set of data has 274 rows (set2)and anther has 264 (set1). Both data sets are similar in structure aswell as values for both of them were extracts from the same parenttable. Hope the info would substitute DDL. I need to find the "gap"rows between these two sets.Attempted to run a query likeselect count(*)from set2where not exists(select *from set1)did not yield what I desired. What else to try?TIA.
View 12 Replies View RelatedHi
I have a matrix whos colunm group is filed by Dataset1,
now i want to add naother colunm group,but using the Dataset2
can I use two different dataset for a matrix,
for differnt colunm group
please help me in this regards
thanks
Dear all,
Why I always fail to manually train a model in Management Studio?
ZhaoHui Tang recommends me to untrain the model first. So I untrained it like this.
DELETE FROM Decision_Tree.CONTENT
Then I train it like this.
INSERT INTO Decision_Tree
(Age, Bike_buyer, Customer_Id, Gender)
OPENQUERY ([AdventureWorksDW], 'SELECT Age, Bike_buyer, Customer_Id, Gender FROM Training_table');
The error message is
Error (Data mining): The mining structure , Decision_Tree is already trained and does not
support incremental updates. Before using the INSERT INTO statement, use DELETE FROM <object>.
Why is that so? I already untrained the model. The model has been made and processed
outside Management Studio, i.e., in the Visual Studio.
Thank you,
Bernaridho
I have two tables - one with sales and another with payments against those. The payment may not match the exact amount of sales and I have to use FIFO method to apply payments. The payment month must >= sales month.
How can i write a query to do this? Examples are as below.
Table 1
Sales Sale DateSale Amt
1Jun-141200
2Oct-142400
3Dec-14600
4Feb-1512000
Table 2
Pay Month Pay YearPay Amount
5 2014 300
6 2014 1000
10 2014 500
11 2014 2000
12 2014 300
1 2015 900
create table tbl1
(
saleNo int
,saleDate date
,saleAmt float
)
insert into tbl1 (saleNo, saleDate, saleAmt)
[Code] ....
Is there a way to put more then one data set in a list.
I have a report that has three data sets with three tables. Now i want to show each report by Region, per page. So you can view the same stuff for each region seperately, instead of all together. Is there a way to do this. Where i dont have to go back in my code, and find a way to link everything together, so its in one data set .
Hi,
I'm using a matrix report where in i want to use two datasets in the same report. How can i make the dataset dynamic for a single report.
Regards
Hi,
I'm trying to created a report.
Final report looks like this.
Total Loans/Lines (#)
13,283
Total Commitments ($ MM)
$1,703
Total Outstandings ($ MM)
$1,175
A
B
C
D
F
Bankruptcy
0
$0
$0
0.00%
0.00%
Charge Off
0
$0
$0
0.00%
0.00%
Source table looks like this;
Bankruptcy
0
Charge Off
0
CLTV
131
DSR
102
Exc Total
265
FICO
7
Foreclosure/Repossession
Grand Total
13283
Loan Amount
32
Column D = A Bankrupcy(0) / Total Loans/Lines #(13283)
But it does not let me to use report expression as its not in the same scope.
Can anyone tell me how to do this calculation ?I was trying to use a report expression but it seems like not working.
Thanks
Hi,
I have designed a contact manager with Data Grid Control bound to a Data Set.
When the application closes, data from Data Set is written to XML file and when application opens, data from XML file is loded into Data Set and is show in Data Grid control.
Contacts in my application can exceed over 1,000
So, Is Data Set capable of handling lot of data very efficiently in memory?
Please advise
Hello,
I am using existing code, which I am trying to convert from using MS Access to SQL Server 2005...
The data set works fine with MS Access database, however when executing with SQL Server 2005 as data source, it generates the following error:
"..The data types ntext and nvarchar are incompatible in the equal to operator..."
in this line:
count = adapter.Update(dataset);
Not sure what should I look for since data sets are new to me.. Where should I check to fix this problem? I have noticed that the table has two columns with nvarchar...
I have two queries that generate two different datasets. One is a count of memebers, and the other is count of admits. I need to generate a calculated field from the two data sets called admits per 1000, which is essential the count of admits/counts of members *12000 I was able to calculte admits per 1000 easily in excel, however I need some insight on how to do is SQL.
Below are my queries from the two datasets.
MemberMonths dataset:
Select
factMembership.BusinessUnitCode,
EffectiveCCYYMM,
ISNULL(count(Distinct MemberId),0) As MemberCount
From factMembership
[Code] ....
Admits dataset:
SELECT
Factadmissions.BusinessUnitCode,
factAdmissions.AdmitCCYYMM,
ISNULL(Count(AdmitNum),0)As [Count of Admits]
FROM factAdmissions
[Code] ...
I have a situation where i have a transactional fact table which consists of date, row type, order number and value. Â Simple example below
Date, RowType, OrderNo, Value
01-May, New, A1, 100
01-Jun, Change, A1, -10
01-Jul, Invoiced, A1, -90
What I need to be able to do is somehow select based on a day, the total value of open orders. Â I have tried to do this in the database but it becomes fixed and quite cumbersome (this is a simplified example in reality i have line information and line component information).I am not hugely skilled with MDX and SSAS but know there are some semi-additive functions i want somebody to be able to pick a day and have the total value of only open orders.
I created Data Collection in wrong DB, how can I change the DB or return to default(as it came with clean version of SS) ?
View 3 Replies View RelatedHello and thanks in advance.
I was wondering if anyone has ever written a chart with multiple datasets.
I need to be able to show sales dollars inflow by order date on one line and on the other needs to be sales dollars delivered by delivery date. So the all sections Values, Category groups, and Series Groups in the chart will be from 2 different datasets.
I have tried but it will not allow aggreates in the series groups.
Any Ideas would be greatly appreciated.
Thanks, Leo
I need to copy data from TableA to TableB (>5 millions rows). The two are in the same database.
What is the best way of doing this?
I was thinking about using a simple INSERT INTO ... SELECT statement. Is there a faster way to do it with SSIS?
Thanks
Hi,
our company is looking for a good training for SQL server 2005. Majority of attendies will be .NET developers, but some will be technicians who need backup, replication, maint., etc. training. All are pretty familiar with sql server and have experience with SQL 2000. So, it should not be for beginners. Intermediate and advance topics.
Whom you can suggest? Do you have experience with them?
Thank you.
Victor
Can I make a calculated field by using two fields from different data sets?(I'm talking about SSRS data sets)
I tried to do that. But I got a error message.
"Report item expressions can only refer to other report items within the same grouping scope or a containing grouping scope."
Please can some one help me out?
We are setting up a test lab environment with 100 machines. Â We want one master testing db that gets replicated to each to run scripted application tests nightly. Â
My goal is to minimize the amount of work to move this thing to each of the 100 test machines. Â I am wondering if we need to even have the sql local and invest in a monster db server with 100 copies of the db we restore and each test machine point to their own db on that server, or if I should use db mirroring or something to get the master test db to each of those machines instead.
Hello. I am called Narsiste.
I have a problem of configuration in SQL server 2005 express train. In fact, I do not know how to make the configuration to tackle the databases SQL server 2005 express train which are on a station has starting from a station B (both being in a network LAN).
But I read in the module of €œconfiguration of the surface of exposure for the services and connections - localhost€? that €œBy defect, the editions Express train, Evaluation and Developer SQL SERVER 2005 authorize only local connections.€?. As it is BY DEFECT, I said myself that it will have to be changed a parameter so that access TCP/IP can go on this version of SQL SERVER.
Here is the message which I have:
€œAn error occurred during the establishment of a connection to the waiter. At the time of connection to SQL Server 2005, this failure can be due to the fact that the default settings of SQL Server do not authorize remote connections. (Provider: Interfaces network SQL, error: 26 - Error during the localization of the waiter/the authority specified) (Microsoft SQL Server, Error: -1)€?.
If somebody encountered this problem in the past, that he wants to inform well me of the solution which he found for this last. Thank you.
Hi there everyone. I have a stored procedure called “PagingTable� that I use for performing searches and specifying how many results to show per ‘page’ and which page I want to see. This allows me to do my paging on the server-side (the database tier) and only the results that actually get shown on the webpage fly across from my database server to my web server. The code might look something like this:
strSQL = "EXECUTE PagingTable " & _
"@ItemsPerPage = 10, " & _
"@CurrentPage = " & CStr(intCurrentPage) & ", " & _
"@TableName = 'Products', " & _
"@UniqueColumn = 'ItemNumber', " & _
"@Columns = 'ItemNumber, Description, ListPrice, QtyOnHand', " & _
"@WhereClause = '" & strSQLWhere & "'"
The problem is the stored procedure actually returns two result sets. The first result set contains information regarding the total number of results founds, the number of pages and the current page. The second result set contains the data to be shown (the columns specified). In ‘classic’ ASP I did this like this.
'Open the recordset
rsItems.Open strSQL, conn, 0, 1
'Get the values required for drawing the paging table
intCurrentPage = rsItems.Fields("CurrentPage").Value
intTotalPages = rsItems.Fields("TotalPages").Value
intTotalRows = rsItems.Fields("TotalRows").Value
'Advance to the next recordset
Set rsItems = rsItems.NextRecordset
I am trying to do this now in ASP.NET 2.0 using the datasource control and the repeater control. Any idea how I can accomplish two things:
A) Bind the repeater control to the second resultset
B) Build a “pager� of some sort using the values from the first resultset
A DB2 store procedure returns two data sets, when executed from SSMS, using linked server. Do we have any simple way to save the two data sets in two different tables ?
View 1 Replies View RelatedHi all,
I am using SSAS 2005. The mining model works fine. But it crashes when I run the 'Mining Model Predictions' against large data sets.
I ran it against 5,000,000 records and it went fine.
But exactly same model failed for 5,100,000 records and beyound.
The message is 'Query Execution Failed' and then Visual Studio crashes.
Pl. let me know if anybody has the same experience or knows the solution.
Thanks,
Vikas
Now that we have a good programming model in SSIS - the question is whether to write automated unit tests for your packages, and would it generally be a good idea for packages?
Also - if yes to write tests - then where to find more informations regarding How to accomplish that?