Is there a way to explicitly assign 'weights' or 'importance' factors to attributes and have that to be considered by the association rules and decision trees algorithms during training? I would like to do so without preprocessing the data (In any case, I can't think on a way to assign weight with preprocessing to boolean attributes like 'smoker')
I had to change the key columns of a dimension attribute to fix an error. I did this in BIDS. The change was from a single key column to a composite key column. Now I am getting these error when I process the cube:
Measure group attribute key column x does not match source attribute ..
I looked at the cube XMLA definition under mesaure groups and it still shows a single key column with inherited binding. However, the BIDS does not give me an option correct this in any way. I have had to do this once before and the only option seems to be removing the dimension from the cube and add it back in. But that is very error prone since I lose any specific settings at the cube dimension level not to mention aggregations no longer include the dimension, etc.
Not seeing an alternative, I went through each measure group (I have 7) and changed the key columns manually in the XMLA and saved the cube. This worked, but I don't understand why BIDS automatically doesn't do it.
Is this a flaw in the BIDS or I should be missing something.
Hi, I am using Sql 7.0 with sp2. I just started as a sql dba. I have a question here, What is the importance of SID's ? When we are mapping to sql logins and user_id 's how we have to give importance regarding SID's. Pls suggest me a good article or some suggestions...
Hi all-- there is a file in C:Program FilesMicrosoft SQL ServerMSSQLBinn direcetory called sqlctr.h which contains a lot of counter parametres..could any one tell me having its importance and can we change any of parametres to gain performance.. Thanks in advance..
// This file is generated by the description file processor. // Please do not edit.
I am thinking of an easy way to explain importance to Marketers without going into the math. This is what i came up with so far. Does this sound correct to you guys?
Reasoning:
IMPORTANCE = Log(Improvement)
Improvement=P(X&Y)/(P(x)*P(y))
Improvement= (Probability 2 products are sold together)/(random chance 2 products are sold together)
If the (Probability 2 products are sold together) = (random chance 2 products are sold together) then Improvement=1. The log(1) = 0
IMPORTANCE SCORE -2 to -1 10 to 100 times less likely than random chance -1 to 0 0 to 10 times less likely than random chance 0 to 1 0 to 10 times more likely than random chance 1 to 2 10 to 100 times more likely than random chance 2 to 3 100 to 1000 times more likely than random chance 3 to 4 1000 to 10000 times more likely than random chance 4 to 5 10000 to 100000 times more likely than random chance 5 to 6 100000 to 1000000 times more likely than random chance 6 to 7 1000000 to 10000000 times more likely than random chance
I understand Mr. MacLennan's explanation provided at http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=282651&SiteID=1 and appreciate the time he took to explain how importance works. However, like the user with username "sang", I also ran the data in BI 2005 and got the same results listed by the aforementioned user. I did this using the following data:
donut muffin
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
n y
n y
n y
n y
n y
etc.
The rule muffin -> donut has an importance of -0.105302438, which is not the same as Mr. MacLennan's results. I tried switching the roles of a and b in a -> b and using different bases on the logarithms. I don't get the result of -0.105302438 with any of these. I also tried to calculate importance with a small data set I have and can't get the results using Mr. MacLennan's explanation with that data set either. Any thoughts on the descrepancy?
hi, i have a exercise using association datamining my database have 350 records, i use 90 records for datamining and it release some rules which i choose on top of mSOLAP_NODE_SCORE, but when i use select statement to check my result i have 1 records, the same as my result, and 5 records not true; for example: rules A=a,B=b-> C=c select * from <my_table> where A='a' and B='b' and C='c'; ==>1 record return select * from <my_table> where A='a' and B='b' and C<>'c'; ==>5 records return C with 3 values c1,c2,c with the second statement C includes 2 c1 and 3 c2
i don't understand how they work. i want to choose some best rules can present my database. how can i choose importance and probability to get best rules. with database have 90 records and a database have 350 records which values i should use for minimum_probability, Minimum_Support, Minimum_importance... when i choose rules i should choose on importance or probability.
Can anyone tell me, how the Business Ã?ntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.
Those of you who have installed SQL Server 2005 may have noticed that the installation creates several new Windows groups on the server. Do not underestimate the importance of these groups.
I try to search my data and sort the result by importance.
I'm using a MS Access database and my data (table1) looks like this:
Code:
ID NAME TEXT 1 Apples Good red apples 2 Bananas Fine yellow bananas 3 Yellow apples Great yellow apples
I want to search the data and get a result where the column "NAME" is more important than "TEXT". My SQL looks like this:
Code:
SELECT id,name,text,1 AS searchorder FROM table1 WHERE name LIKE '*yellow*' UNION SELECT id,name,text,2 AS searchorder FROM table1 WHERE text LIKE '*yellow*' ORDER BY searchorder
The output is this:
Code:
ID NAME TEXT SEARCHORDER 3 Yellow apples Great yellow apples 1 2 Bananas Fine yellow bananas 2 3 Yellow apples Great yellow apples 2
So far so good - the order by importance works - but I do not get unique columns because of the searchorder column.
Can I fix my SQL so I get unique columns where the last line of "Yellow apples" does not appear or am I lost in space?
During testing a package repetatively that deletes/inserts into several tables, over the course of several days, my package, which took 45 minutes to load 1700 XML files, began to take over 6 hours. Turns out it was an I/O bottleneck, and the Avg Disk Queue Length was around 200 and I was incurring many PAGEIOLATCH_EX. My devl machine uses a single local disk, no raid, so I had no options there, but I ran the maintenance wizard to recreate indexes/statistics and defraged the hard drive, and regained my original 45 minutes time. I guess I'll have to put a maintenance plan together to do this nightly.
Currently we have tables (in sql 6,5), many of them do not have primary keys. While I was trying to re-index (re-org), many of them got an error: "fillfactor 204 is not a valid percentage; fillfactor must be between 1 and 100." (many tables' fillfactor exceed 100 or more...) How can I fix them so I can upgrade to sql 7 ? Thank you for your help.
I am really confused about this whole fill factor thing. The way I understand it, is if you have a table whose data remains pretty much static, you should use a higher fill factor. Suppose you had a database where you had at most 150 transactions a day that changed the data, should the fill factor be left at the default(0) or increased? How do you determine how much to increase it? Is there a rule of thumb that suggests if you have x number of changes against a table, you should have a fill factor between y and z percent?
Hi all, While creating indexes for a table, I specified a fill factor of 70%. I then inserted a few hundred rows into the table. Is it possible to check to what percent the pages are full after the rows have been inserted?
You have a db with 50,000 records and you want to add 100,000 more. What should the right fill factor be? Is there a way to "calculate" a fill factor if you don't want to use default? Any help is appreciated. Thank you.
I have a web online table that is inserted about 1500 record one day. Each night, a DST is running to pull all data to anther database. How to set fill factor on a one column index to get the best performance? Current fill factor is 80%.
Hi experts, I would like to ask regarding FILL FACTOR. I observed that our system's loading is a bit slow, and some of the modules take 1 to 2 minutes loading. Maintenance activity is regularly executed based on the scheduled sets. Then I tried to checked the tables indexes/keys turns out that the FILL FACTOR is set to ZERO(0). I would like to know if the FILL FACTOR set to zero will be a factor for the system to slow down..????
You have 50,000 records in a database file and you know you want to add another 100,000 records in the next several weeks. What fill factor would you use to maximize performance? A.0(default setting) B.30 C.70 D.100 which one is correct? And how to calculate fill factor?
I have some non-clustered , non-unique indexes on a medium sized table (25,000 rows). The fill factore is showing 248% on these indexes. I have tried setting the fill factor to various values 100% or less. The re build index seems to work, however the est min/avg/max size of the index all show appx 160Kb, whilst the actual size is in excess of 50Mb !!
We run a weekly rebuild of all indexes overnight, without any fill params and following this the fill on these indexes goes back to 248% !
I have also dropped and re-created the index with a fill of 100%, and it has still reverted to 248% following the weekly rebuild.
I have also looked at the server config, and the fill factor there shows a running value of 171% !!, although the current config is set to 0.
The server is stopped every evening, so no way should the config have a value of 171, especially since the max allowed value is 100.
Any advice/assistance would be gratefully received.
I know what fill factor is ... and know that I should set it high when I have static data tables (where the data rarely changes) and low when I expect to have page splits ...
but does anyone know what affect on performance this setting has ? I don't quite get what Books Online says about it.
Can u plz tell me what is fill factor and what its role in defining the indexes. It is by default 0% and can be set upto 100%, but what it makes difference if i change the percentage? Where it is exactly impacts? If u know any links then plz forward me.
I am trying to set up the relationship (Primary Key and Forign Key) in several tables. I would like to find a way also be able to set 'Fill Factor = 90%' in the script. Here is the code that i have so far:
Is there any such thing that you can find the current Fill Factor for each indexing? The only thing that you have an indication is by looking at DBCC SHOWCONTIF > Scan Density [Best Count:Actual Count].......: 100.00% [0:0] and if this value is not reaching 100% means may have an issue with fill factor ?!!?
HiThere are a lot of articles about the fillfactor.I did change the fill factor and that did not work as intended.How do I get back to the default fill factorI am using sp_msforeachtable undocumented database procedureand when the indexes are rebuilt the fill factor that shows upin origfillfactor is the one I am trying to move away fromYour help will be appreciatedVince
Turn away pure key zealots I have a clustered index that starts with an INT IDENTITY(1,1) column and therefore you can only add data to the end of the cluster. What I'm confused about is the relationship between this and the fill factor. In a normal fill factor scenario you'd be worried about inserts causing page splits but if you can only append to this cluster does this mean I should set the factor to 100% even if I'm expecting a large number of inserts? Basically I don't understand what happens when you run out of space on a page on a B-tree if it's based upon an ever increasing number.