I used Microsoft clustering for grouping my data. Even though i already cleaned the data and have no null values i get one cluster with missing values in every attribute. (i set CLUSTER_COUNT=3 and i'm using Scalable k-means algorithm)
Does "missing" mean that the algorithm cannot group that particular tuple in another group so it consider it as missing?
I have a table that keeps track of click statistics for each one of my dealers.. I am creating graphs based on number of clicks that they received in a month, but if they didn't receive any in a certain month then it is left out..I know i have to do some outer join, but having trouble figuring exactly how..here is what i have:
select d.name, right(convert(varchar(25),s.stamp,105),7), isnull(count(1),0) from tblstats s(nolock) join tblDealer d(nolock) on s.dealerid=d.id where d.id=31 group by right(convert(varchar(25),s.stamp,105),7),d.name order by 2 desc,3,1
this dealer had no clicks in april so this is what shows up: joe blow 10-2004 567 joe blow 09-2004 269 joe blow 08-2004 66 joe blow 07-2004 30 joe blow 06-2004 8 joe blow 05-2004 5 joe blow 03-2004 9
I have two columns, where I have the start and stop numbers (and each of them ordered asc). I would like to get a query that will tell me the missing range.
For example, after the first row, the second row is now 2617 and 3775. However, I would like to know the missing values, i.e. 2297 for start and 2616 for stop and so on as we go down the series. Thanks in advance to any help provided!
I've got a field that might have spurious values in it (say, an admin adds a new row but doesn't have an entry for this field). I'm trying to swap in the string no_image_EN.jpg if the value in the db does NOT end in .jpg. That way, any value rreturned is either a valid filename or no_image I'm having trouble with the CASE statement, particularly testing just the last few cahracters of the string: select product_code, CASE can_image_en ?? When (can_image_en LIKE '%.jpg') then can_image_en Else 'no_image_EN.jpg' End as can_image_en, none of these do the trick either (some are bad syntax obviously): ? When (can_image_en LIKE '%.jpg') then can_image_en ? When LIKE '.jpg' then can_image_en ? When '%.jpg' then can_image_en ? When right(can_image_en,4) = '%.jpg' then can_image_en This is the one that has correct syntax, though it seems to return false in ALL cases CASE can_image_en When '%.jpg%' then can_image_en Else 'no_image_EN.jpg'
We are facing the following issue, several machines/users that are executing very often a command similar to :
INSERT INTO TableName (FieldOne,FieldTwo) VALUES ('ValueOne','ValueTwo'); SELECT SCOPE_IDENTITY() AS Table_ID;
Where TableName has a primary key defined as identity(1,1).and that Table_ID is being used as reference in others tables
These queries are executed using different dababase users and among several diffrent apps..The Problem is that we are detecting lost block of "Table_ID's" as the other tables shows the InsertedID as a reference, but the TableName table lacks of this ID record. In other words, the INSERT seems to work, the SCOPE_Identity returns an InsertedID, and the other tables are populated using this number. However, when we query the TableName table the mentioned record does not exist. We are profiling the server and we're sure that there are no DELETE statement on the TableName table. This seems to be happening when the are either deadlocks or blocked processes. Whenever the deadlocks and locks disappear/solved, everything works as expected.why the Scope_Identity returns the Inserted ID if the INSERT action had failed.
KEYIDGROUP 1 1 a 2 1 b 3 2 a 4 2 b 5 3 a 6 3 b 7 4 a 8 5 a
This is my simple table I need a query that will identity the ID's that are missing the group "b" but I don't want ID 1,2,3 to come up because they are part of a and b. I just need to see anything missing only "b" but not if it's part of a and b.
query should reveal answer should be missing the group b KEYID 7 4 8 5
I tried the NULL search but since the records don't exist it cant find a null. I am writing a query to identify the missing ID without B but exclude ID that are part of A and B
I can't figure out how to get my line chart to break when there isn't a value. For example, I have a trend line over 4 time periods. The 3rd time period is missing a value. Instead of the line ending at the 2nd period and picking up again at the 4th time period, it's connecting the line 2nd to the 4th period. I'd like it to break and for there to be no line appearing in the 3rd period. I bet that's as clear as mud, but let me know if you have any questions.
Write the query that produces the below results. I'm not ale to join the two sets in a way so that it displays NULLs if no purchase was made on a given day for a particular product. I need NULLs or s so that it shows up correctly on my SSRS report.
;with testdata as( SELECT 1 AS Id,'1/6/2014' AS Date, 21 As Amount UNION ALL SELECT 1 ,'1/8/2014', 25 UNION ALL SELECT 1 ,'1/9/2014', 30 UNION ALL SELECT 1 ,'1/10/2014', 60 UNION ALL SELECT 1 ,'1/5/2015', 3800 UNION ALL SELECT 1 ,'1/6/2015', 7120 UNION ALL
I have several databases set up for transactional replication to another instance of SQL Server 2005 for fail over purposes. Today, I restored one of those replicated databases to my development machine and discovered two surprising problems:
1) The Default Values settings in the replicated tables are missing. They are there in the publishing tables, just as they were before I set up replication. However, they are not in the subscribing tables. Now, this is not such a big issue, since I tend to send all default values in insert queries as necessary.
2) The second problem is a more of an issue, since I use auto-numbered Identity columns in my tables (yes, I know that's just plain lazy...). Anyway, in the replicated tables, €śIs Identity€? is indeed set to yes, but despite that fact that there are thousands of records with incrementally unique IDs, SQL server is trying to insert a record starting with 1. This, of course, throws a PK constraint error.
Obviously, if I am use them for failover purposes, these replicated databases need to be identical in every way.
So, what did I do to cause this situation, and how to I fix it?
Just really wonder what is the good idea to deal with missing values? Should we leave the missing values there in the traning data set ? Or replace it with other values?
What I am really concerned is that if we simply replace those missing values with other values, then how will it really affect the correctness of the training models?
I am looking forward to hearing from you for the above issue and it will be really great if we have any kind of best practices of dealing with this issue.
If a button is never clicked in a given month, it never gets logged to the table. In this example, the 5th button was not clicked during the month of December, so it does not appear in the results. I want to modify my query so it displays the name of the button and a zero (in this case "5th Button 0") in the results for any buttons that were not clicked. For some reason I am drawing a blank on how to do this. Thanks in advance.
I'm trying to swap out old partitions and getting "An error occurred while processing 'AltFile' metadata for database id 12 file id 605" 605 is missing from sys.sysfiles. I've tried adding new file groups since it seemed to be assigning them in that range to allow the command to find a match. Once created and I issue the alter command the file id of the target file changes to something else in the missing range.
The file id values seem to be managed solely by sqlserver so I'm not sure what to try. There are hundreds of files with millions of rows and the method has been used problem free for years. I do occasionally get "unable to remove file because it's not empty" once in a while which may be related. I wind up having to shrink those and leave them in existence.
The target file group has an existing file id value when you join sys.sysfiles using the filegroup name.
I partition data in 2 tables on one filegroup per day. I swap out parition 1 each day which makes the new earliest day's partition the new partition 1. Different databases have different day ranges depending on requirements.
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can someone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can soemone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
Is anyone running SQL Server 2000 clusters on an IBM SAN (FAStT or ESS800)? Could you shed some light and provide links if any to IBM and/or Microsoft sites on this issues
I am supporting 2 node cluster sql 2005 servers (Windows 2003 computers). Active and passive cluster.
Right now sql services are running under local system account , as per standard should be run on sql service account. So please let me know how to change, with out impacting the application
Can i directly change sql services to service account on passive node ?
As I understand it, if I have a 4 CPU box and I buy 4 processor licenses for SQL Server 2005 Standard, I can run 16 instances of SQL Server on that box.
Now given a cluster set up for active/passive I understand that if I license the same way I can have 16 instances on one of the nodes of the cluster. In the event of a failure, the instances can fail over to the "passive" node and the licenses move with them.
So heres the question. Given my two node cluster, each node has 4 CPUs, if I have some of the resources on node 1, and others on node 2, so instances are running on node 1 and node 2, am I on the hook for 8 processor licenses?
Does anyone know of any issues with using Active/Passive or Active/Active clusters and log shipping. Will log shipping still work if a failover occurs?
I have a MS Clustering model based on knowledge-area and I want to rename each cluster to the name of the knowledge area. What would be the DMX to rename the cluster?
I have been trying to do something like -
UPDATE [Knowledge Base].Content SET Node_Caption = [Knowledge Base].[Field]
(Field is the column in the mining structure [Knowledge Base] with the values to be assigned to the Node_Caption)
But this does not work. The DMX query editor on parsing the query says that it reached the end of input.
I am about to move 8 SQL 2000 clusters instances residing on 2 seperate MCS clusters (4 instances each with 2 nodes each active/passive) to one single MCS cluster (2 nodes active/passive). The SQL Cluster VMs will have the same DNS names and IPs, however the MCS VM and nodes will have different names.
The planned method for moving the DBs is just to stop all SQL services copy the system and user MDF and LDF files and then restart SQL. From everything I have read this should be fine. However here are my concerns on a cluster platform:
Will the change in node names on the target cluster be a problem when moving the master DBs over from source SQL VMs? Are the cluster nodes listed somewhere in the Master DB?
A couple of the SQL instance VMs are invovled in transactional replication. If all of the SQL files are copied over and the target VM has the same DNS and IP name, will there be a problem with the transactional replication when SQL is restarted?
I am working on a project and we are starting to spec out our back end database systems. I wish to have multiple SQL Servers in a cluster PLUS failover running fiber against SANS storage arrays to partition the databases.
Has anyone seen any articles on this or know of any sources for support etc. ?
Our volume is sporatic but during enrollment phases (2 week long periods) we can have as many as 100,000 visitors registering (small datasets, large volume of logic) Thanks,
To all the SQL H/A experts, we were wondering if we could have 3 physical nodes and 2 active/passive clusters architecture setup on a SAN as seen in the image below? http://www.geocities.com/juanlieu/CP_Arch.JPGIn case you cannot see the diagram, it would looks something like this: active/passive Cluster A ---> physical server A (Win2003/SQL2005) ---> HP EVA SAN ---> physical server B (Win2003/SQL2005) ---> HP EVA SANactive/passive Cluster B ---> physical server B (Win2003/SQL2005) ---> HP EVA SAN ---> physical server C (Win2003/SQL2005) ---> HP EVA SAN In this setup, I understand that Server B cannot be called upon as the active server at the SAME time by both clusters. question: what would happens if it does, would Server B reject the last cluster that calls it?Appreciated in advance.
In a response posted Nov 21 (Clustering Dimension), Jamie wrote...
"The only option of using a table-based model as a dimension is to write out the cluster labels and simply make the cluster label as a dimension attribute. You could even append the cluster label to the source data (e.g. the customer table) and not have a seperate dimension, simply a browseable attribute on the dimension of interest"
Jamie, can you provide more information on how to do this? We'd like to have a series of clusters in an existing household dimension. That is, we need multiple occurences of cluster model results over time browsable in the source cube. I've looked at the data source, dimension, and cube created by the data mining model, but I don't see where the case ID (Household Key) and the cluster name could be extracted to update the existing dimension. We're using the cube for the data mining source.
This would also help to fix a recurring problem we have with keeping the linked cube and the source cube metadata in sync. If I make a change to the source cube, say by adding a new measure, the metadata for the linked cube gets out of sync. I've been deleting the data mining dimension, cube, and dsv and them adding them back in using the data mining menu in the model.
Hi, I'm inserting a datetime values into sql server 2000 from c#
SQL server table details Table nameate_test columnname datatype No int date_t DateTime
C# coding SqlConnection connectionToDatabase = new SqlConnection("Data Source=.\SQLEXPRESS;Initial Catalog=testdb;Integrated Security=SSPI"); connectionToDatabase.Open(); DataTable dt1 = new DataTable(); dt1.Columns.Add("no",typeof(System.Int16)); dt1.Columns.Add("date_t", typeof(System.DateTime)); DataRow dr = dt1.NewRow(); dr["no"] = 1; dr["date_t"] = DateTime.Now; dt1.Rows.Add(dr); for(int i=0;i<dt1.Rows.Count;i++) { string str=dt1.Rows["no"].ToString(); DateTime dt=(DateTime)dt1.Rows["date_t"]; string insertQuery = "insert into date_test values(" + str + ",'" + dt + "')"; SqlCommand cmd = new SqlCommand(insertQuery, connectionToDatabase); cmd.ExecuteNonQuery(); MessageBox.Show("saved"); } When I run the above code, data is inserted into the table The value in the date_t column is 2007-07-09 22:10:11 000.The milliseconds value is always 000 only.I need the millisecond values also in date_t column. Is there any conversion needed for millisecond values?
We have 2 SQL 2012 servers. Our application has 2 databases. We are creating an AlwaysOn cluster. Is it good to create 2 AlwaysOn clusters to have 1 database primary on one of the servers and the other database primary on the other server?
I have been asked if it is possible to have one database running on one server and the other database on the other server. Is this possible without creating 2 separate AlwaysOn clusters?
I have been reading through many postings here, through the MS SQL Server Unleashed book by SAMS, the MS SQL Tech article "Failover clustering for Microsoft SQL Server 2005 and SQL Server 2005 Analysis Services" for installing a brand new SQL 2005 2 node cluster.
So far I have not found the definitive answer that I am looking for and that is, what rights does the SQL service account need to work properly? One article states that it needs both Domain Admin permissions and local admin permissions (and this is a domain account by the way) and then another article states that it only needs domain users group permissions and the least amount of privledges possible.
Can anyone please tell me what is correct for installation and running the server? The more I read about this the more confused I get.
What kind of criterion is used by MS clustering algorithm to determine the number of clusters when 0 is specified in the algorithm parameters?
The problem is that I find automatically defined cluster number somehow strange, especially when expactation maximization algorithm is used. I tried to "manually" calculate optimal cluster numbers in my models using bayesian information criterion and the one by Akaike and received more understandable results.