Data Warehousing :: Generating STATISTICS (Automation)
Jul 21, 2015
Is it possible to write a SP (Automate) to generate STATISTICS on any database and then use the output to create the stats on that database.
I ran the tuning adviser and it suggested indexes with lot of STATISTICS on the dev environment. This dev environment is replicated in several other environment with data size in these environment varying. I would like to know if I can create a SP which generates STATISTICS information pertaining to specific database environment for the query in question for tuning.
Sorry about the huge post, but I think this is the amount ofinformation necessary for someone to help me with a good answer.I'm writing a statistical analysis program in ASP.net and MSSQL7 thatanalyzes data that I've collected from my business's webpage and thehits it's collecting from the various pay-per-click (PPC) engines.I've arrived at problems writing a SQL call to generate certainstatistics.Whenever someone enters our site from one of the PPC search engines, Iwrite out a row to the Hits table. In that table are the followingcolumns:HitID - the Unique ID assigned to each hit that comes into the siteKeyword - the keyword the user searched on when he or she came to thesiteSearchEngine - the PPC engine the user came fromSource - this is pretty much always 'PPC'...if we were to do otherthings, like a newsletter, then this would be different.TimeArrived - the date and time the user arrived at the website. Ihave no idea why I didn't call it "datearrived," since I use "date"and not "time" pretty much everywhere else...(I don't think the rest are important, but they might be, so I'llinclude them for completeness's sake)Referring URL - the URL the user came fromReferring Website - the string between the 'http://' and the first '/'in the URL. I know it's redundant information, but when I designedthis part, I didn't know how to parse it out afterwards, so I justfigured I'd duplicate it.Page Visited - the page the user first arrived atWhen a person comes to the site, I also write out a session cookiecontaining the user's hitID. If the person fills out an enrollmentform (a process which we refer to as "responding"), I attach thatsession ID to the form. The response form (and thus the responsestable) is long; these are the important fields:id - a unique ID for each responsedate - the date and time of the responsestatus - a varchar field containing a status code. I would have madeit a number, but I wanted it to be viewable from looking at the rawdatabase.hitid - the HitID of the user, taken from the session cookie. If thereis no session cookie (for whatever reason), the HidID is written outas 0. While it wouldn't occur often, I can't guarantee that there willnever be more than one response record attached to a singular hitid.Later, some of the responses turn into "confirmations", which meansthat they've actually ordered from us, not just filled out the form.This usually happens about three or four days after the initialresponse. When this happens, the status of the response is changed toa phrase containing the word "confirm" in it (there are a few of them,but they all contain that word).So now that we've collected all this marketing intel., I need toanalyze it.I've written a parser that takes reports from various pay-per-clickcompanies and puts them into a table called PPC. Information in thiscolumn is written out as one record per search engine per keyword perday. The schema is as follows:id - a unique ID for the record in the tabledate - the date to which the information in the record appliessearchengine - the PPC engine to which the information applieskeyword - the keyword to which the information appliesclicks - the number of clicks on the applicable keyword on theapplicable search engine on the applicable day.impressions - same as clicks, but for impressionscpc - the cost per click on the applicable keyword ...avgpos - (I don't always have a value for this field) The averageposition that the keyword was shown in for the applicable keyword ...With this data in, the last step is actually analyzing the threetables for useful statistics on the various keywords, search engines,and time frames. That's the step I've been trying to complete.So what I need is a SQL call that I can run that generates a tablewith the following information:SearchEngineKeywordCost / Click - When calculating the CPC, I can't just take an averageof all the records. I need to calculate the total amount spent per day(clicks * cpc), add that up for every day, and then divide that by thenumber of total clicks. Just doing an average doesn't take intoaccount the fact that some days we'll get more clicks than others.Total Spent - # Clicks * CPC#Responses - counting the number of records in the responses table#Confirms - counting the number of records in the responses table with"confirm" in their statusTotal Spent / #ResponsesTotal Spent / #ConfirmsOh yeah, and I want to be able to order by any four of the fields inany order, narrow my selection to only those keywords that either areor contain a user-specified string, further narrow my selection toonly those records that fit other user-specified criteria for any ofthe columns in the table I'm generating, and select only the top xrecords (where x is a user-specified number). I already haveuser-controls that output the SQL for all of these things, but I needto have places in which I may put that SQL in my call.After many trials and tribulations, I've come up with the followingSQL call. Right now, its output for nearly every row is incorrect, Ithink in a large part due to the fact that the method that I'm usingto generate the number of clicks is yielding incorrect values.If you'd like to help me and you think that modifying the followingcall is easier than writing a whole new one, be my guest; if you'dprefer to write a new one, I'm game for that, too. I'm just concernedwith its working right now, and any help you can give me is greatlyappreciated.Anyway, here's the call:/*sp_dboption @dbname='NDP', @optname='Select Into', @optvalue=true;*//*Running the above might be necessary to get the "Select Into"s towork*/Drop table ResponsesPPCDrop table ConfirmPPCDrop table TempPPCSELECT Responses.[ID] as [ID], Responses.Status, PPC.SearchEngine,PPC.KeywordInto ResponsesPPCFROM Responses, PPCWHERE Responses.HitID IN(SELECT Hits.HitIDFROM HitsWHERE Hits.SearchEngine = PPC.SearchEngineAND Hits.Keyword = PPC.Keyword)SELECT ID, Status, SearchEngine, KeywordInto ConfirmPPCFROM ResponsesPPCWHERE Status LIKE "%confirm%"Order by SearchEngine, KeywordSELECT PPC.SearchEngine, PPC.Keyword,SUM(PPC.Clicks), /*I noticed that thiscolumn gives me incorrect values(I don't need it in my final report, but it's useful for debugging).For some keywords, it gives me huge numbers(e.g. 265 clicks on one word that got ~10 clicks /day over five days),and for others, it doesn't give me enough. I think this is a majorpartof what's throwing off the rest of the statistics*/Case SUM(PPC.Clicks) WHEN 0 THEN 0 ELSESUM(PPC.clicks * PPC.cpc) / SUM(PPC.Clicks) END as CPC,SUM(PPC.clicks * PPC.cpc) AS TotalCost,count(ResponsesPPC.ID) As NumResponses,Count(ConfirmPPC.ID) As Confirms,(Case Count(ResponsesPPC.ID) WHEN 0 THEN 0 ELSESUM(PPC.clicks * PPC.cpc) / count(ResponsesPPC.ID) END) ASCostPerResponse,(Case Count(ConfirmPPC.ID) WHEN 0 THEN 0 ELSESUM(PPC.clicks * PPC.cpc) / count(ConfirmPPC.ID) END) AsCostPerConfirmFROM (PPC LEFT JOIN ResponsesPPC ON PPC.SearchEngine =ResponsesPPC.SearchEngineAND PPC.Keyword = ResponsesPPC.Keyword)LEFT JOIN ConfirmPPC ON PPC.SearchEngine = ConfirmPPC.SearchEngineAND PPC.Keyword = ConfirmPPC.KeywordGROUP BY PPC.SearchEngine, PPC.KeywordOrder by PPC.keyword desc/*Drop table ResponsesPPCDrop table ConfirmPPCDrop table TempPPC*//*I don't drop them right now so I can look at them,but normally, one would drop those tables.*/Thanks a lot for your help,-Starwiz
I'm creating my first Report Model and I've managed to get through it, but if I select the "Update model statistics before generating" in the "Report Model Wizard", I get this error:
"Specified method is not supported"
(It would be a little less frustrating if it actually HAD specified the method <s>)
Hi, I need to implement/set up the Data warehouse/Data mart in one of the department in my company by using SQL server 2005. Do any body knows the steps what I need to follow?
It will be more appreciate that, if any body gives some of the links which will help me to do the implementation/development of the same.
I do have the basic idea however I may face some of the difficulties when I start such as, does the SQL server reporting service allow the end user to customize the report based on their needs etc.?, so any of them having experience in this field please reply me.
I am working on to create a data warehouse. I have made a database which will be the data warehouse and will consist of dimension and fact tables. I know that other than dimension and fact table a data warehouse should also consist of a meta data, now my question is what should be the structure of metadata and all the information it should have?
We are starting with designing a datawarehouse for my company. I have done some reading on the concepts and steps involved, but what I am seriously lacking is some examples. I'd like to read through some real examples of data warehouses that worked including the full design diagrams. Can anyone direct me to some good sites for this?
How do you run a stored procedure on PDW via SSIS? I've tried Execute SQL Task and Execute T-SQL Task but in both cases the task will run and complete almost immediately. Task shows success, no errors, but nothing happens in PDW. PDW admin console does not even register the query. Procedures run fine manually from SQL Server Object Explorer connection.
I have a large fact table spread across tens of partitions (appx. 1TB each). I found that the business does not need much of the columns in the table. So, as an optimization action, I decided to get rid of these un-needed columns.What is the efficient way to achieve this? Can I simply drop these columns from the table, or use a new table with the reduced structure?
I have a Fact Table with a ID column as Primary key and clustered index is created. And also I have 4 dimensions FK's of data type INTEGER. And finally, I have one aggregation measure in the Fact Table.
Now, my situation is How can I improve the speed of querying the fact table by creating any of the below indexes?
I have a table that is increasing quite largely each day. By now, I have average 300 million of records over 2.5 years. Before we received our new interface, the data we received was aggregated and thus not that big.The problem is that the table is so huge that I cannot use the Slowly Changing Component. I was thinking about making a temp table where I load the incremental data before I load it into the final data mart table.Based on this temporary table I use a script to compare the temp data with the already existing data in the data mart. However, this requires a compare of each records (300 mil records).
Question: Is it feasible to use a star schema dimensional model for an OLTP system that incurs few (750 per day)Sales Orders transactions?
Background: My customer wants to replace an existing OLTP system database because it runs on Oracle and their in-house expertise is in SQL Server. The original database developers that designed the Oracle DB have apparently retired. The Oracle database has been over-normalized, to say the least. The number of sales orders being entered daily is small: about 500-750 per day. These entries are done at the five clerks' convenience, from a paper form, and are very unlikely to ever be entered in quick succession. Nothing else gets regularly entered into this database except for the occasional change to a customer, but new customers are very few and far between.
I've designed a star schema for the replacement database with the Sales Order Header and Sales Order detail table combined into a single 'fact' table, and I've introduced some duplication into dimension tables (like customer) in order to eliminate some of the joins (and confusion) that were built into the original database.
I've never tried this before. Is there any reason this would not or should not work?
ETL Packages are getting failed sometimes(Package Execution Error). Eventhough executing ETL Package again from start, getting the same Error. But after Restarting Sql Service in BI Server, it is working fine. Whether it is the issue from Developer Code side or from server side.
I'm having issues with bulk update in SQL Server.I'm using SAP BODS as ETL tool and have some 20000 updates.target table has approx 0.5 million records and it has clustered index on id column.I have selected upsert option in BODS.Same setup is also done for Sybase IQ , IQ has bulk update option which is giving very ood performance.
In IQ same update load is finishing in some 9 minutes where SQL is taking more than 2 hours for same, this doesn't seem right.When I look at update is causing whole package to go slow.Sybase is creating query where is ID is present then do update else insert.Is there any way to make bulk update work faster in SQL environment?
care session quarter Q1-15 Q2-15 Q3-14 Q3-15 Q4-14
I am using this [care session quarter] column in the group by clause to achieve but no success.IF I use date column in the select clause and Group by clause then it comes correctly but groups by all dates which is not required.
Ideally I want show only quarter aggregates. The [Date Dimension] table has the column [care session quarter] which stores all the quarters of years along with dates for each day. i..e I have all columns in [Date Dimension] table as shown below
I am putting together an invoice for my company. I have a text box describing each section of the invoice, followed by a table to list out the charges. I am using multiple tables based on what type of charge the client is receiving.
I would like to hide each section if there are no items purchased of that type. I can do this with the table using the expression "=CountRows() < 1", but I do not know how to refer to that table (call it Tablix1 for the sake of discussion) for the text box. I've tried using a ReportItems function as my basis, without success.
My package is connecting to an external data provider using an OLEDB driver . The package runs fine in debug mode.When i tried to run the same from SQL server agent it failed to aquire the connection. The OLEDB provider does not contain too much of information , ( connection string, initial catalog, blank user name and password).The same package executes successfully if i run using dtexec in BAT file.But if i use the dtexec in sql server job step as operating system command and try to run, the job will fail reporting " can not aquire the connection".
I am trying to create a sample table in the Azure SQL Data warehouse but its giving me a syntax error Incorrect syntax near the keyword 'CLUSTERED'.
CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL