Looking For Webserver Clickstream Log Processing Pointers...
Jul 12, 2007
We currently have a standard star schema warehouse that contains clickstream data from our web server farm. We use a home grown ETL process that is a combination of java code and shell scripts to process these logs on a daily basis. The clickstream data represents both our dimensional data as well as measurements. We are currently processing 22GB of compressed data daily and are currently on a 50% growth rate year over year.
My question is does anyone have experience/pointers on using SSIS to process a stream of data that contains both the dimensions and fact data? Our current architecture pulls out dimensional attributes, processes them separately, and then substitutes the dimensional keys back into the fact stream. I have to believe there is a more efficient way to do this via SSIS.
i am writing a graduation thesis about clickstream datawarehouse with SQL Server , now who can give me some samlie about it? I just only want to learn it,thank you! :)
If UpdForm = 0 Then If int(RSFormUpd("Form")) - 1 = 0 Then RSFormUpd("Form") = int(RSFormUpd("Form")) Else RSFormUpd("Form") = int(RSFormUpd("Form")) - 1 End If End If
If UpdForm = 1 Then RSFormUpd("Form") = int(RSFormUpd("Form")) End If
If UpdForm = 2 Then If int(RSFormUpd("Form")) + 1 = 6 Then RSFormUpd("Form") = int(RSFormUpd("Form")) Else RSFormUpd("Form") = int(RSFormUpd("Form")) + 1 End If End If
I'm using microsoft sql server 2005 along with the microsoft visual studio 2005. I have 2 questions:
1) In the database server, there is an "image" datatype. I need to know how to use that because I need to display images on my webform.
2) I read somewhere that pointers can be used to point the file path. So, is it possible for me to store images / audios in a file and use the database to point to the file path? If it is possible, how can it be done?
HiI'm currently having to design a database for a recruitment agencythat's just started up and have one area where I'm a little unsurewhere to go.Basically I've implemented the 'standard' Customer, Contacts tableslinked on CustomerID, and also have CallRecords (for phone calls etcmade to contacts) Linked on ContactID.My difficulty is that they want to be able to store names/details ofpeople looking for work (candidates) BUT these people may also be acontact (i.e. the agency could be dealing with a contact at a companywho is also looking for a new job themselves). They would also like to(naturally) have these candidates details held against 'currentemployer' customer details so there may be situations where a candidateis JUST a candidate (i.e. not currently working and therefore notassociated to a company), OR they may be a candidate AND a contact, andyou may have contacts who are JUST contacts (i.e. not actively lookingfor work at the moment).I'm basically just trying to figure out the options I have for storingthe contact details and candidate details.FYI I need to store the same details for Contacts and Candidates (i.e.name, job title, contact numbers etc) but Candidates require extrainformation to be stored about them (work experience, qualificationsetc).Any help/pointers would REALLY be appreciated!!Thanks in advanceMartin
Hi! Today I decided to finally install MS SQL 2005 Express on my VPS. For a long time I tried to work with MS Access what definitely brings a lot of complications. I already installed SQL05Exp on my local machine and everything works great here. Because I'm quite new to these things I simply installed SQL05Exp on the Server the same way like locally. I just clicked through the installation with the predefined settings and everything installed without troubles. When I uploaded my first Application and tried to run it, I got the following error-message: Failed to generate a user instance of SQL Server due to failure in retrieving the user's local application data path. Please make sure the user has a local user profile on the computer. The connection will be closed. I then read somewhere that I could change the web.config connectionstring to "User Instance=False" what leads to the following: CREATE DATABASE permission denied in database 'master'.An attempt to attach an auto-named database for file C:Inetpubvhostsgsp-peru.comsubdomainsmuestrahttpdocsApp_DataDatabase.mdf failed. A database with the same name exists, or specified file cannot be opened, or it is located on UNC share. I read a lot of documents about how to install, etc. but it couldn't really use it for me, surely for a lake of basics. When I understand the problem correctly, the user running the app on the server and trying to attach the mdf-database to the sql-server has not the rights to do that. However I don't know what to do against that. What I would really love is a simple solution that allows me to connect to the sql-server from whatever domain on the server, just by copying the mdf. It also would be nice to use some kind of password in the connectionstring to assure that just my apps can connect to my databases. I would be very thankful if someone could give me some good basic explanation what to do and maybe also give me a reference to some nice and understandable informations on this. Best regards, Markus
I have a .Net website being served by my provider from Server A. It stores data in SQL Server B. Two seperate machines.
I have access to server A, but limited access to server B. Is there a script I can put on server A that will backup the database on SQL Server B and save it to a drive on server A (or even better my local drive).
Is there a way to convert an image pointer to a page ID that could beused in DBCC pagei.e.select TEXTPTR(document)FROM testdocs where id = 1resturns0xFEFF3601000000000800000003000000select convert(int,TEXTPTR(document)) FROM testdocs where id =1returns50331648dbcc page (9,3,8,1)dumps the first page of the imageI am trying to map 0xFEFF3601000000000800000003000000 - > pagenumber 8thanks
we are just starting to do some testing on sql server EE with dimensional models.....we have had one or two problems we have been able to solve using the new peformance dashboards etc.
However, as is inevitable, we are seeing strange behaviour of a query....in a star join it seems to be doing an eager spool and trying to spool the entire fact table to tempdb....hhmmm....
Rather than ask one question at a time.....we have DBAs who went to classes etc at MSFT and the client is some level of MSFT partner.
Could anyone point me to the best documentation for understanding the optimiser and how to influence it to get it to do the right thing in optimising plans for star joins?
I have just installed my webapplication on my ISP`s webserver with a connection to a SQL Server on the same server. My problem is that my app. cant find the database from the webserver. When I make a connection to the same db from my application running on my machine locally there is no problem. However, when i try to run the application on the webserver I get an error like this: SQL Server not found or access denied. When the remote db connection works from my machine, why doesn`t it work from tha webserver where the db is located. I use the same connection string:
Currently, we have our SQL database behind our firewall. Our App is required to log each page hit to a DB table with some additional data that we would not get from the regular IIS logfiles. So every page hit means a trip through the firewall...
Since this type of activity is limited purely to DB inserts, i'm curious what the thoughts are on using a copy of MSDE on the webserver itself to store this data.
What is the lesser of 2 evils
-- Take a trip through the firewall for each page to log the hits in a secure SQL DB
-- Log the Hit in a less secure / limited version of a SQL DB
I can't find the answer to the following question.
Why is Visual Studio quicker then the ReportServer.
I've got a report with more then 20 parameters and several are muli-select.
When I'm opening the report in Visual Studio it appears with in one or two seconds whereas I open it on a reportserver it takes 8 to 15 seconds to open or to refresh. After that I can enter the report parameters and then the report is running in approximately the same time.
I know that the refresh is by default, so I can't change that, but the users have problems with wating 15 seconds between entering the parameter values.
The development machine is 2,8 GHz P4, the reportserver is a 4 proc 16 GB databaseserver. The development machine is just being used editing the rdl in Visual Studio.The databases are kept on the database server.The network speed is 100mbps.
Is the only alternative to build my ASPX and reportviewer component inside of that APSX.
Hi, I've installed SQL Server 2005 Express Edition, SQL Server Management Studio Express using Windows Authentification and IIS 5. The problem occurs when I try to access the default address of my local computer to test my installation. I've tried with both http://localhost and http://127.0.0.1. The system requires a password despite I don't use password for my Windows Authentification. Does IIS always demands a password regardless my settings for Windows Authentification or?
I have an extremely annoying problem when debugging stored procedures in SQL Server 2014 with SSDT or SSMS. When calling a SP thru EXECUTE in Debug mode, 9 out of 10 SPs are traced with a wrong yellow arrow-pointer to the line currently reached.
The offset is between 6 to 15 lines downward. Tracing itself and update of the "Locals"-view works as expected. All SPs contain comments also before the Create Procedure statement. The SP shown when tracing show exactly the same content as the stored SQL in the SSDT project under work incl. Create procedure and all comments.
The picture here show the first line selected after the debugger has traced into the SP. The first line really executed with "Next" will be SET NOCOUNT ON.
If this does not turns out as my fault and some of you would support that, I would like to post this to SQL Connect.
Currently I am running MSDE on a webserver. The MSDE was installed by a software package for web survey's. I am wondering if SQL express will interfere with this installation? Would I be better to install MySQL instead?
The problem I am trying to solve is simply giving a web developer some database space so that he can connect his web app to store some data. I'm kind of a n00b at this so I thought SQL express would be the best route. If I'm headed in the wrong direction can someone point me the right way?
I got this weird problem and I was wondering whether anyone has an idea of how to resolve this.
I have a working report on the RS server, ran the report and tried to print it straight from the page by clicking on the Print button.
When it printed, it gave me additional pages as though the margins were incorrect. I have additional space on the left of the report, which pushes it out towards the right (very slight but noticable) which ended up printing "blank pages" ( basically only the page header). Moreover it didn't print any of the page footers ( which has the page numbers).
The weird part is that, I tried to export it to PDF format, and when I take a look at the pdf version, it looked fine with no formatting errors. Even printing the pdf works great.
I was wondering whether there was some bug with the RS Server or do I need to do somethin with the config file ? please advise. thanks !
I have a DTS that imports data from an orcle database into SQL Server. Doesn't the processing mostly occur on the SQL Server, not on the oracle database from which the data is being imported? The oracle database is vendor provieded and they are saying our SQL Server DTS package is killing their server. Any insight is appreciated. Thanks
I've got a process that creates records in my database based on XML input that I've gotten. What I am doing it giving this XML to a stored procedure to handle a specific task, then modify the XML and send it to the next stored procedure.
For instance, the XML could hold header records with detail records, I would first send the XML to a stored procedure that creates the header records, then updates the XML so the XML now knows the identity values of the header records I have just created, and then send the XML to the next stored procedure to create the details for those headers.
All works great and fine, but I have a problem with writing the identity values back to the XML. It seems I can only change one item in the XML at a time and thus need to loop this. For many records this really takes a long time.
Here is some sample code of what I'm doing (please excuse any typos, this is a simplified version of the code) :
declare @lvSeq numeric(15) declare @lvRowNo int declare @lvNumRows int
insert into myHeaderTable ( recid, recdesc ) select ref.value('@recid', 'nvarchar(25)') recid, ref.value('@recdesc', 'nvarchar(250)') recdesc from @pXML.nodes('//headers/header') R(ref)
select @lvRowNo=1, @lvNumRows = @pXML.value('count(//headers/header)', 'int') while (@lvRowNo<=@lvNnumRows) begin select @lvSeq = recseq from myHeaderTable where recid = @pXML.value('//headers/header[position()=sql:variable("@lvRowNo")]/@recid)
set @pXML.modify('replace value of (//headers/header[position()=sql:variable("@lvRowNo")]/@recseq with sql:variable("@lvSeq")')
select @lvRowNo=@lvRowNo+1 end
Obviously I am looking for a better way to update the XML with the sequences. The insert takes a second, the loop takes minutes with large XML sets. I guess MSSQL is searching the whole XML to find the item to update.
It would be nice if I didn't have to loop through the XML. One solution I was thinking off is to store the XML in a temporary table with a single record per header item. Then I could do the modify in one go and recreate the XML by simply selecting the contents of the temporary tabel. I have no idea if this is possible.
So something like this:
select ref.value('@recid','nvarchar(25)') recid, ref.value('.','XML') XMLData -- this gives an error into #TMP_XML from @pXML.nodes('//headers/header') R(ref)
insert into myHeaderTable ( recid, recdesc ) select recid, ref.value('@recdesc', 'nvarchar(250)') recdesc from #TMP_XML CROSS APPLY XMLData.nodes('/header') R(ref)
update #TMP_XML set XMLData.modify('replace ....') from myheadertable where #TMP_XML.recid = myheadertable.recid
Hello friends, I needed a suggestion, I am currently working on a reporting website that generates reports and i need to store all the reports in the database.
I usually go by row wise processing as it can be easily controlled but the problem is there will be a lot of reports, that is an estimation of 30,000 rows in a month and i m not sure if sql server can hold more than 2 billion rows per table.
I will just explain whole scenario what I m facing in tricky problem..
We have xml files coming at regular interval by some other source into sql server 2000…daily having records near @10000 to 70000…we have job scheduled to run it regular interval…we doing this by some filter criteria… suppose the flow is like staging table into secondary table and then final into primary table…. We design DTS package accordingly means take the records from staging table put into secondary table and then into primary table…(near @ 8 task involved in it…) Suppose xml file came at 8:30 am and our DTS package will run at 9:00 am…and then 11:00 am and the 1:00 pm like that….what I observing from many days is that after running job at 9:00 am successfully some good data still pending in secondary table not processed into primary table. But when again job ran at 11:00 am it processed that pending good records into primary table…some times when I ran this job manually through DTS design level the good data that pending in secondary table processed!!!
My question is that why this job not processed all the good records in single shot????
Hello everyone. I need help regarding the following:Given the following table:CREATE TABLE T1 (C1 nvarchar(10), C2 money)INSERT INTO T1 VALUES ('A',1)INSERT INTO T1 VALUES ('B',2)INSERT INTO T1 VALUES ('C',3)let's say that i have this table in a local server and i want to uploadit to a remote server and in the remote server upload it to a databasethat contains the same table.the uploading part can be done by another application in the remoteserver, but i want i need is a way to transfer the data at the fastestpossible way.what steps do i need to follow?tia,Rey Guerrero
Which component should be used to process dataset as a whole, and not on per row basis? I have need to process dataset conditionally (condition based on dataset), e.g. if a special row is present in dataset than dataset should be processed in a special way. Should I maybe use one Script Transformation to determine if dataset satisfies condition (and store that result into a variable), and then based on that condition (variable value) perform or not processing (using Conditional Split and Script Transformation)?
I am having trouble trying to construct the following process in SSIS/SQL 2005:
1. Grab a set of unprocessed rows (ProcessDT = null) in an 'Action' table 2. For each of these rows, execute multiple stored procedures base on the action type If actiontype = 1, exec spAct1a @param1, @parm2 exec spAct1b @param1, @parm2, @param3, @param4 If actiontype = 2, exec spAct2a @param1, @parm2, @param3 exec spAct2b @param1, @parm2, @param3 etc.... 3. Update ProcessDT so it's not processed again 4. Repeat until all rows are processed
Note - all sp params are contained in additional columns in the Action table. Basically the Action table is a store for post-event processing of sorts but is order dependent, hence the row by row processing. And some of my servers are 2000 so Service Broker is not an option (yet).
I first attempted to do this totally within the control flow - using an ado recordset/foreach loop control, but I could not figure out how to run conditional process paths based on the ActionTypeID. I then tried to do this within the dataflow using on OLEDB data source, a conditional split, and an oledb command control which almost got me there - the problem being for each row I need to execute multiple sp's and it appears as if the oledb command only gives me one sp.
I'm populating a new table based on information in an existing table. The new table is a list of all "items" and contains a primary key. The old table is a database of receipts where items can appear many times in any order.
I have put together the off-the-shelf components to do this, using a lookup transformation to see if the item is already in the new table. Problem is, because there's so much repetition in the old table I need to process the old table one row at a time. Batch processing is generating errors because the lookup doesn't detect duplicates within the buffer.
I tried setting the "DefaultBufferMaxRows" property of the task to 1, but that doesn't seem to have any effect.
To get the data from the old table, I'm using an OLE DB source. To get the data into the new table, I'm using the OLE DB Command transformation with parameters to execute an INSERT statement.
This is a job I have to do exactly once, so I don't care if I have to run it overnight. I'd rather have a simple, easy to understand but inefficient script so I understand what it's doing completely.
Any help on forcing SSIS to process one row at a time?
I am verifying my reports processing time. I get the information from the Reporting Service DB - [ExecutionLogs] table. I have the following information:
[TimeEnd] €“ time that reports generation ends.
[TimeStart] - time that reports generation starts.
[TimeDataRetrieval] - amount of time spent running the data sources.
[TimeProcessing] - time spent processing the report.
[TimeRendering] - time spent generating the output format.
If this information is correct the following statement should be true:
When using the AS processing task with a connection to "an Analysis Services project in this solution", only some processing options are available for processing dimensions. For instance, it is not possible to select "Process Update". Once I change the connection manager to point to the deployed cube database, I can choose from all the options. Is this by design?
I need help from sql experts for the following problem
DOCTYPE DATE QTY PRD LOT Purchase 1 jan 20+ AA 2007FW Purchase 4 jan 50+ AA 2007SS Purchase 9 jan 10+ AA 2007FW Sale 3 jan 10- AA Sale 4 jan 20- AA Returned Good 4 feb 10 AA
As you can see I don't have the LOT code in sales records, so I must update these records with the following logic:
I have to assign LOT code to sales in FIFO order: from the table above the 3rd of January I find the first sale of 10, take the first LOT (ascending date order), check for qty on hand (20+), consume 10 from LOT (10 remaining), update sale record with 2007FW LOT code.
Then I find next sale of 20, as before I take the first LOT with qty on hand to consume, again it's the first record with only 10 remaining so I set LOT qty on hand to zero but I have 10 more to allocate to a LOT code. So I find next available LOT of 50 and consume 10, with 40 remaining.
It's important to remember that I can sell part of a "lot", have to track the remaining goods per LOT
Is it possible in SQL to do that in batch mode? How? If I have to split sale record consuming 2 or more LOTS, how can I do? Can you show me SQL or good hints?
I have 3 cubes in a single SSAS database and these cubes should be processed using the following schedule
Cube 1 - Every Day Cube 2 - Every Week Cube 3 - Every Month Cube 4 - Every Day
The issue that I face is that these cubes share the dimensions and so I cant do a FULL process of these SHARED Dimensions as it will affec other cubes.
I can expect additions and deletions to my dimension data , but the structure remains the same. It would be great if someone can suggest how to go about processing the dimensions. I am confused with the number of options(Process Incremental, Process Update etc.,) available for processing the dimensions.
I will creating a SSIS package to automate the processing. One more question is say, if Cube 2 fails during a day and Cube 1 has succesfully processed on the same day earlier, how do I revert back to the old state of Cube 2? Does this mean that I need to do a back up of the SSAS database before processing each cube?