Convert Tick-by-tick Stock Price Time Series Data Grouped As High, Low, Open And Close Prices
Dec 3, 2006
I am working on a stock price analysis project. Stock prices are stored in SQL Server as tick by tick data (that means in a high frequency) and need to be grouped by minutes in order to display them as as high, low, open and close prices in a candlestick or similar chart.
The fields in the table are:
Time (for every stock price tick)
Price (stock price)
Symbol (of the stock)
Assume you have a table called Tick with 2 columns ( tickId bigint IDENTITY(1,1) , price int -- usually money data type, making it int for simplicity )
I am tasked with creating bars that are 10 units long.
Now the catch is I'm not looking for the tickId where price is >= t1(price) + 10 where t1(price) is the price for the first row where tickId = 1. (it could also be where price <= t1(price) - 10)
Code: DECLARE @tickDiff int SET @tickDiff = 10 DECLARE @r1TickId bigint
[Code].....
This seems to work but it is taking multiple minutes to run for about 50k rows of data (which I created off of the 24 million row table I have just looking at data from today). So it takes ~5 minutes to create the first bar which is not acceptible.
If my logic above seems acceptable are there any indexes you could recommend. Database engine tuning advisor didn't find any.
I'd like some ideas on how to help out my VB staff when it comes to the tick ( ' ) mark being used in front end applications. When users enter data into the front end, and they include tick marks (eg: Sam's Auto Store), this causes trouble in how the app uses that data in a WHERE clause and other places. Does anyone have any slick ideas or experience in handling this. We have experimented with search & replace, double tick marks ( '' ), etc...
Been working on an app that stores customer data. All has been well until an address contained the tick ' character, ie the address is John O'Grotes in Scotland. Now, all ticks are currently stripped from any input for obvious reasons, but if I wanted to add it, how do I do it, and do it safely?
I am working on a project to import information from one database table to another. This info includes names, addresses, etc. I am running into a challenge whenever one of the names, such as O'Brien, causes problems in the sql statement. Is there a way for me to tell it to ignore the ' in O'Brien?
I'm trying to do the equivalent of an Excel chart "Number of categories between tick-mark labels" on the X-Axis of a SSRS chart. Can't see anyway to do it. I can get it to display differently by doing Label = IIF(somethingistrue, onevalue, anothervalue), but can't see anyway to simply not show the label at all.
Is it possible to remove the tick marks that I have circled in red?The chart has data Week by week over a year, and the T1, T2 etc are equivalent to Quarter1, Quarter 2 etcI've got a 3 level grouping, with the top level being Year, then Quarter, then Week. The label for the "Week" grouping is set to blank, and that's why you don't see it here.Removing the Tick marks would make the chart legend more readable in my opinion.
I would like to use analysis services to analyze stock prices.
I want to find conditional probabilities: P (YpriceChg >= 10% s.t. Ydate between A and B| X Price Chg >= 20%)?
€¦ Like given a price change of X percent or greater, predict the probability of a price change of Y percent or greater, within a specified time window (like 2 days, 3 months etc.).
I also want to add a support filter, like:
N > 30 cases (i.e., there have been at least 10 instances of a 10% or greater price change, for the chosen time window)
I have a database of prices, monthly, daily, etc. I also have a number of cols that compute statistics such as pChg1M, pChg-1M, vChg1d. Like price chg 1 month forward, price change 1 month backward, volumeChg1d forward. Ideally, I would like to minimize the column flags necessary for the experiment. Can you offer some hints, as far as setting up appropriate columns/flags and choosing a algorithm (maybe decision trees, association rules, or NB)?
I encountered a very strange problem again. Why the time series displayed on the chart are so strange? The Key time column I chose for my time series algorithm is cal_month(e.g 199001...), but why the date displayed on the time series chart is like :05/06/2448? (it should be like 199001..?) What is that data? And where exactly did it come from? What is the exact cause of this?
Hope it is clear for your help.
I am really confused on this and thanks a lot for your kind advices and help and I am looking forward to hearing from you shortly.
I have MS Time Seeries model using a database of over a thousand products each of which has hundreds of cases. It amazingly takes only a few minutes to finish processing the model, but when I click Mining Model Viewer to view the models, it takes many hours to show up. Once the window is open, I can choose model for different products almost instantly. Is this normal?
Just a quick question about connection management. My application willnever need more than 1 or 2 connections about at any given time. Also, I donot expect many users to be connected at any given time. For efficiency, Iwould like to keep connections alive throughout the lifetime of the objectsrequiring them, rather than opening a new connection, executing code andthen closing it again. What is the most efficient way of doing this?Should I perform the open/close or just one open when I create the objectand a close when I dispose of it?
I got an approach like that: 1) Read something from DB - check the value, if true stop if false go on2) Read the second Value (another SQL Statement) - check the value etc. Now I could open the connection at 1) and if I have to go to 2) I leave the connection open and use the same connection at 2). Is it ok to do that? The other scenario would be opening a connection at 1), immediately close it after I read the value and open a new connection at 2). Thanks for the input!
Why SQL Server 2000 Enterprice Edition price is so much higher than Standard Edition. I saw the SQL Server 2000 Editions Comparison and i did't find any good reasons.
Is it possible to group many time series into clusters by using the clustering algorithm of the SQL server 2005. The same question applies to "association rules" technique. Any examples?
I'm wondering what is the best way to store time-series data in an SQL database?? I've done a bit of investigating on the rrdtool (round robin database tool) that is used in a lot of nix based solutions for monitoring network equipment. I have a need to collect performance data from servers and routers and then produces some nice graphs from that data. I'm just not sure who i should store that data without the database growing to some huge size.
I've got a time series of the number of new customer subscriptions, which is the target attribute to predict. The number of subscriptions depends on various marketing activities, such as mailings, which are known within the time series for the past.
If I train an ART (MS Autoregression Tree), it learns the trend pattern as well as the correlations between the marketing activities and the target (cross-correlations), right? What I would like to do is, providing the model with some marketing activity planning for the future and let the model predict the number of new subscriptions based on a) the past trend pattern and b) the future activities.
Unfortunately a time series algorithm does not provide some kind of scoring for input data.
What would be the best approach to solve this problem? How about linear regression? But how to train a regression model with trend patterns?
I have a company that sells fruit and vegetables to the catering industry. I take orders in the evening for the next day, and buy my fruit and vegetables from wholesale market to deliver to my customers on the next day. I have sage simply accounting. I have to enter invoices day before I print them, so I can get a list of items i should buy next day (and also sort them into different routes) My question is, is there a way to update all the prices on the active invoices (i.e. not printed or posted) for the previous day, after I enter new prices for the fruit and vegetables??? I need this as at the moment I hava to go into individual invoices and enter the products and quantities again.
Again I encountered a very strange problem which displayed the predicted attribute values as percentage format? The data type of the attribute is actually double, why is that?
That's really frustrated.
Thanks a lot in advance for your kind advices and I am looking forward to hearing from you shortly.
I am currently investigating aa high avg write time ms issue (145ms) which seems to be only occuring on the tempdb data files.I have followed the recommended setup of TEMPDB in that
1. Data files = number of physical cores 2. Data files and logfiles are on separate partitions away from the other databases. 3. Tempdb is presized and no incremental file increases look like they are happening with frequency.
We have sharepoint 2012 setup on other sql servers and with TEMPDB setup following the same guidelines, with far more Sharepoint activity on a similary specified hardware which is why its confusing.FileIO auditing on the partitions themselves shows that the FileIO is very fast on the partitions that the tempdb data file which leads me to beleive that Sharepoint may be the culprit perhaps due to excess use of tempdb with operations taking a long time to resolve.
I have a very simple time series model which processing works fine without any problem. However when I run the following query
SELECT
[TimeSeries].[PriceChange],
[TimeSeries].[Symbol],
PredictTimeSeries(PriceChange, -3, 2)
From
[TimeSeries]
WHERE
[TimeSeries].[Symbol] = 'x'
I get the following error:
TITLE: Microsoft SQL Server 2005 Analysis Services ------------------------------ Error (Data mining): A time series prediction was requested with a start time further in the past than the internal models of the mining model, TimeSeries, specified in the HISTORIC_MODEL_GAP and HISTORIC_MODEL_COUNT parameters can process.
The following is the excerpt of the minding model script related to the two parameters:
<AlgorithmParameters>
<AlgorithmParameter>
<Name>MISSING_VALUE_SUBSTITUTION</Name>
<Value xsi:type="xsdtring">Previous</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_GAP</Name>
<Value xsi:type="xsd:int">1</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_COUNT</Name>
<Value xsi:type="xsd:int">10</Value>
</AlgorithmParameter>
</AlgorithmParameters>
These HISTORIC_MODEL_GAP (1) and HISTORIC_MODEL_COUNT (10) should accommodate PredictTimeSeries(PriceChange, -3, 2). Could anyone shed some light on this?
Hi All, I have a table call case and case_status have two fields, date and status as below: date status 04/01/2006 open 04/05/2006 closed 04/10/2006 open 04/15/2006 closed Whenever i open and closed the case, one record is insert into the case_status table. Now I would need to calculate the total days of the case in storeprocedure. Anyone can help me please. Aung
Just to verify that this was an issue, I downloaded web developer 2008 and I do not experience this same problem.
BUT when I go to add a dataset in vs2005 for an asp website - all my db files come up in the dialogue box but everyone that click (every db file) I get "This file is in use. Please enter a new name or close the file that's open in another program."
But, like I said, I downloaded 2008 and it does not occur. Plus I KNOW that the db's are not being used. Can someone give me a remedy to this?
I was working with Microsoft Time Series model (MTS) with some data, when in the mining model viewer, decision tree tab, I realized that the key time variable that I define, it was acting like a split variable.
So, I ask you, this is possible?, because, for me, this should not happen€¦.
After, I review the Data Mining Tutorial by Seth Paul, Jamie MacLennan, Zhaohui Tang and Scott Oveson, and I found, in the Forecasting part, that the key time variable (Time Index) it was acting like a split variable too, in for example, M200 pacific:Quantity and R250 Europe:Quantity.
So people, it€™s possible that a key time variable act like a split variable in a MTS model?
I am confused on key time column selection. e.g, I want to predict monthly sales amount, then what column in date dimension should I choose to be the key time column? Is it calendar_date (the key of date dimension) column or calendar_month?
Thanks a lot for your kind advices and help and I am looking forward to hearing from you shortly.
I have a SQL2005 db for tracking the prices of products at multiple retailers. The basic structure is, 'products' table lists individual products, 'retailer_products' table lists current prices of the products at multiple retailers, and 'price_history' table records when the price of a product changes at any retailer. The prices are checked from each retailer daily, but a row is added to the 'price_history' only when the price at the retailer changes.
I have the following query to retrieve the price history of a given product at multiple retailers:
SELECT price_history.datetimeofchange, retailer.name, price_history.price FROM product, retailer, retailer_product, price_history WHERE product.id = 'b486ed47-4de4-417d-b77b-89819bc728cd' AND retailer_product.retailerid = retailer.id AND retailer_product.associatedproductid = product.id AND price_history.retailer_productid = retailer_product.id
This gives the following results:
2008-03-08 Example Retailer 22.3 2008-03-28 Example Retailer 11.8 2008-03-30 Example Retailer 22.1 2008-04-01 Example Retailer 11.43 2008-04-03 Example Retailer 11.4
The question(s) I have are how can I:
1 - Get the price of a product at a given retailer at a given date/time For example, get the price of the product at Retailer 2 on 03/28/2008. Table only contains data for Retailer 1 for this date, the behaviour I want is when there is no data available for the query to find the last data at which there was data from that retailer, and use the price from that point - i.e. so for this example the query should result in 2.3 as the price, given that was the last recorded price change from that retailer (03/08/2008).
2 - Get the average price of a product at a given retailer at a given date/time In this case we would need to perform (1) across all retailers, then average the results
I m using the Time Series Algorithm to forecast sales across regions for various products. Assume the model is built with last 3 years data with the periodicity being monthly.
Is it possible that sometimes I can make predictions based on just 1 yr or 2 yrs data for certain products alone or certain regions alone? Can this be done without having to retrain the already built model?
Also, is it possible that using the model, i can predict week-wise / month-wise / quarterly sales as well?
Hi All, I have a table Test1: ID date Value AAUGVAL 2/27/198760.848 AAUGVAL 3/2/1987 64.288 AAUGVAL 3/3/1987 63.77 AAUGVAL 3/4/1987 62.495 AAUGVAL 3/5/1987 62.65 AAUGVAL 3/6/1987 62.548 AAUGVAL 3/9/1987 62.292 AAUGVAL 3/10/198763.045 AAUGVAL 3/11/198763.021 .... I am trying to see the value % changes day by day and here is is the code I wrote: select starttime=cast(v.date as char(8)), endtime=cast(a.date as char(8)), startval=v.Value, endval=a.Value, change=substring('- +', sign((a.Value-v.Value)+2,1)+ cast(abs(a.Value-v.Value) as varchar) from (select date,Value, ranking =(select count(distinct date) from Test1 T where T.Value<=S.Value) from Test1 S) v left outer join (select date,Value, ranking=(select count(distinct date) from Test1 T where T.Value<=S.Value) from Test1 S) a on( a.ranking=v.ranking+1)
I got the following error message: Server: Msg 174, Level 15, State 1, Line 4 The sign function requires 1 arguments. Server: Msg 170, Level 15, State 1, Line 7 Line 7: Incorrect syntax near 'v'. Server: Msg 170, Level 15, State 1, Line 9 Line 9: Incorrect syntax near 'a'.
Could someone please help with this? Thank you in advance! shiparsons
I am trying to write a stored proc the calculates a moving average overthree periods. In the following example, I need to stratify the data bypersonID and RecordID in the #Temp table, but I am not sure how to doit. Right now I am restricting the data I use to build my time series bypersonID and I get the results I want *by PersonID*. If I can figure outhow stratify by personID so I don't have to use this restriction, I'msure I can extend it to the RecordID.Create Table #Temp(tmpID int identity,DetailID int,RecordID int,AdminDate Datetime,AdminTime datetime,Status tinyint,--decimal(9,2),Location varchar(100),PersonID char(9),PatientName varchar(100),DOB Datetime,Drug varchar(100),Sort varchar(10))--populate with data by personIDinsert into#Temp(DetailID,RecordID,AdminDate,AdminTime,Status ,Location,PersonID,PatientName,DOB,Drug,Sort)Select MD.PatMedOrderDetailID, MD.PatMedOrderID, M.Date as AdminDate,Case M.Time When 'A' then '8:00:00 AM' When 'N' then '12:00:00 AM' When'P' then '4:00:00 AM'When 'H' then '8:00:00 PM' else M.Time End as Admintime,100*M.Status, P.Location,P.PersonID, P.Name as PatientName, P.DOB,D.GenericName + ' (' + D.TradeName + ') ' +D.Strength,Left(P.Location,3)From PatMedOrderDetail MD Inner Join PatMedOrder MO on MD.PatMedOrderID= MO.PatMedOrderIDinner Join PatMedPass M on MD.PatMedOrderDetailID =M.PatMedOrderDetailIDinner join Patient P on M.PersonID = P.PersonIDinner join Drugs D on MO.DrugID = D.DrugIDWhere P.PersonID = '000126230'Order by P.PersonID,MD.patMedorderID, M.Date, M.TimeSelect * from #Temp -- to view entire set--returns relevant rowsSelect Derived.RefusalRate,T.* from #Temp T inner join(select t1.tmpID, avg(t2.Status) as RefusalRatefrom #Temp t1 cross join #Temp t2WHERE t1.tmpID>=3 AND t1.tmpID BETWEEN t2.tmpID AND t2.tmpID+2group by T1.tmpIDhaving avg(t2.Status)< 100) as Derived on T.tmpID = Derived.tmpIDDrop Table #Temp*** Sent via Developersdex http://www.developersdex.com ***
I am new to SQL Server and learning lots very quickly! I am experienced at building databases in Access and using VBA in Access and Excel.
I have a time series of 1440 records that may have some gaps in it. I need to check the time series for gaps and then fill these or reject the time series.
The criteria for accepting and rejecting is a user defined number of time steps from 1 to 10. For example, if the user sets the maximum gap as 5 time steps and a gap has 5 or less then I simply want to lineraly interpolate betwen the two timesteps bounding the gap. If the gap is 6 time steps then I will reject the timeseries.
I have searched the BOL and MSDN for SQL Server and think there must be a solution using the PredictTimeSeries in DMX, but not quite sure if I can do this. I may be better off simply passing through the time series as a recordset and processing as I would have done in Access...(I am reluctant to do this as I have of the order 100 * 5 * 365 time series and growng by 100 each day and fear it will take quite some time...)
Can anyone help me by pointing me in the right direction please?
Unless there is a way of using PredictTimeSeries on its own, I think the solution is:
Identify if a record is the a valid one or part of a gap (ie missing values). Identify the longest gap and reject or process data on this value. Identify if a record preceedes or succeeds a gap. For each gap fill it using a linear interpolation.