Quote from jprad:
Yes, I have, BDB is great for this sort of stuff.
I'm not sure why you think it's "outside" the discussion though. Remember your comment about how a long bar is bad? Well, that's a prime example of why frequency analysis is a good idea, because all long bars are not created equally.
It's just as easy, if not easier, to do any of this type of analysis inside TickZOOM itself. What I do is write out comma separated values for statistics and load them in into a spreadsheet.
For example, you make a 10 line strategy that does nothing but write out the range of every hour bar to a file, plus whether it closed higher or lower than the previous bar, then you load it in a spread sheet to analyze. That's how I do it. Very simple.
Also, in TickZOOM it's easy to do Average, Median, Standard Deviation, and most other statistics. PLUS, when you get an idea, you can easily GRAPH the statistic you want to see and BDB can't do that.
I personally see zero purpose in putting all the ticks over into a database like BDB. Maybe that's a matter of taste but I believe in doing things the easy way.
I could only remotely see copying bar data to a database for that kind of analysis but ticks don't give you bars directly. And you'll be hard pressed to find anything faster than TickZOOM at converting ticks to bars for your analysis.
So now that you have explained, I feel more certain it's outside this discussion because this is about TickZOOM not data mining a BDB.
You don't need TickZOOM to do that, do you? You can get ticks now and load them into BDB. That's why it's outside this discussion.
Whoa! I completely disagree.
You can't have any confidence that a given trading strategy is robust if you give it a data set with known bad and/or missing data. Moreover, it's the responsibility of the trading system that the strategy is executed in to defend against bad data.
You're missing the point. The TickZOOM engine cleans the data in real time both during back testing and real time trading.
It would be very naive to clean the data on disk because then you don't know for certain how TickZOOM will behave with real data in production.
Everyone, even if you dont' have a background in software development you
must get the following concept:
The environment/platform your custom trading rules run for historical testing must exactly duplicate the live environment they trade in as much as possible. Even the slightest difference can have dangerous results since when you run it live it means you haven't tested for some situation whether that be bad ticks, or anything else.
TickZOOM focuses on
absolute reliability that your custom trading rules work the same way from testing to production.
At the instant a tick is received there is no way to know if it's good or bad.
What? Well you are doomed to fail at auto trading live then, aren't you? What do you mean "no way"? Systems scrub data in real time all over the world. TickZOOM does it too--automatically.
the best you can hope for is to statistically determine if the tick is "reasonable" and that test points back to price and volume frequency analysis of the historical dataset, which needs to contain clean data in order to get reliable statistics.
Okay, you're very smart. Answer this:
If you supposedly "clean" all your data in the historical dataset and THEN you realize you found a better way of cleaning or discover it has an error or problem in the cleaning algorithm, how will you undo it/redo it?
The answer is simple. Never clean your historical data set. Keep it dirty and always run your tick filter against it any time you want to do statistical analysis, etc.
Then as the tick filtering algorithm evolves and improves. You can re-run it and make sure it still works.
NOTE: The tick filtering algorithm in TickZOOM is very good, but it's open source so I'm certain it
will improve.
Over the months, I found a bad tick that slipped through and adapted the tick filter to handle it. Thankfully, I didn't "clean" all my historical dataset so it was easily run and tested.
The tick filter logs every tick it filters and why. So you can check the tick filter log to make it what it filters was reasonable.
You can even turn off the tick filter so you get to see the bad ticks if you want to verify. Serious automated traders care about this kind of stuff.
You're very smart but still operating on "theory" rather than practice. Do you have an automated trading system running live with real money right now, like I do?
Maybe so, I doubt it.
So folks, keep all your dirty ticks around to make sure the new better versions of tick filter work and really do a better job.
More problematic is being able to determine if you're missing trades during the trading session. Again, you need some statistical parameters to compare against, which means time frequency analysis of clean historical data.
Certainly, use clean historical data but clean it "on the fly" and never physically modify the data as it was received from exchanges/providers. You need that original copy for testing.
That's because in LIVE trading you will get all those dirty ticks and you need to be confident TickZOOM cleans them appropriately or you adjust the algorithm, etc.
I simply don't understand why you have these arbitrary values. 100K ticks of the SPY is entirely different from a 100K ticks of an instrument that only averages 50 trades/day.
Apparently you don't know what "max bars back" is. That's not related to ticks at all. It relates to "bars" and thus called "max
bars back". 100,000 isn't arbitrary but a default.
Most people never need more than 200 or 300 max bars back. So 100,000 (when they know what it means) will blow their minds.
Wayne