TickZOOM Decision. Open Source and FREE!

jprad · Dec 23, 2008

Quote from TSGannGalt:

It'll be tough...

Learning all the Broker and Datafeed APIs...
The list will keep on:

Growing, growing and growing...

and... once some specification changes:

You get bugged by people to update them... bugging and bugging and bugging you...

Just my 2 cents

Worth every penny, completely agree.

greaterreturn · Dec 23, 2008

Quote from jprad:

You asked for examples of queries that weren't based on symbol and date. A key reversal, a large block trade or doing frequency analysis of price and/or volume all fall under that type of query.

Okay. Then you're right, that falls entirely outside of anything TickZOOM would handle. Still the way tick data works, it might be faster to actually process through all the ticks to find that using TickZOOM than to try to load ticks into a database. Again, is this something you have done? I don't think it's realistic. But that's outside or the TickZOOM discussion it seems.

Well, the exact protocol would be provider dependent, but yes, in general terms it would be one of the following; a tick that replaces an existing one, is a new one, or an instruction to delete an errant tick.

With regard to data corrections, that's easy to implement. To handle them:

1. You do not want to bo back and retroactively change the previous day bars in files stored.

2. It's critical to store the data exactly as received from the broker including belated data corrections to exactly simulate what will happen during real live trading.

3. Therefore, the engine can easily on the fly both in historical and real time receive a correction (we need the format for that stuff) and then unwind all the bars and all the processing from that point and do it over with the correct tick.

Did you know, TickZOOM keeps by default 100,000 max bars back to make sure it will be virtually impossible to run out.

That means a full day of minute bars or even second bars could be unwound and rerun all the indicators and strategies in historical mode then resync the trade signal at the end and go back to real time.

that's ALL the more reason for a super fast engine. With TickZOOM that process would take mere hundreds of a millisecond. (for one day)

So how far back can corrections occur? I vaguely remembering they can go up to 1 business day. If up to one trading day, that's no big deal to implement.

I'm under the impression that your tick stream is compressed, only deltas are saved to reduce storage requirements.

If that's correct then I assume that the ability to go back and modify the tick file is going to be a bit more complex to accomplish.

It's not yet compressed or "deltified" but I plan to simply because I get tired of waiting on the 100 megabyte file to down load from server to my local PC each week.

But read what I said above, correcting data in the files would be a very wrong way to do it. It's absolutely critical the the files store precisely what the data provider send in the order they send with any dirty ticks, data corrections, etc.

That way, when we run historical tests, we can be as closely as possible simulating the automated live trading environment where something like that can occur while you're in Hawaii or on a golf course. The engine must be able to handle all those things "on the fly" and tested that way during historical.

Okay, but be aware that some data providers have a data quality function, so you might receive data corrections well after the trading session and might end up coming days or weeks after it was originally captured.

WEEKS? Realistically, how would an automated trading system running live 24/7 while your climbing mount everest handle a correct that comes WEEKS later?

It seems that it must simply discard corrections that come that late. What do you think?

See whatever you do, you must consider how to handle it 24/7 black box with hands free trading.

Understood. Here's an example; a walk-forward test that only trades the first hour of a regular trading session with an N-minute opening range breakout system.

Okay.

You gave great answers. Important topics. We need to work out the correct requirements for these situations.

How serious are these corrections? How often does it occur? In other words, how urgent is this in relation to the other stuff I need to do on TickZOOM. We need to prioritize.

Wayne

greaterreturn · Dec 23, 2008

Quote from TSGannGalt:

It'll be tough...

Learning all the Broker and Datafeed APIs...
The list will keep on:

Growing, growing and growing...

and... once some specification changes:

You get bugged by people to update them... bugging and bugging and bugging you...

Just my 2 cents

Hey, someone contacted me and says he will work on making TickZOOM interface to IB himself since he uses IB and wants to get started right away.

We agreed I'll do a short video of the code to show him around and he is off and running.

That's the power of open source. Users can improve it themselves.

I will do 30% the humble job of maintaining and testing the code, 30% of supporting (till someone else can do that) and 30% fixing, enhancing.

A variety of people encourage me in P.M.s to take a similar role to that of Linus with Linux and just maintain the central code base where everyone contributes to keep costs very low.

So maybe nonprofit is the way to go.

Sincerely,
Wayne

greaterreturn · Dec 24, 2008

I feel very SHOCKED. Astounded. No AMAZED.

This relates to the Amibroker Formula Language.

Various people have been telling me in PM's that I need to make it easy for Amibroker fans to switch to TickZOOM.

Some have recommended a translator or parser of the AFL language itself.

Someone today actually said that if C# is anything at all like the C language than it will never work.

He went on about how easy to use AFL is compared to C or other language.

So finally, today, I studied the AFL book carefully to see how similar it was to TickZOOM.

Now, mind you, I was paid programmer for the "dreaded" C language for almost 10 years.

Lo and behold AFL is the C language. I can find barely anything in the syntax different from C.

They even have the "good ole" printf statement. And it uses = for assignment but == for comparison. These are all C language ideas.

They were honest enough on page 336 to say outright, "this was borrowed from the C language".

I don't know. I was expecting something totally different to C and so you see my SHOCK.

Now I fully understand what has happened.

EasyLanguage was the first trading language but real programmers can tell you it was lacking some very important but basic things you need in a language like the switch statement (which is in AFL and C) and many other things. Those lacking parts made it very hard to do otherwise easy actions.

Amibroker just took C and called it AFL. The C parser is freely available so they made perhaps minor changes. And people started using it and love the flexibility without every realizing they were learned the "dreaded" C language.

This is so enjoyable.

Anyway, the beauty of this is that AFL programmers will have ZERO trouble learning and using TickZOOM with C# because it has all that same syntax.

C# just adds some other niceties called object orientation but that's not necessary to learn to write strategies.

In fact it will be far easier for AFL people (who have become real programmers without realizing it) than EasyLanguage uses.

But both groups of users will find the TickZOOM system more powerful over time as more formulas get added.

Sincerely,
Wayne

jprad · Dec 24, 2008

Quote from greaterreturn:

Again, is this something you have done? I don't think it's realistic. But that's outside or the TickZOOM discussion it seems.

Yes, I have, BDB is great for this sort of stuff.

I'm not sure why you think it's "outside" the discussion though. Remember your comment about how a long bar is bad? Well, that's a prime example of why frequency analysis is a good idea, because all long bars are not created equally.

1. You do not want to bo back and retroactively change the previous day bars in files stored.

2. It's critical to store the data exactly as received from the broker including belated data corrections to exactly simulate what will happen during real live trading.

Whoa! I completely disagree.

You can't have any confidence that a given trading strategy is robust if you give it a data set with known bad and/or missing data. Moreover, it's the responsibility of the trading system that the strategy is executed in to defend against bad data.

At the instant a tick is received there is no way to know if it's good or bad. the best you can hope for is to statistically determine if the tick is "reasonable" and that test points back to price and volume frequency analysis of the historical dataset, which needs to contain clean data in order to get reliable statistics.

More problematic is being able to determine if you're missing trades during the trading session. Again, you need some statistical parameters to compare against, which means time frequency analysis of clean historical data.

Did you know, TickZOOM keeps by default 100,000 max bars back to make sure it will be virtually impossible to run out.

I simply don't understand why you have these arbitrary values. 100K ticks of the SPY is entirely different from a 100K ticks of an instrument that only averages 50 trades/day.

See whatever you do, you must consider how to handle it 24/7 black box with hands free trading.

IMHO, for the average retail trader, an unattended ATS is a disaster waiting to happen.

greaterreturn · Dec 24, 2008

Quote from jprad:

Yes, I have, BDB is great for this sort of stuff.

I'm not sure why you think it's "outside" the discussion though. Remember your comment about how a long bar is bad? Well, that's a prime example of why frequency analysis is a good idea, because all long bars are not created equally.

It's just as easy, if not easier, to do any of this type of analysis inside TickZOOM itself. What I do is write out comma separated values for statistics and load them in into a spreadsheet.

For example, you make a 10 line strategy that does nothing but write out the range of every hour bar to a file, plus whether it closed higher or lower than the previous bar, then you load it in a spread sheet to analyze. That's how I do it. Very simple.

Also, in TickZOOM it's easy to do Average, Median, Standard Deviation, and most other statistics. PLUS, when you get an idea, you can easily GRAPH the statistic you want to see and BDB can't do that.

I personally see zero purpose in putting all the ticks over into a database like BDB. Maybe that's a matter of taste but I believe in doing things the easy way.

I could only remotely see copying bar data to a database for that kind of analysis but ticks don't give you bars directly. And you'll be hard pressed to find anything faster than TickZOOM at converting ticks to bars for your analysis.

So now that you have explained, I feel more certain it's outside this discussion because this is about TickZOOM not data mining a BDB.

You don't need TickZOOM to do that, do you? You can get ticks now and load them into BDB. That's why it's outside this discussion.

Whoa! I completely disagree.

You can't have any confidence that a given trading strategy is robust if you give it a data set with known bad and/or missing data. Moreover, it's the responsibility of the trading system that the strategy is executed in to defend against bad data.

You're missing the point. The TickZOOM engine cleans the data in real time both during back testing and real time trading.

It would be very naive to clean the data on disk because then you don't know for certain how TickZOOM will behave with real data in production.

Everyone, even if you dont' have a background in software development you must get the following concept:

The environment/platform your custom trading rules run for historical testing must exactly duplicate the live environment they trade in as much as possible. Even the slightest difference can have dangerous results since when you run it live it means you haven't tested for some situation whether that be bad ticks, or anything else.

TickZOOM focuses on absolute reliability that your custom trading rules work the same way from testing to production.

At the instant a tick is received there is no way to know if it's good or bad.

What? Well you are doomed to fail at auto trading live then, aren't you? What do you mean "no way"? Systems scrub data in real time all over the world. TickZOOM does it too--automatically.

the best you can hope for is to statistically determine if the tick is "reasonable" and that test points back to price and volume frequency analysis of the historical dataset, which needs to contain clean data in order to get reliable statistics.

Okay, you're very smart. Answer this:

If you supposedly "clean" all your data in the historical dataset and THEN you realize you found a better way of cleaning or discover it has an error or problem in the cleaning algorithm, how will you undo it/redo it?

The answer is simple. Never clean your historical data set. Keep it dirty and always run your tick filter against it any time you want to do statistical analysis, etc.

Then as the tick filtering algorithm evolves and improves. You can re-run it and make sure it still works.

NOTE: The tick filtering algorithm in TickZOOM is very good, but it's open source so I'm certain it will improve.

Over the months, I found a bad tick that slipped through and adapted the tick filter to handle it. Thankfully, I didn't "clean" all my historical dataset so it was easily run and tested.

The tick filter logs every tick it filters and why. So you can check the tick filter log to make it what it filters was reasonable.

You can even turn off the tick filter so you get to see the bad ticks if you want to verify. Serious automated traders care about this kind of stuff.

You're very smart but still operating on "theory" rather than practice. Do you have an automated trading system running live with real money right now, like I do?

Maybe so, I doubt it.

So folks, keep all your dirty ticks around to make sure the new better versions of tick filter work and really do a better job.

More problematic is being able to determine if you're missing trades during the trading session. Again, you need some statistical parameters to compare against, which means time frequency analysis of clean historical data.

Certainly, use clean historical data but clean it "on the fly" and never physically modify the data as it was received from exchanges/providers. You need that original copy for testing.

That's because in LIVE trading you will get all those dirty ticks and you need to be confident TickZOOM cleans them appropriately or you adjust the algorithm, etc.

I simply don't understand why you have these arbitrary values. 100K ticks of the SPY is entirely different from a 100K ticks of an instrument that only averages 50 trades/day.

Apparently you don't know what "max bars back" is. That's not related to ticks at all. It relates to "bars" and thus called "max bars back". 100,000 isn't arbitrary but a default.

Most people never need more than 200 or 300 max bars back. So 100,000 (when they know what it means) will blow their minds.

Wayne

jprad · Dec 24, 2008

Quote from greaterreturn:

You're very smart but still operating on "theory" rather than practice. Do you have an automated trading system running live with real money right now, like I do?

Maybe so, I doubt it.

Wayne, we already know how well you're doing in that area, remember this post?

http://www.elitetrader.com/vb/showthread.php?s=&postid=2219064#post2219064

FWIW, you've got some good ideas, but you've got some work to do on your people skills if you intend to thrive in the world of open source.

greaterreturn · Dec 24, 2008

Quote from jprad:
FWIW, you've got some good ideas, but you've got some work to do on your people skills if you intend to thrive in the world of open source.

Sorry. Thanks. You are kind to remind me of that. Someone mentioned that the other day in a PM also.

Nobody ever says that who knows me in person--in real life. But sometimes my emails get that reaction.

What I will do from now on is "filter" my posts by holding them at least an hour. Then I can review and edit for people skills with a fresher perspective before posting.

Again thanks.

Now, as far as having good ideas. I must humbly confess that's not true. Usually, what may appear to be a good idea resulted from a "bad idea that was tried".

What I mean is, take for example this database discussion. Do you have any idea how long I spun my wheels trying to use databases with tick data? It was weeks of trying every performance trick in the book but it still took over an hour to select 10,000,000 ticks.

It was most frustrating.

Finally, I did the math on how fast each tick must be loaded and realized it was impossible any other way than binary directly into memory to get the speed necessary.

I also remembered doing data warehousing when all our jobs that involved millions of rows usually ran all night long.

Or what about our discussion of correcting ticks in a file. Do you think that's a smart idea? No.

Not at all. First, genius that I am, I filtered all the tick data, up front, during the conversion to TickZOOM format.

Later, I got a bad tick somewhere that it missed. So, after reworking the algorithm I had to go and redo the conversion on all the tick files that I had. A real pain.

Then again, and again. Till finally, it became obviously better to just leave it raw and filter on the fly.

So you see? I don't have good ideas. It just so happens that I tried the bad ideas out already.

Again, I'm sorry to be annoying. Thanks for you patience and for not giving up on me.

Sincerely,
Wayne

maxpi · Dec 24, 2008

You are on the right track Wayne, little doubt about that. Handling ticks almost down at the firmware level makes imminent sense, I used to write firmware at times, I was always amazed at how long things took in higher level environments... making absolutely sure that backtests and real operations are the same is paramount. I do that in Ninjatrader and Openquant and Tradestation before them. I have to build my own bars in arrays and filter out bad ticks in my strategy code. Hard to program because I then have to write my own indicators to run on the arrays but worth it, I've proven that for myself, it sidesteps any and all ambiguities and many bugs in the environment.

Being able to debug in the visual environment of a chart is important to me. I write all my code as indicators and debug it by looking at the charts. That is important. Then I reuse the code in a strategy. Most of these trading environments make us do separate coding for indicators and strategies, I think that is stupid, an indicator with some dll calls IS a strategy, the workflow of development is much easier with that capability: write indicator, debug it on charts, add in dll calls.. there is no parallel development and version control between the indicator and the strategy that way. Openquant did away with the indicators, their strategies can write indications to the charts but it's not that great for debugging code.

greaterreturn · Dec 24, 2008

Quote from maxpi:
Being able to debug in the visual environment of a chart is important to me. I write all my code as indicators and debug it by looking at the charts. That is important. Then I reuse the code in a strategy. Most of these trading environments make us do separate coding for indicators and strategies, I think that is stupid, an indicator with some dll calls IS a strategy, the workflow of development is much easier with that capability: write indicator, debug it on charts, add in dll calls.. there is no parallel development and version control between the indicator and the strategy that way. Openquant did away with the indicators, their strategies can write indications to the charts but it's not that great for debugging code.

While documenting, I'm also renaming and refactoring a little bit.

One thing to know is that the Engine only interacts with "Formula"s.

For techies, it's loosely coupled with only interfaces between the engine and the formula. That way it will be easy to upgrade.

Let me define the terms "Formula", "Strategy" and "Indicator" to frame the rest of this discussion.

Strategy: A bit of code that can open and close positions in a trading instrument.

Indicator: A bit of code that can graph itself onto a chart.

Formula: Is either a Strategy or Indicator or both.

Okay, in TickZOOM, as far as the engine is concerned it only knows about formulas--so they can be either indicator or strategy or both at the same time.

NOTE: One powerful feature of TickZOOM is that you can "chain" formulas together. For example, you can have a mean reversion strategy. Then another separate one that applies exits and stops to the first one (like a filter). Then another which collects performance and yet another that does money management.

That's very useful because you can disable any one of them to make it easy to debug when something goes wrong.

In contrast, it's very VERY confusing to try to put ALL that logic into one formula. But be my guest if you're brave. We probably have all tried doing that with EasyLanguage and near pulled our hair out.

So back to your question, you can create a "formula" which means that you can both control positions and graph on the chart while in the same code.

However, a formula just has the raw value of the number of positions. It doesn't have any performance stats, money management logic or entry / exit logic. You would have to manually add all that stuff.

Instead, you do better using a Strategy because it includes all of those things already plus you can still graph to your hearts content!

TickZOOM makes every effort to erase the lines between indicator and strategy but provide convenient options for debugging.

Now, here's how it typically works when you're using TickZOOM on a day-to-day basis (like me).

You create strategy from an idea and throw some graphing logic right in there plus your entry exit rules and configuring the built-in stops, etc.

After a while of playing around, you like the graphing you did but want to try some other entry exit rules without messing up the graphing.

What you do then is create an "Indicator" and copy the graphing logic out of the strategy into the indicator simply as a way of organizing.

Now you can have 2 strategies that both use that one Indicator.

It also simplifies the code in the strategy.

Of course, you can do more graphing directly inside those strategies.

In other words, TickZOOM doesn't impose Indicator or Strategy or Formula boundaries. Use what's comfortable but you'll have more fun and spend less time debugging if you eventually separate different functions like the example code you will see.

Is that a good answer? On the right track?

Sincerely,
Wayne