Journey from investment bank to independent automated trader

i guess you stated earlier the actual "weakness" of SQL in regards to trading applications: MySql or Sql Server is not optimized to handle linar data. Of course it does the job of a what a database is supposed to do, and I find it of course more efficient than any Jet/Access database. But we are talking about accessing amounts of data concurrently that SQL simply was not designed for, processing speed wise. I would even challenge you to try to beat Amibroker's proprietary database structure along calls to the data base in terms of speed. Flexibility is not asked for when simply reading and processing flat/linear data.

I am not trained in database design but simply state my experience when it comes to accessing vast amount of data and my experience with SQL is not a good one in this regard and I would argue the issue is not a design question when dealing with simple time series. Please correct me if you disagree and maybe you could provide some more details how you have set up time series on MS SQL Server, I think its relevant to this thread and others may benefit from your knowledge unless the OP disagrees.




Quote from CloroxCowboy:

Porche's have nice shiny engines, but they will be disappointing to drive if you only use the first 3 gears. :D

Again, it's all about how you use it. I get great results, but I devote a good bit of time to designing the database for efficiency, and keeping in mind which types of requests will be inefficient and which won't. I can't speak to your previous experience, asiaprop, and I don't mean to imply that you weren't getting to "4th gear" but I am curious about how you were composing those requests.

Of course SQL is no more of a magic bullet than any other application. I just feel that it gets a bad rap (unfairly) on this board...and amongst traders in general.

Anyway, I don't want to hijack the OP's journal with this, so maybe I'll start a "SQL: love it/hate it" thread in another category.
 
Quote from asiaprop:

i guess you stated earlier the actual "weakness" of SQL in regards to trading applications: MySql or Sql Server is not optimized to handle linar data. Of course it does the job of a what a database is supposed to do, and I find it of course more efficient than any Jet/Access database. But we are talking about accessing amounts of data concurrently that SQL simply was not designed for, processing speed wise.

I completely (but politely :)) disagree. It depends to a large part on table structures, indexes, etc. Not to mention the hardware setup and the way Sql Server - or other RDBMS - is configured on that hardware. I'd love to hear more of your opinions on the thread I linked if you're interested.
 
Hello Lolatency,

Nice Journal....

I have been a TS customer for 4 years( not anymore ) so a few warning about TS Datas:

- If you trade futures, Implied prices sizes on DOM and prints in T§S are missing. Depending on the market, it can be half the prints...

-bid/ask datas( important for HF systems ) are not real time . It's just a snapshot at every tick.

BYE.
 
Quote from CloroxCowboy:

I completely (but politely :)) disagree. It depends to a large part on table structures, indexes, etc. Not to mention the hardware setup and the way Sql Server - or other RDBMS - is configured on that hardware. I'd love to hear more of your opinions on the thread I linked if you're interested.

I don't mind the thread being hijacked if you want to post your table structure for data. I don't consider this a tremendous edge to hide if the objective is backtesting, but I understand if you want to withhold details.

Let's theorize the construction of a few queryable objects here:

1) Time and sales data with:
i) consolidated tape/exchange timestamp [to millisecond precision or better] and
ii) received timestamp, to assist in making statistical guesses as to when the trade actually happened, and what the depth of book was showing when the trade happened

2) Level-2 data/ Market Depth, at a surface level, with: Picture at a given microsecond, or some coherent organization that shows what the market is doing at any given second. I should be able to load this into a structure from flat files so I can run queries against it and the time and sales.

3) The Level 1 quotes + sizes for each individual ECN


What's your proposal here?
 
Quote from lolatency:

I don't mind the thread being hijacked if you want to post your table structure for data. I don't consider this a tremendous edge to hide if the objective is backtesting, but I understand if you want to withhold details.

Let's theorize the construction of a few queryable objects here:

1) Time and sales data with:
i) consolidated tape/exchange timestamp [to millisecond precision or better] and
ii) received timestamp, to assist in making statistical guesses as to when the trade actually happened, and what the depth of book was showing when the trade happened

2) Level-2 data/ Market Depth, at a surface level, with: Picture at a given microsecond, or some coherent organization that shows what the market is doing at any given second. I should be able to load this into a structure from flat files so I can run queries against it and the time and sales.

3) The Level 1 quotes + sizes for each individual ECN


What's your proposal here?

I think you've got a pretty good structure right there.

I'd probably have a T&S table with symbol, consolidated timestamp, received timestamp, price and volume...primary key (clustered index) on symbol + consolidated timestamp or symbol + received timestamp.

Then the L2 table with symbol, timestamp, market participant ID, price, volume, and bid/ask indicator...primary key on symbol + timestamp.

Finally the L1 table with ecn, symbol, timestamp, price and volume...primary key on symbol + timestamp or ecn + symbol + timestamp depending on the queries.

For best execution leave ecn off the L1 table's primary key and you can join all three tables with the same clustered index. Use merge joins since the data will already be coming in and stored pre-sorted. And try to push some of the calculations onto the server as well, since you're joining the tables anyway...don't just use it for storage and retrieval.

That would be my first attempt at a structure.
 
Quote from asiaprop:

sorry but I agree, you suggest nonesense. While having more data is better than less a lot of empirical evidence shows that testing a high frequency trading strategy over 1 or 2 years of tick data will not perform a whole lot different statistically than testing it over 5 years worth of tick data, regardless of the time frame chosen. Given you scrambled tick data (possibly, depending on the core of your strategy, running a test on some sort of bm discretizations, generated by iid random vars) you can be pretty much certain you are onto something useful if your system makes it through such tests. I see no valid reason in regards to your 20+ year backtest postulation.

I could show you a high frequency strategy that performs almost identically regardless of whether I backtest it on 1 year of data during the hight of the "internet bubble" or 2008 data. Surprised?

Somewhat. What kind of account size would you need to trade 1 (say) S&P 500 contract? What would the worst drawdown be? What's the back tested expected return and standard deviation? What brokerage & slippage per trade have you assumed?
 
sorry but I dont see the relevance of your questions to what I wrote...?

How is trade size, draw down, stdev, ...., related to the fact that some high frequency trading strategies perform very similarly when backtested over different time frames? By the way, I dont attempt to exploit "arbitrage opportunities", especially not any that may only last for a few weeks. In that sense the OP may come from a completely different angle. I myself find no value in searching for trading opportunities that may vanish within a few weeks when designing an automated trading strategy nor do I think most hedge funds would. The development costs may be exorbitantly high relative to the expected value of such project. I work on strategies that harness repetitive market signatures.

Quote from Ash1972:

Somewhat. What kind of account size would you need to trade 1 (say) S&P 500 contract? What would the worst drawdown be? What's the back tested expected return and standard deviation? What brokerage & slippage per trade have you assumed?
 
Quote from asiaprop:

While having more data is better than less a lot of empirical evidence shows that testing a high frequency trading strategy over 1 or 2 years of tick data will not perform a whole lot different statistically than testing it over 5 years worth of tick data, regardless of the time frame chosen.
................
I could show you a high frequency strategy that performs almost identically regardless of whether I backtest it on 1 year of data during the hight of the "internet bubble" or 2008 data. Surprised?

I agree with you in general. I think what matters is the ratio of expected holding period and the backtesting length. Since high-frequency strategies can generate more trades for a given period, the sample size is larger and the backtesting length does not need to be as long.

Just out of curiousity, how long of a holding period would you consider "high-frequency?"
 
I consider high frequency holding periods to be in the sub minute ball park, even though I do not currently trade such models myself.

QUOTE]Quote from ezbentley:

I agree with you in general. I think what matters is the ratio of expected holding period and the backtesting length. Since high-frequency strategies can generate more trades for a given period, the sample size is larger and the backtesting length does not need to be as long.

Just out of curiousity, how long of a holding period would you consider "high-frequency?"
[/QUOTE]
 
Quote from asiaprop:

I consider high frequency holding periods to be in the sub minute ball park, even though I do not currently trade such models myself.

My understanding is that high-frequency strategies are either market-making or stat arb. Do you agree or are you aware of any other type of HF strategy? I am apparently a newbie trying to satisfy my curiosity.

Since high-frequency trading requires very low commission and very good execution, do you think an independent retail trader can build an infrastructure to trade high frequency profitably?
 
Back
Top