Interesting Article.

jspauld · Nov 10, 2012

Quote from themickey:

If you google survivorship bias, you will get the gyst of the meaning, but in trading it can mean curve fitting via backtesting to a flawed method which is backtesting only on stocks which are now currently trading, not the true measure which is backtesting to those currently trading AND those which no longer are in existence. It can also mean backtesting the winners and not the losers.

jspauld, I think you are a noob, no disrespect, your system is curve fitting similar to neural networks I think, which sounds good in theory buy not practice.

It is possible to get your system to work if you implement it only during certain events, ie don't trade it anytime of the day, implement it only when for example when all markets are rising or falling or whatever and maybe just after the market opens etc.

Anyhow, thanks for sharing, wishing you luck.

Did you even read my article?

I understand what you've said about survivorship bias but it has no bearing on what I was doing. I was only backtesting using 4 weeks worth of data.

It is pretty rude to say.. "I think you are a noob, no disrespect". As you can see from my P&L chart my algorithm worked very well in practice.

ssrrkk · Nov 11, 2012

Quote from jspauld:

Okay, you are talking about overfitting. I was very aware of this as I setup my system. It really wasn't much of an issue. (ie my system performed almost as well on new data as it did on training data.) It's pretty easy to check for this by assigning a portion of your training data as validation data. Also, keep in mind all of my indicators were for very short term market movements and so each day gave me thousands of data points for each indicator.

The money from selling my first business got used up traveling and starting the second. With regard to giving up easily. I do not feel I did this. As mentioned I spent four months trying everything I could to improve profitability. I even paid $20,000 to get a new data stream to see if it helped - it didn't. On the other hand, my passion is startups and not really finance so maybe I did give up to easily.

I can't really respond to the 6 months thing other than to say that seems like a decent chunk of time to me. Especially considering I never knew for sure it was going to work.

Your PL curve proves that you did overfit -- if you think that a year of profits is "proof" that your system did what you intended to do, then you probably haven't studied random walks enough. By the way, you said earlier you used hundreds of variables to fit the short term price action. Then you said you used thousands of data points so you are pretty sure it's enough. However, did you know that the number of data points is irrelevant here? What you need to know is how many degrees of freedom of signal that 1000s of data points contain. And it sure is not 1000s of DOFs. More like 2 or 3 or 5. The only way to know is to use an information based criteria, like AIC, or partial F to compare models of different complexities.

Your comment about not knowing that it was going to work is precisely why exploratory research takes much longer than most anticipate in this business. Usually most people who do research will try 100s or 1000s of things and perhaps find a few things that may have a real edge. On the other hand, if you overfit without knowing it, then you will find something very quickly, guaranteed. And you seem to think that machine learning will somehow magically solve things. I have used ML professionally in other contexts (not trading) such as SVMs SOMs ANNs, Bayesian nets, logistic regression, and they are no magic bullets. They are just part of a class of statistical modeling algorithms with their own strengths and weaknesses.

To reduce the fishiness of your article, would like to see more statistics on the trades, such as overall sharpe ratio, max draw down, total number shares traded, cents per share earned, average daily PL, max daily positive profit, max daily loss, number winning days, number losing days, average positive profit, average loss, plot of cumulative PL vs time, etc.

Anyway, like CT10Gov says, I suppose if you really claim to have done everything you say in the article, then (a) you did put in a lot of work into it, and (b) you did get extremely lucky and there is nothing wrong with both those things.

Craig66 · Nov 11, 2012

Quote from ssrrkk:
Your PL curve proves that you did overfit

I don't really understand how you can draw that conclusion based on the given information, everything was making money in 08/09, to me it just looks like the system decayed with volatility, lots of 'scalping' systems did the same thing. No big deal...

jspauld · Nov 11, 2012

Quote from ssrrkk:

Your PL curve proves that you did overfit -- if you think that a year of profits is "proof" that your system did what you intended to do, then you probably haven't studied random walks enough. By the way, you said earlier you used hundreds of variables to fit the short term price action. Then you said you used thousands of data points so you are pretty sure it's enough. However, did you know that the number of data points is irrelevant here? What you need to know is how many degrees of freedom of signal that 1000s of data points contain. And it sure is not 1000s of DOFs. More like 2 or 3 or 5. The only way to know is to use an information based criteria, like AIC, or partial F to compare models of different complexities.

It's a year of profits with 1000-4000 trades per day. All my trades were with 1-2 contracts position size and I had a 50/50 split of long/short positions. It's clearly not a random walk.

The problem here, I believe is that you don't know the types of indicators I was looking at. For example, I was trading Russell 2000 futures. If NASDAQ futures went up then then my system predicted the Russell futures would also go up (within milliseconds.) The Nasdaq is moving around all day so this provided thousands (yes thousands) of independent data points.

Your comment about not knowing that it was going to work is precisely why exploratory research takes much longer than most anticipate in this business. Usually most people who do research will try 100s or 1000s of things and perhaps find a few things that may have a real edge. On the other hand, if you overfit without knowing it, then you will find something very quickly, guaranteed. And you seem to think that machine learning will somehow magically solve things. I have used ML professionally in other contexts (not trading) such as SVMs SOMs ANNs, Bayesian nets, logistic regression, and they are no magic bullets. They are just part of a class of statistical modeling algorithms with their own strengths and weaknesses.

More...

Okay, as mentioned, I don't think you are really familiar with what I was doing. I was doing HFT / market making. My indicators were not things that I had to research.. they were all things I just kind of knew. Like if there is more size on the bid vs offer then the likelihood is slightly higher the price will move up.

As far as machine learning.. yes, there is a way to 'magically' check if your system is overfitting some data. You simply have a validation set. My algo worked on validation data and it worked in live trading. Overfitting was only a minor problem.

To reduce the fishiness of your article, would like to see more statistics on the trades, such as overall sharpe ratio, max draw down, total number shares traded, cents per share earned, average daily PL, max daily positive profit, max daily loss, number winning days, number losing days, average positive profit, average loss, plot of cumulative PL vs time, etc.

More...

Most of these things could be inferred by my article. I'm not sure you've read it.

ssrrkk · Nov 11, 2012

Quote from jspauld:

It's a year of profits with 1000-4000 trades per day. All my trades were with 1-2 contracts position size and I had a 50/50 split of long/short positions. It's clearly not a random walk.

The problem here, I believe is that you don't know the types of indicators I was looking at. For example, I was trading Russell 2000 futures. If NASDAQ futures went up then then my system predicted the Russell futures would also go up (within milliseconds.) The Nasdaq is moving around all day so this provided thousands (yes thousands) of independent data points.

Okay, as mentioned, I don't think you are really familiar with what I was doing. I was doing HFT / market making. My indicators were not things that I had to research.. they were all things I just kind of knew. Like if there is more size on the bid vs offer then the likelihood is slightly higher the price will move up.

As far as machine learning.. yes, there is a way to 'magically' check if your system is overfitting some data. You simply have a validation set. My algo worked on validation data and it worked in live trading. Overfitting was only a minor problem.

Most of these things could be inferred by my article. I'm not sure you've read it.

what was your commission costs per day? what is your commission per trade or per share? what was the commission minimum on a trade?

if you claim to be doing HFT, what was your network latency? what type of hardware and network connection were you using? were you colocated near an exchange? were you only using limit orders or did you ever pay the spread. How frequently did you cancel and resubmit orders? what was the average spread you were earning per trade?

regarding random walks, does the Russel 2000 have a nonzero autocorrelation at non zero tau?

I read through your article and could not "infer" the sharpe ratio, or the average winning trade or the average losing trade.

you claim your biggest losing days were like around $2000. Well that pretty much means that if you took out commission costs, you didn't have a single losing day for the whole year.

jspauld · Nov 11, 2012

Quote from ssrrkk:

what was your commission costs per day? what is your commission per trade or per share? what was the commission minimum on a trade?

if you claim to be doing HFT, what was your network latency? what type of hardware and network connection were you using? were you colocated near an exchange? were you only using limit orders or did you ever pay the spread. How frequently did you cancel and resubmit orders? what was the average spread you were earning per trade?

regarding random walks, does the Russel 2000 have a nonzero autocorrelation at non zero tau?

I read through your article and could not "infer" the sharpe ratio, or the average winning trade or the average losing trade.

you claim your biggest losing days were like around $2000. Well that pretty much means that if you took out commission costs, you didn't have a single losing day for the whole year.

Good questions.. And you are right, I failed to add some important details. I had a server co-located in Chicago. Trading Russell I believe the latency was around 10-20 ms. (I actually mention modeling this in the article.) The DAX latency was around 60-140ms based on my memory. (Had to cross an ocean.)

Commission per contract (side) I think were around 70 cents (us) for DAX and 40 cents for the russell. I traded more russell so maybe the average was 50 cents.

If we look at my 5 best months I was doing around 1800 contracts per day average. So maybe my commissions were $900/day. As my program became less profitable I did less volume. Actually my volume chart basically looks like a smoothed version of my P&L chart. Also, this made me just realize 1000-4000 trades was incorrect. That was 1000-4000 contracts. So trades maybe 400-1600 per day.

I would say 90%-95% of my executions were from limit orders. Occasionally my system had enough edge to pay the spread. Orders were cancelled ALL THE TIME. Most orders were cancelled and if I had to guess I would say 80-90%.

"regarding random walks, does the Russel 2000 have a nonzero autocorrelation at non zero tau?" -- Sorry I don't know what this means.

I'm actually not sure on the avg winning trade and avg losing trade. It's a good question though.

Spectre2007 · Nov 11, 2012

Trick is to find the everlasting edge. The bid ask spread is everlasting, but what characteristics of price action create an everlasting edge in the microstructure if you had to give up the bid/ask spread.

ssrrkk · Nov 11, 2012

Quote from jspauld:

Good questions.. And you are right, I failed to add some important details. I had a server co-located in Chicago. Trading Russell I believe the latency was around 10-20 ms. (I actually mention modeling this in the article.) The DAX latency was around 60-140ms based on my memory. (Had to cross an ocean.)

Commission per contract (side) I think were around 70 cents (us) for DAX and 40 cents for the russell. I traded more russell so maybe the average was 50 cents.

If we look at my 5 best months I was doing around 1800 contracts per day average. So maybe my commissions were $900/day. As my program became less profitable I did less volume. Actually my volume chart basically looks like a smoothed version of my P&L chart. Also, this made me just realize 1000-4000 trades was incorrect. That was 1000-4000 contracts. So trades maybe 400-1600 per day.

I would say 90%-95% of my executions were from limit orders. Occasionally my system had enough edge to pay the spread. Orders were cancelled ALL THE TIME. Most orders were cancelled and if I had to guess I would say 80-90%.

"regarding random walks, does the Russel 2000 have a nonzero autocorrelation at non zero tau?" -- Sorry I don't know what this means.

I'm actually not sure on the avg winning trade and avg losing trade. It's a good question though.

Okay so with the TF you were you earning 1 tick spread on average per round-trip trade? How long were you holding a typical trade? How did you close your positions -- with opposite limit order quotes? If so, how did you ensure it would close quickly enough? How did you ensure 50/50 long short if you were using only limit orders?

CT10Gov · Nov 11, 2012

Quote from ssrrkk:

Okay so with the TF you were you earning 1 tick spread on average per round-trip trade? How long were you holding a typical trade? How did you close your positions -- with opposite limit order quotes? If so, how did you ensure it would close quickly enough? How did you ensure 50/50 long short if you were using only limit orders?

Hummm... the more details he adds, the more I'm swinging from the "why can't he be the lucky mouse" side to "I think he made 500k in simulations".

jspauld · Nov 11, 2012

Quote from Spectre2007:

Trick is to find the everlasting edge. The bid ask spread is everlasting, but what characteristics of price action create an everlasting edge in the microstructure if you had to give up the bid/ask spread.

I wasn't giving up the spread. The key was bidding/offering at just the right times so I could make money. Very much like a market maker.

Interesting Article.

Attachments