In one sentence, what is your edge for 2019?

Diskreet · Feb 17, 2019

I've taken on a trading partner who complements my abilities by influencing me to make more measured trades and risk management.

Handle123 · Feb 17, 2019

schweiz said:
Exactly, not every move has the same potential, you should know how to recognize which ones have more potential and which ones don't.

My personal experience is that trend trades are not luck. You have to find a way to see if, with high probability, the trend will reverse. And that is possible. You only need an advance of 1 minute, that's enough.
Many times what people call luck is just a lack of the specifically needed knowledge (has nothing to do with being smart because of a high education). People tend to think that thins happening without them to understand why has all to do with luck. There are a lot of things that I don't understand that look like luck... till they explained me why it happened...
Successful intraday trading is a very complex matter. But once you master it...

I will agree to some trades better than others, however in my case, I am a scalper who has added percentage more lots in hopes these runners hook onto a trend quickly as all my risk management on lots only have so much time to complete 3 tick targets in ES on 85% of the lots. Even though each trade, system does identify some kind of trend, the degree of this trend, many would not consider it viable to do a day trade, whereas reversion to the mean offering possible 3-5 ticks is my main focus. However, if price quickly continues without coming back to breakeven stops, the 15% that are left becomes longer term day trades and I consider getting into them as luck. Smaller the time frame and duration of time is very complex, been scalping since '92, so have the hang of it.

ph1l · Feb 17, 2019

userque said:
Interesting ... what platform are you using?

This is software I wrote that runs on Windows 10 with C++, opencl, and cygwin.

Perl and shell scripts gather daily price and reference data for various assets (e.g., stock indexes and ETFs) using curl and (headless) chrome to retrieve data.

Perl and shell scripts preprocess the data to do scaling and calculate indicators.

The genetic programming C++ executable (Windows console application) with opencl processes the preprocessed data to create rules. Opencl lets some calculations run on a GPU to get the results much faster. An example of a rule is:

The top line of the rule has a name of something the rule is trying to predict. In this example, rule 1 of model 06 is for signaling a short trade on the S&P 500 at the next bar's close with an exit at the close 10 bars in the future.
This rule looked at 7245 trading days of preprocessed data, would have been hit 2874 times (39.6687 percent of the time) and would have had a positive outcome 2042 times (71.0508 percent of the hits) with a mean gain of 1.69423 percent.

The body of the rule can be thought of a a high-level assembly language. Each instruction has an operation (e.g., +), one or two operands (e.g., 0.186615 or indTypeA015), and may put the result in a register (e.g., R0) which will be used in later instructions. Indentation shows instructions that would run when the preceding if statement evaluates true. Operands are floating point constants, indicators, or registers. Indicators have types, and an instruction with an operation on two different types of indicators result in NAN (not a number) or false for an if statement. Missing indicator values have a value of NAN, and operations involving NAN result in NAN. The rule would be fired when it returns a value greater than zero.

The genetic part consists of initialization followed by multiple sequences (generations) of selection, crossover, mutation, evaluation, and survival.

Initialization creates random rules and calculates a fitness measure for each rule. Fitness is based on a risk-adjusted return for a simulated trade.

Selection picks pairs of father and mother rules for crossover and mutation.

For crossover, the father rule gets copied to a son rule, and the mother rule gets copied to a daughter rule. Then a random part of the father's rule gets overlaid at a random location the daughter's rule, and a random part of the mother's rule gets overlaid at a random part of the son's rule.

For mutation, the fittest of the father and mother rules gets copied to a mutant rule. Then a random number of instructions in the mutant rule are changed to new random instructions.

Evaluation calculates fitness for each son, daughter, and mutant.

Survival picks which of the fathers, mothers, sons, daughters, and mutants are kept for the next generation.

Perl and shell scripts use the executable to interpret the best rules from multiple models for long and short directions to form a consensus to go long, short, or not trade.

The k-nearest neighbor C++ executable (Windows console application) processes preprocessed data for the model and evaluation. Model data represents price charts for different assets at different times with future results. Evaluation data represents price charts for different assets at a single time (usually the most recent time.

For each instance of evaluation data (i.e., represents a single chart), the software compares the evaluation data with each instance of the model data to find which models have similar charts. The comparison is by a weighted), Euclidean-type distance. More recent times get higher weights in the calculated distance. The results of closest "k" model instances are combined to form a risk-adjusted result as a prediction.

Perl and shell scripts use the executable's output to rank the evaluation assets into something like:

The count column is the "k" which for this example is one percent of the model instances, and the score column has the risk-adjusted prediction (higher values are better). The prediction is for going long at the close of the next bar and exiting at the close 21 bars in the future.

I don't know if either of these methods will work in the future of course, but they were certainly interesting to develop.

userque · Feb 17, 2019

ph1l said:
This is software I wrote that runs on Windows 10 with C++, opencl, and cygwin ...

I am truly impressed!

Definitely appreciate the detailed response! Do you plan on back/forward testing it?

Looks powerful enough to over-fit. Is that a concern? If so, what's the plan for combating it?

Nobert · Feb 18, 2019

tonyf said:
To get the ball rolling, here is my edge: Swing trading illiquid stocks with a very high cash reserve.

Could you give a single ticker, curious, how illiquid those are, cuz for me liquidity is key factor.

maxinger · Feb 18, 2019

Nobert said:
Could you give a single ticker, curious, how illiquid those are, cuz for me liquidity is key factor.

some of the info are useless, some useless, some sensible, some nonsense.
so do make good judgement.

Nobert · Feb 18, 2019

maxinger said:
some of the info are useless, some useless, some sensible, some nonsense.
so do make good judgement.

so maybe that was an ironic joke and i got it wrong & for real

ph1l · Feb 18, 2019

userque said:
I am truly impressed!

Definitely appreciate the detailed response! Do you plan on back/forward testing it?

Looks powerful enough to over-fit. Is that a concern? If so, what's the plan for combating it?

I started forward testing the k-nearest neighbor strategy with some real money this past Friday (bought EPHE iShares MSCI Philippines ETF which had the highest score after February 14). The genetic programming strategy didn't have a signal.

The genetic programming strategy can easily overfit data if given the wrong kinds of inputs. I've been running it daily for 30 models long and 30 models short for predicting the direction of S&P 500 for 10 trading days after the next day's close. For each model, a scripts finds the proportion of all 169 indicators in the fittest 200 rules. Another script calculates Pearson's correlation coefficient for each pair of models using the proportions of the 169 indicators from each model in the pair.

These correlations are high. For example, in my most recent run after the close Friday, February 15, correlations for long-predicting models vs other long-predicting models ranged from 0.95 to 0.99. The correlations for short-predicting models vs other short-predicting models ranged from 0.87 to 0.99. The correlations for long-predicting models vs short-predicting models ranged from 0.81 to 0.96.

I'm no expert on statistics, but I think the high correlations mean the models are finding similar solutions. And when I look at the best rule from each model, I see similar relationships among the indicators. When I tried the genetic programming method with different kinds of indicators, the corresponding model correlations were much lower -- maybe about in the range of -0.10 to 0.60 (from what I remember). The models being similar with my current indicators gives me some confidence they are valid for awhile (hopefully for the 10 trading days they predict for).

In addition, the genetic programming strategy won't be relating indicators of different types in the same instruction. For example, it won't compare an oscillator-type indicator with a rate of return-type indicator because that would be meaningless. This run-time typing might help the strategy avoid spurious results.

For the k-nearest neighbor strategy (also run daily), I was thinking it would be not too likely to overfit because it looks at a wide variety of assets. It uses, according to etfdb.com, the 550 highest average 3-month volume, passively-managed, unleveraged, non-inverse ETFs not including asset classes bond, currency, preferred stock, or multi-asset (multi-asset ETFs seem to have a lot of bonds and/or cash).

But it will score the assets differently when using more or less model data (e.g., 5 years vs 20 years). And it isn't always clear to me that using more model data is better because the data could be dominated by assets that existed longer.

IAS_LLC · Feb 19, 2019

ph1l said:
I started forward testing the k-nearest neighbor strategy with some real money this past Friday (bought EPHE iShares MSCI Philippines ETF which had the highest score after February 14). The genetic programming strategy didn't have a signal.

The genetic programming strategy can easily overfit data if given the wrong kinds of inputs. I've been running it daily for 30 models long and 30 models short for predicting the direction of S&P 500 for 10 trading days after the next day's close. For each model, a scripts finds the proportion of all 169 indicators in the fittest 200 rules. Another script calculates Pearson's correlation coefficient for each pair of models using the proportions of the 169 indicators from each model in the pair.

These correlations are high. For example, in my most recent run after the close Friday, February 15, correlations for long-predicting models vs other long-predicting models ranged from 0.95 to 0.99. The correlations for short-predicting models vs other short-predicting models ranged from 0.87 to 0.99. The correlations for long-predicting models vs short-predicting models ranged from 0.81 to 0.96.

I'm no expert on statistics, but I think the high correlations mean the models are finding similar solutions. And when I look at the best rule from each model, I see similar relationships among the indicators. When I tried the genetic programming method with different kinds of indicators, the corresponding model correlations were much lower -- maybe about in the range of -0.10 to 0.60 (from what I remember). The models being similar with my current indicators gives me some confidence they are valid for awhile (hopefully for the 10 trading days they predict for).

In addition, the genetic programming strategy won't be relating indicators of different types in the same instruction. For example, it won't compare an oscillator-type indicator with a rate of return-type indicator because that would be meaningless. This run-time typing might help the strategy avoid spurious results.

For the k-nearest neighbor strategy (also run daily), I was thinking it would be not too likely to overfit because it looks at a wide variety of assets. It uses, according to etfdb.com, the 550 highest average 3-month volume, passively-managed, unleveraged, non-inverse ETFs not including asset classes bond, currency, preferred stock, or multi-asset (multi-asset ETFs seem to have a lot of bonds and/or cash).

But it will score the assets differently when using more or less model data (e.g., 5 years vs 20 years). And it isn't always clear to me that using more model data is better because the data could be dominated by assets that existed longer.

Good luck, but settle down and don't overthink things that don't matter brother. That's all I have to offer

Simples · Feb 19, 2019

ph1l said:
I started forward testing the k-nearest neighbor strategy with some real money this past Friday (bought EPHE iShares MSCI Philippines ETF which had the highest score after February 14). The genetic programming strategy didn't have a signal.

The genetic programming strategy can easily overfit data if given the wrong kinds of inputs. I've been running it daily for 30 models long and 30 models short for predicting the direction of S&P 500 for 10 trading days after the next day's close. For each model, a scripts finds the proportion of all 169 indicators in the fittest 200 rules. Another script calculates Pearson's correlation coefficient for each pair of models using the proportions of the 169 indicators from each model in the pair.

These correlations are high. For example, in my most recent run after the close Friday, February 15, correlations for long-predicting models vs other long-predicting models ranged from 0.95 to 0.99. The correlations for short-predicting models vs other short-predicting models ranged from 0.87 to 0.99. The correlations for long-predicting models vs short-predicting models ranged from 0.81 to 0.96.

I'm no expert on statistics, but I think the high correlations mean the models are finding similar solutions. And when I look at the best rule from each model, I see similar relationships among the indicators. When I tried the genetic programming method with different kinds of indicators, the corresponding model correlations were much lower -- maybe about in the range of -0.10 to 0.60 (from what I remember). The models being similar with my current indicators gives me some confidence they are valid for awhile (hopefully for the 10 trading days they predict for).

In addition, the genetic programming strategy won't be relating indicators of different types in the same instruction. For example, it won't compare an oscillator-type indicator with a rate of return-type indicator because that would be meaningless. This run-time typing might help the strategy avoid spurious results.

For the k-nearest neighbor strategy (also run daily), I was thinking it would be not too likely to overfit because it looks at a wide variety of assets. It uses, according to etfdb.com, the 550 highest average 3-month volume, passively-managed, unleveraged, non-inverse ETFs not including asset classes bond, currency, preferred stock, or multi-asset (multi-asset ETFs seem to have a lot of bonds and/or cash).

But it will score the assets differently when using more or less model data (e.g., 5 years vs 20 years). And it isn't always clear to me that using more model data is better because the data could be dominated by assets that existed longer.

Getting some skin in the game, not too much, is good for experience. But for forward testing, have you tried simulation? There is a tendency to think, the longer and harder pursuit, leads to results and "now I should be ready". Not so in trading. If leaning on the analysis paralysis-side, it's good not to incur psychological damage due to meeting reality of this business.