Genetic Programming, C project

vicirek · May 29, 2014

volpunter said:
Domain knowledge? There is a guy who cannot make a dime trading his own strategies and now he has to sell 120 USD books, are you kidding me? And I have published tons and actively contribute as quant. But I will not reveal where because I am happy that particular service is mostly free of cockroaches and freebie suckers like you are. Please stick to this website and free linux ware to satisfy your hunger for freebies. I, in the meanwhile earn my money with trading my own ideas and strategies perusing my own system architecture that I , myself, developed. I have zero inclination to share any of that with others because it works and why should I give away what I worked hard on and which works. In case you still did not understand the motivations behind those giving away strategy ideas: It's because such ideas do not work (anymore) and hence the author has nothing to lose by publicising it. Still hard to comprehend?

Is it not what you are doing? Statistics using parameters derived from loosely defined TA space? They just found a way to monetize it and looking at the list of their parameter base (judging from book content) it is quite extensive. Sometimes it has to work. Is not it?

volpunter · May 29, 2014

you can find as many glorious words as you wish, fact remains this:

Someone who trades profitably does not have any willingness to give away the recipe for success. Everyone else sells garbage to the masses. Period.

vicirek said:
Is it not what you are doing? Statistics using parameters derived from loosely defined TA space? They just found a way to monetize it and looking at the list of their parameter base (judging from book content) it is quite extensive. Sometimes it has to work. Is not it?

Sergio77 · Jun 22, 2014

MPxtrader said:
Save yourself a lot of trouble...buy the book the software is free.

http://www.tssbsoftware.com

I think he is charging a lot for support and you will need it with software that was written for DOS.

eusdaiki · Jun 22, 2014

Sergio77 said:
I think he is charging a lot for support and you will need it with software that was written for DOS.

Written for DOS?? :O

maybe this will come in handy...

http://www.ccs64.com/

fan27 · Jul 7, 2014

Looks like a library has already been written (Open source)

http://gaul.sourceforge.net/

fan27

cjbuckley4 · Dec 10, 2014

I realize this is an old thread, but it's an interesting topic that deserves to spend some time at the front! As someone who is currently taking classes in ML at an undergraduate and graduate level, I think I can offer you a couple little pieces of advice that will save you a lot of headache. The way you've purposed this project seems a little off base to me. When I was first taught about evolutionary learning, it seemed to me like 'God's truth.' The idea that you can solve an optimization problem by emulating nature is something beautiful, and even more beautiful is the fact that your solutions suck if you fail to include enough diversity in each population...truly amazing. From an end user standpoint, however, there's no difference if an optimization problem was solved using a brute force sweep, GA, or some gradient based algorithm when applicable (UNLESS YOU MISS THE MAXIMUM!!!). Now, what you're purposing to do is (I believe) is to use evolutionary learning to generate entire algorithms and not just find max/mins of a defined function. This is something I know a little about (not a lot), but I think this lecture from MIT might be helpful.

It gives a lot of info about a cool relevant problem. "Evolving Virtual Creatures" by Karl Sims:

In my brief experience with ML, here's some advice:

Regarding Coding:
1. Starting out C in a linux environment is a BAD idea, as other's have pointed out, here's why:
As you've already heard on this thread:
a. It'll take you an eon, most of what you're doing will be prototyping, don't do that in this environment.
As I'll point out:
b. SINGLE BIGGEST REASON: Good genetic algorithm results aren't the result of just letting a computer run wild, that will never work...computers are good at computing, humans are good at being creative and reasoning. You will need to input new features to permute all the time as parts of your genetic algorithm. You will constantly be coding up new functions to add feature vectors to your dataset...this will be slowed down exponentially by using C in linux.
c. You may need something like OpenCL or CUDA to achieve good performance, so C is good there, but C++ or a language with a wrapper for those is going to make your life much easier.
2. Use other people's code where you can. This is going to be a massive project. You could use libraries to achieve a lot of this, maybe C has some, but look at other options too.
3. If I was you, I would stick to MATLAB, R, python, or C# for now. I can speak for MATLAB and R having tons of resources available for this kind of work, I believe Python does too, but I've never used it for ML.

...enough of the usual ET nonsense where we get off topic bickering and argue about tools, lets talk a bit about my uninitiated intuitions on implementing GA on this sort of problem in general.

Regarding Implementation:
1. GA is an optimization technique. It's not going to just print a little sheet of good trading strategies or something, it requires that you understand what's going on, how your GA implementation works, what your input vectors are, and giving your GA good data and objectives.
2. For the love of God, please please do not make your fitness function net profit. That's a recipe for disaster. You NEED to have some measure of risk included in your fitness function, use Sharpe Ratio or something better like some metric including cVaR or something, but do not use profit as your objective.
3. By nature of GA, you're going to overfit. There's no avoiding that. You need to understand how the resulting algorithms work and hopefully be able to describe a realistic economic reason why they work. If you're making money, you're doing a service to the market to make it more efficient in some perverse way or another, so unless you can say conclusively what that service is, I'd be wary of deploying any such algorithm. Here's a great thread by a really smart guy on that topic:
http://www.elitetrader.com/et/index.php?threads/why-strategies-make-money.287837/
4. You're gonna need a lot of really clean data.
5. If I were you, I would invert the whole process you're purposing (but I doubt you will since that's not your objective). I would first come up with a strategy targeting and 'economic opportunity' ^^^ and then use a genetic algorithm to get you closer to a tradable strategy exploiting this opportunity iteratively.
6. If you're really going to use something like GA, then embrace the curve fitting (sometimes), you're trying to find permutations of an algo that make money, so let the computer try permutation. Make all your input functions take variable arguments in, and have one argument be flag to permute or not to permute X variable or feature of the function, then make another input a vector of scalar parameters for that piece of code which are decided by your optimization routine...better yet, you could probably define which parts to permute recursively too...it should be organized in some way, but I'm not smart enough to tell you what that way is. Make the timeframe you trade on variable, the discrete time intervals features are calculated on variable, the assets you trade, everything variable. I don't really know, there's probably some fancy computer science theory that insures that literally everything possible is made variable in an orderly fashion. If anyone knows what that is please, point us in the right direction. You will also want a way for a human being to control which inputs are permuted and which are not externally so that you as the user are actively involved in the optimization process and can use your human judgment, instead of just letting the computer fit anything it wants all at the same time.
7. You'll probably want to record everything. Maybe come up with some way to database results automatically at each iteration with each parameter set, that way you have a record of what works, what doesn't where your GA tends to get stuck, etc. etc.
8. This is a big one. You're gonna come up with terrible ideas probably 95% of the time using this approach, and the more you allow the computer to replace human creativity, the more likely you are to come up with overfit nonsense. Again, you need to come up with good features to include in your dataset. Look at Karl Sims video, it's an highly iterative process by a renown CS hero to create even those simple GA results. The successful implementation of GA is way more a function of good human input than anything else. GAs themselves come in a open source can you can get and tweak on the internet. Good input is your responsibility.

I've never implemented a GA for anything really complex, and I've never done one that designs a new algorithm, only ones that look for optimal parameters over a defined parameter space. The goal of this type of implementation is to conserve iterations of an optimization algorithm, not to come up with a trading algorithm like you suggest. There are probably people here way more knowledgable on the topic. I suggest you look on quora, google groups, linkedIn, etc etc. to find people really doing this type of thing. Best of luck.

maxpi · Dec 10, 2014

IMO evolution is a real joke essentially. It's been magically empowered with abilities to create.

volpunter · Dec 11, 2014

Very good points made, cudos to the time you invested in writing this. Unfortunately 99% of people on this site who ever stumbled across GA (or even use software), not even including all ETlers, have nothing but the most basic and rudimentary idea about GA and even less so about how it can be applied in financial space.

Here my experience in this space and trading in general:

+ I have invested a considerable amount of time into applications of machine learning and particularly GAs to strategy development and I join the majority of those who have worked in this space for a very long time in that machine learning and GA and the whole universe of neural networks poses very few opportunities to improving the whole strategy development life cycle. Parameter optimization possibly the only one.

+ I claim that roughly 90 out of 100 people on this site who claim they develop strategies (or claim they tried it) have never ever come up and will never come up with a consistently risk-adjusted profitable strategy. The complexities of such is just way beyond most peoples' skill level capacity.

+ I go further in claiming that one has almost zero chance of succeeding in developing profitable strategy algorithms unless one has traded and risk managed REAL MONEY, sat in front of screens for several 10000 hours, understands the real psychology in markets by having sweated, experienced fear, the greed to take money off the table (often times prematurely), and learned to control and direct those very human but for trading flawed urges. One may say that running strategies will exactly eliminate the human psychological component but without fully understanding it one will be unable to understand how the masses who trade feel and how they act and why we see capitulation moves and when and to what magnitude. A lot of people claim they develop strategies and it works for one single set of market dynamics and breaks down as soon as the cycle moves on to a new state dynamic.

+ Someone who cannot make money by trading in a discretionary (manual) fashion will not be able to develop a strategy that makes money.

cjbuckley4 said:
I realize this is an old thread, but it's an interesting topic that deserves to spend some time at the front! As someone who is currently taking classes in ML at an undergraduate and graduate level, I think I can offer you a couple little pieces of advice that will save you a lot of headache. The way you've purposed this project seems a little off base to me. When I was first taught about evolutionary learning, it seemed to me like 'God's truth.' The idea that you can solve an optimization problem by emulating nature is something beautiful, and even more beautiful is the fact that your solutions suck if you fail to include enough diversity in each population...truly amazing. From an end user standpoint, however, there's no difference if an optimization problem was solved using a brute force sweep, GA, or some gradient based algorithm when applicable (UNLESS YOU MISS THE MAXIMUM!!!). Now, what you're purposing to do is (I believe) is to use evolutionary learning to generate entire algorithms and not just find max/mins of a defined function. This is something I know a little about (not a lot), but I think this lecture from MIT might be helpful.

It gives a lot of info about a cool relevant problem. "Evolving Virtual Creatures" by Karl Sims:

In my brief experience with ML, here's some advice:

Regarding Coding:
1. Starting out C in a linux environment is a BAD idea, as other's have pointed out, here's why:
As you've already heard on this thread:
a. It'll take you an eon, most of what you're doing will be prototyping, don't do that in this environment.
As I'll point out:
b. SINGLE BIGGEST REASON: Good genetic algorithm results aren't the result of just letting a computer run wild, that will never work...computers are good at computing, humans are good at being creative and reasoning. You will need to input new features to permute all the time as parts of your genetic algorithm. You will constantly be coding up new functions to add feature vectors to your dataset...this will be slowed down exponentially by using C in linux.
c. You may need something like OpenCL or CUDA to achieve good performance, so C is good there, but C++ or a language with a wrapper for those is going to make your life much easier.
2. Use other people's code where you can. This is going to be a massive project. You could use libraries to achieve a lot of this, maybe C has some, but look at other options too.
3. If I was you, I would stick to MATLAB, R, python, or C# for now. I can speak for MATLAB and R having tons of resources available for this kind of work, I believe Python does too, but I've never used it for ML.

...enough of the usual ET nonsense where we get off topic bickering and argue about tools, lets talk a bit about my uninitiated intuitions on implementing GA on this sort of problem in general.

Regarding Implementation:
1. GA is an optimization technique. It's not going to just print a little sheet of good trading strategies or something, it requires that you understand what's going on, how your GA implementation works, what your input vectors are, and giving your GA good data and objectives.
2. For the love of God, please please do not make your fitness function net profit. That's a recipe for disaster. You NEED to have some measure of risk included in your fitness function, use Sharpe Ratio or something better like some metric including cVaR or something, but do not use profit as your objective.
3. By nature of GA, you're going to overfit. There's no avoiding that. You need to understand how the resulting algorithms work and hopefully be able to describe a realistic economic reason why they work. If you're making money, you're doing a service to the market to make it more efficient in some perverse way or another, so unless you can say conclusively what that service is, I'd be wary of deploying any such algorithm. Here's a great thread by a really smart guy on that topic:
http://www.elitetrader.com/et/index.php?threads/why-strategies-make-money.287837/
4. You're gonna need a lot of really clean data.
5. If I were you, I would invert the whole process you're purposing (but I doubt you will since that's not your objective). I would first come up with a strategy targeting and 'economic opportunity' ^^^ and then use a genetic algorithm to get you closer to a tradable strategy exploiting this opportunity iteratively.
6. If you're really going to use something like GA, then embrace the curve fitting (sometimes), you're trying to find permutations of an algo that make money, so let the computer try permutation. Make all your input functions take variable arguments in, and have one argument be flag to permute or not to permute X variable or feature of the function, then make another input a vector of scalar parameters for that piece of code which are decided by your optimization routine...better yet, you could probably define which parts to permute recursively too...it should be organized in some way, but I'm not smart enough to tell you what that way is. Make the timeframe you trade on variable, the discrete time intervals features are calculated on variable, the assets you trade, everything variable. I don't really know, there's probably some fancy computer science theory that insures that literally everything possible is made variable in an orderly fashion. If anyone knows what that is please, point us in the right direction. You will also want a way for a human being to control which inputs are permuted and which are not externally so that you as the user are actively involved in the optimization process and can use your human judgment, instead of just letting the computer fit anything it wants all at the same time.
7. You'll probably want to record everything. Maybe come up with some way to database results automatically at each iteration with each parameter set, that way you have a record of what works, what doesn't where your GA tends to get stuck, etc. etc.
8. This is a big one. You're gonna come up with terrible ideas probably 95% of the time using this approach, and the more you allow the computer to replace human creativity, the more likely you are to come up with overfit nonsense. Again, you need to come up with good features to include in your dataset. Look at Karl Sims video, it's an highly iterative process by a renown CS hero to create even those simple GA results. The successful implementation of GA is way more a function of good human input than anything else. GAs themselves come in a open source can you can get and tweak on the internet. Good input is your responsibility.

I've never implemented a GA for anything really complex, and I've never done one that designs a new algorithm, only ones that look for optimal parameters over a defined parameter space. The goal of this type of implementation is to conserve iterations of an optimization algorithm, not to come up with a trading algorithm like you suggest. There are probably people here way more knowledgable on the topic. I suggest you look on quora, google groups, linkedIn, etc etc. to find people really doing this type of thing. Best of luck.

cjbuckley4 · Dec 11, 2014

Thanks for the props on my reply. I mirror your sentiment that most people on ET will probably never bring a strategy to fruition. You know enough about my current work to know that I'm in the 90% of ETer's who have never deployed a strategy in the live market. I do however have tremendous respect for what it takes to get started in this business. I'd like to think I'm taking the right steps to getting there, but I guess only time and the markets will tell...I'm trying to be in that 10%. You're also right that GA and ML are just a buzzwords that a lot of people gravitate to.

The point I'm most in agreement with is your sentiment that machine learning is kind of nonsense for professional trading. It depends on your definition of ML...if you say everyone who calculates an MLE is doing ML in some form, then maybe more people are; if you say only people who are successful applying neural nets to trading are using machine learning then obviously the number of successful ML practitioners is going to be very low. I think that in the retail trading community there are probably some misunderstandings about what exactly machine learning is. Machine learning, in my experience (I'm really not an expert either) is no more than a collection of optimization techniques to fit data. If you view machine learning for what it is, then it becomes more of an optimization toolkit like you suggest than some magic bullet to create strategies. As I said in my post, the way I learned GA, it's used as a way to decrease the number of iterations necessary to find an optimal solution. For example, maybe you're trying to create a portfolio to mimic the S&P 500 out of 15 of the 500 symbols. So there are 500 choose 15 combinations = 18877913877607917786274849200 choices!!! (if my basic combinatorics memory serves me!). Seems like a pretty big problem to solve iteratively (if my calculation is right and you assume every backtesting iteration takes 1 second...it looks like it would take 5.986 * 10^20 years), but you can use GA to get you closer to a solution in fewer iterations. That seems like more of a classic application of GA to trading to me. As I said, I think his problem statement is a bit misguided, but still, I encourage him to try his best to see the project to completion. We learn by doing, so even if it doesn't work, tackling a big problem like this will probably make him a better coder and teach him something. Best of luck to the OP! I hope you prove us all wrong!!

volpunter · Dec 11, 2014

very much agree.

I think it all comes down to most being outright lazy or at the very least not understanding what it takes to succeed in this business. Most want to get rich quick, no shame in admitting that. But it is shameful to assume that a little work and a mediocre approach to trading will get one to the goals of getting rich quick or not losing money for that matter. Shameful because it ridicules all those who put in years of efforts to study hard, who competed, in order to qualify to learn from those who do best, it ridicules also all those who spend a tremendous amount of time on learning and applying what they have learned, and I am talking years, not a few weeks or months. This world operates on the principle of meritocracy not mediocrity. That is what most here do not understand. And because they want to put in as little an effort as possible they buy into all the snakeoil promises that a little optimization here or there, a little "machine learning" (I hate this term by now because its totally misused by most), will get you to the goal of being profitable. No it does not, because most would not even start to comprehend how to apply the tools at their disposal.

To sum up, the problem is that most search for tools and once they have them they think that is the end of road, whereas truly intelligent beings understand that each tool has to be fully understood before it is utilized. But you can only teach someone who wants to learn. Someone with lack of intellect or lack of sufficient IQ cannot be taught, stupid stays stupid, as harsh as it sounds.

cjbuckley4 said:
Thanks for the props on my reply. I mirror your sentiment that most people on ET will probably never bring a strategy to fruition. You know enough about my current work to know that I'm in the 90% of ETer's who have never deployed a strategy in the live market. I do however have tremendous respect for what it takes to get started in this business. I'd like to think I'm taking the right steps to getting there, but I guess only time and the markets will tell...I'm trying to be in that 10%. You're also right that GA and ML are just a buzzwords that a lot of people gravitate to.

The point I'm most in agreement with is your sentiment that machine learning is kind of nonsense for professional trading. It depends on your definition of ML...if you say everyone who calculates an MLE is doing ML in some form, then maybe more people are; if you say only people who are successful applying neural nets to trading are using machine learning then obviously the number of successful ML practitioners is going to be very low. I think that in the retail trading community there are probably some misunderstandings about what exactly machine learning is. Machine learning, in my experience (I'm really not an expert either) is no more than a collection of optimization techniques to fit data. If you view machine learning for what it is, then it becomes more of an optimization toolkit like you suggest than some magic bullet to create strategies. As I said in my post, the way I learned GA, it's used as a way to decrease the number of iterations necessary to find an optimal solution. For example, maybe you're trying to create a portfolio to mimic the S&P 500 out of 15 of the 500 symbols. So there are 500 choose 15 combinations = 18877913877607917786274849200 choices!!! (if my basic combinatorics memory serves me!). Seems like a pretty big problem to solve iteratively (if my calculation is right and you assume every backtesting iteration takes 1 second...it looks like it would take 5.986 * 10^20 years), but you can use GA to get you closer to a solution in fewer iterations. That seems like more of a classic application of GA to trading to me. As I said, I think his problem statement is a bit misguided, but still, I encourage him to try his best to see the project to completion. We learn by doing, so even if it doesn't work, tackling a big problem like this will probably make him a better coder and teach him something. Best of luck to the OP! I hope you prove us all wrong!!