I believe there are a few people here who do this or similar kind of work in their process of model building.
Here is the deal:
The data is synthetic. Means I have generated it and I know the rules (real model) that generated it.
The values are separated by semicolons.
There are 3000 rows - 3000 data points.
Each column is one variable. The first column is target variable, the other 300 are input variables.
All variables are binary. For target it's either 1 (true) or 0 (false). For inputs it's 1 (true) or (-1) false.
The challenge is to find a set of patterns that generated this data set.
A pattern can be in the form of:
if (variable #35 is true) AND (variable #184 is false) then target is "true".
I am not disclosing the complexity of patterns.
There are 300 data points with target = true, and 2700 data points with target = false.
A few notes:
Not all targets are predictable. Some part of them were added as random noise.
Not all input variables are relevant. In fact, most of them are irrelevant.
This data is in no way related to any real financial time series (it's synthetic).
I've been toying around (actually doing quite serious work) with neural networks, testing how powerful they are in feature selection and modelling this kind of data. So far I couldn't find a single method/training technique to crack this problem (with NNs).
If you manage to crack this problem, please be free to post your results. I will reveal the real model that generated the data to compare with your results. Also, if you don't want to disclose your technique/method/algorithm, don't. I can understand this can be proprietary (and very expensive) information. My first priority is to find out if it's possible to crack the problem. The method itself is secondary.
Here is the deal:
The data is synthetic. Means I have generated it and I know the rules (real model) that generated it.
The values are separated by semicolons.
There are 3000 rows - 3000 data points.
Each column is one variable. The first column is target variable, the other 300 are input variables.
All variables are binary. For target it's either 1 (true) or 0 (false). For inputs it's 1 (true) or (-1) false.
The challenge is to find a set of patterns that generated this data set.
A pattern can be in the form of:
if (variable #35 is true) AND (variable #184 is false) then target is "true".
I am not disclosing the complexity of patterns.
There are 300 data points with target = true, and 2700 data points with target = false.
A few notes:
Not all targets are predictable. Some part of them were added as random noise.
Not all input variables are relevant. In fact, most of them are irrelevant.
This data is in no way related to any real financial time series (it's synthetic).
I've been toying around (actually doing quite serious work) with neural networks, testing how powerful they are in feature selection and modelling this kind of data. So far I couldn't find a single method/training technique to crack this problem (with NNs).
If you manage to crack this problem, please be free to post your results. I will reveal the real model that generated the data to compare with your results. Also, if you don't want to disclose your technique/method/algorithm, don't. I can understand this can be proprietary (and very expensive) information. My first priority is to find out if it's possible to crack the problem. The method itself is secondary.