how to protect trading strategies for a colocated server

heech · Dec 23, 2009

Quote from Jerry030:

A good analogy there is the work done by NSA crypto-analytics units.

Uh, that problem is almost trivial compared to the one you're describing here.

You know the input in that case consists of a few dozen alphanumeric characters, encoded in a known language. You know the "output" from the encryption engine, represents a one-to-one mapping from the input.

Furthermore, because you KNOW for a fact the original input can be systematically/programatically retrieved from the output... reasonable to guess some mathematical transform is involved.

That's *much* more than what you know in the case of your hypothetical trading strategy.

Jerry030 · Dec 23, 2009

Quote from heech:

Unless you're using some kind of a toy system... like a TDA/E-Trade "strategy generator"... I find it incredibly difficult to imagine that you could reverse-engineer an algorithm. Even if you know what the raw inputs are, you have no clue what the "features" being used by the strategy are.

I'm 99.99% sure that problem is computationally intractable.

OK, and what is your background in Computational Intelligence or Predictive Analytics?

Mine is about 15 years in data mining, business intelligence and related fields, consulting with major corporations and foreign governments on occasion. I work with about 8 major software systems in these fields (SAS, JMP, S-Plus, etc).

How many do you use?

heech · Dec 23, 2009

Quote from Jerry030:

OK, and what is your background in Computational Intelligence or Predictive Analytics?

Mine is about 15 years in data mining, business intelligence and related fields, consulting with major corporations and foreign governments on occasion. I work with about 8 major software systems in these fields (SAS, JMP, S-Plus, etc).

How many do you use?

In addition to a 15 year career in computer science, I studied quite a bit of pattern recognition and artificial intelligence while in my graduate program, at MIT. I also worked on an encryption-related project with Ron Rivest... you know, the 'R' behind RSA. So, when I said earlier the problem is computationally intractable, I meant it from a theoretical point of view.

Your experience in "business intelligence and related fields" is fascinating.

At the end of the day, I don't even need to compare resumes, even if mine does outshine yours. I'm fine with discussing facts.

trend2009 · Dec 23, 2009

besides system reversed engineered by your broker, which is a concern for anyone no matter he has colocated server or not. the major concern of server colocation is that the IT person in the broker company could steal your binary code and reversed engineered it. in that case, generating phony signals does not protect you at all, since your code is stolen.

Quote from Jerry030:

As noted on this thread about a dozen posts ago the intent is to protect a collocated server running a trading system. It was pointed out that one very effective way to protect your system from the network admin folks is to generate many false signals and have these separated into phony and real signals by a different co-located server at a different location. This quite workable approach was rejected, as it would take up too much time since the entire intent of co-location next to the exchange is to beat others by a few milliseconds.

You'd have to throw in a lot of false signals to make it safe form a GA/EA as the system could just bifurcate the data set into parts: finding a solution for each set, the real signals and no solution possible for the random ones since they are random.

If you really want to discourage folks form messing with the signals from the system I'd suggest sending out only a few phony signals for a period then during period of high volatility vary the mix to loose as ton of money with deliberate bad trades (filtered out by a second server before execution). The folks who though they could piggy back the system will loose their shirts.

Jerry030 · Dec 23, 2009

Quote from heech:

Uh, that problem is almost trivial compared to the one you're describing here.

You know the input in that case consists of a few dozen alphanumeric characters, encoded in a known language. You know the "output" from the encryption engine, represents a one-to-one mapping from the input.

Furthermore, because you KNOW for a fact the original input can be systematically/programatically retrieved from the output... reasonable to guess some mathematical transform is involved.

That's *much* more than what you know in the case of your hypothetical trading strategy.

Not really.

What makes you think that the folks that NSA wants to track aren't converting their instructions from Arabic to say Icelandic or Navaho and then translating it back on the other end? They aren't so stupid that they can't learn little known languages at least for the 50 words it would take for an attack.

And beyond that what if you use "eat the apple" for attack the airport on alternate Tuesdays and "my foot is sore" for the same instruction on every third Monday, etc, etc. Do you really think the decoded message reads "launch mortar attack on the Green Zone from the south side of 3rd Ave. and Al Gizera St. at 9AM on Tuesday 10/12/2009?

By contrast a trading system is a very constrained universe with few primitives: you know the market from the price feed/market traded as few could trade EUR/USD from a price feed of Corn futures. You know that the action is either no action (hold current position or stay out of the market if not in), or buy, sell (to enter or exit a current position), or adjust stops. What other primitive functions can you suggest?

I'd say this is a very simple universe compared to the thousands of targets one could decide to attack and the dozens of methods used to attack them and the ability to cloud this with phony signals (attack commands), in several thousand natural languages and dialects and a vast number of synthetic languages or a code within a code.

In any case Iâd be happy top run the process as a demo, if you are interested. Sometimes itâs more interesting to actually do something then to engage in abstract philosophical debateâ¦. and sometimes the real purpose is just the fun of debating, right?

Jerry030 · Dec 23, 2009

Quote from heech:

In addition to a 15 year career in computer science, I studied quite a bit of pattern recognition and artificial intelligence while in my graduate program, at MIT. I also worked on an encryption-related project with Ron Rivest... you know, the 'R' behind RSA. So, when I said earlier the problem is computationally intractable, I meant it from a theoretical point of view.

Your experience in "business intelligence and related fields" is fascinating.

At the end of the day, I don't even need to compare resumes, even if mine does outshine yours. I'm fine with discussing facts.

OK, thanks. And your current job these days, if not trading full time is? For me, in addition to market involvement, Iâm currently consulting on a multi billion dollar project (annual revenue) for the scientific R&D group at a major global corporation.

In terms of the facts then: letâs be very precise and say creating a series of models that are functionally equivalent to the hidden trading system such that the actual performance of systems H (hidden) and G (generated) vary by no more then 5% in terms of actual net dollars at the end of 200 trades.

heech · Dec 23, 2009

Quote from Jerry030:

Not really.

What makes you think that the folks that NSA wants to track aren't converting their instructions from Arabic to say Icelandic or Navaho and then translating it back on the other end? They aren't so stupid that they can't learn little known languages at least for the 50 words it would take for an attack.

If the NSA has little more than 50 words (perhaps a few hundred bytes) of data, even if transmitted in the clear, there's absolutely no way they can take meaning from it... except for meta-data associated with the transmission (the fact a code has never been used before is meaningful information).

At the same time, Navajo or Icelandic or Arabic or any other human language would still be a "known" language, and there are very well known patterns for each that makes decoding it much easier than randomly generated data.

By contrast a trading system is a very constrained universe with few primitives: you know the market from the price feed/market traded as few could trade EUR/USD from a price feed of Corn futures. You know that the action is either no action (hold current position or stay out of the market if not in), or buy, sell (to enter or exit a current position), or adjust stops. What other primitive functions can you suggest?

The *point* here is not just to witness the output, but be able to replicate it. The "simpler" the output signal, the "simpler" the input, the more difficult the replication task.

Let's use a standard pattern-recognition example. You want to write an application that recognizes the pattern of "fish". I'll put a 3-year old next to your application. The 3-year old will happily (and with very little training) look at items coming down an assembly line, and tell you whether it's a 1 (fish) or 0 (no fish).

If your application can replicate the 3-year old's fish pattern recognition capabilities just by looking at raw pixel data without domain knowledge of the specific feature set that the 3-year old is using (size, color, shape), you're already doing an incredibly impressive job.

Now, let's talk about what a serious quant fund may be doing. It's not a 3 year old dealing with a linear problem that can be broken down into convenient primary components; it's a 65 year old Middle Eastern Studies PhD distinguishing between "good" versus "bad" poetry written by some Armenian scholar 2000 years in the grave. If you can replicate that recognition process and start reviewing old literature just by monitoring input (a sequence of pixels)/output ("good" versus "bad"), then you deserve the Turing award.

As far as what I'm doing, I'm trading my own prop account with plans for a statistical arbitrage quant fund in Q1 of next year.

Gcapman · Dec 23, 2009

Pls help me understand the OP's question...

Is he referring to straight-up black-box automated trading strategies?

Or is he also referring to chart settings i.e., eSignal charts with RSI, ADX, Stochastics, etc. customized with non-default chart settings?

Excuse my lack of knowledge,,,, I'm just trying to understand the thread.

Thanks!

Jerry030 · Dec 23, 2009

Quote from heech:

If the NSA has little more than 50 words (perhaps a few hundred bytes) of data, even if transmitted in the clear, there's absolutely no way they can take meaning from it... except for meta-data associated with the transmission (the fact a code has never been used before is meaningful information).

At the same time, Navajo or Icelandic or Arabic or any other human language would still be a "known" language, and there are very well known patterns for each that makes decoding it much easier than randomly generated data.

The *point* here is not just to witness the output, but be able to replicate it. The "simpler" the output signal, the "simpler" the input, the more difficult the replication task.

Let's use a standard pattern-recognition example. You want to write an application that recognizes the pattern of "fish". I'll put a 3-year old next to your application. The 3-year old will happily (and with very little training) look at items coming down an assembly line, and tell you whether it's a 1 (fish) or 0 (no fish).

If your application can replicate the 3-year old's fish pattern recognition capabilities just by looking at raw pixel data without domain knowledge of the specific feature set that the 3-year old is using (size, color, shape), you're already doing an incredibly impressive job.

Now, let's talk about what a serious quant fund may be doing. It's not a 3 year old dealing with a linear problem that can be broken down into convenient primary components; it's a 65 year old Middle Eastern Studies PhD distinguishing between "good" versus "bad" poetry written by some Armenian scholar 2000 years in the grave. If you can replicate that recognition process and start reviewing old literature just by monitoring input (a sequence of pixels)/output ("good" versus "bad"), then you deserve the Turing award.

As far as what I'm doing, I'm trading my own prop account with plans for a statistical arbitrage quant fund in Q1 of next year.

Like I said with hundreds of combinations of actions/players/locations and languages NSA has their hands full in a very complex domain. There are ways around this such as the method the US Navy used to crack Japanese Navel codes during WWII (take an action that we cause the Japanese to make a know immediate coded transmission).

And also like I said trading has a very limited set of primary conditions and actions. Let's forget the fish and kids as its apples and oranges. You have a black box on a collocated server sending out possibly dozens or hundreds of orders per day. Given a couple of months worth of this it's is a much simpler task than military code breaking to create a surrogate that takes the same actions as the original. Note: I'm not saying you will know that the original takes the second order derivative of a particular wavelet function in relation to some other vector as a buy signal only that your can generate equivalent output. Who really cares about the details of the process if you can clone it in terms of results?

All the best with your plans for next year.

We are likely to spend a lot of effort talking past each other as we operate in different domains: the algo community likes to do a theory, turn it into a complex set of mathematical formulas and that is fun and the result has intrinsic beauty. I operate in the world of models and predictive analytics. I neither know how or why a model works due to the complexity of the process and product only that it does. No theory, no elegant set of cool equations just strange wave functions in n-dimension hyper space that predict the future tone, texture and direction of a market at a given event horizon into the future.

Jerry030 · Dec 24, 2009

Quote from Gcapman:

Pls help me understand the OP's question...

Is he referring to straight-up black-box automated trading strategies?

Or is he also referring to chart settings i.e., eSignal charts with RSI, ADX, Stochastics, etc. customized with non-default chart settings?

Excuse my lack of knowledge,,,, I'm just trying to understand the thread.

Thanks!

It's a bit foggy at this point.

The discussion started on how to keep secure a theoretical black box automated trading application collocated near the exchange from the hosting company.

It's transmuted into a debate on the relative difficulty of code breaking, if you will, between US military and related agencies trying to figure out what our enemies are talking about and breaking or duplicating the black box system on the collocated server assuming you know the price time series going in and the trades coming out, which any network admin guy can get with great ease.

Some say it impossible to use GA/EA and related methods to break the black box and I maintain that it can be done, mostly because I've done something similar already...that is creating a failure model for an existing unknown black box system. It predicts when the unknown system will have loosing trades so they can be filtered out. While this isn't recreating the entire system per say it is in the same ballpark.