How to calculate realtime stats?

That being said, in my opinion and having been down this path in 2016, you're barking up the wrong tree. The hedge funds have got Ph.D. Physicists, Mathematicians, and Computer Scientists doing same, but they have the fastest MD news and systems on earth crunching these numbers. They'll eat up that alpha before it even reaches your network card.

Unless you think Citadel hasn't thought of this, you must be more clever. Not a lot of room for the retail investor in today's market. It's like playing chess against a computer on highest level.

i think funds are dumber today that ever, there are very few guys making all the money and majority of them are using insider information. many big supposedly successful funds are money laundering for various entities including government related agenda's.

the rest who are making it honestly tried all the smartest people on the planet and are using more street sense trading based on ideas decades old and perfected.

very few people have the dedication to master an produce an edge and fewer still those who can implement it into action. some have succeeded and they have been at it for decades doing the same thing.

so there are many participants in the market, to think someone setting at home can't be a hummingbird is just defeatist thinking. just remember don't ever play another man's game and be honest with yourself to go far.

work hard and find a edge, they are out there - put in the time and find one that work's for you.

m
 
what I'm struggling is with how to actually implement the FIFO to make it efficient, there is what I have so far. so I will pop the last element and push a new element in to the list every 5 minutes (dt = 300 sec), of course the dt can be 1 hour, 5 days etc as it needs to be customized. It seems a convenient way to work but seems still slow.


rtq=[
{"TIME":0,'DATA':{'SPY':{'BID':431,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":300,'DATA':{'SPY':{'BID':431,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":600,'DATA':{'SPY':{'BID':431.5,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":900,'DATA':{'SPY':{'BID':431.75,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":1200,'DATA':{'SPY':{'BID':431.80,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":1500,'DATA':{'SPY':{'BID':431.89,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
]

import numpy

# numpy.mean([e['DATA']['SPY']['BID'] for e in rtq])
t0=time.time()
a=numpy.array([e['DATA']['SPY']['BID'] for e in rtq])
numpy.mean(a)
print(time.time()-t0)

--------------

>>> import numpy
>>> # numpy.mean([e['DATA']['SPY']['BID'] for e in rtq])
>>>
>>> t0=time.time()
>>> a=numpy.array([e['DATA']['SPY']['BID'] for e in rtq])
>>> numpy.mean(a)
431.49
>>> print(time.time()-t0)
0.15358877182006836


Note that you probably don't want your t0 and t1 to be a single price quote or you risk running your maths on an outlier.

I did it by first storing 15 minutes of price data in a circular FIFO. Then I'd compute my t0 as an average of n minutes, and my t1 as 15-n minutes. Ex: t0 as an average of 10 mins of prices; t1 as 5 mins of prices.

I was getting price data from IB, which was a VWAP of 250 Nasdaq price points at 1/ms (nasdaq runs at 1000 ticks/s). Four IB quotes/second * 60 seconds * 15 mins = 3600 price points in memory, per instrument tracked. FIFO Array: 0 to 3599 XD

I then computed the slope and some other statistical stuff. I recomputed every 10 seconds. If the stars aligned, wrote code to automatically take a position with a limit sell and stop.

The data collection is entirely separate from the statistical calculations. Nowadays they have stream handlers to make this easier; I had to write a lot myself in C++. Today I'd use Python or Kotlin, unless I had a screaming fast data feed (which is effing expensive).

I did a ton of work on this stuff.
 
what I'm struggling is with how to actually implement the FIFO to make it efficient, there is what I have so far. so I will pop the last element and push a new element in to the list every 5 minutes (dt = 300 sec), of course the dt can be 1 hour, 5 days etc as it needs to be customized. It seems a convenient way to work but seems still slow.


rtq=[
{"TIME":0,'DATA':{'SPY':{'BID':431,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":300,'DATA':{'SPY':{'BID':431,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":600,'DATA':{'SPY':{'BID':431.5,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":900,'DATA':{'SPY':{'BID':431.75,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":1200,'DATA':{'SPY':{'BID':431.80,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
,{"TIME":1500,'DATA':{'SPY':{'BID':431.89,'ASK':432},'TSLA':{'BID':744,'ASK':755}}}
]

import numpy

# numpy.mean([e['DATA']['SPY']['BID'] for e in rtq])
t0=time.time()
a=numpy.array([e['DATA']['SPY']['BID'] for e in rtq])
numpy.mean(a)
print(time.time()-t0)

--------------

>>> import numpy
>>> # numpy.mean([e['DATA']['SPY']['BID'] for e in rtq])
>>>
>>> t0=time.time()
>>> a=numpy.array([e['DATA']['SPY']['BID'] for e in rtq])
>>> numpy.mean(a)
431.49
>>> print(time.time()-t0)
0.15358877182006836
Bravo... impressive start! :thumbsup:

1. Take an OOP approach and create Rtq as a class, so that you can add methods and easily create new queue objects as needed.
2. Use collections.deque for the queue, and wrap it with your class, so that you have some methods available. Make the actual queue private, and create methods to operate thereon
https://docs.python.org/3/library/collections.html#collections.deque
3. I think that all this nesting within the object might be slowing things a bit. Flatten the whole thing, and keep it at one layer.
https://stackoverflow.com/questions/4151320/efficient-circular-buffer

I took a few minutes and coded out a little structure that makes sense to me. Note that this is the old-fashioned way. Async programming is a more modern approach to something like this, and multiprocessing is essential, but this is a great place to start, and I admire your ballz taking a stab at this.

What IDE are you using, and have you heard of something called the Global Interpreter Lock (GIL)?

Code:
// This is class is pseudocoded Python and won't compile
// self.q_ is private
// You don't need to store time because it's at regular intervals. Just store earliest queue time, multiply, and add.
class Rtq:
    def __init__(self, queueLength):
        self.q_ = collections.deque
        self.startTime

    # Queueing methods
    def push(self, val):
        self.q_.push(val);
    def pop(self, val):
        self.q_.pop(#next);
    // Etc..

    # Stat methods
    def mean(self):
        # You can use numpy or just do the math yourself
        return numpy.mean(q_)
    def median(self):
        return numpy.median(q_)
    def mode(self):
        return numpy.mode(q_)
    // Etc..


// Here you create a number of queues in a dictionary
// There's a few ways to speed this up if needed

instruments = {
    "TSLA": {
        "bid": Rtq(101),
        "ask": Rtq(101),
        "last": Rtq(101),
    },
    "AAPL": {
        "bid": Rtq(101),
        "ask": Rtq(101),
        "last": Rtq(101),
    }
}

// With OOP and a solid queuing structure, math is easy. Just call the method:
tslaBidMean = instruments["TSLA"]["bid"].mean()
tslaAskMean = instruments["TSLA"]["ask"].mean()
aaplAskMedian = instruments["AAPL"]["ask"].median()

// Also quite scalable. You can create queues for the entire SnP if you want.
 
Last edited:
Good to hear your feedback. I think deque is about what I'm gonna use, but the way I structure it might will be a bit different than you suggested.

1. Not sure what you mean IDE, its just simple Python code being called in command window.
2. did hear or use GIL, I'm using single thread processing, which of course now is still in experiment.




Bravo... impressive start! :thumbsup:

1. Take an OOP approach and create Rtq as a class, so that you can add methods and easily create new queue objects as needed.
2. Use collections.deque for the queue, and wrap it with your class, so that you have some methods available. Make the actual queue private, and create methods to operate thereon
https://docs.python.org/3/library/collections.html#collections.deque
3. I think that all this nesting within the object might be slowing things a bit. Flatten the whole thing, and keep it at one layer.
https://stackoverflow.com/questions/4151320/efficient-circular-buffer

I took a few minutes and coded out a little structure that makes sense to me. Note that this is the old-fashioned way. Async programming is a more modern approach to something like this, and multiprocessing is essential, but this is a great place to start, and I admire your ballz taking a stab at this.

What IDE are you using, and have you heard of something called the Global Interpreter Lock (GIL)?

Code:
// This is class is pseudocoded Python and won't compile
// self.q_ is private
// You don't need to store time because it's at regular intervals. Just store earliest queue time, multiply, and add.
class Rtq:
    def __init__(self, queueLength):
        self.q_ = collections.deque
        self.startTime

    # Queueing methods
    def push(self, val):
        self.q_.push(val);
    def pop(self, val):
        self.q_.pop(#next);
    // Etc..

    # Stat methods
    def mean(self):
        # You can use numpy or just do the math yourself
        return numpy.mean(q_)
    def median(self):
        return numpy.median(q_)
    def mode(self):
        return numpy.mode(q_)
    // Etc..


// Here you create a number of queues in a dictionary
// There's a few ways to speed this up if needed

instruments = {
    "TSLA": {
        "bid": Rtq(101),
        "ask": Rtq(101),
        "last": Rtq(101),
    },
    "AAPL": {
        "bid": Rtq(101),
        "ask": Rtq(101),
        "last": Rtq(101),
    }
}

// With OOP and a solid queuing structure, math is easy. Just call the method:
tslaBidMean = instruments["TSLA"]["bid"].mean()
tslaAskMean = instruments["TSLA"]["ask"].mean()
aaplAskMedian = instruments["AAPL"]["ask"].median()

// Also quite scalable. You can create queues for the entire SnP if you want.
 
Good to hear your feedback. I think deque is about what I'm gonna use, but the way I structure it might will be a bit different than you suggested.

1. Not sure what you mean IDE, its just simple Python code being called in command window.
2. did hear or use GIL, I'm using single thread processing, which of course now is still in experiment.
IDE means "Integrated Developer Environment." It's where you do pretty much everything with coding: write, test, debug, etc.

I strongly recommend that you install PyCharm and write all your code there. Community edition is free. It's infinitely better than the command window for dev, and what the pros use.
https://www.jetbrains.com/pycharm/
 
Okay, then the IDE I use for now is VS Code, which I think would be pretty much the same as pycharm.


IDE means "Integrated Developer Environment." It's where you do pretty much everything with coding: write, test, debug, etc.

I strongly recommend that you install PyCharm and write all your code there. Community edition is free. It's infinitely better than the command window for dev, and what the pros use.
https://www.jetbrains.com/pycharm/
 
Okay, then the IDE I use for now is VS Code, which I think would be pretty much the same as pycharm.
VS Code is pretty good... I use that quite a bit. Much better than the terminal, you must agree!

I'd say install PyCharm and try it. The guy who made the IB educational videos used it. I actually got him on the phone last year and chatted with him for about an hour. He recommended it PyCharm vs VS Code. It might make things a bit easier for you.

Go to the Educational Vids and filter by "TWS Python API." The UX is a little clumsy; see screenshot below.

Watch all the vids.

This vid will show you how to subscribe to data feeds to fill your FIFO queues:
https://www.interactivebrokers.com/...ideos/tws-python-market-data-candlesticks.php

https://www.interactivebrokers.com/en/index.php?f=14082
 

Attachments

  • tws_python.PNG
    tws_python.PNG
    516.2 KB · Views: 9
Last edited:
if you already have the current moving average you do not need to keep all the old values. Imagine if your window was in the billions, your buffer would be huge. you can subtract the mean and add the new value
 
Back
Top