I was looking over the request again.. more to the specifics..
As with the stuff i am putting together, the hardest part (for a app engineer) is data aquisition (and potentially quality)
I ALSO looked over the answers from others, and the truth is Quantiacs or Quantopian may not do as they are not complete datasets or even near complete. Quantiacs is mostly futures as their customers want the size and liquidity that such can bring compared to the stock market (as far as i can tell from what work i have done). Quantopian on the other hand, restricts their data set to a subset of companies and other items to have certain qualities and so is also not as complete.
The data set I have contains 11,114 symbols
and it starts about 2006 and goes to 2019 (w 2020 pending)
though i have not had time to write code to go back and check for missing parts.
AND it has the caveat that it would be missing companies that went out of business before the date i initially created the symbol list and so had no symbol for me to pull data up on. there are a few other things about it, like figuring out how the sources handled splits so that this method can be known in any software one would write against it (given i am out of work, i dont have lots of time to work on anything that may not lead to earnings for my wife and family so things tend to sit due to priorities).
The data is OPEN, HIGH, LOW, ADJUSTED CLOSE, VOLUME, DIVIDENDS AND SPLITS (in separate tables).
Most of the open APIs that used to provide this information are closed..
Google API has stopped providing the data (see below)
Investopedia has also closed the door as its old link gives 404 error
This was part of the reason i stopped (temporarily?) collecting data..
As you may have heard from the Google Developers Blog, Google is doing an API spring cleaning. One of the APIs affected is the Google Finance API (both the Portfolio API and the Finance Gadgets and Tools API), which will be shut down on October 20, 2012.
This was not an easy decision to take. We work with a large number of data providers and need to respect our relationships with them. As a result, we had to keep the API restricted to end users, which prevented a meaningful ecosystem from growing around the API. We also realized that we could serve more people better by integrating the data into other Google products rather than requiring them to write code to access the data. For example, check out the GoogleFinance() function in Google Spreadsheets which replicates some of the API's functionality without requiring you to write code.
Thank you for being loyal users of the Google Finance API over the last few years.
Yahoo has made it harder (impossible?), to get it from their pages in a easy programatic way
however, if you wanted to look at one company, they still provide.
https://finance.yahoo.com/quote/AAPL/history?p=AAPL
but this is AJAX and only populates the whole table if you page down
and is 2019 to 2020.. (maybe more, i haven't really looked)
the download is this link
https://query1.finance.yahoo.com/v7...eriod2=1581870306&interval=1d&events=history&crumb=wDEQ8G8kzwU
But notice that last (bold part). That is a digital crumb that basically prevents easy access to the download data for the period you put in... I have not bothered to even try to get around it, or work with it... (yet?)
NASDAQ provides historical quotes (Date Close/Last Volume Open High Low) on their site on a company by company basisbut this is just NASDAQ not other areas
There are now MANY web businesses that offer historical data
they provide free accounts that are limited in the bandwidth, and so, if you want more, you have to pay more...
Some are limited by the number of daily requests, some are limited by the number of requests per minute or hour (which is the same thing in different ways of putting it)
As an example (i am NOT endorsing!!!!!!!!)
https://www.worldtradingdata.com/
they allow 250 historical requests a day
(which would take me 44 days to update my database given its ticker size)
Their plus account is $32 a month and lets you do 250,000
So i would be able to update my DB in a day (historically)
The advantage to this kind of company is that you get good data, you get what you need.
depending on company you may or may not pay a lot. Many of them also provide more granularity data than just the daily data (like what i have)..
but also provide intra-day as well..
Wife needs me now, so i will have to finish this later..
however, just because you have an idea of looking up things, doesnt mean that it will be fruitful... I go through lots of them that i think may look good, only to find out in back testing that over time and accross 300 companies (on either quantopian, quantiacs, or TD (limited)) that it does not do anything which i thought it would
in fact.. its very hard to beat two ema averages and a few crossover rules
which is why Quantiacs and Quantopian are now trying to shift their work to using the 1600 data points that morningstar provides including social data (like stock twits post rates... (given i am on stock twits i shudder to think they would use that data with the rampant lying bears thinking they can influence the market there))
IF one is curious why i have this data for myself or am trying it...
i have a Titan X card and can do Machine learning coding using the card..
which is why i am self teaching Python... (but given Bronx Ccience, 30 years of professional programming with last 15 in medical research, and many languages, this isn't that hard... the hardest part is grasping how they work with Numpy array data vs the more traditional ways).
Too bad i am not diverse (and young) enough to be employed any more...
![]()
to explain...
lets say a stock goes up on monday... is there a stock that will follow that price a day or two later? again, nothing new under the sun... its akin to finding trading pairs... asking the question, are there stocks that follow other stocks movements at a high enough percentage rate that one could use the action of one to get an edge on the action of another...
...
have lots of interesting ideas to try.. including a kind of tierra based genetic algorithm version
ie. create critters with large ability to make rule sets about stock... and evolve them to survive on earnings...
rule 1 SPYror001BarsAhead_02 <- /fitness 1.29702 /numWins 1104 /numHits 1687 /netWins 521 /mean 0.283658 /perfMeasure 527.289 /winPct 65.4416 /hitPct 31.6214 /numTxns 5335 /numInstr 300 /crc d45583c9f85fbec2
0000: if -0.817802 < QQQpobr005
0001: if 0.0386963 >= XARpobr005
0002: if XLPpobr003 > -0.578102
0003: if QQQpobr003 < -0.477997
0004: if XESpobr005 > -0.834
0005: if -0.795197 <= XBIpobr003
0006: if -0.394997 >= XLEpobr002
0007: if XLFpobr002 <= -0.259201
0008: return 3.19151
0009: if XLEpobr002 < 0.842499
0010: if 0.805809 > XLFpobr001
0011: if XLKpobr003 <= 0.178802
0012: if 0.932503 < IWMpobr001
0013: return 3.19151
0014: if -0.0888977 >= XLKpobr005
0015: if XLIpobr004 <= -0.649399
0016: if XLIpobr001 >= 0.210907
0017: if XLVpobr003 < 0.545502
0018: if -0.75 < MDYpobr002
0019: if -0.528297 <= XLYpobr002
0020: return 3.19151
0021: if 0.637993 >= XLYpobr003
0022: if 0.839996 < MDYpobr001
0023: if SPYpobr005 <= SPYpobr003
0024: if MDYpobr005 < -0.0628967
0025: if MDYpobr002 > -0.228203
0026: return 3.19151
...
return NAN
I tried something like this with a genetic programming rules generator. The result would be rules like
Code:rule 1 SPYror001BarsAhead_02 <- /fitness 1.29702 /numWins 1104 /numHits 1687 /netWins 521 /mean 0.283658 /perfMeasure 527.289 /winPct 65.4416 /hitPct 31.6214 /numTxns 5335 /numInstr 300 /crc d45583c9f85fbec2 0000: if -0.817802 < QQQpobr005 0001: if 0.0386963 >= XARpobr005 0002: if XLPpobr003 > -0.578102 0003: if QQQpobr003 < -0.477997 0004: if XESpobr005 > -0.834 0005: if -0.795197 <= XBIpobr003 0006: if -0.394997 >= XLEpobr002 0007: if XLFpobr002 <= -0.259201 0008: return 3.19151 0009: if XLEpobr002 < 0.842499 0010: if 0.805809 > XLFpobr001 0011: if XLKpobr003 <= 0.178802 0012: if 0.932503 < IWMpobr001 0013: return 3.19151 0014: if -0.0888977 >= XLKpobr005 0015: if XLIpobr004 <= -0.649399 0016: if XLIpobr001 >= 0.210907 0017: if XLVpobr003 < 0.545502 0018: if -0.75 < MDYpobr002 0019: if -0.528297 <= XLYpobr002 0020: return 3.19151 0021: if 0.637993 >= XLYpobr003 0022: if 0.839996 < MDYpobr001 0023: if SPYpobr005 <= SPYpobr003 0024: if MDYpobr005 < -0.0628967 0025: if MDYpobr002 > -0.228203 0026: return 3.19151 ... return NAN
where the rule classifies whether or not to take a trade. The trade in this example is go long SPY at the next trading day's close, and exit at the following trading day's open.
The rule's instructions are either if statements (indentation shows nesting), a return of a numeric value which means the trade should be taken, or return NAN at the end which means the trade should not be taken.
The arguments for the if statements compare an oscillator value that ranges from -1 through 1 to a constant or a different oscillator value. For example, "0000: if -0.817802 < QQQpobr005" means if the oscillator value for symbol QQQ with a period of 5 trading days is greater than or equal to -0.817802, continue to the next nested instruction. Otherwise, skip to the next non-nested instruction ("0009: if XLEpobr002 < 0.842499").
For this example, out of 5335 trading days simulating potentially entering trades from 1998-12-31 through 2020-03-12, the rule would have taken the trade 1687 times while predicting the correct class 1104 times. The class predicted is the top third of the values of percentage changes from the next day's close to the following day's open price. This rule resulted in a simulated mean gain of 0.28%. The mean for all 5335 trading days is 0.03%, so the rule shows significantly better results.
Is this similar to your ideas?
rule 1 SPYror000BarsAhead_02 <- /fitness 1.20881 /numWins 1036 /numHits 1578 /netWins 494 /mean 0.426047 /perfMeasure 506.344 /winPct 65.6527 /hitPct 29.5672 /numTxns 5337 /numInstr 300 /crc 77825b3ef8a99c2d
0000: if 0.531403 < XSDpobr004
0001: if 0.318192 <= XMEpobr003
0002: if 0.792603 <= XOPpobr003
0003: if XLEpobr005 >= -0.103798
0004: if XESpobr005 <= XESpobr003
0005: if QQQpobr002 > 0.303307
0006: return 4.38555
0007: if XLKpobr003 > -0.897697
0008: if SPYpobr003 < -0.631897
0009: if -0.569504 >= XLEpobr005
0010: if -0.222298 >= XLKpobr001
0011: if -0.800003 <= XLPpobr004
0012: if -0.801102 <= XSDpobr003
0013: return 4.38555
...
0296: if XLIpobr002 <= 0.3479
0297: if XRTpobr001 > 0.840401
0298: if XSDpobr002 <= -0.181396
0299: return 4.38555
return NAN