I tried to investigate the relationship between the manufacturing durable goods new orders indicator and the spread of cyclical and defensive stocks (consumer discretionary XLY and consumer staples XLP). Following is the Python code and textual comments.
Durable goods orders monthly data is one of the leading economic indicators. In theory, we can use it to predict the future, such as the behavior of the cyclical and defensive stocks spread. See the details here. I have downloaded the indicator data from the FRED here.
First, let's create the spread.
Now let's read the indicator data from the csv file.
Now we create the explanatory variables. I assume that in the first days of the current month, you can find out the data for the last month. This data will have an index of the first day of the previous month. For example, on the first day of December, they publish the November data. This data is indexed by November 1.
Let's say the number of durable goods orders dropped from October to November. You can find it out no earlier than December 1st, or even a few days later. It would be a data analysis mistake to assume that this information became known earlier. That is why I use newOrders['DGORDER'].shift(1) in the code.
Now let's create several resulting variables, i.e. the returns of the spread in the coming 1, 2, and 3 months.
We have the spread data in a daily format. The resulting variables should be in a monthly format, just like our explanatory variables. So we have to use the Pandas resample function.
Now let's create the full DataFrame with all the explanatory variables and results.
First of all, I tried the linear regression.
The results were as follows.
You can try all combinations of explanatory and result variables yourself and make sure none of them work. A different approach is needed. I tried to investigate situations where the indicator falls for 2, 3, or 4 months in a row.
The results were as follows.
After the indicator fell for two consecutive months, cyclical stocks are performing worse than defensive stocks. It is consistent with the hypothesis.
Then I used the t-test for the means of two independent samples to determine if this result was statistically significant.
The full code follows.
The results.
When preparing scientific articles, we want the p-value to be below 0.05. However, for practical applications, higher values are also appropriate. A drop in the indicator for two months in a row looks like a reliable signal to run away from cyclical stocks.
At the same time, when the indicator falls for four consecutive months, stocks behave strangely. Why? Because the Fed immediately floods the system with liquidity? Or is the sample size just too small?
I have also investigated the situations where the indicator has been growing for several months in a row, but I found nothing interesting except this.
Questions:
Durable goods orders monthly data is one of the leading economic indicators. In theory, we can use it to predict the future, such as the behavior of the cyclical and defensive stocks spread. See the details here. I have downloaded the indicator data from the FRED here.
First, let's create the spread.
Code:
import pandas as pd
import pandas_datareader.data as web
import numpy as np
from IPython.core.debugger import set_trace
import plotly.graph_objects as go
# from scipy import stats
import plotly.io as pio
symbol1 = 'XLY' # consumer discretionary
symbol2 = 'XLP' # consumer staples
stock1 = web.DataReader(symbol1, 'yahoo', '1-1-1993')
stock2 = web.DataReader(symbol2, 'yahoo', '1-1-1993')
s2 = pd.DataFrame(columns = stock1.columns)
# set_trace()
# s2['Open'] = stock1['Open']/stock2['Open']
# s2['High'] = stock1['High']/stock2['High']
# s2['Low'] = stock1['Low']/stock2['Low']
s2['Close'] = stock1['Close']/stock2['Close']
s2['Adj Close'] = stock1['Adj Close']/stock2['Adj Close']
s2 = s2.drop('Volume', axis=1)
s2 = s2.dropna()
Now let's read the indicator data from the csv file.
Code:
newOrders = pd.read_csv("DGORDER.csv", index_col='DATE', parse_dates=True)
Now we create the explanatory variables. I assume that in the first days of the current month, you can find out the data for the last month. This data will have an index of the first day of the previous month. For example, on the first day of December, they publish the November data. This data is indexed by November 1.
Let's say the number of durable goods orders dropped from October to November. You can find it out no earlier than December 1st, or even a few days later. It would be a data analysis mistake to assume that this information became known earlier. That is why I use newOrders['DGORDER'].shift(1) in the code.
Code:
newOrders['Trend1months'] = (newOrders['DGORDER'].shift(1) - newOrders['DGORDER'].shift(2))/newOrders['DGORDER'].shift(2)
newOrders['Trend2months'] = (newOrders['DGORDER'].shift(1) - newOrders['DGORDER'].shift(3))/newOrders['DGORDER'].shift(3)
newOrders['Trend3months'] = (newOrders['DGORDER'].shift(1) - newOrders['DGORDER'].shift(4))/newOrders['DGORDER'].shift(4)
Now let's create several resulting variables, i.e. the returns of the spread in the coming 1, 2, and 3 months.
We have the spread data in a daily format. The resulting variables should be in a monthly format, just like our explanatory variables. So we have to use the Pandas resample function.
Code:
spreadReturn1Month = (s2['Adj Close'].resample('BMS').first().shift(-1) - s2['Adj Close'].resample('BMS').first())/s2['Adj Close'].resample('BMS').first()
spreadReturn2Month = (s2['Adj Close'].resample('BMS').first().shift(-2) - s2['Adj Close'].resample('BMS').first())/s2['Adj Close'].resample('BMS').first()
spreadReturn3Month = (s2['Adj Close'].resample('BMS').first().shift(-3) - s2['Adj Close'].resample('BMS').first())/s2['Adj Close'].resample('BMS').first()
Now let's create the full DataFrame with all the explanatory variables and results.
Code:
full = pd.concat([newOrders['Trend1months'], newOrders['Trend2months'], newOrders['Trend3months']], axis=1)
spreads = pd.concat([spreadReturn1Month, spreadReturn2Month, spreadReturn3Month], axis=1)
spreads.columns = ['R1M', 'R2M', 'R3M']
spreads.index = spreads.index.map(lambda t: t.replace(day=1)) # essential operation
full = pd.concat([full, spreads], axis=1)
full.dropna(inplace=True)
First of all, I tried the linear regression.
Code:
import numpy as np
import statsmodels.api as sm
x = full['Trend3months']
y = full['R3M']
x, y = np.array(x), np.array(y)
x = sm.add_constant(x) # essential operation to build the design matrix before fitting the model
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())
The results were as follows.
R-squared: 0.002
Adj. R-squared: -0.002
t-statistics of the explanatory variable: -0.776
p-value: 0.439
You can try all combinations of explanatory and result variables yourself and make sure none of them work. A different approach is needed. I tried to investigate situations where the indicator falls for 2, 3, or 4 months in a row.
Code:
full['Fall2Months'] = (full['Trend1months'] < 0) & (full['Trend1months'].shift(1) < 0)
full['Fall3Months'] = (full['Trend1months'] < 0) & (full['Trend1months'].shift(1) < 0) & (full['Trend1months'].shift(2) < 0)
full['Fall4Months'] = (full['Trend1months'] < 0) & (full['Trend1months'].shift(1) < 0) & (full['Trend1months'].shift(2) < 0) & (full['Trend1months'].shift(3) < 0)
print('Fall 2 Months, Return 1 month')
print('True', full[full['Fall2Months'] == True]['R1M'].mean())
print('False', full[full['Fall2Months'] == False]['R1M'].mean())
print()
print('Fall 2 Months, Return 2 month')
print('True', full[full['Fall2Months'] == True]['R2M'].mean())
print('False', full[full['Fall2Months'] == False]['R2M'].mean())
print()
print('Fall 2 Months, Return 3 month')
print('True', full[full['Fall2Months'] == True]['R3M'].mean())
print('False', full[full['Fall2Months'] == False]['R3M'].mean())
The results were as follows.
Fall 2 Months, Return 1 month
True -0.0024623192321217378
False 0.0044396274379823915
Fall 2 Months, Return 2 month
True -0.006255083071738883
False 0.008920762498261766
Fall 2 Months, Return 3 month
True -0.009361925798950713
False 0.013035536247534455
After the indicator fell for two consecutive months, cyclical stocks are performing worse than defensive stocks. It is consistent with the hypothesis.
Then I used the t-test for the means of two independent samples to determine if this result was statistically significant.
The full code follows.
Code:
from scipy import stats
print('Fall 2 Months, Return 1 month')
print('True', full[full['Fall2Months'] == True]['R1M'].mean())
print('False', full[full['Fall2Months'] == False]['R1M'].mean())
t,p = stats.ttest_ind(full[full['Fall2Months'] == True]['R1M'], full[full['Fall2Months'] == False]['R1M'], equal_var = False)
print('p-value', p)
print()
print('Fall 2 Months, Return 2 month')
print('True', full[full['Fall2Months'] == True]['R2M'].mean())
print('False', full[full['Fall2Months'] == False]['R2M'].mean())
t,p = stats.ttest_ind(full[full['Fall2Months'] == True]['R2M'], full[full['Fall2Months'] == False]['R2M'], equal_var = False)
print('p-value', p)
print()
print('Fall 2 Months, Return 3 month')
print('True', full[full['Fall2Months'] == True]['R3M'].mean())
print('False', full[full['Fall2Months'] == False]['R3M'].mean())
t,p = stats.ttest_ind(full[full['Fall2Months'] == True]['R3M'], full[full['Fall2Months'] == False]['R3M'], equal_var = False)
print('p-value', p)
print('----------------------')
print('Fall 3 Months, Return 1 month')
print('True', full[full['Fall3Months'] == True]['R1M'].mean())
print('False', full[full['Fall3Months'] == False]['R1M'].mean())
t,p = stats.ttest_ind(full[full['Fall3Months'] == True]['R1M'], full[full['Fall3Months'] == False]['R1M'], equal_var = False)
print('p-value', p)
print()
print('Fall 3 Months, Return 2 month')
print('True', full[full['Fall3Months'] == True]['R2M'].mean())
print('False', full[full['Fall3Months'] == False]['R2M'].mean())
t,p = stats.ttest_ind(full[full['Fall3Months'] == True]['R2M'], full[full['Fall3Months'] == False]['R2M'], equal_var = False)
print('p-value', p)
print()
print('Fall 3 Months, Return 3 month')
print('True', full[full['Fall3Months'] == True]['R3M'].mean())
print('False', full[full['Fall3Months'] == False]['R3M'].mean())
t,p = stats.ttest_ind(full[full['Fall3Months'] == True]['R3M'], full[full['Fall3Months'] == False]['R3M'], equal_var = False)
print('p-value', p)
print('----------------------')
print('Fall 4 Months, Return 1 month')
print('True', full[full['Fall4Months'] == True]['R1M'].mean())
print('False', full[full['Fall4Months'] == False]['R1M'].mean())
t,p = stats.ttest_ind(full[full['Fall4Months'] == True]['R1M'], full[full['Fall4Months'] == False]['R1M'], equal_var = False)
print('p-value', p)
print()
print('Fall 4 Months, Return 2 month')
print('True', full[full['Fall4Months'] == True]['R2M'].mean())
print('False', full[full['Fall4Months'] == False]['R2M'].mean())
t,p = stats.ttest_ind(full[full['Fall4Months'] == True]['R2M'], full[full['Fall4Months'] == False]['R2M'], equal_var = False)
print('p-value', p)
print()
print('Fall 4 Months, Return 3 month')
print('True', full[full['Fall4Months'] == True]['R3M'].mean())
print('False', full[full['Fall4Months'] == False]['R3M'].mean())
t,p = stats.ttest_ind(full[full['Fall4Months'] == True]['R3M'], full[full['Fall4Months'] == False]['R3M'], equal_var = False)
print('p-value', p)
print('----------------------')
The results.
Fall 2 Months, Return 1 month
True -0.0024623192321217378
False 0.0044396274379823915
p-value 0.4574989679128597
Fall 2 Months, Return 2 month
True -0.006255083071738883
False 0.008920762498261766
p-value 0.2441344922396886
Fall 2 Months, Return 3 month
True -0.009361925798950713
False 0.013035536247534455
p-value 0.11950403760469207
----------------------
Fall 3 Months, Return 1 month
True -0.0025654197365067386
False 0.0037917519676142907
p-value 0.6940054770617398
Fall 3 Months, Return 2 month
True -0.0034694883984567166
False 0.007259726354037066
p-value 0.6386707707432691
Fall 3 Months, Return 3 month
True -0.004889260790210732
False 0.010555692188711241
p-value 0.5315743880415918
----------------------
Fall 4 Months, Return 1 month
True 0.0012916592321980608
False 0.0034186109984594044
p-value 0.9355384918743199
Fall 4 Months, Return 2 month
True 0.008044100644084017
False 0.006409793564085472
p-value 0.963977859829536
Fall 4 Months, Return 3 month
True 0.03200541474789461
False 0.00843808796970911
p-value 0.47567023526019114
----------------------
When preparing scientific articles, we want the p-value to be below 0.05. However, for practical applications, higher values are also appropriate. A drop in the indicator for two months in a row looks like a reliable signal to run away from cyclical stocks.
At the same time, when the indicator falls for four consecutive months, stocks behave strangely. Why? Because the Fed immediately floods the system with liquidity? Or is the sample size just too small?
I have also investigated the situations where the indicator has been growing for several months in a row, but I found nothing interesting except this.
Grow 3 Months, Return 1 month
True 0.01123047912735774
False 0.002778454327885034
p-value 0.3086477165213826
Questions:
- Am I wrong somewhere in the analysis?
- Was the use of the t-test justified?
- Is it worth using the obtained result for practical trading?