ETF Analysis Using Probabilistic Programing

Alex Shpurov
7 min readMar 4, 2021

https://www.etf-analytics.net/

The main idea behind this article, which is supported by historic experiments and studies, is that, if price of an asset is expensive or cheap, what is the chance of the price of the asset to go in the opposite direction and would the movement in the opposite direction be greater if the asset is too expensive or too cheap?

When someone is trying to buy an Exchange Traded Fund (ETF’s, which are a type of investment fund and can be bought and sold as regular stocks. ETF’s can specialize in one sector, such as tech or healthcare for example). The experiment looks to see if the price of the stock or ETF is higher or lower than normal. When the price of a ETF is low, there are negative factors that will impact the price. Once the negative factors are eliminated, the price will go back to normal. If the price of the asset is too high or too low, the price will become normal given enough time.

For example, the Standard and Poor, ticker: SPY (S&P 500, an index that has 500 of the largest companies in the United States) ETF. If we know the price history of the stock, then we can find how fast the stock grows per year and events that make the stock increase or decrease.

Standard and Poor 500
S&P 500 recent price (SPY ETF)

Below: the daily price changes of the S&P 500. The x-axis is daily return and y-axis is how often these opportunities happen.

S&P 500 Daily returns

Same chart as above, but as a histogram.

The majority of the returns are almost 0% with the daily returns being -2% to 2%.

The chart shows x is daily return and y is how often history repeats itself.

For normal data collections, 68% of data is in the two sectors with the darkest color below, -1 and 1. 95% of data is between two lines from the center, -2 and 2.

https://en.wikipedia.org/wiki/Standard_deviation
https://en.wikipedia.org/wiki/Standard_deviation

Based on the above graph, the range -2% to 2% is within the 68% of the data collected. So when we see a return of 1%, then we know that is a normal return. If the return is 3%, then we know that return is not normal as the normal return is between -2% to 2%.

S&P 500 returns standard deviation bands

By analyzing the return range, we can get a good idea of what is happening. It is quite rare for the returns to be less than -2% or more than 2%, but we need higher returns for trading and more opportunities. It is at this point where probabilistic programming shows its effectiveness. Probabilistic programming is a new field of artificial intelligence (AI) that shows promise.

Probabilistic programing

The programming creates a system that helps investors make decisions if there is uncertainty.

Probabilistic programming is a new programming paradigm and it has multiple implementations using different languages. The one we choose is a JavaScript implementation: WebPPL

The programs do have hidden variables and observations. For example, a coin toss has a 50% of being heads or tails. After 10 coin flips, we record the following results: [H,T,H,T,H,H,T,H,T,H]. If we want to calculate the hidden variable (or the coin’s bias in this case) with the following information:

  1. Our initial or prior believe that a coin should flip 50% on a head (prior probability)
  2. We have seen (observed) the following outcome after 10 flips [H,T,H,T,H,H,T,H,T,H]
  3. We want to adjust our initial assumption (1) to what we have seen to infer (2) -> (1) to readjust our initial believe (posterior probability)

The WebPPL program will look like the following:

// what we observed
var observedData = ['h', 't', 'h', 't', 'h', 'h', 't', 'h', 't', 'h']
var weightPosterior = Infer({method: 'rejection', samples: 200}, function() {
// sample a floating point random value from the interval [0..1]
var coinWeight = sample(Uniform({a: 0, b: 1}))
// flip a coin with probability ‘coinWeight’
var coin = Bernoulli({p: coinWeight})
// infer to the observed results
var obsFn = function(datum){observe(coin, datum == 'h')}
mapData({data: observedData}, obsFn)
return coinWeight
})
// show hidden probability
viz(weightPosterior)

The result is a normal distribution:

calculate biased coin distribution

Based on the graph above, we can conclude that the coin toss is biased, meaning it is not 50% change for either side and is in fact at 0.6 and not 0.5.

ETF Programming

We know the daily return of the S&P 500 is 0.13% and the deviation is 1.14%. Below is the WebPPL program that does that calculations:

// last 5 daily returns, %
var last5DailyReturnsPct = [0.493978, -0.086600, 0.022948, -0.425600, -0.176598]
var model = function(){// sample from Gaussian model
var x = gaussian(0.13,1.14)
// condition on the actual observations
map(
function(obs) {
factor(Gaussian({mu: x, sigma: 1.14}).score(obs));
}, last5DailyReturnsPct);
return x;
}
// Calculate bayesian inference
Infer({method: 'MCMC', samples: 100000}, model)

The result is a “adjusted” normal distribution

S&P 500 returns, posterior distribution

The new return is now -0.1%, as opposed to our guess of 0.13%.

The example we have shown is quite simple with just one condition, but it gives you an idea of how a probabilistic program works. Probabilistic programs usually have many stochastic rules and they could be as complex as regular programs.

We have created a website in which you can find ETF assets, Forex and Crypto markets, choosing them to your liking. This is a link to our website: https://www.etf-analytics.net/.

For each asset type, the program creates a signal on a z-score scale. What it means is that under most normal situations, the signal would be between -1 to 1 and in extreme situations, can become 4–6x more and possibly higher. A score that is positive means the asset is too expensive, if negative it means the asset is cheap.

The asset-cash ratio is also calculated (The cash asset ratio is the current value of marketable assets and cash, which is then divided by the current liabilities.). When the market is normal and the signal is 0, then the ratio is 100% (100% — asset, 0% — cash). If the signal is +1, which means the market is expensive, then the ratio could about 90% (90% — asset, 10% — cash). If the market is cheap and the signal is -1, then the ratio is 110% (100% — asset, 10% — cash to be borrowed).

Results

The chart below shows the signal for the S&P 500. The signal is shown by the smaller bars at the bottom of the graph.

https://www.etf-analytics.net/dashboard?t=SPY&type=etf

S&P 500 Signals and actively managed value

The tables below shows the recent maximum and minimum signals for the S&P 500. It also shows the comparison of the price next to the maximum and minimum that is within 6 months.

A Probabilistic program allows you a mathematical way to input your beliefs about dynamics that you are trying to model. It is an effective way when you understand the domain and reasoning behind it. The knowledge is obtained mostly from historical data, but the models should include other factors such as past and present events, current market situation, etc. This is the main difference against machine learning that relies mostly on historical data.

Machine learning (ML) is curve fitting, but does not know much about the business domain. ML, required significant data for the model to analyze, giving us the best results. More advanced ML such as deep neural networks require more data to train, which may be hard to obtain. Daily prices, for examples, are simply does not have enough data points for ML but they are good for probabilistic programming.

— —

Trading including ETFs involve numerous risks, including, among others, market risk, counterparty default risk and liquidity risk.
All data presented on the page is analytics based on the previous history only and not intended to be used as any trading advice.

--

--