White Paper - DQT Crypto Factor Models - Welcome to Deep QuanTech

White Paper – DQT Crypto Factor Models

1. A Brief Theory of Factor Models – From Nobel to Wall Street

Under the canonical Brownian motion, a coin’s daily return \(\tilde{r}_t\) is subject to:
\[ \tilde{r}_t = \frac{dp_t}{p_t} = \alpha_t dt + \sigma_t dB_t. \]Sticking to the daily time scale, it can be discretized to:
\[ \tilde{r}_d = \alpha_d + \sigma_d z_d = \alpha_d + r_d = \text{trend + de-trended return}, \]
where \(\alpha_d, \sigma_d\) are daily alpha and vol, and \(z_d\) a canonical normal variable. \(r_d\) is the de-meaned or de-trended daily return of the coin. Below whenever we use the symbol \(r\), it always refers to \(r_d\). Every coin may have its own \(r\) or \(z\). Across the crypto universe, they are certainly not independent. A factor model of a given market assumes the existence of a set of common or systematic factors: \[ f_1, f_2, \cdots, f_K,\] so that every coin’s return is driven by them: \[ r(t) = \beta_1 \cdot f_1(t) + \beta_2 \cdot f_2(t) + \cdots + \beta_K \cdot f_K(t) + \delta(t),\] except for the residual \(\delta(t)\). The residual is idiosyncratic and cannot be explained by the common factors. Its volatility is hence called the “idiosyncratic” or “specific” risk of the coin: \[ \text{specific risk} = \sigma_s = \mathrm{vol}(\delta(t)) = \sqrt{ \mathrm{var}(\delta(t)) }.\] The beta’s are called factor loadings, and reveal how a coin dances with the general rhythms of its ambient market.

Factor models have been advanced from the celebrated 3-factor and 5-factor models of Nobel Laureate Eugene Fama et al. Mainstream TradFi factor-model vendors (e.g., Axioma or MSCI-Barra) further engineer them in great detail so that they are actually applicable to modern portfolio management. It is like going from the general theory of combustion to the actual delivery of Mercedes-Benz to your garage! So is our own effort for cryptos!

For more about the basic theory of factor models, we refer to the following two external sources:

“Linear Factor Models,” a friendly online page by Nobel Laureate Prof. William F. Sharpe at Stanford.
“Factor Models“, MIT’s Open Course, Math 18.S096, by Dr. Peter Kempthorne.

2. Why PCA Factor Models for Cryptos

For any given market, there are two types of factor models: Fundamental vs. PCA/Statistical.

2.1 Fundamental Factor Models

The fundamental factors are observable economic or financial metrics. For example, in Fama-French the 3 factors are the excessive market return over the risk-free rate, and the excess return of small caps over large caps, and the excess return of value stocks over growth stocks.

A modern vendor model may include even more fundamental factors to capture macro- or micro-economical and company-specific financial metrics, e.g., exchange rates, 10-year interest rates, market capitals, liquidity scores, momentums, or debts, asset values, equity values, and their ratios.

The fundamental factors are typically correlated. For example, interest rates and liquidity scores could be negatively correlated, since high interest rates imply high borrowing or capital-raising costs. Among all factors for equities, the sector and industrial-group factors are always crucial since companies are heavily influenced by sector trends. For example, all AI-related stocks are currently gaining unprecedented momentum due to the genuine dawning of the AI era.

2.2 PCA Factor Models

Fundamental factors are always derived from directly observable metrics, and hence also called “explicit.”

For a given market, to theoretically exhaust the set of explicit fundamental factors is not as easy as it sounds. Going from 3 of Fama-French to 15 or 30 in typical vendor models attests to this challenge. Insufficient factors artificially leak systematic risks, while too many factors result in overfits.

Thus arises the alternative – PCA-based statistical factor models, where factors are “implicit.” These implicit factors are not pre-defined via economic or financial metrics. Instead, the observed market return data govern what they should be. Hence, this is a purely data-driven approach.

Principal Component Analysis (PCA) efficiently captures the lower-dimension subspaces that massive data always tend to crowd around. It has solid theoretical foundations rooted in both linear algebra and statistics, and also enjoys well-established and efficient algorithms. For now and even far into the future of AI, DQT predicts its lasting power in the arena of big data analytics.

For factor models, each principal component supplies a factor, with its singular value revealing its significance for gauging the given market data (of coin returns). When components with lower significance are weeded out, PCA naturally results in a factor model. The retained factors are implicit but possess substantial explanatory power.

These data-driven factors do not easily engender economic or financial explanation as fundamental factors do. This view is valid but could also be merely a common bias against PCA. Sometimes PCA could reveal more if one zooms in. For example, fundamental factors rarely capture hidden temporal rhythms.

2.3 PCA for Cryptos

The crypto landscape is still rapidly evolving. Certain technical factors such as market sizes or momentums can be similarly established as for the equities market, but other fundamental factors are much more difficult.

The equity market enjoys balance-sheet transparency due to the quarterly filing mandates on public firms by the SEC. Due to a lack of industrial standards and regulatory requirements, such transparent data are simply not there for cryptos. For example, even the compositional information on assets backing up stablecoins is foggy.

Many may argue that all information is lying naked on associated blockchains. This could be an illusion as well. The ledgers record transactions, but do not tell how successful the associated project has been. For instance, the transactions of a coin issued by a new social network (intended to compete with Meta or X, say) do not reveal its weekly or monthly increment of subscribers. Nor do the ledgers tell us how many insiders are holding the coin and how much they have been dumping in the past month.

Therefore, at the moment it seems that PCA is the most natural choice for designing the first-generation factor models for the crypto world. DQT projects a waiting period of at least several years before robust fundamental models can meaningfully kick in.

3. Technical Specifications of DQT Factor Models

3.1 Long- and Short-Horizon Models

The attributes of the short-horizon model DQT-CRPT-FM-SH120F11 are:

Factor Types	#Factors	Covered Universe	Calibration Horizon
1Market + 10PCA	11	About 170~180 coins	120 days/4 months

The attributes of the long-horizon model DQT-CRPT-FM-LH240F13 are:

Factor Types	#Factors	Covered Universe	Calibration Horizon
1Market + 12PCA	13	About 170~180 coins	240 days/8 months

To benchmark, for a typical equities PCA factor model in TradFi, vendors usually settle for a horizon of 12 months and 20 PCA factors, in order to achieve the desired level of statistical stability.
The universes are automatically selected by the algorithm, and refreshed periodically. The algorithm takes into account ranking in terms of both market capitals and trading volumes, as well as the availability of market data over the horizons. The top or popular ones are always covered, except for USD stablecoins and wrapped coins. The algorithm does not discriminate against any particular coin.
The single market factor is calculated by the algorithm. The lack of sufficiently diverse indices like the S&P500 for equities makes it nontrivial to properly define “the market” for the entire crypto landscape.
The number of PCA factors is currently based on expert judgment for either model. The factors are orthonormal at the daily level so that the associated factor loadings (\(\beta_k\)’s) can be roughly of order 1.0.
Due to the singular distribution of market sizes of the crypto universe, the algorithm employs a proprietary scheme of weighted PCA analysis.

3.2 Meaning of the Published Model Parameters

The columns or fields in the downloadable samples of the CSV model files are explained as follows.

Column “date_model_calib” is the day T on which coin return data over [T-horizon, T-1] have been used, where horizon=120 for SH and 240 for LH.
Unlike TradFi factor models, we have published the factor loadings and factor covariance matrix via a single CSV file. As a result, “value_type==loadings” indicates rows for the factor loadings while “value_type==cov” rows for the covariance matrix. The corresponding numerical values are stored in the factor columns with headers “f_0, …, f_K” where K=10 for SH and 12 for LH.
The “alpha” column stores the daily-level alpha values (in percent) for individual coins that are observed on the model calibration day. Here we must reiterate (as in “Cautions and Limitations of Factor Models“) that these alphas are historically realized ones and generally different from the predictive alphas. TradFi vendor models do not publish them. Given the theoretical background in Section 1 above, we believe they are relevant.
The three vol columns “spec_vol”, “sys_vol”, and “tot_vol” refer to the specific volatility, systematic volatility, and total volatility for each coin. They obey the Pythagorean relation:\[ \sigma_{spec}^2 + \sigma_{sys}^2 = \sigma_{tot}^2\] In particular, only the specific vol is the truly independent information, which is why most equities vendor models in TradFi publish only it. The systematic vol is the vol of the systematic component (i.e., \(\sum_{i=1}^K \beta_k \cdot f_k\) as in Section 1) and can be derived from the factor loadings and the covariance matrix.
The “mcap_rank” column has been provided just FYI and could be useful to order the coins or rows. It is based on the latest reading of the market snapshot on “date_model_calib.”

Finally, clients are encouraged to search the internet for all possible use cases or examples of applying factor models, including but not limited to:

optimal portfolio construction,
optimal portfolio rebalancing,
optimal portfolio liquidation execution,
multi-name systematic market making,
optimal inventory control due to accumulated in-house or dealing positions,
firmwide or aggregated market risks and VaR calculations, and
Monte-Carlo based valuation and Greeks for vanilla or exotic basket options, etc.