#### 1. A Brief Theory of Factor Models – From Nobel to Wall Street

\[ \tilde{r}_t = \frac{dp_t}{p_t} = \alpha_t dt + \sigma_t dB_t. \]Sticking to the daily time scale, it can be discretized to:

\[ \tilde{r}_d = \alpha_d + \sigma_d z_d = \alpha_d + r_d = \text{trend + de-trended return}, \]

where \(\alpha_d, \sigma_d\) are daily alpha and vol, and \(z_d\) a canonical normal variable. \(r_d\) is the de-meaned or de-trended daily return of the coin. Below whenever we use

**the symbol \(r\), it always refers to \(r_d\)**. Every coin may have its own \(r\) or \(z\). Across the crypto universe, they are certainly not independent. A factor model of a given market assumes the existence of a set of

**factors: \[ f_1, f_2, \cdots, f_K,\] so that every coin’s return is driven by them: \[ r(t) = \beta_1 \cdot f_1(t) + \beta_2 \cdot f_2(t) + \cdots + \beta_K \cdot f_K(t) + \delta(t),\] except for the residual \(\delta(t)\). The residual is idiosyncratic and cannot be explained by the common factors. Its volatility is hence called the “**

*common or systematic***” risk of the coin: \[ \text{specific risk} = \sigma_s = \mathrm{vol}(\delta(t)) = \sqrt{ \mathrm{var}(\delta(t)) }.\] The beta’s are called**

*idiosyncratic” or “specific***and reveal how a coin dances with the general rhythms of its ambient market.**

*factor loadings**,**of*

**celebrated 3-factor and 5-factor models***Nobel Laureate Eugene Fama*et al. Mainstream TradFi factor-model vendors (e.g., Axioma or MSCI-Barra) further engineer them in great detail so that they are actually applicable to modern portfolio management. It is like going from the general theory of combustion to the actual delivery of Mercedes-Benz to your garage! So is our own effort for cryptos!

**:**

__two external sources__- “
,” a friendly online page by Nobel Laureate Prof. William F. Sharpe at Stanford.**Linear Factor Models** - “
“, MIT’s Open Course, Math 18.S096, by Dr. Peter Kempthorne.*Factor Models*

#### 2. Why PCA Factor Models for Cryptos

For any given market, there are two types of factor models: * Fundamental vs. PCA/Statistical*.

###### 2.1 Fundamental Factor Models

The *fundamental* factors are observable economic or financial metrics. For example, in Fama-French the 3 factors are the excessive market return over the risk-free rate, and the excess return of small caps over large caps, and the excess return of value stocks over growth stocks.

A modern vendor model may include even more fundamental factors to capture macro- or micro-economical and company-specific financial metrics, e.g., exchange rates, 10-year interest rates, market capitals, liquidity scores, momentums, or debts, asset values, equity values, and their ratios.

The fundamental factors are typically correlated. For example, interest rates and liquidity scores could be negatively correlated, since high interest rates imply high borrowing or capital-raising costs. Among all factors for equities, the ** sector and industrial-group** factors are always crucial since companies are heavily influenced by sector trends. For example, all AI-related stocks are currently gaining unprecedented momentum due to the genuine dawning of the AI era.

###### 2.2 PCA Factor Models

Fundamental factors are always derived from directly observable metrics, and hence also called “** explicit**.”

For a given market, to theoretically exhaust the set of explicit fundamental factors is not as easy as it sounds. Going from 3 of Fama-French to 15 or 30 in typical vendor models attests to this challenge. Insufficient factors artificially leak systematic risks, while too many factors result in overfits.

Thus arises the alternative – PCA-based statistical factor models, where factors are “* implicit.*” These implicit factors are not pre-defined via economic or financial metrics. Instead,

*the observed market return data govern what they should be*. Hence, this is a purely data-driven approach.

**Principal Component Analysis** (PCA) efficiently captures the lower-dimension subspaces that massive data always tend to crowd around. It has solid theoretical foundations rooted in both linear algebra and statistics, and also enjoys well-established and efficient algorithms. For now and even far into the future of AI, DQT predicts its lasting power in the arena of big data analytics.

For factor models, each * principal component* supplies a factor, with its singular value revealing its significance for gauging the given market data (of coin returns). When components with lower significance are weeded out, PCA naturally results in a factor model. The retained factors are implicit but possess substantial explanatory power.

These data-driven factors do not easily engender economic or financial explanation as fundamental factors do. This view is valid but could also be merely a common bias against PCA. Sometimes PCA could reveal more if one zooms in. For example, fundamental factors rarely capture hidden temporal rhythms.

###### 2.3 PCA for Cryptos

The crypto landscape is still rapidly evolving. Certain * technical* factors such as market sizes or momentums can be similarly established as for the equities market, but other fundamental factors are much more difficult.

The equity market enjoys balance-sheet transparency due to the quarterly filing mandates on public firms by the SEC. Due to a lack of industrial standards and regulatory requirements, such transparent data are simply not there for cryptos. For example, even the compositional information on assets backing up stablecoins is foggy.

Many may argue that all information is lying naked on associated blockchains. This could be an illusion as well. The ledgers record transactions, but do not tell how successful the associated project has been. For instance, the transactions of a coin issued by a new social network (intended to compete with Meta or X, say) do not reveal its weekly or monthly increment of subscribers. Nor do the ledgers tell us how many insiders are holding the coin and how much they have been dumping in the past month.

Therefore, at the moment it seems that PCA is the most natural choice for designing the first-generation factor models for the crypto world. DQT projects a waiting period of at least several years before robust fundamental models can meaningfully kick in.

#### 3. Technical Specifications of DQT Factor Models

###### 3.1 Long- and Short-Horizon Models

The attributes of the short-horizon model **DQT-CRPT-FM-SH120F11** are:

Factor Types | #Factors | Covered Universe | Calibration Horizon |
---|---|---|---|

1Market + 10PCA | 11 | About 170~180 coins | 120 days/4 months |

The attributes of the long-horizon model **DQT-CRPT-FM-LH240F13** are:

Factor Types | #Factors | Covered Universe | Calibration Horizon |
---|---|---|---|

1Market + 12PCA | 13 | About 170~180 coins | 240 days/8 months |

- To benchmark, for a typical
PCA factor model in__equities__**TradFi**, vendors usually settle for, in order to achieve the desired level of statistical stability.__a horizon of 12 months and 20 PCA factors__ - The universes are automatically selected by the algorithm, and refreshed periodically. The algorithm takes into account
in terms of both market capitals and trading volumes, as well as the__ranking__of market data over the horizons. The top or popular ones are always covered, except for USD stablecoins and wrapped coins. The algorithm does not discriminate against any particular coin.*availability* - The single
factor is calculated by the algorithm. The lack of sufficiently diverse indices like the S&P500 for equities makes it nontrivial to properly define “__market__for the entire crypto landscape.__the market”__ - The number of PCA factors is currently based on expert judgment for either model. The factors are orthonormal at the daily level so that the associated factor loadings (\(\beta_k\)’s) can be roughly of order 1.0.
- Due to the singular distribution of market sizes of the crypto universe, the algorithm employs a proprietary scheme of weighted PCA analysis.

###### 3.2 Meaning of the Published Model Parameters

The columns or fields in the downloadable samples of the CSV model files are explained as follows.

- Column “
**date_model_calib**” is the day T on which coin return data over [T-horizon, T-1] have been used, where horizon=120 for SH and 240 for LH. - Unlike TradFi factor models, we have published the factor loadings and factor covariance matrix via a
CSV file. As a result, “**single****value_type==loadings**” indicates rows for the factor loadings while “**value_type==cov**” rows for the covariance matrix. The corresponding numerical values are stored in the factor columns with headers “f_0, …, f_K” where K=10 for SH and 12 for LH. - The “
**alpha**” column stores the daily-level alpha values (in percent) for individual coins that are observed on the model calibration day. Here we must reiterate (as in ““) that these alphas are__Cautions and Limitations of Factor Models__ones and generally different from the predictive alphas. TradFi vendor models do not publish them. Given the theoretical background in Section 1 above, we believe they are relevant.__historically realized__ - The three vol columns
**“spec_vol”, “sys_vol”, and “tot_vol”**refer to the specific volatility, systematic volatility, and total volatility for each coin. They obey the Pythagorean relation:\[ \sigma_{spec}^2 + \sigma_{sys}^2 = \sigma_{tot}^2\] In particular, only the specific vol is the truly independent information, which is why most equities vendor models in TradFi publish only it. The systematic vol is the vol of the systematic component (i.e., \(\sum_{i=1}^K \beta_k \cdot f_k\) as in Section 1) and can be derived from the factor loadings and the covariance matrix. - The “
**mcap_rank**” column has been provided just FYI and could be useful to order the coins or rows. It is based on the latest reading of the market snapshot on “date_model_calib.”

Finally, clients are encouraged to search the internet for all possible use cases or examples of applying factor models, including but not limited to:

- optimal portfolio construction,
- optimal portfolio rebalancing,
- optimal portfolio liquidation execution,
- multi-name systematic market making,
- optimal inventory control due to accumulated in-house or dealing positions,
- firmwide or aggregated market risks and VaR calculations, and
- Monte-Carlo based valuation and Greeks for vanilla or exotic basket options, etc.