Arithmax Research
← Back to Research

BTC/ETH Pairs Trading with Fractional Cointegration and Adaptive Stochastic Control

Abstract

This article introduces a novel pairs trading framework combining fractional cointegration, stochastic optimal control, and reinforcement learning. We extend traditional cointegration theory to capture long-memory dependencies and formalize trading decisions via Hamilton-Jacobi-Bellman equations. Key innovations include: 1) Volatility-adaptive thresholding with Gaussian Process optimization, 2) Fractional Ornstein-Uhlenbeck dynamics for spread modeling, 3) Deep RL agent for real-time parameter tuning, and 4) High-frequency P&L decomposition theorems. Backtests show 35% higher risk-adjusted returns versus benchmarks with 38% lower drawdowns. The mathematical framework solves critical limitations in existing statistical arbitrage literature.

Key Findings

Key Mathematical Equations

Cointegration Framework

Two time series are cointegrated if both are integrated of order d and there exists a vector β such that:

$$\beta^T \mathbf{P}(t) = P_1(t) - hP_2(t) \sim I(0)$$

The hedge ratio h is estimated via OLS regression:

$$\hat{h} = \frac{\sum_{t=1}^T P_1(t)P_2(t)}{\sum_{t=1}^T P_2(t)^2}$$

Fractional Cointegration

We extend to fractional cointegration where:

$$P_1(t) \sim I(d_1), \quad P_2(t) \sim I(d_2), \quad S(t) = P_1 - hP_2 \sim I(\gamma)$$

with γ < min(d₁, d₂). The fractional differencing parameter d is estimated via Geweke-Porter-Hudak estimator:

$$\ln I(\omega_j) = c - d\ln\left(4\sin^2(\omega_j/2)\right) + \epsilon_j, \quad \omega_j = \frac{2\pi j}{T}$$

Ornstein-Uhlenbeck Process

The spread S(t) follows an Ornstein-Uhlenbeck process:

$$dS(t) = \lambda(\mu - S(t))dt + \sigma dW(t)$$

where λ is mean-reversion speed, μ is long-term equilibrium, σ is volatility, and W(t) is a Wiener process.

Half-Life of Mean-Reversion

$$\tau_{1/2} = \frac{\ln 2}{\lambda} = -\frac{\ln 2}{\beta} \quad \text{where} \quad \beta = \frac{\text{Cov}(\Delta S_t, S_{t-1})}{\text{Var}(S_{t-1})}$$

Optimal Trading Band

The trading threshold θ is optimized via Sharpe ratio maximization:

$$\max_{\theta} \; SR(\theta) = \frac{\mathbb{E}[r(\theta)]}{\sigma[r(\theta)]}$$

where portfolio returns r(θ) are generated by:

$$r_t(\theta) = \begin{cases} \frac{S_{t-1} - \mu_S}{\sigma_S} \Delta S_t & \text{if } |z_{t-1}| > \theta \\ 0 & \text{otherwise} \end{cases}$$

Hamilton-Jacobi-Bellman Equation

The optimal trading problem formalized as stochastic control:

$$V(S,t) = \max_{\delta} \mathbb{E}\left[ \int_t^T e^{-\rho s} u(r_s) ds \;\bigg|\; S_t = S \right]$$

solving the HJB equation:

$$\sup_{\delta} \left[ \mathcal{L}V + u(r) \right] = 0$$

Adaptive Threshold Mechanism

$$\theta_t = \theta_{\min} + \frac{\theta_{\max} - \theta_{\min}}{1 + \exp\left(-k\left(\frac{\sigma_t^{\text{EWMA}}}{\sigma_0} - 1\right)\right)}$$

Reinforcement Learning Policy

The policy network maps state to actions:

$$Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha \left[ r_t + \gamma \max_a Q(s_{t+1},a) - Q(s_t,a_t) \right]$$ $$\pi(a|s) = \text{softmax}\left( \text{MLP}_{\phi}(s) \right)$$

Algorithms

Fractional Kalman Filter Update

Function KalmanUpdate(p1, p2):
    F ← [p2^H, (1-H) · p2^(H-1)]
    y ← p1 - F · w
    Q ← F · C · F^T + ε
    K ← C · F^T / Q
    w ← w + K · y
    C ← (I - K · F) · C
    Return w
                

Reinforcement Learning Trading Agent

Function TrainAgent(states, actions, rewards):
    π ← MLP(states)
    log π_a ← log(π[actions])
    loss ← -mean(log π_a · rewards)
    grads ← ∇_θ loss
    θ ← θ - α · grads
    Return π
                

Strategy Core Components

1. Cointegration Testing: Johansen's trace test
   J_trace(r) = -T Σ(i=r+1 to n) ln(1 - λ̂_i)

2. Spread Calculation:
   S(t) = P₁(t) - hP₂(t)

3. Z-score Calculation:
   z(t) = (S(t) - μ_S(t)) / σ_S(t)

4. Position Sizing: Kelly-optimal sizing
   f* = μ_r / (μ_r² + σ_r²)
                

Performance Comparison

Backtest results from January 2020 to December 2024 on cryptocurrency pairs:

Theoretical Contributions

  1. Fractional Ornstein-Uhlenbeck Process: Extended mean-reversion dynamics for spreads with long-memory dependencies
  2. High-Frequency P&L Decomposition: Rigorous theorem decomposing profit into drift, volatility tax, and jump components
  3. HJB Solution: Optimal statistical arbitrage formulation via stochastic control theory
  4. RL Convergence Proof: Theoretical guarantees for reinforcement learning-based trading agent

Risk Management

$$\text{Max Drawdown} < 5\%$$ $$\text{Value-at-Risk} < 2.5\% \quad \text{at 99\% CI}$$

Full Paper

Your browser cannot display the PDF inline.

Download PDF