Analyzing Lead–Lag Effects Between Exchanges Using Flat Files

Price discovery in crypto markets happens across multiple exchanges simultaneously. When prices move, they may appear on one exchange slightly earlier than others before propagating across the market.

This is known as the lead–lag effect.

However, there is an important caveat:

There is no permanently “leading” exchange. Leadership changes depending on market conditions such as liquidity, trading activity, and time of day.

In this tutorial, you will learn how to analyze lead–lag relationships using historical trade data from the CoinAPI Flat Files API.

By the end of this tutorial, you will:

  • Load real trade data from Flat Files
  • Convert raw trades into time series
  • Measure lead–lag using cross-correlation
  • Interpret which exchange leads (for a given period)
  • Understand why results change over time

Open your terminal:

1mkdir lead_lag_analysis
2cd lead_lag_analysis
1touch analysis.py

Open this file in VS Code or your preferred editor.

1pip install pandas numpy matplotlib
1mkdir data

Place your Flat Files inside:

1lead_lag_analysis/
2 ├── analysis.py
3 └── data/
4     ├── IDDI-138123+SC-BINANCE_SPOT_BTC_USDT+S-BTCUSDT.csv.gz
5     └── IDDI-108735+SC-OKEX_SPOT_BTC_USDT+S-BTC__002DUSDT.csv.gz

We use:

1BINANCE_SPOT_BTC_USDT
2OKEX_SPOT_BTC_USDT

Why this matters:

  • same quote currency (USDT)
  • avoids distortion from USD vs USDT differences
  • isolates true market behavior

Before writing logic, inspect the dataset.

Run Python:

1import pandas as pd
2import gzip
3
4with gzip.open("data/IDDI-138123+SC-BINANCE_SPOT_BTC_USDT+S-BTCUSDT.csv.gz", "rt") as f:
5    df = pd.read_csv(f, sep=";")
6
7print(df.head())
1time_exchange;time_coinapi;price;base_amount;...
22026-02-01T00:00:00.0060000;...;78741.1;0.10172
32026-02-01T00:00:00.1920000;...;78741.1;0.00006
1time_exchange;time_coinapi;price;base_amount;...
22026-02-01T00:00:00.2290000;...;78739;0.001568
32026-02-01T00:00:00.3180000;...;78739;0.000635

👉 Notice:

  • timestamps are irregular
  • prices are similar but not identical
  • trades occur at different times

Open analysis.py and paste everything below:

1# =========================
2# CONFIGURATION
3# =========================
4
5import gzip
6import numpy as np
7import pandas as pd
8import matplotlib.pyplot as plt
9from pathlib import Path
10
11DATA_DIR = Path("data")
12
13BINANCE_FILE = DATA_DIR / "IDDI-138123+SC-BINANCE_SPOT_BTC_USDT+S-BTCUSDT.csv.gz"
14OKX_FILE = DATA_DIR / "IDDI-108735+SC-OKEX_SPOT_BTC_USDT+S-BTC__002DUSDT.csv.gz"
15
16# Resampling frequency controls the time resolution of the analysis.
17# Examples:
18# "1s"   = 1 second (recommended for beginners)
19# "500ms" = 0.5 seconds
20# "100ms" = 0.1 seconds (more precise but noisier)
21
22RESAMPLE_FREQ = "1s"
23# Number of lag steps to test on each side
24MAX_LAG = 10
25
26def load_trades(file_path, name):
27    with gzip.open(file_path, "rt") as f:
28        df = pd.read_csv(f, sep=";")
29
30    df = df[["time_exchange", "price", "base_amount"]].copy()
31
32    df["time_exchange"] = pd.to_datetime(df["time_exchange"], utc=True)
33    df["price"] = pd.to_numeric(df["price"])
34    df["base_amount"] = pd.to_numeric(df["base_amount"])
35
36    df = df.dropna().sort_values("time_exchange")
37    print(f"{name}: {len(df)} rows loaded")
38
39    return df
40
41def main():
42    binance = load_trades(BINANCE_FILE, "binance")
43    okx = load_trades(OKX_FILE, "okx")
44
45    print("\nSample Binance:")
46    print(binance.head())
47
48    print("\nSample OKX:")
49    print(okx.head())
50
51    binance_series = (
52        binance.set_index("time_exchange")["price"]
53        .resample(RESAMPLE_FREQ)
54        .last()
55    )
56
57    okx_series = (
58        okx.set_index("time_exchange")["price"]
59        .resample(RESAMPLE_FREQ)
60        .last()
61    )
62
63    combined = pd.concat([binance_series, okx_series], axis=1)
64    combined.columns = ["binance", "okx"]
65
66    combined = combined.ffill().dropna()
67
68    print("\nMerged sample:")
69    print(combined.head())
70
71    returns = combined.pct_change().dropna()
72
73    lags = range(-MAX_LAG, MAX_LAG + 1)
74    correlations = []
75
76    for lag in lags:
77        corr = returns["binance"].corr(returns["okx"].shift(lag))
78        correlations.append(corr)
79
80    best_lag = lags[np.argmax(correlations)]
81
82    print("\nBest lag:", best_lag)
83
84    plt.plot(lags, correlations)
85    plt.xlabel("Lag (seconds)")
86    plt.ylabel("Correlation")
87    plt.title("Lead–Lag Analysis")
88    plt.show()
89
90if __name__ == "__main__":
91    main()
1python analysis.py

Example output:

1
2Sample Binance:
3                     time_exchange    price  base_amount
40 2026-02-01 00:00:00.006000+00:00  78741.1      0.10172
51 2026-02-01 00:00:00.192000+00:00  78741.1      0.00006
62 2026-02-01 00:00:00.199000+00:00  78741.1      0.00025
73 2026-02-01 00:00:00.269000+00:00  78741.1      0.00025
84 2026-02-01 00:00:00.284000+00:00  78741.1      0.00150
9
10Sample OKX:
11                     time_exchange    price  base_amount
120 2026-02-01 00:00:00.229000+00:00  78739.0     0.001568
131 2026-02-01 00:00:00.318000+00:00  78739.0     0.000635
142 2026-02-01 00:00:00.504000+00:00  78739.0     0.000063
153 2026-02-01 00:00:00.505000+00:00  78739.0     0.000127
164 2026-02-01 00:00:00.505000+00:00  78739.0     0.000178
17
18Merged sample:
19                            binance      okx
20time_exchange                               
212026-02-01 00:00:00+00:00  78741.09  78738.9
222026-02-01 00:00:01+00:00  78739.35  78739.0
232026-02-01 00:00:02+00:00  78738.92  78742.0
242026-02-01 00:00:03+00:00  78741.10  78751.0
252026-02-01 00:00:04+00:00  78730.85  78739.3
26
27Best lag: 0

Trades occur at irregular timestamps across exchanges. To compare price movements, we convert them into a regular time series using a fixed interval (RESAMPLE_FREQ).

In this tutorial, we start with a 1-second interval, which provides a stable and easy-to-interpret baseline.

The RESAMPLE_FREQ parameter controls how granular your analysis is.

  • "1s" (1 second): Stable and easy to interpret (recommended for this tutorial)
  • "500ms" or "200ms": Captures finer market dynamics
  • "100ms" or lower: Reveals microstructure effects but may introduce noise

If the resolution is too coarse: → You may miss lead–lag effects entirely

If the resolution is too fine: → Data may become sparse and correlations unstable

You can experiment by changing:

RESAMPLE_FREQ = "1s"

to:

RESAMPLE_FREQ = "200ms"

and rerunning the analysis.

At a 1-second resolution, no clear lead–lag relationship is observed between Binance and OKX.

This means that price movements on both exchanges appear highly synchronized within each 1-second interval.

However, this does not necessarily mean that no lead–lag exists. Instead, it suggests that any lead–lag effect likely occurs at a finer time resolution (e.g., milliseconds), which is not captured at the current aggregation level.

Repeat the analysis with different datasets.

Example:

DateLeaderLag
Day 1Binance1s
Day 2OKX0.5s
Day 3None~0s

Try running the analysis with different resolutions:

  • RESAMPLE_FREQ = "1s"
  • RESAMPLE_FREQ = "500ms"
  • RESAMPLE_FREQ = "200ms"

Compare the results across runs.

You may observe:

  • No clear leader at 1-second resolution
  • Small lead–lag effects emerging at sub-second resolution

This demonstrates how market microstructure becomes more visible at finer time scales.

  • Lead–lag relationships are highly dependent on the chosen time resolution.
  • At coarser resolutions (e.g., 1 second), markets may appear perfectly synchronized.
  • At finer resolutions (e.g., milliseconds), small but meaningful delays between exchanges can become visible.
  • This reinforces that there is no permanently "leading" exchange — observed leadership depends on both market conditions and the level of analysis.
  • do NOT compare BTC/USD vs BTC/USDT
  • always use same quote currency
  • use high-liquidity exchanges
  • results depend on time period

In this tutorial, you used Flat Files to:

  • analyze trade-level data
  • measure lead–lag between exchanges
  • observe how price discovery propagates

The most important takeaway:

There is no single “fastest” exchange — the observed lead–lag relationship depends not only on market conditions, but also on the time resolution used in the analysis.