Phantom symbols are instruments that show up in your symbol list or dataset due to metadata, mapping, or normalization artifacts, but that do not represent a genuinely tradeable instrument at the time you analyze. They can be created by symbol renames, duplicated identifiers, stale catalog entries, merged markets, or incorrect joins across exchanges and instrument types.
Including phantom symbols can distort universe size, coverage statistics, and cross-sectional metrics. In backtests, they can create impossible trades (orders on an instrument that didn’t exist) or double-count exposure (the same market represented twice). Phantom symbols also waste API calls and complicate data quality debugging.
Typical causes include:
Use stable identifiers (for example, an exchange+instrument-specific symbol ID) and validate that each symbol has a plausible coverage window for the data type you use. When building a point-in-time universe, require that the symbol is eligible at time t and that your requested timestamps lie within its coverage window. Deduplicate using well-defined keys and test joins on small samples.
No. Delisted (historical) symbols were real instruments that existed and traded in the past. Phantom symbols are artifacts: they appear in your lists but are not valid instruments for the time period you’re analyzing, or they duplicate a real market due to mapping issues.
They can inflate diversification, create unrealistic opportunity sets, and introduce spurious returns/risks. For example, double-counting a market can make a strategy look smoother than it should, while trading on a symbol that never existed can create fills that could not have occurred.
Use point-in-time universe construction, stable symbol IDs, and strict join keys that include exchange and instrument type. Enforce coverage-window validation and deduplicate by identifier. Finally, add sanity checks (for example, expected counts per venue) to catch anomalies early.
An exchange renames a spot pair from ABC/USDT to ABCUSDT and both strings appear in a current catalog. If you join historical trades by base/quote assets and treat both as separate symbols, you may double-count volume and simulate holding two positions that are actually the same market.