Olsen Data Filter

Human input errors as well as automated quoting algorithms run by market makers inevitably produce bad data—especially for non-binding quotes for OTC instruments.  Exchange-traded instruments suffer somewhat less from these problems, but tick data (for transactions or quotes) from exchanges also show signs of contamination.

Traders looking at real-time data graphs on-screen intuitively filter out bad data.  Therefore, using filter algorithms may seem trivial; but experience shows how quickly market changes can cause a filter to appear excessively conservative or brash.  The development of good filter algorithms borders on work in the area of artificial intelligence; the ability to adapt to rapidly evolving market conditions is paramount.

Any analysis of high-frequency data, as well as risk-management exercises (such as VaR calculations), are strongly influenced by bad data.  Olsen learned this 20 years ago when it began to publish the first papers ever that investigated the behavior of high-frequency market evolution.

The filter component is embedded within the collector process.  The Repository writes every tick appearing from the live data interface, but it assigns to every tick a credibility number between 0 and 1.  A credibility of 0.5 or above may be regarded as a default threshold (a higher credibility means the tick is good).  However, the data retrieval system—RIDE—can extract data with a custom credibility level, to allow the user to determine the stringency of the filter process.  So, while the credibility number is assigned when the data is written, filtering occurs only when the data is retrieved for use in an application.

The embedded filter also maintains a checkpoint of the filter state for each of the thousands of instruments being collected.  In case the system must be restarted, this enables the recovery of the latest filter state.


Technical Paper

See published Technical paper on the Olsen Data Filter.