Using Execution Benchmarks - Why ?

"However beautiful the strategy, you should occasionally look at the results"
Sir Winston Churchill

Using Execution Benchmarks - why bother ?

Benchmarking execution within FX has become standard practice. Benchmarks provide the reference points against which to measure performance, which needs to be viewed in both absolute and relative terms. When reviewing execution quality as part of a best execution policy, it is clearly necessary to be able to compare different counterparties, methods of execution (e.g. RFQ, Streaming, Voice,...), liquidity venues and execution products (e.g. algos) and such a relative comparison requires the use of standardised benchmarks.

So using benchmarks is a no-brainer, however, the selection of appropriate benchmarks, and ensuring they are computed using a standard methodology using consistent market data, is far from a no-brainer. Indeed, it can become a minefield of opaque complexity that can be extremely difficult to navigate. In this article we explore some of the larger landmines to look out for, and offer some suggestions for an approach for benchmark selection.

Benchmark Types

Let’s first explore the taxonomy of execution benchmarks. There are many different types of benchmarks out there, but they broadly fall into 2 categories as illustrated in the figure below: Point In Time or Average. Within the first of these categories, it is possible to further subdivide into 3 sub-categories: Risk Transfer, Arrival Price and Fixings. One could argue that fixings such as the WMR Fix are technically not a point in time benchmark as this rate is now computed as the median observed over a 5 minute window but however, for the sake of simplicity, we’ll include WMR in this sub-category together with other ‘snapshots’ taken during the day such as the ECB Fix. 


Even within relatively simple benchmark types, such as TWAP, there are layers of complexity. For example, what duration should be chosen ? Two types of TWAP are often referred to, Interval and Fixed Period. Interval TWAP corresponds the rate that is computed over exactly the same period that the order was active in the market, whereas Fixed Period are obviously values based on fixed durations such as 1 hour or 24 hours.

Ok, so, Arrival Price is just Arrival Price isn’t it ? Not necessarily. There are many different points in the trade lifecycle where an Arrival Price can be snapped. For example, for measuring implementation shortfall, you really should be measuring Arrival Price at multiple decision points including, for example, when the portfolio manager first decides to trade; when the order arrives at the execution desk; when the order is placed into the market; when the order is filled. In terms of basic performance measurement, it is the Arrival Price snapped at the last of these time stamps that is typically used. But care should be taken to make sure the definition is understood.

Data Sources & Methodologies

Ok, so the time you measure your Arrival Price is agreed, surely that must be it ? No further cause for confusion ? Well, not necessarily. Exactly what rate is taken at that precise time stamp ? Is it mid, or bid, or offer ? In our view, market convention should be that it is always the mid. However, this is not always the case as there are example post-trade reports out there that use an Arrival Price based on bid or offer. There’s nothing that says this is technically incorrect, it is just important that the convention is totally transparent. A bigger issue is that it makes it impossible to compare performance versus Arrival Price across providers if one uses bid/offer and other uses mid.

Surely, once the time stamp and rate is agreed, that is it ? No, unfortunately not. There’s still the issue of the data source used to compute the Arrival Price mid (or bid or offer). Each liquidity provider computes their own estimate of mid based on multiple price sources, often using algorithms that take into account fill ratios and the quality of available liquidity. For liquid currencies, during liquid times of the day, you would expect everyone’s mids to be pretty similar. However, in more volatile and less liquid conditions, or for less liquid pairs, differences will arise.

Unlike the Equity market, FX doesn’t have a ‘tape’, i.e. a central price source from which an ‘official’ market rate is published. There are discussions commencing in the industry about developing such a tape for FX, although it is probably fair to assume that this will take considerable time to implement, even if it did have the support of all the major market participants. There are a number of other initiatives seeking to establish a standard market mid for FX, which do at least provide a solution to allow comparative analysis.

And isn’t TWAP a simple, standard calculation meaning that all liquidity providers must be using the same numbers in their post-trade reports ? Nope, unfortunately not. Clearly the source of the data is one way that TWAPs will differ, as explored above, but another key factor is the precise methodology. For example, you could construct a TWAP from quoted mid rates sampled at different frequencies. Even if you sample over the same frequency, e.g. every second, do you sample at the beginning of the second, in the middle or at the end ? Also, you could use bid/offer quote data or even actual traded prices (paid/given) data instead of mid data. There are many reasons why TWAPs can differ, thus again, making it difficult to perform a like for like comparison of performance.

Benchmark Selection

A commonly asked question is what benchmark should be used ? There is no one right answer as it depends on a number of factors, including the purpose of the transaction, the objectives of the underlying portfolio and the mandate of the execution desk. It is our view, therefore, that benchmark selection is a client-specific process, that should be explained and justified in the best execution policy. It is very likely that within the same institution, a number of different benchmarks would be appropriate for the varying range of portfolio mandates. Indeed, for an individual portfolio, there is an argument that more than one benchmark would be valid to allow for measurement of different performance metrics.

Broadly speaking, in terms of trading purpose, FX transactions can be divided into 3 categories:

  1. Alpha – trades that are initiated with the objective of profit maximisation, over a range of different time horizons

  2. Funding – trades that are required in order to fund the purchase of securities or pay dividends etc

  3. Hedging – trades that are required to hedge foreign currency exposure of assets and liabilities back into the base currency of the portfolio or institution

Each category lends itself to different benchmark selection and it may be possible to group appropriate benchmark types for each, as exemplified in the diagram below.


So, for example, it would not make a lot of sense for an alpha trade, where the investment horizon is a matter of minutes, to be solely benchmarked to a 24 hour TWAP measure. In this case, Arrival Price and/or a Fair Value Risk Transfer may be more appropriate. Conversely, for a large funding trade conducted over an entire day, where the objective is to achieve as close to the day’s average as possible, Arrival Price would not be appropriate, whereas an Interval TWAP or VWAP may be better suited. The ‘no one size fits all’ cliché is very valid in the realms of benchmark selection for FX.


The world of execution benchmarks would appear to be a little more involved than one would think. It is clearly an area of the best execution process that requires careful thought at a strategic level in terms of selection, but also on a more tactical level when measuring and comparing performance on an ongoing basis to ensure that fair comparisons and conclusions are drawn. Selection needs to be client and portfolio specific, so it is important that any post-trade analytics allow for this flexibility. The chosen benchmarks can also be supplemented with other metrics, including measures of market impact and spread earning etc, to help complete and provide a holistic view of execution performance.

Navigating the potential pitfalls of the benchmark maze is critical in order to ensure appropriate usage of benchmarks, which are a necessary component of any best execution process. You can only manage what you can measure. Another very appropriate cliché as benchmarks are key to the measurement process. Careful selection and judicial usage of benchmarks provides the foundation for you to analysis the results of your trading strategies with confidence.