Cluster Analysis for Evaluating Trading Strategies 1



Similar documents
Measuring and Interpreting the Performance of Broker Algorithms

Trading Around the Close

Adjusting for Size Liquidity and Risk Effects in Foreign Exchange Trading

Implementation Shortfall One Objective, Many Algorithms

How To Model Volume On A Stock With A Trading Model

Transaction Cost Analysis and Best Execution

Volatility Series Trading Halts

An introduction to measuring trading costs - TCA

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Understanding the Equity Summary Score Methodology

Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams

Transaction Cost Analysis to Optimize Trading Strategies

The matching engine for US Treasury Futures Spreads (CME).

The Cost of Algorithmic Trading: A First Look at Comparative Performance

Pairs Trading Algorithms in Equities Markets 1

Common Tools for Displaying and Communicating Data for Process Improvement

algorithmic & program trading services

Dong-Joo Kang* Dong-Kyun Kang** Balho H. Kim***

Strategic Advisers Fundamental Research Process: A Unique, Style-Based Approach

Demographics of Atlanta, Georgia:

Software Metrics & Software Metrology. Alain Abran. Chapter 4 Quantification and Measurement are Not the Same!

Auctions (Opening and Close) in NYSE and NASDAQ

Execution Costs. Post-trade reporting. December 17, 2008 Robert Almgren / Encyclopedia of Quantitative Finance Execution Costs 1

MiFID II, Research Unbundling, and What it Means for You

An Unconventional View of Trading Volume

Big Data, Big Decisions The coming sea change in technology investments

Grade 6 Mathematics Assessment. Eligible Texas Essential Knowledge and Skills

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Goldman Sachs Electronic Trading India: Algorithmic Trading. FIXGlobal Face2Face Electronic Trading Forum - India

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, , , 4-9

Statistics Chapter 2

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

PERFORMING DUE DILIGENCE ON NONTRADITIONAL BOND FUNDS. by Mark Bentley, Executive Vice President, BTS Asset Management, Inc.

sqore.swissquote.eu EURUSD USER GUIDE sqore.swissquote.eu OILUSD USDJPY OILUSD 1

Garbage In, Garbage Out: An Optical Tour of the Role of Strategy in Venue Analysis

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

Prediction of Stock Performance Using Analytical Techniques

Triton. All the trading power of ITG, now on your desktop. Client-Site Trading Products

Algorithmic and advanced orders in SaxoTrader

The Need for Speed: It s Important, Even for VWAP Strategies

Data Visualization Handbook

Data Visualization Techniques

RBC STAT - STATISTICAL TRANSACTION ANALYSIS TOOL ANALYZE. EXECUTE. EVALUATE.

De-Risking Solutions: Low and Managed Volatility

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Assessing the Risks of a Yield-Tilted Equity Portfolio

INDEX & ETF ASSET MANAGEMENT

The Scope of the Market for ETFs

Visualization methods for patent data

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

Construction and methodology. Russell Stability Index Series

Multiple Kernel Learning on the Limit Order Book

AP Physics 1 and 2 Lab Investigations

Risk Visualization: Presenting Data to Facilitate Better Risk Management

A New Quantitative Behavioral Model for Financial Prediction

Segmentation: Foundation of Marketing Strategy

Increasing Debit Card Utilization and Generating Revenue using SUPER Segments

Does trading at the Fix fix FX?

Exploratory Spatial Data Analysis

-6.0% -4.0% -2.0% 0.0% 2.0% 4.0% 6.0% Copyright by Otar & Associates

Interpreting Data in Normal Distributions

Anti-Gaming in the OnePipe Optimal Liquidity Network

Measuring the success of a managed volatility investment strategy

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

Risk Analysis and Quantification

BECS Pre-Trade Analytics. An Overview

Building and Interpreting Custom Investment Benchmarks

Examples of Data Representation using Tables, Graphs and Charts

CTA Trading Styles. Insights

Session 7 Bivariate Data and Analysis

Using News Articles to Predict Stock Price Movements

TEACHING AGGREGATE PLANNING IN AN OPERATIONS MANAGEMENT COURSE

Assessing Measurement System Variation

EQUITY RISK CONTROLS. FPL Risk Management Committee

Software User Experience and Likelihood to Recommend: Linking UX and NPS

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

Tracking Project Progress

Exploratory Data Analysis with R

Principal Component Analysis: A Tool for Analyzing and Describing CTA Programs

(1): 50 minutes None Whole Class N/A 03.SC.TE TE.2.2 Bar Graph, Graph, Line graph, Pie Graph

The Case For Passive Investing!

Pattern Recognition and Prediction in Equity Market

ITG Smart Limit Algorithm Low latency algorithm for passive trading

RapidMiner 5.2: Advanced Charts

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Data Analysis, Statistics, and Probability

A powerful dashboard utility to improve situational awareness of the markets, place precise orders, and graphically monitor trading positions.

A Review of Data Mining Techniques

ideas from RisCura s research team

Trade Execution Analysis Generated by Markit

Performance Level Descriptors Grade 6 Mathematics

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Transcription:

CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. Jeff.Bacidore@itg.com +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. Kathryn.Berkow@itg.com +1.212.444.6146 Ben Polidore Director, Algorithmic Trading, ITG, Inc. Benjamin.Polidore@itg.com +1.212.323.3408 Nigam Saraiya Vice President, Algorithmic Trading, ITG, Inc. Nigam.Saraiya@itg.com 1.212.444.6479 CONTACT Asia Pacific +852.2846.3500 Canada +1.416.874.0900 EMEA +44.20.7670.4000 United States +1.212.588.4000 info@itg.com www.itg.com Cluster Analysis for Evaluating Trading Strategies 1 ABSTRACT In this paper, we introduce a new methodology to empirically identify the primary strategies used by a trader using only post-trade fill data. To do this, we apply a well-established statistical clustering technique called k-means to a sample of progress charts, representing the portion of the order completed by each point in the day as a measure of a trade s aggressiveness. Our methodology identifies the primary strategies used by a trader and determines which strategy the trader used for each order in the sample. Having identified the strategy used for each order, trading cost analysis (TCA) can be done by strategy. We also discuss ways to exploit this technique to characterize trader behavior, assess trader performance, and suggest the appropriate benchmarks for each distinct trading strategy. BACKGROUND Assessing trader performance is challenging because traders often vary their strategies depending on the objectives of each trade. For example, when orders are benchmarked to the open, traders may front-load their trades, perhaps executing a large portion of the trade in the opening auction. For larger, more impactful orders, traders may choose to trade more passively, stretching the order over a longer period of time. Ideally, trading cost analysis (TCA) should take into account the trader s underlying strategy. In reality, doing so is challenging because 1) it is often unclear how to characterize the underlying strategies used by the trader and 2) even if the strategies were known, determining which orders apply to which strategy can be difficult if that information is not captured in post-trade databases. In light of these challenges, one common approach to assessing trader performance is to group trades by algorithm as a proxy for the trader s underlying strategy. If traders use specific algorithms to meet their objectives (e.g. using Close Algorithms for trades benchmarked to the close, VWAP Algorithms for trades benched to VWAP, etc.), this approach makes sense because the algorithm is the strategy. However, high-touch traders often use algorithms as tactics rather than strategies, switching between different algorithms within a given order. As a result, TCA by algorithm will 1 This is the submitted version of the following article: Cluster Analysis for Evaluating Trading Strategies, Jeff Bacidore, Kathryn Berkow, Ben Polidore, and Nigam Saraiya, The Journal of Trading Vol. 7 No. 3, 2012, Institutional Investor, Inc., which has been published in final form at: www.iijournals.com/doi/ abs/10.3905/jot.2012.7.3.006

2 not yield information about the effectiveness of the trader s hybrid strategy. Another commonly used approach to evaluate trader performance is to assess their performance in the context of average aggressiveness. For example, one could look at the average progress chart of a trader to see how passively or aggressively the trader tends to work orders, and assess performance in that context. Such averages may not be meaningful, however, as they aggregate across underlying strategies. For example, Figure 1 shows the aggregate fill progress chart for a single trader. From the graph, it would appear that this trader s underlying strategy is VWAP. However, in reality, this trader may have used multiple strategies that resemble VWAP in aggregate, even if the trader never actually targeted full-day VWAP on a single order. Figure 1. This is an example of the aggregate fill progress chart for all orders in a sample dataset. The horizontal axis represents time from 9:30 AM 9:45 AM (bin 1) to 3:45 PM 4:00 PM (bin 26); the vertical axis represents percent of the order completed. Analyzing trader performance correctly requires first identifying the different underlying strategies used by a trader and then aggregating orders by these strategies. In this paper, we present a new methodology that allows us to both identify the core trading strategies used by a trader and classify each of the trader s orders into these strategies empirically, without having to tag orders prior to execution. To do this, we first create a progress chart for each order and then apply a well-established statistical clustering methodology called k-means to identify the primary strategies used to execute these orders. The k-means methodology classifies each order within one of the strategies, allowing for analysis by strategy. This new approach to identifying trading strategies can be very useful when doing TCA, especially for high touch trading. First, our methodology can identify the underlying strategies used by each trader. Because of its dynamic nature, any new strategies employed will be uncovered even if traders change them over time. Second, for desks with multiple traders, our approach can be used to report which strategies are used by the desk as a whole and divide strategy usage by trader. Third, this type of granular trader-level analysis allows desks to assess relative trader performance as a means to share best practices, instead of simply measuring which trader is best. In particular, this analysis not only identifies which traders outperformed, but also helps explain why they outperformed. Finally, since these strategies can be represented graphically, we are able to infer what the trader s benchmark may have been for a given trade. For example, for highly front-loaded trades, the open may be the most relevant benchmark, while for back-loaded trades, the closing price may be more appropriate. As noted before, all this can be done empirically on a post-trade basis, so our approach does not require traders to enter additional data or for systems to be adapted to accommodate new post-trade strategy information.

3 METHODOLOGY Our methodology uses the intuition of a progress chart when characterizing a trading strategy, but applies a common clustering technique called k-means to divide the aggregate strategy into its component strategies in the same way a prism divides light into its component colors (as shown in Figure 2). The process begins by creating a progress chart for each order. Specifically, for each 15-minute period in the trading day (26 in total), it computes the cumulative fraction of the order that was completed by the end of that period, i.e., the progress of the order at that point. The trading strategy itself is represented by the collection of these 26 progress points, an example of which is given in Figure 1. These charts will always begin at 0% and end at 100%, and will increase as we move from left to right along the x-axis to represent the order s cumulative fill progress over the day. We then apply k-means to group them into k distinct trading strategies. Figure 2. The methodology takes an aggregate progress chart and splits it into its underlying component strategies. To understand how k-means works intuitively, assume that we break the trading day into 3 bins instead of 26 bins. For each order, we determine the percent of the order that was complete at the end of each bin. For example, suppose the trader executed a 10,000-share order by executing 2000 shares in bin 1, 1000 shares in bin 2, and 7000 shares in bin 3. Our methodology would characterize this order as a progress chart with the values 20%, 30%, and 100%, to represent the percent complete at the end of each bin. Since all orders are completed by the end of the last bin, all orders will have a value of 100% in bin 3. For this reason, we only need to look at the progress at the end of the first two bins when attempting to distinguish between strategies. 2 In Figure 3, we plot a sample of orders, where each black dot on the graph represents an order. The x-axis represents the percent of the order completed by the end of bin 1, and the y-axis represents the percent completed by the end of bin 2. In the 2 Adding the third bin where all orders take on a value of 100% to the k-means methodology does not provide any useful information in helping us differentiate between how the different orders were traded. So one can exclude the third bin from the k-means methodology without influencing the results.

4 example of the 10,000-share order above, the order can be represented graphically as the dot labeled X in Figure 3A. Since this order was 20% complete at the end of bin 1 and 30% complete by the end of bin 2, the point is represented with an x-axis value of 20% and a y-axis value of 30%. Figure 3. Illustration of k-means algorithm. In Figure 3A, the black dots are the existing, classified observations. The triangle in Figure 3B represents a new order that must be classified, and the squares represent the centers of the two existing clusters. The grey arrows show the distance between the new point and the existing clusters centers. The algorithm classifies the new point with the cluster whose center is the shortest distance from it. The black squares in Figure 3C represent the original cluster centers. The grey square is the updated center of the cluster with the additional order. Looking at Figure 3A, there are clearly two distinct groups of dots one cluster in the lower left quadrant and another in the upper right quadrant. Intuitively, these clusters represent the two distinct strategies that the trader used. The former represents orders that are executing slowly, i.e., those that have made relatively little progress after both bin 1 (x axis) and bin 2 (y axis). The latter represents orders that are being executed more quickly, where progress in both bin 1 and bin2 is significantly higher. In two-dimensions with a small amount of data, one could do cluster analysis visually, as in Figure 3A. When the data set is large or the number of dimensions is higher, as is the case here where we could have thousands of orders each split into 26 distinct bins, one must rely on statistical techniques to manage the clustering. This is where k-means methodology comes into play. The k-means algorithm begins by assigning k initial cluster centers, which can be specified by the user or selected randomly by the algorithm. Iteratively, the algorithm works through the sample, using a distance metric to assign each observation to the nearest cluster. Figure 3B provides an example of an iteration of k-means. Suppose we were to add a new observation, represented by the triangle in Figure 3B. K-means computes the distance between that point and the two existing cluster centers, represented by the squares in Figure 3B, to determine the nearest cluster. Since the triangle is closer to the left cluster, k-means assigns it to the left cluster. With the addition of a new data point, however, k-means must now compute a new cluster center. Figure 3C shows the new cluster center, represented by the grey square, which has shifted in the direction of the new observation. When cluster centers and assignments of observations stop changing dramatically, the algorithm stops. At this point, the output contains information on the k cluster centers, which can be used to characterize the group itself, as well as the assignment of each observation into a cluster. 3 In our specific application, the center point of a group characterizes the average progress chart of that strategy and the assignments indicate the strategy that each order most closely resembles. 3 See Johnson & Wichern (2007) and MacQueen (1967) for a detailed discussion of k-means.

EXAMPLE To demonstrate the methodology s effectiveness, we apply it to a sample of orders sent to two different algorithms over two different trading horizons to determine whether it can identify these four distinct algorithm-trading horizon combinations. Specifically, the sample includes both half-day and full-day 4 not-held market orders sent to either a VWAP or implementation shortfall (IS) algorithm 5 between January 1, 2011 and September 31, 2011. We limit our sample to orders greater than five hundred shares, ensuring orders were worked over time and not executed in one slice by the algorithm. With no strategy context, k-means identified the four trading strategies and classified orders within them with a high degree of accuracy. The results in Figure 4 show the trading strategies identified in the sample that comprise the VWAP-like aggregate progress chart shown in Figure 1. Figure 4A represents half-day VWAP orders, Figure 4B represents full-day VWAP orders, Figure 4C represents IS algo orders starting before 9:40 AM, and Figure 4D represents half-day IS algo orders. K-means was able to classify over 98% of the orders correctly. As shown in Table 1, VWAP orders were correctly identified more than 99.5% of the time. IS orders were identified correctly more than 98% of the time. Therefore, k-means was able to both correctly identify the four different strategies and assign orders to each strategy with precision. 5 Figure 4. Trading styles identified from post-trade data; example results for sample full- and half-day VWAP and IS algo orders. Order Type Accurary Half-Day VWAP 99.73% Full-Day VWAP 99.54% Full-Day IS 98.58% Half-Day IS 98.19% Table 1 Accuracy of k-means in assigning orders to strategies. APPLICATIONS This methodology can be used to assess trader performance in several ways. First, k-means can be used to identify underlying trading strategies for large client orders. Figure 5 shows the output for a hypothetical client. For this client, we see three distinct fill trajectories trading into the close (strategy A), front-loaded trading (strategy B), and participation-based trading throughout the day (strategy C). Another benefit of k-means is the ability to uncover less dominant strategies used by a trader. 4 Orders considered full-day arrived before 9:40 AM; orders considered half-day arrived between 12:00 and 12:50 PM. All VWAP orders ended after 3:20 PM, but there was no restriction on end time of IS orders. 5 Specifically, we include orders sent to ITG Active Algorithm, a single stock implementation shortfall algorithm.

This is evidenced in Table 2, which shows that only 5% of value was executed via strategy C. Here, k-means uncovered a minority strategy that may have been overlooked in a traditional analysis. In effect, our methodology gives traders the ability to experiment with trading strategies in real time without having to change their work flow to capture any strategy-level information. 6 Figure 5. Hypothetical client trades aggregated over the day and grouped by style via k-means. Three distinct trading strategies emerge from the data. Second, for desks with multiple traders, k-means can be used to help characterize strategies by trader. The diagrams in Figure 6 show trader usage of the strategies identified by k-means. For example, we can see that Trader 1 is the dominant user of strategy C, but C makes up only 25% of Trader 1 s trading. Using the k-means results, we can report how often each strategy was used and understand the trades composing each strategy by trader, fund, order size, market capitalization, time period, market conditions, or any combination thereof. Figure 6. Breakdown of trader usage of strategies for hypothetical client analysis shown in Figure 4 and Table 1. Traders within strategies (Figure 5B) and strategies within traders (Figure 5A). Beyond usage patterns, the k-means output allows us to evaluate trades according to appropriate benchmarks, identifying which strategies are most successful. Why compare all executions to the close benchmark if 10% of orders were actually front-loaded and 5% traded in a VWAP algorithm? The k-means results implicitly provide suggestions concerning the benchmark a given trader may have been targeting, which can help to better evaluate performance. For example, Trader 1 may use strategy A when benchmarked to close, B when benched to the open and C when benched to VWAP. Table 2 indicates that strategy A is performing well versus the close benchmark, strategy B is performing well versus arrival and open, and strategy C is performing well versus VWAP benchmarks. These results are intuitive since traders

7 likely target different benchmarks with different strategies. The ability to infer benchmarks is especially useful for traders whose systems do not permit benchmark information to flow to their post-trade databases. Strategy Orders % Value Arrival Open Close Performance (bps) Prev. Close Day VWAP A 10,334 46% -3-1 1 1-8 8 B 17,957 49% -6 4-2 -12-2 2 C 3,940 5% -17-13 -6-9 2 1 Table 2. Performance results for hypothetical client orders grouped into trading styles illustrated in Figure 1. Interval VWAP Finally, our methodology can help to evaluate trader performance in the context of the underlying trading strategies. If a given trader is under- or outperforming his peers, our methodology can help identify the strategies driving his relative performance. For example, if Trader 1 strongly underperforms his peers, it may be due to his overuse of strategy C, which Table 2 shows is the worst-performing strategy relative to the pre-trade cost benchmark. More generally, Table 2 shows which strategies do best against each benchmark, implicitly making suggestions for how to execute future trades. CONCLUSION In this paper, we provide a new methodology for identifying trading strategies using only post-trade data. Specifically, we apply a well-established statistical technique called k-means to both identify the primary strategies used by a trader and classify each order into one of these strategies. This approach is particularly useful since it does not require changes to trader workflows or post-trade systems to capture strategy or benchmark information. Once the underlying strategies have been identified and orders classified, TCA can be done by strategy. Analysis by strategy is crucial because the choice of strategy can often be the primary determinant of a trader s performance. Visual representations of the underlying strategies naturally suggest the trader s benchmark, yielding relevant and useful analysis. Results can be communicated both visually and numerically, making this a practical tool for any trader.

8 REFERENCES Johnson, R. A. and D. W. Wichern Applied Multivariate Statistical Analysis, Sixth Edition. Upper Saddle River, New Jersey: Pearson Prentice Hall, 2007. MacQueen, J.B. Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, Berkeley, CA: University of California Press (1967), 281-297. 2012 Investment Technology Group, Inc. All rights reserved. Not to be reproduced or retransmitted without permission. 50112-22067 Broker-dealer products and services offered by ITG Inc., member FINRA, SIPC. These materials are for informational purposes only, and are not intended to be used for trading or investment purposes or as an offer to sell or the solicitation of an offer to buy any security or financial product. The information contained herein has been taken from trade and statistical services and other sources we deem reliable but we do not represent that such information is accurate or complete and it should not be relied upon as such. No guarantee or warranty is made as to the reasonableness of the assumptions or the accuracy of the models or market data used by ITG or the actual results that may be achieved. These materials do not provide any form of advice (investment, tax or legal). ITG Inc. is not a registered investment adviser and does not provide investment advice or recommendations to buy or sell securities, to hire any investment adviser or to pursue any investment or trading strategy. The positions taken in this document reflect the judgment of the individual author(s) and are not necessarily those of ITG.