1 Exploiting the Amazon.com People Who Bought Also Bought Algorithm in Reagent Selection Christian Tyrchan, iklas Falk and Jonas Boström
2 Setting the Scene The current trend is that drug discovery projects are treated as processes creativity might be hampered, and little room for Serendipity? We need new ways of working we want creative users (not feeling stuck in processes) Making novel compounds is at the heart of drug design Thus, the aim of the current work is to enhance discovery, surfacing reagents from deep in the catalog that our chemists wouldn't find on their own. Using a novel approach, where similarity is based on users (not structures).
3 Internet Success Stories ew Technologies ew Sciences Finite State Machines Item-to-Item Collaborative Filtering (ew approaches to improve searches)
4 Recommendation Systems are best known for their use on e-commerce Web sites. attempts to present items that are likely to be of interest to the user. The idea of recommending items at checkout is nothing new
5 The Harry Potter Shopping Cart Amazon.com saw the opportunity to personalize impulse buys
6 The Harry Potter Shopping Cart The idea of recommending items at checkout is nothing new
7 Recommendation Systems Typically, a recommender system compares the user's profile to some reference characteristics, and seeks to predict the 'rating' that a user would give to an item they had not yet considered. Should help a customer find and discover new, relevant, and interesting items Two main categories (based on how the recommendations are made): Content-based recommendations the information item user will be recommended items similar to the ones the user preferred in the past Collaborative recommendations social environment user will be recommended items that people with similar taste liked in the past
8 Content-based and Collaborative Systems Content-based recommendations nly the movies that have a high degree of similarity to what the user s preference are would be recommended. Collaborative recommendations start by finding a set of customers whose purchased items overlap the user s purchased items. The algorithm aggregates items from these similar customers, eliminates items the user has already purchased, and recommends the remaining items to the user. focus on finding similar users represents a user as an -dimensional vector of items.
9 Recommendations needed to work... from sparse data often just a few purchases. it needed to be fast high-quality in real-time. the system needed to scale to massive numbers huge amounts of data. the algorithm must respond immediately to new information customer data is volatile. one of the existing methods were good enough Traditional collaborative filtering does little or no offline computation, nline computation scales with the number of customers and catalog items. The algorithm is impractical on large data sets. Content-based recommendations no news (unless randomization)
10 Item-to-Item Collaborative Filtering item-to-item collaborative filtering matches each of the user s purchased items to similar items, then combines those similar items into a recommendation list. To determine the most-similar match for a given item, the algorithm builds a similaritems table by finding items that customers tend to purchase together. Amazon.com's item-to-item approach computes the cosine between binary vectors representing the purchases in a user-item matrix. Given two vectors of attributes (A and B) the cosine similarity (θ) is represented using a dot product and magnitude as: Recommendations based on items which are most similar to query item. Greg Linden et al. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering, IEEE Internet Computing, 2003, 7,
11 Since it works for Amazon.com, why not try it... to help medicinal chemist select reagents from chemical databases enhance discovery, surfacing reagents from deep in the catalog that our chemists wouldn't find on their own.
12 Exploiting the Amazon.com People Who Bought Also Bought algorithm in Reagent Selection ot only suggesting new reagents, but also solving problems? For example, suggesting possible bioisosters: + reductive amination R H R Final product may be genetoxic. Design idea to avoid AMES positives R H Genetoxic AMES test is one measure of genetic toxicity Aromatic amines are often unwanted fragments in drug design (GeneToxic). Regulatory view: If carcinogenic in animals, it will be a carcinogen in man.
13 Strategy Collect Data Set of Chemical Reagents Get Check-out information Generate Similarity Matrix using Cosine Similarities Import Matrix into an racle database Display Recommendations ISIS/db query items (reagents) which are most similar to query item (reagent). Check-out information
14 Reagent Data Set Extract reagents in Stockroom ( CIMS ) checked out the last 5yrs reagents Filter amount!=0 tweak-1 canonical SMILES generated counter salts were removed (and reagents merged) unique compound id s assigned unique Grouping Assign reagents into 10 functional classes, by SMARTS mapping: tweak-2 Times Check-ut Check-out only once reagents could be mapped onto the 10 functional classes. 194 unique chemists. Reagents
15 Tweak 1 counter-ions Ca 5000 entries include a counter-ion Different salts should give the same results For example, the reagent below exists with and without the hydrochloride salt F F ClH F F F F 3,3,3-TRIFLURPRPYLAMIE 3,3,3-TRIFLURPRPYLAMIE HYDRCHLRIDE The salts are removed, and the data are merged for the vectors.
16 Tweak 2 functional classes A search for amines should only recommend other amines + R reductive amination H R Class Reagents Freq FunctionalGroups primary and secondary amines acids, acid halides, anhydrides, carbamates, carbonates, esters aromatic halides alkyl halides sulphonyl chlorides alcohols aldehydes, ketones boronic acids, trifluoroborates isocyanates, isothiocyanates alpha halide ketones (dual functionalities counted twice)
17 Similarities Data binary User checked-out reagent (1), or not (0). Where the cosine between C0001 and C003 is: Item User C001 C002 C003 Anthony icholls Andrew Grant Morten Langgard = checked-out, 0 = not checked out Frequency almost all-against-all Binned Amazon.com Similarities* *Roughly 85% of the reagents belong in the zero bin
18 Architecture racle and MDL ISIS/Base not web-based system user rows user-by-item matrix item columns updates over-night possible
19 Results What does the frontend look like? Yet Another Similarity Measure? A Dream Come True? Possible ways forwards ther info revealed
20 Frontend, and That little bit extra riginal CIMS CIMS-Recommend Available amount Location
21 Amazon.com vs ther Similarities Lingos and 3 fingerprints are calculated (ECFP6, FPFP6, MDL Public keys). TopX hits compared to topx Amazon-hits. verlap (%) MaxHits* ECFP6 FPFP6 Lingo MDL Public Keys Amazon Hito Molame 1 C C C C0001 FP/Lingos Hito Molame 1 C C C C0134 Max C0955 Max C0251 Results show that Amazon recommendations are, more or less, orthogonal to other searching techniques.
22 Amazon.com vs ther Similarities Top 10 structures selected from the Amazon-like selection and the ECFP4 fingerprint method for two queries Amazon Top 10 H H H H H H F ECFP4 Top 10 Cl Br H H F F
23 Exploiting Recommendation Systems in Reagent Selection Design idea to avoid AMES positives + R reductive amination H R Search database for anline, and get Chemists who requested aniline also requested : All AMES negatives H S The advantage of such a feature is the inherent knowledge-transfer. In the dream scenario such a reagent suggestion could solve an existing problem.
24 Medicinal Chemistry Poll Pre-defined sets? To diverse recommendations? Already better! Since I get everything in one go
25 Most Frequently Checked-ut Reagents ther information easily accessible just ask the right question. Top5 amines H H H o. Checked-out Reagent Top5 aldehydes H o. Check-out Reagent
26 Summary Recommendation systems are useful alternatives to search algorithms since they help users to discover items they might not have found by themselves. We presented a novel dynamic similarity measure personalized information was used to produce reagent recommendations, using Amazon.com s item-to-item collaborative filtering technique. Low threshold for trying first prototype finished within 1-2 weeks (as all infrastructure was in place) maintaining data can readily be updated nightly, weekly In the dream scenario such a [reagent] suggestion could solve an existing problem. not there just yet (too little data need more info ) ur recommendations are, more or less, orthogonal to other similarity measures. Positive comments in small MedChem poll. In the end, what we want is happy satisfied customers!
27 Jens Sadowski for presenting! Acknowledgments
28 Exploiting the Amazon.com People Who Bought Also Bought Algorithm in Reagent Selection Abstract. Amazon.com s People who bought [this book] also bought [these books] is a popular feature on numerous web-sites nowadays. The use of such arecommendersystemcanbeexploitedinmanyareas,alsoindrugdesign.in the current work a system to recommend reagents has been developed, using the item-to-item collaborative filtering technique. The goal is to enhance discovery, surfacing reagents from deep in our corporate reagent database; reagents that medicinal chemists might not have found on their own. Another potential advantage of using personalized information is the inherent knowledge-transfer. That is, in a dream scenario a reagent recommendation could solve an existing problem. Moreover, this novel similarity measure differs from other similarity measures; as it is based on user-item information and not descriptions of molecular structures. It will be shown that the recommendations are, more or less, orthogonal to other methods.
Industry Report Amazon.com Recommendations Item-to-Item Collaborative Filtering Greg Linden, Brent Smith, and Jeremy York Amazon.com Recommendation algorithms are best known for their use on e-commerce
Chapter 9 Recommendation Systems There is an extensive class of Web applications that involve predicting user responses to options. Such a facility is called a recommendation system. We shall begin this
Application of Dimensionality Reduction in Recommender System -- A Case Study Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John T. Riedl GroupLens Research Group / Army HPC Research Center Department
You Might Also Like: Privacy Risks of Collaborative Filtering Joseph A. Calandrino 1, Ann Kilzer 2, Arvind Narayanan 3, Edward W. Felten 1, and Vitaly Shmatikov 2 1 Dept. of Computer Science, Princeton
Automatically Detecting Vulnerable Websites Before They Turn Malicious Kyle Soska and Nicolas Christin, Carnegie Mellon University https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/soska
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
Robust De-anonymization of Large Sparse Datasets Arvind Narayanan and Vitaly Shmatikov The University of Texas at Austin Abstract We present a new class of statistical deanonymization attacks against high-dimensional
DOCUMENT OVERVIEW Learn specific ways to optimize your RightNow knowledge base to help your customers find the information they need. Review these best practices, including tips on configuration settings,
Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta email@example.com Davood Rafiei University of Alberta firstname.lastname@example.org ABSTRACT
Indexing by Latent Semantic Analysis Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637 Susan T. Dumais George W. Furnas Thomas K. Landauer Bell Communications Research 435
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
Towards Statistical Queries over Distributed Private User Data Ruichuan Chen Alexey Reznichenko Paul Francis Johannes Gehrke Max Planck Institute for Software Systems (MPI-SWS), Germany Cornell University,
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku Stanford University email@example.com Rajeev Motwani Stanford University firstname.lastname@example.org Abstract We present algorithms for
Processing over Incomplete Autonomous Databases Garrett Wolf Hemal Khatri Bhaumik Chokshi Jianchun Fan Yi Chen Subbarao Kambhampati Department of Computer Science and Engineering Arizona State University
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica MIT Laboratory for Computer Science 1. Introduction The recent success of some widely deployed
USE-CASE 2.0 The Guide to Succeeding with Use Cases Ivar Jacobson Ian Spence Kurt Bittner December 2011 USE-CASE 2.0 The Definitive Guide About this Guide 3 How to read this Guide 3 What is Use-Case 2.0?
Improving Recommendation Lists Through Topic Diversification Cai-Nicolas Ziegler 1 Sean M. McNee ABSTRACT 1 Institut für Informatik, Universität Freiburg Georges-Köhler-Allee, Gebäude Nr. 51 79110 Freiburg
Using Case Studies to do Program Evaluation E valuation of any kind is designed to document what happened in a program. Evaluation should show: 1) what actually occurred, 2) whether it had an impact, expected
How Do You Know It? How Can You Show It? Penny Reed Wisconsin Assistive Technology Initiative Gayl Bowser Oregon Technology Access Program Jane Korsten Responsive Centers for Psychology and Learning Wisconsin
Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising Randall A. Lewis Yahoo! Research email@example.com Justin M.Rao Yahoo! Research firstname.lastname@example.org
What do you need to know to learn a foreign language? Paul Nation School of Linguistics and Applied Language Studies Victoria University of Wellington New Zealand 11 August 2014 Table of Contents Introduction
Perfect For RTI Getting the Most out of STAR Math Using data to inform instruction and intervention The Accelerated products design, STAR Math, STAR Reading, STAR Early Literacy, Accelerated Math, Accelerated
Climate Surveys: Useful Tools to Help Colleges and Universities in Their Efforts to Reduce and Prevent Sexual Assault Why are we releasing information about climate surveys? Sexual assault is a significant
MANAGEMENT SCIENCE Vol. 57, No. 8, August 2011, pp. 1485 1509 issn 0025-1909 eissn 1526-5501 11 5708 1485 doi 10.1287/mnsc.1110.1370 2011 INFORMS Deriving the Pricing Power of Product Features by Mining
A First Encounter with Machine Learning Max Welling Donald Bren School of Information and Computer Science University of California Irvine November 4, 2011 2 Contents Preface Learning and Intuition iii
Pearson Inform v4.0 Educators Guide Part Number 606 000 508 A Educators Guide v4.0 Pearson Inform First Edition (August 2005) Second Edition (September 2006) This edition applies to Release 4.0 of Inform