LifeStyle Targeting on Big Data using Rapid Miner Maksim Drobyshev, Lifestyle Marketing Ltd NCT Group, Moscow Abstract Raw purchases/social data can be used to predict customers behavior towards offers within promotions by using RapidMiner LifeStyle Targeting extension 1. This has been tested in tens of original retail databases having different data structures without any data cleaning/preparation. Novell algorithm was developed to forecast target value based on alternate number of auto-generated attributes. For every customer, data can include different number of records containing date, target field, and other text or numeric fields of several data sources. All combinations of values, words, and periods in all fields generate customer and offer characteristics used to forecast offer results for every customer. Up to four billion unique sparse characteristics (attributes) are supported. Student s T-test proves confidence of revealed offer versus customer portrait in decision-tree form. The algorithm has no limitation on the structure of numeric characteristics. Ordered values also can be transformed to numeric order index, and textual values to numeric count of phrase usage. LifeStyle Targeting is only one example of many applications where target value prediction based on poorly structured or auto-generated characteristics can be useful. The target values and characteristics generation in LifeStyle Targeting operator is separated from LifeStyle Segmentation executable which can be used for other operators. A restricted free Windows version of LifeStyle Segmentation is delivered with the operator. 1. Business model and Target value LifeStyle Targeting is a tool for predicting sales value for the customer s future purchases based on purchases and other events in a past (History) period and planned offers. Observations used to forecast the effect of promotions on future purchase make use of the last available purchase data, while the future purchase period for them is unknown. To collect statistics for prediction, all fitting in the data past observations are generated with known customer behavior both in the target and history periods. 1 Patent pending.
Offer consists of controlled parameters like promoted goods, price etc. In the case of NetFlix film ratings (Picture 1), they are FilmName and Year. Picture 1: Screen-shot of setting parameters for LifeStyle Targeting the target value of every observation (profit) is calculated as sum of amount field in the target period minus cost of the promotion. For the presented NetFlix rating prediction, the prognosis is made for the future rating of the film minus minimal required rating, equal four stars out of five (Picture 2). Subtraction is done for clarity to mark the threshold: if the prognosis of target value is more than zero, the offer can be presented to the customer. Picture 2: Target Values and Periods Only past data can be used for the prognosis, so the data is divided into the 30- days target periods with previous 360-days periods as history periods.
2. Characteristics A customer s habit as a part of lifestyle is usually connected with some consumption of goods and services and leaves several signs in his purchase data, which enable to predict target values knowing other related signs. Characteristics represent such signs of customer s habits like values of goods purchases with particular words and phrases in goods or category names, words and phrases in questionnaire and CRM logs etc. In the NetFlix data, Customer ID, Date, Rating, Year were used and additional info was added for some films from DVD database (Picture 3): Picture 3: Rating and Film Data For a typical supermarket with tens of thousands stock keeping units articles on sale, number of unique characteristics in real projects accounts to millions, but every particular customer has only his tiny subset. LifeStyle Segmentation supports up to four billion unique characteristics in a single project. Picture 4: Generated Characteristics
When making a prognosis of offer value for the customer, both offer and customer characteristics are important. Their generation is detailed at Picture 3. With high portion of duplicates, different abbreviations, misspellings and other noise in purchases data, revealing keywords in textual fields proved to be a good practice. Stemmed fields can also be added to the data to be automatically tested for better prognostic value compared to other data. 3. Customer-Offer Portrait The Customer-Offer portrait divides all observations into segments with maximum division s additional profit compared to parent segment, provided Student s T-test for given confidence passed. Additional profit calculates as a sum of target value deviations in sub-segment from average target value in parent segment. Together this assures that on every level maximum financial result is received and that noise level is below user-defined threshold. On the Picture 5, root segment is marked with R symbol. On every level, best Characteristics and its value divide segment into sub-segments, adding 1 and 0 symbols for positive and negative segments. If for any reason any characteristics would be suppressed or become too noisy taking into account the Student s T-Test, the next best characteristics would be chosen as best segment division by LifeStyle Segmentation. Picture 5: Decision tree and the Segments.
As seen on Decision tree, joined DVD-database characteristics are also useful for forecast, despite of poor joining quality (intentionally no data cleaning was made). As a performance example, this 15-level Decision Tree with ~5k nodes for ~730k observations, ~240k unique Characteristics and 90% Confidence was built in 4.5 hours on notebook (2.7GHz). The corresponding throughput is ~2 million Characteristics items per second. 4. Prognosis In Prognosis mode, same characteristics with same algorithm are generated for all possible combinations of customers in the last target period of data and all possible offers. After dividing into the segments according to the best characteristics and their values, average target value in segment is used as prognosis. Offers filtered for positive prognosis values and can be used for suggestions. Consistency of process ensures that new Offers are generated by the same algorithm and segmented by the same and most financially valuable characteristics as observations in the historical data. Acknowledgements We would like to thank Simon Fischer of Rapid-I for his time, valuable help and suggestions.