New Approaches to Idea Generation and Consumer Input in the Product Development

Transcription

1 New roaches to Idea Generation and Consumer Inut in the Product Develoment Process y Olivier Toubia Ingénieur, Ecole Centrale Paris, 000 M.S. Oerations Research, Massachusetts Institute of Technology, 00 SUMITTED TO THE SLON SCHOOL OF MNGEMENT IN PRTIL FULFILL- MENT OF THE REQUIREMENT FOR THE DEGREE OF DOCTOR OF PHILOSOPHY T THE MSSCHUSETTS INSTITUTE OF TECHNOLOGY JUNE 004 Massachusetts Institute of Technology. ll rights reserved. Signature of uthor: Sloan School of Management May 0 th, 004 Certified by: John R. Hauser Kirin Professor of Marketing Thesis Suervisor cceted by: irger Wernerfelt Professor of Management Science Chair, Sloan Doctoral Program

2 New roaches to Idea Generation and Consumer Inut in the Product Develoment Process y Olivier Toubia Submitted to the Sloan School of Management on May 0 th, 004 in artial fulfillment of the requirements for the Degree of Doctor of Philosohy in Management Science STRCT This thesis consists of five related essays which exlore new aroaches to hel design successful and rofitable new roducts. The rimary focus is the front end of the rocess where the roduct develoment team is seeking imroved inut from customers and imroved ideas for develoing roducts based on that inut. Essay examines whether carefully tailored ideation incentives can imrove creative outut. The influence of incentives on idea generation is studied using a formal model of the ideation rocess. ractical, web-based, asynchronous ideation game is develoed, allowing the imlementation and test of various incentive schemes. Using this system, an exeriment is run, which demonstrates that incentives do have the caability to imrove idea generation, confirms the rediction from the theoretical analysis, and rovides additional insight on the mechanisms of ideation. Essay rooses and tests new adative question design and estimation algorithms for artial-rofile conjoint analysis. The methods are based on the identification and characterization of the set of arameters consistent with a resondent s answers. This feasible set is a olyhedron defined by equality constraints, each aired-comarison question yielding a new constraint. Polyhedral question design attemts to reduce the feasible set of arameters as raidly as ossible. nalytic Center estimation relies on the center of the feasible set. The roosed methods are evaluated relative to established benchmarks using simulations, as well as a field test with 330 resondents. Essay 3 introduces olyhedral methods for choice-based conjoint analysis, and generalizes the concet of D-efficiency to individual adatation. The erformance of the methods is evaluated using simulations, and an emirical alication to the design of executive education rograms is described. Essay 4 generalizes the existing olyhedral methods for adative choice-based conjoint analysis by taking resonse error into account in the adative design and estimation of choice-based olyhedral questionnaires. The validity of the roosed aroach is tested using simulations. Essay 5 studies the imact of Utility alance on efficiency and bias. new definition of efficiency (M-efficiency) is also introduced, which recognizes the necessity to match reference questions with the quantities used in the ultimate managerial decisions. Thesis Suervisor: John R. Hauser Title: Kirin Professor of Marketing

3 cknowledgments When I first started being a student at MIT in Setember 999, my goal was to quickly get a Master s in Oerations Research. I had just gotten engaged. Today, on May 9 th 004, I am writing the acknowledgment section of my doctoral dissertation in marketing. My son Eitan has just wished my wife lexandra her first mother s day. What haened? One hyothesis is that I have been brainwashed and maniulated. nother hyothesis is that I have had the immense rivilege of being surrounded with outstanding eole over the ast five years. Let us focus on the second hyothesis. ctually, it started well before Setember 999. My arents have always rovided me with love, suort, and advice. They have taught me imortant values and heled me develo myself, without trying to force me into any redefined ath. The well being of their children (and now, grandchildren) has always been their first riority and they have always suorted our decisions as long as you re healthy and hay and if that s what you want. I hoe to always remember their lesson that taking care of others can be more rewarding that taking care on oneself. My late maternal grandfather, Poal, has been my greatest role model. I have always admired him, his strength, his wisdom, his willower and his ersicacity. I hoe and I believe that he would have aroved the career that I have adoted, and I like to think that he might have made similar decisions in similar situations. My late maternal grandmother, Mamounie, gave me her unconditional love and affection, and transmitted to me the value of academic achievement. Their memories have been with me all along, and will continue to guide my rofessional and ersonal lives. My aternal grandarents Mammy and Daddy have been a model of hoe and otimism, have taught me how to love and cherish life, and have shown me the imortance of family. My brother Didier and my sister Valérie have excelled at the difficult job of having me as a little brother, giving me advice and leading the way for me. Then, of course, came lexandra, who suorted my decision to go study abroad and with whom I have shared all the exciting, hay and uncertain moments of this adventure. She has been my wife, my friend, my comanion and my suorter. I would have never been able to have a balanced life at MIT without her. For me she gave u her hoe for a redictable life, and we are now ready to share whatever will come next. Finally, my cousin Isabelle and her family Philie, Nicolas and Lionel have been our family away from home. It has been great to share holidays with them, and to see Nicolas and Lionel grow. t MIT, I have had the rivilege to meet and work with some incredibly brilliant rofessors and fellow students. First, of course, my advisor, John Hauser, has been a erfect academic father. John has taught me how to aly mathematical tools to the study of marketing henomena, how to balance scientific inquiry with managerial relevance, how to aly the History of science to analyze the worth of a research roject, and how to exose research. His erfectionism, his extremely high ersonal standards, and his dedication to hard work have been insiring. He has been a mentor for me at all levels, training me on research, teaching, and on how to be a good husband and father. He has also been incredibly human, reassuring me in the most troubled times. Duncan Simester has continuously amazed me with his ability to understand, analyze and critique virtually any intellectual argument. He has taught me how imortant it is to aroriately osition a research roject and to isolate the essence of an idea. Ely Dahan has been one of the most influential ersons in my decision to ursue a PhD. eyond a colleague, he has been a friend and we will never forget this 4 th of July weekend with him and his family in Hyannis. I would also like to thank Dan riely for his benevolent feedback on my research, and for his generous funding. He has been an examle of dedication to research, and his PhD seminars fed not 3

4 only my body, but also my brain. Drazen Prelec has suorted me and advised me on my job market aer. I admire his calm and the deeness of his thoughts. I have also been fortunate to interact with John Little, who confirmed my hyothesis that the most brilliant and accomlished researchers are often also the most friendly and caring human beings. Glen Urban also confirmed this hyothesis. He introduced me to teaching and gave me an oortunity to better reare for my next job. irger Wernerfelt, by exosing me to his research aradigm, has forced me to challenge my own and, I believe, to imrove the rigor and the scientific value of my research. JinGyo Kim introduced me to the world of hierarchical ayes and heled me acquire some indisensable tools. I hoe that Shane Frederick made me a slightly smarter erson, by asking all those I.Q. questions at the lunch table. Rosa lackwood gave me some advice and suort, often working extra time for me although she does not even work for me! The marketing grou would not be the same without her good humor and her great cakes. Sandra Crawford-Jenkins shared with me her music and her kindness. I will also surely miss my fellow students. My office mate Wei Wu could not have been more leasant. My former office mate Robert Zeithammer made a little room for me in his office and guided me through my first year. On mir has been like a big brother, giving me some advice and sharing him wisdom with me. Jiwoong Shin has gone through the job market with me and offered me some guidance while studying for the general exams. Leonard Lee has been a lot of fun to be around and to study with. Nina Mazar has brought her sohisticated and delicate touch to the grou. Kristina Shaman er has contributed her wit and critical thinking. Ray Weaver, with his great sense of humor, his M and his two children, has not been a tyical first year student. I hoe that he will graduate before Shane turns him into an addicted gambler. I would like to conclude by thanking a very secial erson, who in just three months has changed my life forever and introduced me to the joys of fatherhood. I am roud to give to my son Eitan Toubia his first Google hit. 4

5 Table of Contents cknowledgments... 3 Chater : Idea Generation, Creativity, and Incentives Introduction...8. Theoretical nalysis n Ideation Game Exeriment Summary and Future Research... 3 References endix : roofs of the roositions Chater : Fast Polyhedral dative Conjoint Estimation Polyhedral Methods for Conjoint nalysis Polyhedral Question Design and Estimation Monte Carlo Simulations Results of the Initial Monte Carlo Exeriments The Role of Self-Exlicated Questions Emirical lication and Test of Polyhedral Methods Results of the Field Test Product Launch Conclusions and Future Research References... 9 endix : Mathematics of Fast Polyhedral dative Conjoint Estimation endix : Internal Validity Tests for Lato Comuter ags... 0 Chater 3: Polyhedral Methods for dative Choice-ased Conjoint nalysis Introduction Existing CC Question Design and Estimation Methods Polyhedral Question Design Methods Monte Carlo Exeriments lication to the Design of Executive Education Programs Conclusions and Research Oortunities

6 Endnotes... 3 References endix: Mathematics of Polyhedral Methods for CC nalysis Chater 4: Non-Deterministic Polyhedral Methods for dative Choice-ased Conjoint nalysis Introduction asic olyhedral methods for Choice-ased Conjoint nalysis Taking resonse error into account Simulations Conclusions and Future Research References Chater 5: Proerties of Preference Questions: Utility alance, Choice alance,... Configurators, and M-Efficiency Motivation Efficiency in Question Design Utility alance for the Metric Paired-Comarison Format Choice alance for the Stated-Choice Format Configurators Generalizations of Question Focus: M-Efficiency Summary and Future Research... 8 References endix

7 Chater : Idea Generation, Creativity, and Incentives bstract Idea generation (ideation) is critical to the design and marketing of new roducts, to marketing strategy, and to the creation of effective advertising coy. However, there has been relatively little formal research on the underlying incentives with which to encourage articiants to focus their energies on relevant and novel ideas. Several roblems have been identified with traditional ideation methods. For examle, articiants often free ride on other articiants efforts because rewards are tyically based on the grou-level outut of ideation sessions. This aer examines whether carefully tailored ideation incentives can imrove creative outut. I begin by studying the influence of incentives on idea generation using a formal model of the ideation rocess. This model identifies conditions under which it is efficient to simly reward articiants based on their contributions, and other conditions under which it is not. I show that in the latter case, the grou s outut can be imroved by rewarding articiants for their imact on the other articiants contribution. I then develo a ractical, web-based, asynchronous ideation game, which allows the imlementation and test of various incentive schemes. Using this system, I run an exeriment, which demonstrates that incentives do have the caability to imrove idea generation, confirms the redictions from the theoretical analysis, and rovides additional insight on the mechanisms of ideation. 7

8 . Introduction Idea generation (ideation) is critical to the design and marketing of new roducts, to marketing strategy, and to the creation of effective advertising coy. In new roduct develoment, for examle, idea generation is a key comonent of the front end of the rocess, often called the fuzzy front end and recognized as one of the highest leverage oints for a firm (Dahan and Hauser 00). The best known idea generation methods have evolved from brainstorming, develoed by Osborn in the 950 s (Osborn 957). However, dozens of studies have demonstrated that grous generating ideas using traditional brainstorming are less effective than individuals working alone (see Diehl and Stroebe 987 or Lamm and Trommsdorff 973 for a review). One cause identified for this oor erformance is free riding (Williams et al. 98, Kerr and ruun 983, Harkins and Petty 98). In articular, articiants free ride on each other s creative effort because the outut of idea generation sessions is tyically considered at the grou level, and articiants are not rewarded for their individual contributions. This suggests that idea generation could be imroved by roviding aroriate incentives to the articiants. Note that this free riding effect is likely to be magnified as firms seek inut from customers (who tyically do not to have a strong interest in the firm s success) at an increasingly early stage of the new roduct develoment rocess (von Hiel and Katz 00). Surrisingly, agency theory aears to have aid little attention to the influence of incentives on agents creative outut. On the other hand, classic research in social sychology suggests that incentives might actually have a negative effect on ideation. For examle, the Hull- Sence theory (Sence 956) redicts an enhancing effect of rewards on erformance in simle tasks but a detrimental effect in comlex tasks. The reason is that rewards increase indiscriminately all resonse tendencies. If comlex tasks are defined as tasks in which there is a redisosition to make more errors than correct resonses, then this redisosition is amlified by rewards. Similarly, the social facilitation aradigm (Zajonc 965) suggests that, insofar as incentives are a source of arousal, they should enhance the emission of dominant, well-learned resonses, but inhibit new resonses, leading eole to erform well only at tasks with which they are familiar. McCullers (978) oints out that incentives enhance erformance when it relies on making simle, routine, unchanging resonses, (. 4) but that the role of incentives is far less 8

9 clear in situations that deend heavily on flexibility, concetual and ercetual oenness, or creativity. McGraw (978) identifies two conditions under which incentives will have a detrimental effect on erformance: first, when the task is interesting enough for subjects that the offer of incentives is a suerfluous source of motivation; second, when the solution to the task is oen-ended enough that the stes leading to a solution are not immediately obvious. (. 34) In this aer I examine whether carefully tailored ideation incentives can imrove creative outut. In Section, I study the influence of incentives on idea generation using a formal model of the ideation rocess. In Section 3, as a ste towards imlementing and testing the conclusions from Section, I resent a ractical, web-based, asynchronous ideation game, which allows the imlementation and test of various incentive schemes. Finally, in Section 4, I describe an initial exeriment conducted using this system. This exeriment demonstrates that incentives do have the caability to imrove idea generation, is consistent with the redictions from the theoretical analysis, and rovides additional insight on the mechanisms of ideation. Section 5 concludes and suggests some ossibilities for future research.. Theoretical nalysis The goal of this section is to develo a theoretical framework allowing the study of idea generation incentives. One characteristic of ideas is that they are usually not indeendent, but rather build on each other to form streams of ideas. This is analogous to academic research, in which aers build on each other, cite each other, and constitute streams of research. Let us define the contribution of a stream of ideas as the moderator s valuation for these ideas (for simlicity I shall not distinguish between the moderator and his or her client who uses and values the ideas). The exected contribution of a stream of ideas can be viewed as a function of the number of ideas in that stream. Let us assume that the exected contribution of a stream of n ideas is a non-decreasing, concave function of n, such that each new idea has a non-negative exected marginal contribution and such that there are diminishing returns to new ideas within a stream. Note that the analysis would generalize to the case of increasing, or inverted U-shaed, returns. (This might characterize situations in which the first few ideas in the stream build some foundations, allowing subsequent ideas to carry the greatest art of the contribution.) The model roosed here is based on Kuhn s notion of an essential tension between convergent thinking and divergent thinking (Kuhn 977, Simonton 988). Divergent thinking 9

10 involves changing ersectives and trying new aroaches to roblems, while convergent thinking relies on linear and logical stes and tends to be more incremental. Kuhn claims that since these two modes of thought are inevitably in conflict, it will follow that the ability to suort a tension that can occasionally become almost unbearable is one of the rime requisites for the very best of scientific research. (. 6) lthough Kuhn s focus is the history of science, his insights aly to idea generation. If several streams of ideas have been established in an idea generation session, a articiant has a choice between contributing to one of the existing streams, and starting a new stream. If he or she decides to contribute to an existing stream, he or she might have to choose between contributing to a longer, more mature stream, and a newer, emerging one. If we assume that each stream of ideas reflects a different aroach to the toic of the session, then the roblem faced by the articiant can be viewed equivalently as that of deciding which aroach to follow in his or her search for new ideas. I assume that some aroaches can be more fruitful than others, and that the articiant does not know a riori the value of each aroach but rather learns it through exerience. He or she then has to fulfill two otentially contradictory goals, reflecting Kuhn s essential tension: increasing short-term contribution by following an aroach that has already roven fruitful, versus imroving long-term contribution by investigating an aroach on which little exerience has been accumulated so far. For examle, let us consider a grou of articiants generating ideas on How to imrove the imact of the UN Security Council? (which was the toic of the exerimental sessions reorted in Section 4). Let us assume that at one oint in the session, two streams of ideas have emerged: a long stream, comosed of ten ideas, all aroaching the toic from a legal standoint, and a shorter stream, with only two ideas, both aroaching the toic from a monetary standoint. Each articiant then has to decide whether to search for new ideas following a legal, or a monetary aroach to the toic. The legal aroach has already roven fruitful, whereas the monetary aroach is more risky but otentially more romising. 0

11 This roblem is tyical in decision making and is well catured by a reresentation known as the multi-armed bandit model (ellman 96). For simlicity, let us first restrict ourselves to a two-armed bandit, reresenting a choice between two aroaches and. Let us assume that by sending one unit of time thinking about the toic using aroach (resectively ), a articiant generates a relevant idea on the toic with unknown robability (resectively ). Particiants hold some beliefs on these robabilities, which are udated after each trial. Note that starting a new stream of ideas can be viewed as a secial case in which a articiant tries an aroach that has not been used successfully yet. Let us comlement this classical multi-armed bandit model by assuming that when N trials of aroach have been made, a roortion idea found using aroach has contribution of which were successful, then the next (similarly for aroach ). This assumtion catures the fact that new ideas in a stream have diminishing marginal contribution (0 α by assumtion). In the context of our revious examle, this assumtion imlies that the third idea following the monetary aroach is likely to have a greater marginal contribution than the eleventh idea following the legal aroach, which increases the attractiveness of the riskier monetary aroach. α N Some roblems (like mathematical roblems or uzzles) have objective solutions. Once a solution has been roosed, there is usually little to build uon. Such toics would be reresented by a low value of α. On the other hand, many roblems considered in idea generation are more oen-ended (trying to imrove a roduct or service) and are such that consecutive ideas refine and imrove each other. Such situations corresond to higher values of α. The arameter α could also deend on the references of the user(s) of the ideas. For examle, the organizer of the session might value many unrelated, rough ideas (low α), or, alternatively, value ideas that lead to a deeer understanding and exloration of certain concets (high α). To analyze the tension in ideation, I first identify and characterize three stages in the idea generation rocess. Next I consider the simle incentive scheme consisting of rewarding articiants based on their individual contributions, and show that although it leads to otimal search when α is small enough, it does not when α is large. For large α, this suggests the need for incen- andit is a reference to gambling machines (slot machines) that are often called one-armed bandits. In a multi-armed bandit, the machine has multile lever (arms) and the gambler can choose to ull any one of them at a time.

12 tives that interest articiants in the future of the grou s outut. I next show that, although such incentives allow aligning the objectives of the articiants with that of the moderator of the session, they might lead to free riding. voiding free riding requires further tuning of the rewards and can be addressed by rewarding articiants more recisely for the imact of their contributions. Three stages in the idea generation rocess Let us consider a two-eriod model, in which a single articiant searches for ideas, and study the conditions under which it is otimal to follow the aroach that has the higher versus the lower rior exected robability of success. For simlicity, I shall refer to the former as the better-looking aroach and to the latter as the worse-looking aroach. I assume that the rior beliefs on and at the beginning of eriod are indeendent and such that ~eta(n S, n F ) and ~eta(n S, n F ), with n S, n F, n S, n F. The corresonding exected robabilities of success are =E()=n S /N ; =E()=n S /N, where N = n S + n F and N = n S + n F can be interreted as the total number of trials observed for and resectively rior to eriod. 3 For these assumtions the variance of the beliefs is inversely roortional to the number of trials. Two effects influence this model. First, if the uncertainty on the worse-looking aroach is large enough comared to the uncertainty on the better-looking aroach, it might be otimal to choose the worse-looking aroach in eriod although it imlies a short-term loss. In articular, if the exected robability of success of the uncertain, worse-looking aroach is close enough to that of the more certain, better-looking aroach, a success with the worse-looking aroach in eriod might lead to udated beliefs that will make this aroach much more aealing than the currently better-looking one. For examle, let us assume that the better-looking aroach at the beginning of eriod,, has a very stable exected robability of success which is around 0.6. y stable we mean that These beliefs can be interreted as corresonding to a situation in which the articiant starts with some initial rior uniform beliefs on and ( ~eta(,) and ~eta(,)) and observes, rior to eriod, n S - successes for, n F - failures for, n S - successes for, and n F - failures for. ecause beta riors are conjugates for binomial likelihoods (Gelman et al. 995), the osterior beliefs after these observations (which are used as riors at the beginning of eriod ) are ~eta(n S, n F ) and ~eta(n S, n F ). 3 The uniform rior can be can be interreted as resulting from the observation of one success and one failure for each aroach.

13 it will be similar at the beginning of eriod whether a success or failure is observed for this aroach in eriod (N is very large). Let us assume that the worse-looking aroach,, has an exected robability of success of 0.60 at the beginning of eriod. However a success with this aroach would increase the osterior robability to 0.67 while a failure would decrease this osterior robability to Then, laying in eriod would lead to a total exected contribution of =. (ignoring discounting for simlicity). Playing in eriod would lead to an exected total contribution of * *0.6=.5 (lay in eriod iff it was successful in eriod ), which is higher. Second, if the number of trials (and successes) on the better-looking aroach gets large, this aroach becomes over-exloited (due to the decreasing marginal returns) and it might become more attractive to choose the worse looking aroach: although the robability of success is lower, the contribution in case of success is higher. Let us define (we define similar quantities for ): ST() the exected contribution from eriod obtained by choosing aroach in eriod (ST stands for short-term). LT() the exected contribution from eriod, calculated at the beginning of eriod, if is layed in eriod. y definition, it is otimal to lay in eriod if and only if ST()+LT() ST()+LT(). If is such that >, then ST()>ST() unless has been over-exloited and ST() ST(). The following roositions characterize three stages in the idea generation rocess, labeled exloration, exloitation, and diversification. If has been over-exloited, diversification is otimal and should be layed in eriod. When has not been over-exloited yet, then if α is small enough, the exloitation of is always otimal. In this case, exloring would be too costly: at the same time as the articiant would learn more about, he or she would also heavily decrease the attractiveness of this aroach. On the other hand, when α is large enough, then if is uncertain enough comared to, the exloration of might be otimal in eriod. 4 These robabilities would result from N =5, n S =3, n F =, imlying that =3/5 and that after a success is udated to 4/6. 3

14 Exloration is characterized by a short-term loss incurred in order to increase long-term contribution. Proosition a: For all, N, α<, for all close enough to such that <, if has been over-exloited (ST() ST()), then diversification is otimal in eriod. Proosition b: For all,n, for all α close enough to 0, for all close enough to such that <, if has not been over-exloited (ST()>ST()), then the exloitation of is otimal in eriod. Proosition c: For all, N, for all α close enough to such that α<, for all close enough to such that <, there exists 0<N* <N ** (0<V **<V *) such that The exloitation of is otimal if N N * (Var( ) V *). The exloration of is otimal if N * N N ** (V ** Var( ) V *). Diversification is otimal if N ** N (Var( ) V **). The roofs to the roositions are in endix. Note that the above roositions focus on the more interesting case where the exected robabilities of success of the two aroaches are close. The identification and characterization of these three stages rovides a useful background for the study of idea generation incentives. In articular, short-term focus is otimal in the diversification and exloitation stages, such that articiants trying to maximize their own contributions behave as if they were maximizing the grou s outut. In the exloration stage, however, articiants have to forego some short-term contribution in order to increase the grou s future outut. In this case roblems arise if articiants are not given incentives to internalize the influence of their resent actions on the future of the grou. 4

15 Rewarding articiants for their individual contributions Consider the incentive scheme that consists of rewarding articiants based on their individual contributions. In ractice, this could be oerationalized by having an external judge rate each idea sequentially, and rewarding each articiant based on the ratings of his or her ideas. This scheme is robably among the first ones that come to mind, and addresses the free-riding issue mentioned in the introduction. There are also many situations in which individual contribution is likely to be the most sohisticated metric available. It is then useful to identify some conditions under which basing incentives on this metric leads articiants to behave in a way that maximizes the grou s total contribution, and conditions under which it does not. Let us introduce a second articiant into the model. ssume that, at each eriod, both layers simultaneously choose one of the two aroaches. t the end of eriod, the outcomes of both layers searches are observed by both layers, and layers udate their beliefs on and. 5. Each layer s ayoff is roortional to his or her individual contribution, i.e., to the sum of the contributions of his or her ideas from the two eriods. The contribution of the grou is equal to the sum of the contributions of each layer. Note that α alies equally for both layers, such that if a layer finds an idea using a certain aroach, the marginal return on this aroach is equally decreased for both articiants. The assumtion that layers indeendently choose which aroach to follow and then share the outut of their searches is robably more descritive of electronic ideation sessions (in which articiants are seated at different terminals from which they tye in their new ideas and read the other articiants ideas) than it is of traditional face-toface sessions. s will be seen later, electronic sessions have been shown to be more effective than face-to-face sessions (Gallue et al. 99, Nunamaker et al. 987). Let us consider the subgame-erfect equilibrium (SPE) of the game layed by the articiants, as well as the socially otimal strategy, i.e., the strategy that maximizes the total exected contribution of the grou. 5 Equivalently, we could assume that layers reort the outcome of their t= search only if it is successful, by submitting a new idea. In this case if a layer does not submit an idea at the end of eriod, the other layer can infer that her search was unsuccessful. Finally, we could interret this assumtion as meaning that success and failure corresond to good and fair ideas, and that both tyes of ideas are worth submitting. 5

16 Low-α case (raidly decreasing returns) When α is low, the marginal contribution of successive ideas following the same aroach diminishes quickly. In this case, exloration is never otimal, which lowers the value of grou interactions. Consequently, actions that are otimal for self-interested articiants trying to maximize their own contributions are also otimal for the grou. This is catured by the following roosition: Proosition : For all N, N, there exists α* such that 0<α<α* for all, such that < and is close enough to, if articiants are rewarded based on their own contributions, then a strategy (x,y) is layed in eriod in a SPE of the game iff it is socially otimal. High-α case (slowly decreasing returns) When α is close to, there are situations in which exloration is otimal. However, exloration results in a short-term loss suffered only by the exlorer, and a otential long-term benefit enjoyed by the whole grou (articiants are assumed to share their ideas). There are then some cases in which it is socially otimal that at least one layer exlores in eriod, but no layer will do so in any SPE of the game. This is illustrated by the following roosition when α=: Proosition 3: If α=, then for all, for all < close enough to, there exists Nlow<N high (V low <V high ) such that if N is large enough (Var( ) low enough), then N <N high (Var( )>V low ) imlies that it is not socially otimal for both layers to choose in eriod (at least one layer should exlore) N low <N <N high (V low <Var( )<V high ) imlies that both layers choose in eriod in any SPE of the game (no layer actually exlores) 6

17 Interesting articiants in the future of the grou s outut Proosition 3 demonstrates a otential misalignment roblem when building on ideas matters, that is, when the marginal returns to subsequent ideas are high. In order to overcome this roblem, articiants should be forced to internalize the effect of their resent actions on the future of the other articiants contribution. To study the dynamic nature of the incentives imlied by this observation, let us consider a more general framework, with an infinite horizon, N articiants, and a er-eriod discount factor δ. ssume that articiant i s outut from eriod t, y it, has a robability density function f( y it x t,..., x,( y jτ ) j=... N, τ =... t ), where xjτ is articiant j s action in eriod τ, and that articiant i s contribution from eriod t is C it =C ) ). N t (( y jτ j=... N, τ =... t The two-armed bandit model considered earlier is a secial case of this general model. Interesting articiants in the grou s future outut allows aligning the incentives, but it might also lead to free-riding. To illustrate this, let us consider the incentive scheme that consists of rewarding a articiant for his or her action in eriod t according to a weighted average between his or her contribution in eriod t and the other articiants contribution in eriod t+. 6 More recisely, layer i receives a ayoff for his or her action in eriod t roortional to[ γ. Cit + ( γ ). δ. C j, t+ ]. Let us call S(γ) such a scheme. We have the following roosition: j i Proosition 4: In any infinite horizon game with discount factor δ<, S(γ) aligns the objectives of the moderator and the articiants for any form of the contribution and outut functions iff S(γ) is equivalent to a roortional sharing rule in which layer i receives a ayoff for his or her actions in eriod t roortional to j C. jt Intuitively, in eriod t articiant i gets, in addition to a share of C it as a reward for his or her actions in eriod t, a share of the other articiants contribution from eriod t ( reward for his or her actions in eriod t-. However, the fact that this reward is attributed to his ) as a or her action in eriod t- is irrelevant in eriod t, since this action has already been taken. Hence j i C j, t 6 Proosition 4 would also hold with a weighted average between the articiant s contribution in eriod t and a discounted sum of the other articiants contribution in subsequent eriods. 7

18 the scheme is equivalent to giving the articiant in eriod t a reward equal to a weighted average between his or her contribution in eriod t and the other articiants contribution in the same eriod. The equivalence to a roortional sharing rule in this examle illustrates how grou incentives align the objectives at the same time as they introduce free riding (Holmstrom 98), even in a dynamic setting. In articular, although the objective function of the articiants is roortional to that of the moderator, it is ossible for a articiant to be aid without articiating in the search. This imoses a cost on the moderator. More recisely, let us assume that at each eriod, layers have an otion not to search for ideas, or to search at a er-eriod cost c, and that the first-best level of effort is for all the layers to erform a search at each eriod. Then if the reward given to articiant i at eriod t is equal to β. C jt, β has to be high enough for the incentive comatibility (IC) constraints of the ar- j ticiants to be satisfied (i.e., such that each articiant is willing to search in each eriod). In this case free riding forces the moderator to redistribute a higher share of the created value to the articiants. 7 Rewarding Particiants for their Imact In order to address free riding while still aligning the objectives, one otion is to reward each articiant more recisely for that art of the grou s future contribution that deends directly on his or her actions, i.e., for his or her imact on the grou. In articular, the following roosition considers a modification of the above scheme that rewards articiant i for his or her action in eriod t based on a weighted average between his or her contribution in eriod t, and the imact of this action on the grou s future contribution (let us denote such a scheme by S (γ)). Under this alternative scheme, the objectives of the articiants remain aligned with that of the moderator, because the only comonent removed from the articiant s objective function is not under his or her control, and free riding is reduced, because a articiant does not get rewarded for the other articiants contribution unless it is tied to his or her own actions. This reduces the share of the created value that needs to be redistributed to the articiants. (The imact 7 Note that I assume that the moderator s valuation for the ideas is high enough comared to c, such that it is ossible and otimal for him or her to induce the articiants to search in each eriod. 8

19 of articiant i s action in eriod t is defined here as: + δ τ t λ( t, τ ) τ = t+ j i [ C jτ ~ C jτ ( i, t)] where ~ C jτ ( i, t) is the exected value of C jτ obtained when y it is null and all layers maximize the t τ grou s exected outut at each eriod, and λ(t,τ)= δ.( δ ) ( δ δ ) is such that τ t= λ( t, τ ) = for all t and τ). Proosition 5: S (/) is such that the objectives of the articiants are aligned with that of the moderator, and the total exected ayoffs distributed in the session are lower than under a roortional sharing rule. Summary of this section Three stages can be defined in the idea generation rocess: exloration, exloitation and diversification. When the marginal contribution of new ideas following the same aroach decreases quickly, exloration is never otimal and rewarding articiants based on their individual contributions leads to otimal search. When the marginal contribution of new ideas following the same aroach decreases slowly, exloration can be otimal and rewarding articiants based on their individual contributions leads to subotimal search. In the latter case, the grou s outut can be imroved by rewarding each articiant based on a weighted average between his or her individual contribution and his or her imact on the grou s contribution. This last result can be translated into a testable hyothesis. First, let us note that this result, like the others summarized above, assumes that all articiants search for ideas at each eriod. Hence the different incentive schemes should be comared holding articiation constant. The following hyothesis, which will be examined in the exeriment in Section 4, is then sufficient for the result to hold: 9

20 Hyothesis: For a given level of articiation, if the marginal contribution of new ideas following the same aroach is slowly diminishing, the total contribution of the grou is greater if articiants are rewarded for their imact than it is if they are rewarded for their individual contributions. The alication and test of the insights from this theoretical analysis necessitate an idea generation system that is comatible with the incentive schemes considered in this section, i.e., which allows the measurements necessary to their imlementation. Moreover, for the theoretical results to be managerially relevant, grous generating ideas using this system should erform better when incentives are resent than when they are not, and erform better than grous or individuals generating ideas with other established methods. In articular, this idea generation system should not inhibit, and ossibly enhance articiants creative abilities. Section 3 rooses a articular system. 3. n Ideation Game I now describe an incentive-comatible ideation game that allows testing and imlementing the insights derived in Section. In this game, articiants score oints for their ideas, the scoring scheme being adjustable to reward individual contribution, imact, or a weighted average of the two. The design of the game is based on the idea generation, bibliometric, and contract theory literatures. Table rovides a summary of the requirements imosed on the system and how they were addressed. Table Ideation game requirements and corresonding solutions Requirement ddress Production locking ddress Fear of Evaluation Measure contribution and imact Prevent cheating Proosed solution synchronous nonymous Objective measures Mutual monitoring Ideas structured into trees Relational contract Mutual monitoring 0

21 ddressing fear of evaluation and roduction blocking lthough the rimary requirement imosed on this idea generation system is to allow imlementing and testing the incentive schemes treated in Section, it should also be comatible with the existing literature on idea generation. In articular, two main issues have been identified with classical (face-to-face) idea generation sessions, in addition to free riding (Diehl and Stroebe 987). The first one, roduction blocking, haens with classical idea generation sessions when articiants are unable to exress themselves simultaneously. The second one, fear of evaluation, corresonds to the fear of negative evaluation by the other articiants, the moderator, or external judges. These two issues have been shown to be reduced by electronic idea generation sessions (Gallue et al. 99, Nunamaker et al. 987), in which articiation is asynchronous (therefore reducing roduction blocking) and in which the articiants are anonymous (therefore reducing fear of evaluation). The online system roosed here is anonymous and asynchronous as well. In articular, articiants create (or are given) an anonymous login and assword, and log on to the idea generation session at their convenience (the session lasts tyically for a few days). Measuring the contribution and the imact of articiants One simle way to measure the contribution and imact of articiants would be to have them evaluated by external judges. However, this aroach has limitations. First, it is likely to trigger fear of evaluation. Second, the evaluation criteria of the moderator, the judges and the articiants might differ. Third, it increases the cost of the session. The roosed ideation game instead adots a structure that rovides objective measures of these quantities. These objective measures do not require additional evaluation, are comuted based on well-stated rules, and offer continuous feedback on the erformance of the articiants. ibliometric research suggests that the number of ideas submitted by a articiant should be a good measure of his or her contribution, and that his or her number of citations should be a good measure of his or her imact (King 987). The roosed structure adats the concet of citation count from the field of scientific ublications (which are the focus of bibliometry), to idea generation. n examle is rovided in Figure. Ideas are organized into trees, such that the son of an idea aears below it, in a different color and with a different indentation (one level further to the right). The relation between a father-idea and a son-idea is that the son-

22 idea builds on its father-idea. More recisely, when a articiant enters a new idea, he or she has the otion to lace it at the to of a new tree (by selecting enter an idea not building on any revious idea ) or to build on an existing idea (by entering the identification number of the fatheridea and clicking on build on/react to idea #... ). Note that an idea can have more than one son, but at most one father (the investigation of more comlex grah structures allowing multile fathers is left for future research). The assignment of a father-idea to a new idea is left to the author of the new idea (see below how incentives are given to make the correct assignment). Figure Examle of an ideation game The imact of an idea y can then be measured by considering the ideas that are lower than y in the same tree. In the exeriment described in the next section, imact was simly measured by counting these descendant ideas. More recisely, articiants rewarded for their imact scored one oint for each new idea that was entered in a tree below one of their ideas. For examle, if idea # built on idea #, and idea #3 on idea #, then when idea #3 was osted, one oint was given to the authors of both idea # and idea #. Particiants rewarded for their own contri-

23 butions scored one oint er idea. The exloration of alternative measures is left for future research. If a articiant decides to build on an idea, a menu of conjunctive hrases (e.g. More recisely, On the other hand, However ) is offered to him or her (a blank otion is also available), in order to facilitate the articulation of the links between ideas. Pretests suggested that these hrases were useful to the articiants. Preventing cheating lthough this structure rovides otential measures of the contribution and imact of the different articiants, if the scoring system were entirely automatic and if no subjective judgment were ever made, articiants could cheat by simly osting irrelevant ideas in order to score unbounded numbers of oints. Consequently, some level of subjective judgment seems inevitable. These judgments, however, should kee fear of evaluation to a minimum, and limit costly interventions by external judges. This is addressed by carefully defining the owers of the moderator of the session. First, the moderator does not have the right to decide which ideas deserve oints. However, he or she has the ower to exel articiants who are found to cheat reeatedly, i.e., who submit ideas that clearly do not address the toic of the session or who wrongly assign new ideas to father-ideas. This tye of relation between the articiants and the moderator is called a relational contract in agency theory. In this relationshi the source of motivation for the agent is the fear of termination of his or her relation with the rincial (aker at al. 998, Levin 003). In the resent case, if the moderator checks the session often enough, it is otimal for articiants to only submit ideas which they believe are relevant. In order to reduce the frequency with which the moderator needs to check the session, this relational contract is comlemented by a mutual monitoring mechanism (Knez and Simester 00, Varian 990). In articular, if articiant i submits an idea which is found to be fraudulent by articiant j, then j can challenge this idea. 8 This freezes the corresonding stream of ideas (no articiant can build on a challenged idea), until the moderator visits the session and determines who, between i and j, was right. 9 The articiant who was judged to be wrong then ays a fee to the other articiant. The amount of this fee was fine-tuned based on retests and was set 8 In the imlementation used so far, articiants cannot challenge an idea if it has already been built uon. 9 The idea is deleted if j was right. 3

24 to 5 oints in the exeriment. This mechanism is economical because it reduces the frequency with which the moderator needs to visit the session, without increasing its cost, since challenges result in a transfer of oints between articiants (no extra oints need to be distributed). eyond the cost factor, another argument in favor of having articiants monitor each other is the finding by Collaros and nderson (969) that fear of evaluation is greater when the evaluation is done by an exert judge rather than by fellow articiants. Note that in equilibrium, there should be no challenges, because articiants should only submit relevant ideas and challenge irrelevant ideas. Indeed, in the exeriment, there was only one challenge. Practical imlementation This ideation game was rogrammed in h, using a MySQL database. Its structure and the instructions were fine-tuned based on two retests. The retests used articiants from a rofessional marketing research/strategy firm. One retest addressed an internal roblem (imroving the worksace) and a second retest addressed an external roblem (how to re-enter a market). In the imlementation, the moderator also has the ability to enter comments, either on secific ideas or on the overall session. Particiants also have the ability to enter comments on secific ideas. This allows them, for examle, to ask for clarification or to make observations that do not directly address the toic of the session. The difference between comments and ideas is that comments are not required to contribute ositively to the session, and are not rewarded. Finally, note that the imlementation choices (e.g., amount of the fee in case of a challenge), although determined by the retests, are likely to be imroved with further exerimentation. 4. Exeriment The exeriment resented in this aer has two rimary goals. The first goal is to test whether incentives have the ower to imrove the outut of idea generation sessions. Recall that some research in social sychology suggests that incentives will have a negative influence on erformance. The second goal is to verify the hyothesis, stated in Section, that under certain conditions, the total contribution of the grou can be imroved by rewarding each articiant for the imact of his or her contribution, as oosed to his or her own contribution. This exeriment also allows studying further the influence of incentives on the dynamics of idea generation. 4

25 Exerimental design The exeriment used the ideation game described in the revious section, and had three conditions. The only difference between conditions was the incentive system, that is, the manner in which oints were scored in the game. Each articiant was randomly assigned to one idea generation session, and each idea generation session was assigned to one of the three conditions (all articiants in a given session were given the same incentive scheme). The oints scored during the session were later translated into cash rewards. Recall that the hyothesis derived in Section relies on two assumtions: an equal level of articiation, and a slowly diminishing marginal contribution. The latter is imlied by the systematic measures of contribution and imact used in this exeriment, which value all ideas equally. The former was addressed by calibrating the value of oints in the different conditions based on a retest, such that articiants in all conditions could exect similar ayoffs. 0 total of 78 articiants took art in the exeriment over a eriod of 0 days, signing u for the sessions at their convenience. Three sets of arallel sessions were run (one session er condition in each set), defining three waves of articiants. Each session lasted u to five days, and within each wave, the termination time of the three sessions was the same. The first condition (the Flat condition) was such that articiants received a flat reward of $0 for articiation. In this condition, no oints were scored (oints were not even mentioned). In the second condition (the Own condition), articiants were rewarded based exclusively on their own contributions, and scored one oint er idea submitted (each oint was worth $3). 0 In this retest, a grou of aroximately 5 emloyees from a market research firm generated ideas on How can we better utilize our office sace?. Points were scored using the same scheme as in the Imact condition. lthough oints were not translated to monetary rewards in this retest, articiants found the game stimulating and felt directly concerned with the toic, resulting in 40 ideas submitted. Calibration was done conservatively, such that based on the retest, articiants in the Flat condition should exect slightly higher ayoffs than articiants in the Own condition, who in turn should exect slightly higher ayoffs than articiants in the Imact condition. The first wave consisted of the first 36 articiants who signed u and had articiants er condition, the second wave consisted of the following 0 articiants, with 7 articiants in the Flat and in the Own conditions and 6 in the Imact condition, and the last wave consisted of the last articiants, with 8 in the Flat condition, and 7 in the Own and Imact conditions. This termination time was usually driven by budget considerations. Messages were osted on the Imact and Own sessions informing the articiants that the session was over. The Flat sessions were not interruted, but only the ideas submitted before the termination of the other sessions were taken into account. 5

26 In the third condition (the Imact condition), each articiant was rewarded based exclusively on the imact of his or her ideas. Particiants scored one oint each time an idea was submitted that built on one of their own ideas (see revious section for the details of the scoring scheme). One interretation of this scoring rule could be that articiants scored one oint for each citation of one of their ideas. Each oint was worth $ for the first two waves of articiants; the value of oints was decreased by half to $ for the last wave. The results for the third wave were similar to those for the first two waves; hence the same analysis was alied to the data from the three waves. In order to rovide a strong test of the hyothesis that incentives can imrove idea generation, an engaging toic, as well as some motivated articiants, were selected. In articular, the toic was: How can the imact of the UN Security Council be increased? (note that the exeriment was run in March 003, at the time of the US-led war in Iraq), and the articiants were recruited at an anti-war walkout in a major metroolitan area on the east coast, as well as on the camus of an east coast university. Three graduate students in olitical science in the same university (naïve to the hyotheses) were later hired as exert judges, and were asked to evaluate indeendently the outut of the sessions on several qualitative criteria (the judges were aid $50 each for their work). Verification of the theoretical hyothesis s noted earlier, articiants in this exeriment were rewarded as if all ideas were equally valued by the moderator (there was no discounting of ideas deending on their osition in a tree). More recisely, articiants were rewarded as if the value of the arameter α from Section was equal to, i.e., as if contribution was measured by the number of ideas. With α=, the theoretical hyothesis roosed at the end of Section suggests that the total number of ideas should be higher when articiants are rewarded for their imact (measured by their number of citations ), than when they are rewarded for their number of ideas. Note the counter-intuitive nature of this rediction: in the Own condition, although articiants are trying to maximize their number of ideas and know that the other articiants in the grou have the same objective, they roduce fewer ideas than the articiants in the Imact condition who are rewarded for their imact on the grou. The intuition behind this rediction is that articiants 6

27 who are rewarded for their number of ideas are less likely to exlore new aroaches, which leads to less insiring ideas, which leads to fewer ideas in total. The data are consistent with this rediction: comared to the Own condition, articiants in the Imact condition submitted significantly more unique ideas (-value<0.05), 3 where the number of unique ideas is defined as the number of ideas minus the number of redundant ideas (an idea is classified as redundant if it was judged as redundant by any of the three judges). Other quantitative results oth the quantitative and qualitative results are summarized in Table. The quantitative results are averaged across articiants. Comared to the Own condition, articiants in the Imact condition not only submitted significantly more unique ideas, they were also significantly more likely to submit at least one unique idea (-value<0.04), submitted significantly more unique ideas conditioning on submitting at least one (-value<0.0), and submitted ideas that were on average 76% longer. Particiants in the Own condition, in turn, erformed better than articiants in the Flat condition, although not significantly so on all metrics. 4 Quality ratings The qualitative results reorted in Table were obtained by averaging the ratings of the three judges. 5 The ratings were found to have a reliability of (Rust and Cooil 994), which is above the 0.7 benchmark often used in market research (oulding et al. 993). 3 One cannot assume that the number of ideas submitted by different articiants are indeendent, since articiants are influenced by the ideas submitted by the other members of their grou. n endogeneity issue arises, because the number of ideas submitted by articiant i deends on the number of ideas submitted by articiant j, and vice versa. In order to limit this issue, a regression was run with the number of ideas submitted by articiant i as the deendent variable, and three dummies (one er condition) as well as the number of ideas that were submitted by the other members of articiant i s grou before he signed u as indeendent variables. This last indeendent variable is still endogenous, but it is re-determined, leading to consistent estimates of the arameters (Greene 000). 4 -values below 0.8, 0.7, and 0.04 resectively for the number of unique ideas, for the roortion of articiants who submitted at least one unique idea, and for the number of unique ideas given that at least one was submitted. 5 ll ratings, excet for the number of star ideas, are on a 0-oint scale. 6 For each measure and for each judge, the ratings were normalized to range between 0 and. Then ratings were classified into three intervals: [0;/3[, [/3;/3[, and [/3;]. 7

28 The sessions in the Imact condition were judged to have a larger overall contribution, more star ideas (i.e., very good ideas), more breadth, deth, and to have ideas that on average were more novel, thought-rovoking and interactive (interactivity is defined here as the degree to which a son-idea builds on its father-idea and imroves uon it). Hence the results suggest that incentives do have the caacity to imrove idea generation, and are consistent with the hyothesis formulated in Section. Table Results Imact Own Flat Quantitative results Number of unique ideas er articiant Proortion of articiants who osted at least one unique idea 68% 54% 48% Number of unique ideas given that at least one Number of words er idea Qualitative ratings Total contribution Number of star ideas readth Deth Novelty Thought-rovoking Interactivity * = Imact significantly larger than Own as 0.05 level. = Own significantly larger than Flat at 0.05 level. Reconciling the results with the social sychology literature s was mentioned in the introduction, the social sychology literature seems to redict that incentives are more likely to hurt idea generation. This rediction relies in great art on the observation that idea generation is not a task in which there exist easy algorithmic solutions, achievable by the straightforward alication of certain oerations. McGraw (978) contrasts tasks such that the ath to a solution is well maed and straightforward with tasks such that 8

29 this ath is more comlicated and obscure. He argues that incentives are likely not to hel with this second tye of tasks, but notes that it is nonetheless ossible for reward to facilitate erformance on such roblems in the case where the algorithm (leading to a solution) is made obvious (. 54). Perhas such facilitation might have haened in the resent exeriment. In articular, the structure of the ideation game might have made the mental stes leading to new ideas more transarent, allowing the articiants to aroach the task in a more algorithmic manner. Incentives and motivation Table indicates that fewer articiants in the Flat condition submitted at least one idea. Further analysis suggests that this might be due to a lower level of motivation of these articiants. More recisely, 00% of the articiants who submitted at least one unique idea in the Flat condition did so in the first 30 minutes after signing u, versus 65% for the Imact condition and 64% for the Own condition. In articular, 8% (resectively %) of the articiants who submitted at least one idea in the Imact condition (resectively the Own condition) did so more than three hours after signing u. This is consistent with the hyothesis that all three conditions faced the same distribution of articiants, but that when incentives were resent, articiants tried harder to generate ideas and did not give u as easily. Dynamics of the sessions The study of the dynamics of the ideation sessions, although not directly testing the theoretical redictions from Section, rovides additional insight on the influence of incentives on the idea generation rocess. These exloratory analyses also rovide direction for future research. First, I study whether the quantitative results are due to articiants osting more ideas at each visit, or visiting the session more often. oth the Imact and Own conditions give clear incentives to submit more ideas at each visit. In addition, we might exect articiants in the Imact condition to have been more concerned with the evolution of the session, leading to more frequent visits. In the exeriment, the time at which each idea was submitted was recorded. Let us then define ockets of ideas such that two successive ideas by the same articiants are in the same ocket if they were submitted less than 0 minutes aart. 7 Table 3 reorts the aver- 7 Similar results were obtained when defining ockets using 0 or 30 minutes. 9

30 age number of ideas er ocket for the articiants who osted at least one idea, as well as the average number of ockets. Consistent with our redictions, articiants in the Own and Imact conditions submitted significantly more ideas at each of their visits to the session comared to articiants in the Flat condition. Furthermore, articiants in the Imact condition visited the session significantly more often. Table 3 Dynamics Imact Own Flat Number of ideas er ocket.9**.3*. Number of ockets er articiant.6**.6. *significantly larger than Flat, <0. **: significantly larger than Flat, <0.03 Next, I study how incentives influenced the overall number of ideas submitted in the session as a function of time. Figure (corresonding to the first wave of articiants - the grahs for the other two waves having similar characteristics) shows that the rocess in the Imact condition is S-shaed, whereas the rocess in the Flat condition tends to be smoother. 8 The rocess in the Own condition is somewhat intermediary between these two extremes. In order to better understand the shaes of these grahs, it is useful to lot similar grahs at the individual level. These grahs suggest that the differences in shae of the aggregate grahs reflects the fact that articiants in the Imact condition relied more heavily on grou interactions. In the Imact condition, the individual rocesses have the same characteristics as the aggregate grahs (eriod of inactivity followed by a eriod of high activity which finally stabilizes). The eriods of high activity start at similar times for the different articiants of a session, which accounts for the shae of the aggregate grahs. On the other hand, the grous in the Flat condition dislayed no articular structure: articiation is sread throughout the session and articiants aear to be largely indeendent, resulting in smooth aggregate grahs. 8 Time on the x axis is the total cumulative time sent by articiants in the session when the idea was submitted, divided by the number of articiants. For examle, if an idea was osted at 5 m, and at this time two articiants had signed u, one at m and one at 3 m, then the reorted time is [(5-)+(5-3)]/=.5 hours. simler measure of time, the total clock time elased since the first articiant signed u, gives similar qualitative results. 30

31 Figure Number of ideas as a function of time Number of ideas "Flat" condition "Own" condition "Imact" condition time (hours) In conclusion, the study of the dynamics of the sessions indicates that incentives do not only influence the quantity and the quality of the ideas submitted, but also the manner in which they are submitted. This is reflected by both within- and between-articiant atterns. The theoretical study of these atterns could be addressed in future research, for examle by extending the model roosed in Section. Limitations and alternative exlanations The hyothesis from Section assumes that articiation in the Own and Imact conditions was similar, i.e., articiants sent comarable amounts of time searching for ideas. If this assumtion were not true, then this variation might rovide an alternative exlanation for the results. Three effects, otentially testable with future exeriments, might lead to such alternative exlanations: a non-monetary effect, an indirect incentives effect, and a direct incentives effect. non-monetary effect might attribute higher articiation in the Imact condition to a greater level of stimulation, resulting from richer interactions between articiants. The indirect incentives effect is based on the hyothesis that, if ideas in the Imact condition were more insiring, articiants would be able to generate more ideas in the same amount of time, 3

32 making articiation more attractive. The direct incentives effect hyothesizes that the exected ayoff er idea was erceived as higher in the Imact condition. In articular, although the monetary value of oints was calibrated to equate the exected ayoff er idea in the two conditions, the calibration was based on a retest using a different toic, and did not take into account articiants subjective beliefs. The non-monetary and indirect incentive effects would also advocate the use of imact as an incentive, but for different theoretical reasons. Further exeriments could identify their magnitudes, roviding direction for future theoretical research. For examle, one might control the amount of time sent by the articiants on the task by running the sessions in a laboratory. The third effect would make the comarison of the Imact and Own conditions roblematic and cast some doubt on the validity of rewarding articiants for their imact. In order to address this concern, another exeriment could be run, in which the number of oints earned by an idea in the Imact condition is bounded. For examle, if ideas in this condition score oints only for the first five subsequent ideas in the tree, and if the monetary value of a oint is five times higher in the Own condition, then the exected ayoff er idea is objectively lower in the Imact condition than it is in the Own condition. 5. Summary and Future Research In this aer, I roose a quantitative framework for the study of a mostly qualitative toic, idea generation, with a focus on the effect of incentives. I first derive an analytical model of the idea generation rocess. ased on this model, I identify conditions under which rewarding articiants for their individual contributions induces desirable actions, and other conditions under which it does not. In the latter case, erformance can be imroved by rewarding each articiant for the imact of his or her contribution. I then develo an incentive comatible ideation game, which allows imlementing the insights from the theoretical analysis and testing some of the redictions. Finally, I show exerimentally that incentives have the caability to imrove idea generation, in a manner consistent with the theory. Several oortunities for future research can be identified in addition to the ones mentioned throughout the aer. First, it might be interesting to extend the theoretical model and take into account other dimensions of idea generation than the choice of which aroach to follow. Second, it might be useful to generalize the exerimental findings using different toics, and 3

33 other (otentially subjective) measures of the contribution and imact of the articiants. Third, although the exerimental findings in this aer are consistent with the theoretical rediction, they do not test the causal relationshi between the seed of diminishing returns and the validity of rewarding imact, which could be addressed by additional exeriments. Finally, it would be interesting to identify conditions under which incentives are more likely to enhance or inhibit idea generation, as well as conditions under which the ideation game resented here is more or less likely to enhance articiants creativity. More generally, future research might examine how the exerimental findings reorted in this aer relate to classic results in social sychology. In a different aer (author 003), I attemt to address these questions using major recent work in idea generation (Goldenberg et al. 999a, Goldenberg et al. 999b, Goldenberg and Mazursky 00). In articular, there exist at least two oosing views regarding which cognitive skills should be encouraged in ideation sessions. The first view is that articiants should be induced to think in a random fashion. This widely held belief, based on the assumtion that anarchy of thought increases the robability of creative ideas, has led to the develoment of ideation tools such as brainstorming (Osborn 957), Synectics (Prince 970), and lateral thinking (De ono 970). In contrast, recent aers suggest that structure, and not randomness, is the key to creativity (Goldenberg et al. 999a). This structured view, according to which creativity can be achieved through the identification and alication of some regularities in revious creative outut, has led to systematic aroaches to idea generation. Two imortant examles are inventive temlates (Goldenberg et al. 999b, Goldenberg and Mazursky 00), and TRIZ (ltshuller 000). In author 003, I try to shed some light on the aarent contradiction between these two views, and to identify conditions under which each one of them is more likely to be valid. I then show how this analysis alies to the questions mentioned above. 33

34 References ltshuller, Genrich (000), The Innovation lgorithm, Technical Innovation Center, Inc., Worcester, M. aker, G., R. Gibbons and K. Murhy (998), Relational Contracts and the Theory of the Firm, mimeo, MIT. ellman, R. (96), dative Control Process: guided Tour, Princeton University Press. oulding, William, Richard Staelin, Valarie Zeithaml, and jay Kalra (993), Dynamic Process Model of Service Quality: From Exectations to ehavioral Intentions, Journal of Marketing Research, 30 (February), 7-7. Collaros, Panayiota., and Lynn R. nderson (969), Effect of Perceived Exertness Uon Creativity Of Members of rainstorming Grous, Journal of lied Psychology, Vol. 53, No., Dahan, Ely, and John R. Hauser (00), Product Develoment Managing a Disersed Process, in the Handbook of Marketing, arton Weitz and Robin Wensley, Editors. De ono, Edward (970), Lateral thinking: a textbook of creativity, Ward Lock Educational, London. Diehl, Michael, and Wolfgang Stroebe (987), Productivity Loss in rainstorming Grous: Toward the Solution of a Riddle, Journal of Personality and lied Psychology, Vol. 53, No.3, Gallue, rent R., Lana M. astianutti, and William H. Cooer (99), Unblocking rainstorms, Journal of lied Psychology, vol. 76, No., 37-4 Gelman, ndrew., John S. Carlin, Hal S. Stern, and Donald. Rubin (995), ayesian Data nalysis, Chaman & Hall/CRC Goldenberg, J., D. Mazursky, and S.Solomon (999a), Creative Sarks, Science, ,, (999b), Toward identifying the inventive temlates of new roducts: channeled ideation aroach, Journal of Marketing Research, 36 (May) 00-0., (00), Creativity in roduct innovation, Cambridge University Press. Greene, William H. (000), Econometric nalysis, Prenctice Hall International, Inc. 34

35 Harkins, Stehen G., and Richard E. Petty (98), Effects of Task Difficulty and Task Uniqueness on Social Loafing, Journal of Personality and lied Psychology, Vol. 43, No.6, 4-9 Holmstrom, engt (98), Moral hazard in teams, ell Journal of Economics, 3(): Kerr, Norbert L., and Steven E. ruun (983), Disensability of Member Effort and Grou Motivation Losses: Free-Rider Effecs, Journal of Personality and lied Psychology, Vol. 44, No., King, Jean (987), Review of ibliometric and Other Science Indicators and Their Role in Research Evaluation, Journal of Information Science, 3, 6-76 Kuhn, Thomas S. (977), The Essential Tension: Tradition and Innovation in Scientific Research, In T. S. Kuhn (Ed.), The essential tension: Selected readings in scientific tradition and change, Chicago, IL.: University of Chicago Press. Knez, Marc, and Duncan Simester (00), Firm-Wide Incentives and Mutual Monitoring at Continental irlines, Journal of Labor Economics, vol. 9, no. 4, Lamm, Helmut and Gisela Trommsdorff (973), Grou versus individual erformance on tasks requiring ideational roficiency (brainstorming): review, Euroean Journal of Social Psychology, 3 (4), Levin, J. (003), Relational Incentive Contracts, merican Economic Review, forthcoming, McCullers, J.C. (978), Issues in learning and motivation, in M.R. Leer & D. Greene (Eds.), The hidden costs of reward (. 5-8), Hillsdale, NJ: Erlbaum McGraw, K.O. (978), The detrimental effects of reward on erformance: literature review and a rediction model, in M.R. Leer & D. Greene (Eds.), The hidden costs of reward ( ), Hillsdale, NJ: Erlbaum Nunamaker, Jay F., Jr., Lynda M. legate, and enn R. Konsynski (987), Facilitating grou creativity: Exerience with a grou decision suort system, Journal of Management Information Systems, Vol. 3 No. 4 (Sring). Osborn,.F. (957), lied Imagination, (Rev. Ed.), New York: Scribner Prince, George M. (970), The Practice of Creativity; a manual for dynamic grou roblem solving, New York, Harer & Row. Rust, Roland, and ruce Cooil (994), Reliability Measures for Qualitative Data: Theory and Imlications, Journal of Marketing Research, vol. XXXI (February 994),

36 Simonton, Dean K. (988), Scientific Genius: Psychology of Science, Cambridge University Press Sence, K.W. (956), ehavior theory and conditioning, New Haven: Yale University Press. Toubia, Olivier (003), ounded rationality models of idea generation, working aer. Varian, Hal R. (990), Monitoring gents with Other gents, Journal of Institutional and Theoretical Economics (JITE), Vol. 46, Von Hiel, Eric, and Ralh Katz (00), Shifting Innovation to Users via Toolkits, Management Science, vol. 48, No.7, Williams, Kiling, Stehen Harkins, and ibb Latané (98), Identifiability as a Deterrent to Social Loafing: Two Cheering Exeriments, Journal of Personality and lied Psychology, Vol. 40, No., Zajonc, Robert. (965), Social Facilitation, Science, Vol. 49, No. 6 (July),

37 endix : roofs of the roositions N Let us define (we define similar quantities for ): ST()= α. the exected ayoff from eriod obtained by choosing aroach in eriod ; S the exected ayoff from eriod obtained by laying in eriod given that was layed in eriod and was successful; F + ) the exected ayoff from eriod obtained by laying in eriod given that was layed in eriod and was unsuccessful; /( N LT()=.max{S,ST()} +( ).max{f,st()} the exected ayoff from eriod, calculated at the beginning of eriod, if is layed in eriod.. N = α N N + α.( N + ) /( N + ) = Proof of Proosition a: First, note that N and N are restricted to integer values, and that [/N,(N -)/N ] and [/N,(N -)/N ]. Let us first show that for all, N, for all α<, for all close enough to such that <, ST() ST() N N. ST() is decreasing in N. When =, ST()>ST() if N=N -. y continuity, this is also true for close enough to such that <, hence ST() ST() N N. Then we have ST() ST() S S. To see this, note that S S N N α.( N ) ( N + ) α.( N ) ( N + ) + + N N N N α. α.( ) ( N + ) α. + α.( ) ( N + ) + N α.( ) ( N + ) N α.( ) ( + ) and ST() ST() ST() ST(), N α N α N, <, N N ST() ST(), >. Then, when ST() ST(), there are cases: S ST(). This imlies S S ST() ST(). Then LT() ST() and LT()=ST() and so ST()+LT()=ST()+ST() ST()+LT(). ).ST() S >ST(). Then LT().S+(- ).ST() and LT()=.max{S,ST()}+(- (because F <ST() ST()). Then ST()+LT() ST()+LT() if ST()+.S+(- ).ST() ST()+.max{S,ST()}+(- ).ST().( S -ST()).(max{S,ST()}- 37

38 ST()) (S-ST()).(S-ST()) (because S >ST()) N + N N +.[ α.( N + ) ( N ) α. + ].[ α.( N + ) ( N + ) ]- N α. N..[. N ) ( N + N α α + ) + ] α..[α.( N + ) ( N + ) + ] ( α.( N +) ( N + ) + α.( N + ) ( N + ) + (because ST() ST() ) α.[ ( N ) ( ) ( + N + N + ) ( N + ) ] ( N + ) ( N + ) ( N +) (N +) or{ ( N + ) ( N + ) > ( N + ) ( N + ) and ( + ) ( + ) N N - ( N + ) ( N +) - }, which is satisfied because ( N + ) ( N +) - ( N + ) ( + ) N - ( ) ( N + ) ( ) ( N + ) > and N N. Proof of Proosition b: For all, <, N, let N =min{n :. N.N > }. If α is small enough, then N N N N N N ST()>ST(). Indeed, ST()>ST() α. > α. α α N N < ' N </N α' N (because /N and <) α<α where α is such that =/N. Then ST() ST() N <N.. Then if α is small enough, we have S <ST(). N N and F <ST() ST(), which imlies LT()=ST(). Then since ST() LT(), we have ST()+LT() ST()+LT(). Given, N. must be such that N -, i.e., N must be at least as large as N max{, ( ) }=N min. Let us show that for all, for all N, for all α close enough to such that α <, for all close enough to such that <, we have the following: lim N > + Proof of Proosition c:.st()>st() for N =N min and. LT()<LT() lim N > + 3.ST() is monotonically decreasing in N 4.LT() is monotonically non-increasing in N 5.ST()=ST() imlies LT()<LT() ST()<ST() 38

39 These conditions imly that there exists N min N <N ** such that N >N ** ST()<ST(),ST()+LT()-[ST()+LT()] is decreasing in N, negative for N >N **, ositive for N <N, and so by continuity and monotonicity there exists N * such that N *<N ** and such that ST()+LT()-[ST()+LT()] is non-negative for N N * (exloitation), non-ositive for N * N N ** (exloration) and negative for N >N ** (diversification). Note that if ST()+LT()<ST()+LT() for N =N min, then N *<N min. Let us rove conditions to 5: ) for N =N min min. N N, ST()= α >ST()= α for all α< and close enough to, and N lim > ST()=0<ST(). N ) When N goes to +, the limits are: LT()=ST()= α., LT()=. S + ( ). F N = α..(. N.( α ) + N + α) ( N + ) <LT() because.n.(α-)+n +α<n + 3) is trivial + 4) First let us note that S is decreasing in N because N N α is ositive and decreasing in N, and so is N (. N + ) ( N + ) (the derivative of this last function with resect to N is + L ( ) ( N ) ). Next we have: S < F N N = α (( α). ) >. If α> L N. N (. N + ) then N < N, in which case N> N F α. N ( N +) L N N < α. <ST()= α. if close enough to. Then we have 4 cases: ST()>F >S : LT() is constant in N ST()>S >F : LT() is constant in N L = S >ST()>F : LT()=.S + ( ).ST() is decreasing in N because S is N S >F >ST(): LT()=. S + ( ). F = α..(.(. N α + ) + α) ( N + ) and LT ( ) = N.ln( α). α N ( N.( N + ) + ) α N..( N.( α. + α ) + α) + N N. +.( α. + which is of the same sign as: (.ln( α ).( N ) ).( N.( α. + ) + α) ( N + ) + + α. + ) which is equal to 0 when α=, and which is of the same sign, when α close to ( (using a Taylor s series exansion), as. N + ( + N ) ( N + ) )

40 . + N ( N +) <0. So LT() is non-increasing in N if α close enough to and close enough to. 5) The roof is similar to that of roosition : ST()=ST() N >N and S <S. Then if α close enough to, S >ST()=ST() for all ST()+LT() > α< and >. [/N,(N -)/N ], and ST()+LT()> α.[( N ) ( N + ) ( N + ) ( N + ) ] which is satisfied for + Proof of roosition : Let us assume that if both layers find an idea using the same aroach in a given eriod, then we determine randomly which layer submits his or her idea first. This is equivalent to having layer s exected ayoff equal to (α n + α n+ )/( )+ α n..(-) if n ideas have been submitted using the aroach at the beginning of the eriod and if the robability of success is. Recall that and are resectively in [/N,(N -)/N ] and [/N,(N -)/N ]. Since the lower (res. uer) bounds of the intervals above are strictly ositive (res. strictly smaller than ), there exists α small enough such that in eriod both layers will lay the aroach with the smallest number of successes so far (no matter what the exected robabilities are, they cannot counterbalance the effect of the discount rate). In the secial case in which the two aroaches have the same number of successes, then layers will choose different aroaches if to (because choosing the same aroach leads to much lower ayoffs if α is small enough). n R()=: ( S ns + ns α + α ). E( ) + α. E(.( )) + LT (, ) > α n S. + LT (, ) ( + α). E( ) / + E.( )) is the exected instantaneous ayoff divided byα n s if layers is close enough Let us first consider the case N >N. Then if is close enough to, ns>n S. Let us show that if α is small enough, then the only SPE is for both layers to lay in eriod, and this is also socially otimal: for α small enough if (if α is small enough then any term discounted by more than negligible): ST(,)+E((-) ).ST(, n F +)> ( ). ST (, n + ) where ST(,)= ( lay (,) and ST(, nsf+) is the similar quantity if both layers lay (,) after a failure in. This inequality holds because ST(,)+E[(- ) ].ST(, n F +)>ST(,) >ST(, n F +)> ( ). ST (, n +) F F α n s is 40

41 R()=: as above, this is true for α small enough if ( ). ST (, n + ) + F >0+ST(,) which holds because + ( ). ST (, nf + ) > >ST(,) (,) is socially otimal if α small enough because it is more likely that at least one idea will be found using if both layers lay in eriod. enough to. The same argument alies to the case N N, in which case n S <n S if is close Proof of roosition 3: Let us consider the subgame erfect equilibrium of the game that consists in choosing which aroach to lay at t= and t=. In eriod, layers each lay the aroach with the highest exected robability of success. In eriod : If layer lays at t=, then layer gets: y laying : + E( n ).max{ N S + n, } + E(.( )).max{ + N S + +, } + E(( n S + ns + n S + ns y laying :. max{, } +.( + ) max{, } N + N + N + N + + ( ). n. N S + + ( + ).( n ).max{ N If layer lays at t=, then layer gets: n, + N S } + n S + ns + n S + ns y laying :..max{, } +.( + ).max{, } N + N + N + N + + ( ). y laying : + E( n ). N n. N S S + + ( E( +.( ).( n ).max{ N n )).max{, N Let us consider the following condition: S n, + N S S } + S + } + E(( + <(ns+)/(n +) (a). ) ). ns ) ).max{, } N + 4

42 For all < and N such that (a) is satisfied, if N is large enough (the beliefs on bandit are stable enough), then we have that the following conditions are sufficient for both layers to choose at t= in any SPE of the game and for it not to be socially otimal: R()=:.( n + ) ( N + ) + ( ). < + S + >.[( n S + ) ( N + ) ] (b) R()=: + (. n ) ( N + ) + ( ). > + E ( ).( n ) ( N + ) + S +. E (.( )). max{, ( n + ) ( N + ) } + E (( ) ). () S S + If >(ns+)/(n +) (c) then () becomes: - > E ( )(. N + ) ( N + ) ). E(.( N + ) ( N + ) +. (f) If <(ns+)/(n +) (c ) then () becomes: + > + E ( E( ).( n n ).( N S S S.( n S + ) ( N + ) + ( + ) ( N + ) + E( ( )).( n + ) ( N + ) + E(( ) ). > + + n ) +.( N S + n + N S + ) + E( + n ).( N S + + n N S + ) + ns + ns + The right-hand side is smaller than.[( ) ( ) n S + N + ] because.( ) + N + N + ). ns + ns + E ( ).( ) = N + N +. ( N ns + + ).( N E( ) + + ) N + =..( ( N + E ). ) =0 because E ( ) = +.( ) ( N + ) = N N + +.(. N ) ( N + ) (Var( )=.( ) ( N + ) ). So the condition holds if.[( n S + + ) ( N + ) ] < (e) - socially better than -: + +..( n ) ( N + ) +.( ). > 4. <..[( n + ) ( N + ) ] (d). S S + We ll also be using the following, when (c) is satisfied (- is socially better than -): + E(. ).( N + ) ( N + ) +.( E( ))..[( N + ) ( N + ) ] (d ) Let us consider the following conditions: > 4. < < ( N + ) ( N + ) N < ( ) ( ) (a) 4

43 N N N )..( + > < + + (b) ) ( ). ( ) ( ). ( N N N > + + > (c) ) ( ). ( N < (c ) ).(. 3 ] ) ( )..[( N N N + < > + + (d) ) ).( ( ).( )..( N ] ) (N ) N.[( + < + + < (d ) ) (. 3 ] ) ( )..[( N N N + > < + + (e). N N. ). E( N.N ) E( > (f) (a),(b),(c),(d ), (f) as well as (a),(b),(c ),(d),(e) are sets of sufficient conditions. Let us consider such that ) ( ) ( = γ with γ small enough such that the numerators in (d) and (d ) are ositive. Let us look at the first set of conditions (a),(b),(c),(d ), (f): (f) imoses a lower bound on N. (f) + ) ). (. ). (.( E E N + + ] ) (. ). ( ). ). (.[3.( E E E N )) (. ). (.( > + E E + Let N max = ).(.. + γ γ ).( + be the value of N defined by (d ). Then (f) and (d ) can be satisfied simultaneously when γ small enough if ( when γ 0): + which is satisfied for <. ) ( E ).(.( ) ) /(.(. + 0 ). > > 0 γ is the value of N defined by (a). Then (f) and (a) can be satisfied simultaneously when γ small enough if: + ) +.( ) ).( ).( (. >0 >0 which is satisfied for <. (b) and (a) can be satisfied simultaneously if: + 43

44 ( + ( ) < ( ) ( ) which is satisfied. (c) and (a) can be satisfied simul- ) taneously because (. ) ( ) < ) ( ). (b) and (d ) can be satisfied simul- γ taneously if γ.( + (. γ < which is satisfied for γ small enough if ) γ.( + ) + < +. > 0 > +. (c) and (d ) can be satisfied simultane- + ously if:. γ.(. γ < which is satisfied for γ small enough if (. ) ( ) < ) γ.( + ) ( + ). > 0 > +. So (a),(b),(c),(d ), (f) can be simultaneously + satisfied if > if: + ( Now let us look at the other set of sufficient conditions (a),(b),(c ),(d),(e): first, (c ) imlies (a) so we can ignore (a). Then (e) and (c ) can be satisfied simultaneously. ( = ) ( ) + ( 3. satisfied simultaneously if: < + ( ) ( )( γ ) ( )( γ ) < which is satisfied for γ low enough if ( ). γ.( + ) ( ). γ.( + ) ). < ), which holds. (e) and (d) can be 3. + ( ) < ( ).( + ) + > (. ) which is true. (b) and (d) can be satisfied γ γ simultaneously if < which is satisfied for γ low enough γ.( + ) γ.( + ) if < which holds. (b) and (c ) can be satisfied simultaneously if: + +. ( ).( γ ) < ( ). γ.( + ) γ.( ) which is satisfied for γ low enough if: +. < ( ). + > 0 < +. 44

45 So we have that (a),(b),(c ),(d),(e) can be satisfied if < +. So we have that for all, for all close enough to, there exists N low<n high such that if N is large enough, then N low <N <N high imlies that both layers choose at t= in any SPE of the game but that this is not socially otimal. Finally, note that if N <N high then it is not socially otimal for both layers to choose. Proof of roosition 4: t eriod t, layer i receives C jt j i ( γ ). from eriod (t-), and γ.c it directly from eriod t. Then his or her exected discount ayoff in eriod t is: E[ + τ t + δ [γ.cit + ( γ). δ. C jτ + ] = ( γ ). ] C jt j i τ = t j i + + τt τ t τt E[γ[γ δ.c ) + ( γ). ( δ ( C ))] ]=E [ δ.[γ[γ + ( γ). C ] ] τ= t iτ τ= t j i jτ which is roortional to the moderator s objective function only if γ=/, in which case the articiant s revenue from eriod t is roortional to the grou s contribution in eriod t. + τ= t iτ j i jτ τ Proof of roosition 5: Under S (/), Particiant i s ayoff in eriod τ is roortional to π(i,τ)=c iτ + λ( t, τ ).[ C jτ j i t= ~ C jτ τ τ ~ ( i, t)] = C iτ + ( λ( t, τ )).C ( t, τ ). C ( i, t) = C j i t= jτ λ jτ iτ+ C j τ j i t= j i τ τ ~ ( t, τ ). C ( i, t) because λ( t, τ ) =. The last term being a constant, the objectives are λ j i t= jτ t= aligned. lso if a roortional sharing rule β. j C jt is such that articiants search in each eriod, then the articiants also search in each eriod under S (/) with the same β (what matters is the difference between the ayoff obtained from searching and the ayoff obtained from not searching, which is unaffected), and the total ayoff distributed is lower. 45

46 Chater : Fast Polyhedral dative Conjoint Estimation bstract We roose and test new adative question design and estimation algorithms for artialrofile conjoint analysis. Polyhedral question design focuses questions to reduce a feasible set of arameters as raidly as ossible. nalytic center estimation uses a centrality criterion based on consistency with resondents answers. oth algorithms run with no noticeable delay between questions. We evaluate the roosed methods relative to established benchmarks for question design (random selection, D-efficient designs, dative Conjoint nalysis) and estimation (Hierarchical ayes). Monte Carlo simulations vary resondent heterogeneity and resonse errors. For low numbers of questions, olyhedral question design does best (or is tied for best) for all tested domains. For high numbers of questions, efficient Fixed designs do better in some domains. nalytic center estimation shows romise for high heterogeneity and for low resonse errors; Hierarchical ayes for low heterogeneity and high resonse errors. Other simulations evaluate hybrid methods, which include self-exlicated data. field test (330 resondents) comared methods on both internal validity (holdout tasks) and external validity (actual choice of a lato bag worth aroximately $00). The field test is consistent with the simulation results and offers strong suort for olyhedral question design. In addition, marketlace sales were consistent with conjoint-analysis redictions. 46

47 . Polyhedral Methods for Conjoint nalysis We roose and test () a new adative question design method that attemts to reduce resondent burden while simultaneously imroving accuracy and () a new estimation rocedure based on centrality concets. For each resondent the question design method dynamically adats the design of the next question using that resondent s answers to revious questions. ecause the methods make full use of high-seed comutations and adative, customized local web ages, they are ideally suited for web-based anels. The adative method interrets question design as a mathematical rogram and estimates the solution to the rogram using recent develoments based on the interior oints of olyhedra. The estimation method also relies on interior oint techniques and is designed to rovide robust estimates from relatively few questions. The question design and estimation methods are modular and can be evaluated searately and/or combined with a range of existing methods. dating question design within a resondent, using that resondent s answers to revious questions, is a difficult dynamic otimization roblem. datation within resondents should be distinguished from techniques that adat across resondents. Sawtooth Software s dative Conjoint nalysis (C) is the only ublished method of which we are aware that attemts to solve this roblem (Johnson 987, 99). In contrast, aggregate customization methods, such as the Huber and Zwerina (996), rora and Huber (00), and Sandor and Wedel (00, 00) algorithms, adat designs across resondents based on either retests or ayesian riors. C uses a data-collection format known as metric aired-comarison questions, and relies on balancing utility between the airs subject to orthogonality and feature balance. We rovide an examle of a metric-aired comarison question in Figure. To date, aggregate customization methods have focused on a stated-choice data-collection format known as choicebased conjoint (CC; e.g. Louviere, Hensher, and Swait 000). Polyhedral methods can be used to design either metric-aired-comarison questions or choice-based questions. In this aer we focus on metric-aired-comarison questions because this is one of the most widely used and alied data-collection formats for conjoint analysis (Green, Krieger and Wind 00,. S66; Ter Hofstede, Kim, and Wedel ). In addition, metric aired-comarison questions are common in comuter-aided interviewing, have roven reliable in revious studies (Reibstein, 47

48 ateson, and oulding 988; Urban and Katz 983), rovide interval-scaled data with strong transitivity roerties (Hauser and Shugan 980), rovide valid and reliable arameter estimates (Leigh, MacKay, and Summers 984), and enjoy wide use in ractice and in the literature (Wittink and Cattin 989). We are extending olyhedral methods to CC formats (Toubia, Hauser, and Simester, 003). Our goal is an initial evaluation of olyhedral methods relative to existing methods under a variety of emirically relevant conditions. We do not exect that any one method will always out-erform the benchmarks, nor do we intend that our findings be interreted as criticism of any of the benchmarks. Our findings indicate that olyhedral methods have the otential to enhance the effectiveness of existing conjoint methods by roviding new caabilities that comlement existing methods. Figure Metric Paired-Comarison Format for I-Zone Camera Redesign ecause the methods are new and adot a different estimation hilosohy, we use Monte Carlo exeriments to exlore their roerties. The Monte Carlo exeriments exlore the conditions under which olyhedral methods are likely to do better or worse than extant methods. We demonstrate ractical domains where olyhedral methods show romise relative to a reresentative set of widely alied and studied methods. The findings also highlight oortunities for future research by illustrating domains where imrovements are necessary and/or where extant methods are likely to remain suerior. 48

49 We also undertake a large-scale emirical test involving a real roduct a lato comuter bag worth aroximately $00. Resondents first comleted a series of web-based conjoint questions chosen by one of three question design methods (the methods were assigned randomly). fter a filler task, resondents in the study were given $00 to send on a choice set of five bags. Resondents received their chosen bag together with the difference in cash between the rice of their chosen bag and the $00. We comare question design and estimation methods on both internal and external validity. Internal validity is evaluated by comaring how well the methods redict several holdout conjoint questions. External validity is evaluated by comaring how well the different conjoint methods redict which bag resondents later chose to urchase using their $00. The aer is structured as follows. We begin by describing olyhedral question design and analytic center estimation for metric aired-comarison tasks. Detailed mathematics are rovided in endix and oen-source code is available from htt://mitsloan.mit.edu/vc. We next describe the design and results of the Monte Carlo exeriments. Finally, we describe the field test and the comarative results. We close with a descrition of the launch of the lato bag, a summary of the findings, and suggestions for future research.. Polyhedral Question Design and Estimation We begin with a concetual descrition that highlights the geometry of the conjoint-analysis arameter sace. We illustrate the concets with a 3-arameter roblem because 3-dimensional saces are easy to visualize and exlain. The methods generalize easily to realistic roblems that contain ten, twenty, or even one hundred roduct features. Indeed, relative to existing methods, the olyhedral methods are roosed for larger numbers of roduct features. y a arameter, we refer to a artworth that needs to be estimated. For examle, twenty features with two levels each require twenty arameters because we can scale to zero the artworth of the least referred feature. Similarly, ten three-level features also require twenty arameters. Interactions among features require still more arameters. Suose that we have three features of an instant camera icture quality, icture taking (-ste vs. -ste), and styling covers (changeable vs. ermanent). If we scale the least desirable level of each feature to zero we have three non-negative arameters to estimate, u, u, and u 3, reflecting the additional utility (artworth) associated with the most desirable level of 49

50 each feature. 9 The measurement scale on which the questions are asked imoses natural boundary conditions. For examle, the sum of the artworths of Camera minus the sum of the artworths of Camera can be at most equal to the maximum scale difference. In ractice, the artworths only have relative meaning and so scaling allows us to imose a wide range of boundary conditions, without loss of generality. Therefore, in order to better visualize the algorithm, we imose a constraint that the sum of the arameters does not exceed some large number (e.g., 00). Under this constraint, rior to any data collection, the feasible region for the arameters is the 3-dimensional bounded olyhedron in Figure a. Suose that we ask the resondent to evaluate a air of rofiles that vary on one or more features and the resondent says () that he or she refers rofile C to rofile C and () rovides a rating, a, to indicate the strength of his or her reference. ssuming for the moment that the resondent answers without error, this introduces an equality constraint that the utility associated with rofile C exceeds the utility of C by an amount equal to the rating. If we define r T r u = ( u, u, u3) as the 3 vector of arameters, zl as the 3 vector of roduct features for the left rofile, and z r r as the 3 vector of roduct features for the right rofile, then, for additive utility, this equality constraint can be written as z u r z r r u r = a. We can use geometry to character- r l ize what we have learned from this resonse. r r r Secifically, we define x = z z such that x r is a 3 vector describing the difference be- l r tween the two rofiles in the question. Then, x rr u = a defines a hyerlane through the olyhedron in Figure a. The only feasible values of u r are those that are in the intersection of this hyerlane and the olyhedron. The new feasible set is also a olyhedron, but it is reduced by one dimension (-dimensions rather than 3-dimensions). ecause smaller olyhedra mean fewer arameter values are feasible, questions that reduce the size of the initial olyhedron as fast as ossible lead to more recise estimates of the arameters. 9 In this examle, we assume referential indeendence which imlies an additive utility function. We can handle interactions by relabeling features. For examle, a x interaction between two features is equivalent to one four-level feature. We hold to this convention throughout the aer. 50

51 Figure Resondent s nswers ffect the Feasible Region u + u + u 3 00 u 00 x r u r = a u intersection of r r a δ xu a +δ and u + u + u u u 00 u 3 u 3 (a) Metric rating without error (b) Metric rating with error However, in any real roblem we exect a resondent s answer to contain error. We can model this error as a robability density function over the arameter sace (as in standard statistical inference). lternatively, we can incororate imrecision in a resonse by treating the equality constraint x rr rr u = a as a set of two inequality constraints: a δ xu a + δ. In this case, the hyerlane defined by the question-answer air has width. The intersection of the initial olyhedron and the fat hyerlane is now a three-dimensional olyhedron as illustrated in Figure b. When we ask more questions we constrain the arameter sace further. Each question, if asked carefully, will result in a hyerlane that intersects a olyhedron resulting in a smaller olyhedron a thin region in Figure a or a fat region in Figure b. Each new questionanswer air slices the olyhedron in Figure a or b yielding more recise estimates of the arameter vector u r. We incororate rior information about the arameters by imosing constraints on the arameter sace. For examle, if u m and u h are the medium and high levels, resectively, of a feature, then we imose the constraint u m u h on the olyhedron. Previous research suggests that these tyes of constraints enhance estimation (Johnson 999; Srinivasan and Shocker 973). We 5

52 now examine question design for metric aired-comarison data by dealing first with the case in which subjects resond without error (Figure a). We then describe how to modify the algorithm to handle error (e.g., Figure b). Selecting Questions to Shrink the Feasible Set Raidly The question design task describes the design of the rofiles that resondents are asked to comare. Questions are more informative if the answers allow us to estimate artworths more quickly. For this reason, we select the resondent s next question in a manner that is likely to reduce the size of the feasible set (for that resondent) as fast as ossible. Consider for a moment a 0-dimensional roblem (without errors in the answers). s in Figure a, a question-based constraint reduces the dimensionality by one. That is, the first question reduces a 0-dimensional set to a 9-dimensional set; the next question reduces this set to an 8-dimensional set and so on. fter the twelfth question, for examle, we reach an 8- dimensional set: 8 dimensions = 0 arameters questions. Without further restriction, the feasible arameters are generally not unique any oint in the 8-dimensional set (olyhedron) is still feasible. However, the 8-dimensional set might be quite small and we might have a very good idea of the artworths. For examle, the first twelve questions might be enough to tell us that some features, say icture quality, styling covers, and battery life, have large artworths while other features, say folding caability, light selection, and film ejection method, have very small artworths. If this holds across resondents then, during an early hase of a roduct develoment rocess, the roduct develoment team might feel they have enough information to focus on the key features. lthough the olyhedral algorithm is designed for high-dimensional saces, it is hard to visualize 0-dimensional olyhedra. Instead, we illustrate the olyhedral question design method in a situation where the remaining feasible set is easy to visualize. Secifically, by generalizing our notation slightly to q questions and arameters, we define a r as the q vector of answers and X as the q matrix with rows equal to x r for each question (recall that x r is a vector). Then the resondent s answers to the first q questions define a (-q)-dimensional hyerlane given by the equation Xu = a. This hyerlane intersects the initial -dimensional oly- r r hedron to give us a (-q)-dimensional olyhedron. In the examle of =0 arameters and q=8 5

53 questions, the result is a -dimensional olyhedron that is easy to visualize. One such - dimensional olyhedron is illustrated in Figure 3. Figure 3 Choice of Question (-dimensional slice) ( x r, a ) ( x r, a ) r ( x, a ) ( x r, a ) (a) Two question-answer airs (b) One question, two otential answers Our task is to select questions that reduce the -dimensional olyhedron as fast as ossible. Mathematically, we select a new question vector, x r, and the resondent answers this question with a new rating, a. We add the new question vector as the last row of the question matrix and we add the new answer as the last row of the answer vector. While everything is really haening in -dimensional sace, the net result is that the new hyerlane will intersect the - dimensional olyhedron in a line segment (i.e., a -dimensional olyhedron). The sloe of the line will be determined by x r and the intercet by a. We illustrate two otential question-answer airs in Figure 3a. The sloe of the line is determined by the question, the secific line by the answer, and the remaining feasible set by the line segment within the olyhedron. In Figure 3a one of the question-answer airs ( x r, a ) reduces the feasible set more raidly than the other question-answer air ( x r, a ). Figure 3b reeats a question-answer air ( x r, a ) and illustrates an alternative answer to the same question ( x r, a ). 53

54 If the olyhedron is elongated as in Figure 3, then, in most cases, questions that imly line segments erendicular to the longest axis of the olyhedron are questions that result in the smallest remaining feasible sets. lso, because the longest axis is in some sense a bigger target, it is more likely that the resondent s answer will select a hyerlane that intersects the olyhedron. From analytic geometry we know that hyerlanes (line segments in Figure 3) are erendicular to their defining vectors ( x r ). Thus, we can reduce the feasible set as fast as ossible (and make it more likely that answers are feasible) if we choose question vectors that are arallel to the longest axis. For examle, both line segments based on x r in Figure 3b are shorter than the line segment based on x r in Figure 3a. If we can develo an algorithm that works in any -dimensional sace, then we can generalize this intuition to any question, q, such that q. fter receiving answers to the first q questions, we could find the longest vector of the (-q)-dimensional olyhedron of feasible arameter values. We could then ask the question based on a vector that is arallel to this axis. The resondent s answer creates a hyerlane that intersects the olyhedron to roduce a new olyhedron. We address later the cases where resondents answers contain error and where q>. Centrality Estimation Polyhedral geometry also gives us a means to estimate the arameter vector, u r, when q. Recall that, after question q, any oint in the remaining olyhedron is consistent with the answers the resondent has rovided. If we imose a diffuse rior that any feasible oint is equally likely, then we would like to select the oint that minimizes the exected error. This oint is the center of the feasible olyhedron, or more recisely, the olyhedron s center of gravity. The smaller the feasible set, either due to better question design or more questions (higher q), the more recise the estimate. If there were no resondent errors, then the estimate would converge to its true value when q= (the feasible set becomes a single oint, with zero dimensionality). For q> the same oint would remain feasible. s we discuss below, this changes when resonses contain error. This technique of estimating artworths from the center of a feasible olyhedron is related to that roosed by Srinivasan and Shocker (973,. 350) who suggest using a linear rogram to find the innermost oint that maximizes the minimum distance from the hyerlanes that bound the feasible set. Philosohically, the roosed olyhedral method makes maximum 54

55 use of the information in the constraints and then takes a central estimate based on what is still feasible. Carefully chosen questions shrink the feasible set raidly. We then use a centrality estimate that has roven to be a surrisingly good aroximation in a variety of engineering roblems. More generally, the centrality estimate is similar in some resects to the roven robustness of linear models, and in some cases, to the robustness of equally-weighted models (Dawes and Corrigan 974; Einhorn 97, Huber 975; Moore and Semenik 988; Srinivasan and Park 997). Interior-Point lgorithms and the nalytic Center of a Polyhedron To select questions and obtain intermediate estimates the roosed heuristics require that we solve two non-trivial mathematical rograms. First, we must find the longest axis of a olyhedron (to select the next question) and second, we must find the olyhedron s center of gravity (to rovide a centrality estimate). If we were to define the longest axis of a olyhedron as the longest line segment in the olyhedron, then one method to find the longest axis would be to enumerate the vertices of the olyhedron and comute the distances between the vertices. However, solving this roblem requires checking every extreme oint, which is comutationally intractable (Gritzmann and Klee 993). In ractice, solving the roblem would imose noticeable delays between questions. lso, the longest line segment in a olyhedron may not cature the concet of a longest axis. Finding the center of gravity of the olyhedron is even more difficult and comutationally demanding. Fortunately, recent work in the mathematical rogramming literature has led to extremely fast algorithms based on rojections within the interior of olyhedrons (much of this work started with Karmarkar 984). Interior-oint algorithms are now used routinely to solve large roblems and have sawned many theoretical and alied generalizations. One such generalization uses bounding ellisoids. In 985, Sonnevend demonstrated that the shae of a bounded olyhedron can be aroximated by roortional ellisoids, centered at the analytic center of the olyhedron. The analytic center is the oint in the olyhedron that maximizes the geometric mean of the distances to the boundaries of the olyhedron. It is a central oint that aroximates the center of gravity of the olyhedron, and finds ractical use in engineering and otimization. Furthermore, the axes of the ellisoids are well-defined and intuitively cature the concet of an 55

56 axis of a olyhedron. For more details see Freund (993), Nesterov and Nemirovskii (994), Sonnevend (985a, 985b), and Vaidja (989). Polyhedral Question Design and nalytic Center Estimation We illustrate the roosed rocess in Figure 4, using the same two-dimensional olyhedron deicted in Figure 3. The algorithm roceeds in four stes. We first find a oint in the interior of the olyhedron. This is a simle linear rogramming (LP) roblem and runs quickly. Then, following Freund (993) we use Newton s method to make the oint more central. This is a well-formed roblem and converges quickly to yield the analytic center as illustrated by the black dot in Figure 4. We next find a bounding ellisoid based on a formula that deends on the analytic center and the question-matrix, X. We then find the longest axis of the ellisoid (diagonal line in Figure 4) with a quadratic rogram that has a closed-form solution. The next question, x r, is based on the vector most nearly arallel to this axis. formal (mathematical) descrition of each ste is rovided in endix. nalytically, this algorithm works well in higher dimensional saces. For examle, Figure 5 illustrates the algorithm when ( q) = 3, where we reduce a 3-dimensional feasible set to a -dimensional feasible set. Figure 5a illustrates a olyhedron based on the first q questions. Figure 5b illustrates a bounding 3-dimensional ellisoid, the longest axis of that ellisoid, and the analytic center. The longest axis defines the question that is asked next which, in turn, defines the sloe of the hyerlane that intersects the olyhedron. One such hyerlane is shown in Figure 5c. The resondent s answer locates the secific hyerlane. The intersection of the selected hyerlane and the 3-dimensional olyhedron is a new -dimensional olyhedron, such as that in Figure 4. This rocess alies (in higher dimensions) from the first question to the th question. For examle, the first question imlies a hyerlane that cuts the first -dimensional olyhedron such that the intersection yields a ( )-dimensional olyhedron. The olyhedral algorithm runs extremely fast. We have imlemented the algorithm for the web-based emirical test described later in this aer. ased on this examle, with ten twolevel features, resondents noticed no delay in question design nor any difference in seed versus a fixed design. For a demonstration see the website referenced earlier. 56

57 Figure 4 ounding Ellisoid and the nalytic Center (-dimensions) Inconsistent Resonses and Error-Modeling Figures, 3, 4, and 5 illustrate the geometry when resondents answer without error. However, real resondents are unlikely to be erfectly consistent. It is more likely that, for some q <, the resondent s answers will be inconsistent and the olyhedron will become emty. That is, we will no longer be able to find any arameters, u r, that satisfy the equations that define r r the olyhedron, Xu = a. Thus, for real alications, we extend the olyhedral algorithm to address resonse errors. Secifically, we adjust the olyhedron in a minimal way to ensure that some arameter values are still feasible. We do this by modeling errors,δ r, in the resondent s r r r r r answers such that a δ Xu a + δ (recall Figure b). We then choose the minimum errors such that these constraints are satisfied. This same modification covers estimation for the case of q >. endix rovides the mathematical rogram (OPT4) that we use to estimate u r and r δ. The algorithm is easily modified to incororate alternative error formulations, such as least- 57

58 squares or minimum sum of absolute deviations, rather than this minimax criterion. 0 Exloratory simulations suggest that the algorithm is robust to the choice of error criterion. Figure 5 Question design with a 3-Dimensional Polyhedron (a) Polyhedron in 3 dimensions axis (b) ounding ellisoid, analytic center, and longest (c) Examle hyerlane determined by question vector and resondent s answer To imlement this olicy for analytic center estimation, we use a two-stage algorithm. In the first stage we treat the resonses as if they occurred without error the feasible olyhedron shrinks raidly and the analytic center is a working estimate of the true arameters. However, as soon as the feasible set becomes emty, we adjust the constraints by adding or subtracting er- 0 Technically, the minimax criterion is called the -norm. To handle least-squares errors we use the - norm and to handle average absolute errors we use the -norm. Either is a simle modification to OPT4 in endix. 58

59 rors, where we choose the minimum errors, r δ, for which the feasible set is non-emty. The analytic center of the new olyhedron becomes the working estimate and r δ becomes an index of resonse error. s with all of our heuristics, the accuracy of our error-modeling method is tested with simulation. While estimates based on this heuristic seems to converge for the domains that we test (reduce mean errors, no measured bias), we recognize the need for further exloration of this heuristic, together with the develoment of a formal error theory. ddressing Other Practical Imlementation Issues Imlementation raises several additional issues. lternative solutions to these issues may yield more or less accurate arameter estimates, and so the erformance of the olyhedral methods in the validation tasks are lower bounds on the erformance of this class of methods. Product rofiles with discrete features. In most conjoint analysis roblems the features are secified at discrete levels, as in Figure. This constrains the elements of the xr vector to be,, 0, or 0, deending on whether the left rofile, the right rofile, neither rofile, or both rofiles have the high feature, resectively. In this case we choose the vector that is most nearly arallel to the longest axis of the ellisoid. ecause we can always recode multilevel features or interacting features as binary features, the geometric insights still hold even if we otherwise simlify the algorithm. Restrictions on question design. For a -dimensional roblem we may wish to vary fewer than features in any aired-comarison question. For examle, Sawtooth Software (996,. 7) suggests that: Most resondents can handle three attributes after they ve become familiar with the task. Exerience tells us that there does not seem to be much benefit from using more than three attributes. We incororate this constraint by restricting the set of questions over which we search when finding a question-vector that is arallel to the longest axis of the ellise. First question. Unless we have rior information before any question is asked, the initial olyhedron of feasible utilities is defined by the boundary constraints. If the boundary constraints are symmetric, the olyhedron is also symmetric and the olyhedral method offers r y construction, δ grows (weakly) with the number of questions, q, thus a better measure of fit might be mean error, the error divided by the number of questions an analogy to mean-squared error in regression. 59

60 little guidance for the choice of the first question. In these situations we choose the first question for each resondent so that it hels imrove estimates of the oulation means by balancing how often each feature level aears in the set of questions answered by all resondents. In articular, for the first question resented to each resondent we choose feature levels that aeared infrequently in the questions answered by revious resondents. Question design when the arameter set becomes infeasible. nalytic center estimation is well-defined when the arameter set becomes infeasible, but question design is not. Thus, in the simulations we use a random question design heuristic when the arameter set is infeasible. This rovides a lower bound on what might be achieved. Programming. The otimization algorithms used for the simulations are written in Matlab and are available at the website cited earlier. We also rovide the simulation code and demonstrations of web-based alications. ll code is oen-source. 3. Monte Carlo Simulations Polyhedral methods for conjoint analysis are new and untested. lthough interior-oint algorithms and the centrality criterion have been successful in many engineering roblems, we are unaware of any rior alication to marketing roblems. Thus, we turn first to Monte Carlo exeriments to identify circumstances in which olyhedral methods may contribute to the effectiveness of current methods. Monte Carlo simulations offer at least three advantages for the initial test of a new method. First, they facilitate comarison of different techniques in a range of domains such as varying levels of resondent heterogeneity and resonse accuracy. We can also evaluate combinations of the techniques, for examle, mixing olyhedral question design with extant estimation methods. Second, simulations resolve the issue of identifying the correct answer. In studies involving actual customers, the true artial utilities are unobserved. In simulations the true artial utilities are constructed so that we can comare how well alternative methods identify the true utilities from noisy resonses. Finally, other researchers can readily relicate the findings. However, simulations do not guarantee that real resondents behave as simu- Earlier imlementations, including the field test, used C question design when the arameter set became infeasible. Further analysis revealed that it is better to switch to random question design than C question design when the arameter set becomes infeasible. Fortunately, this makes the erformance of olyhedral question design in the field test conservative. 60

61 lated nor do they reveal which domain is likely to best summarize field exerience. Thus, following the simulations, we examine a field test that matches one of the simulated domains. Many aers have used the relative strengths of Monte Carlo exeriments to study conjoint techniques, roviding insights on interactions, robustness, continuity, feature correlation, segmentation, new estimation methods, new data-collection methods, ost-analysis with Hierarchical ayes methods, and comarisons of C, CC, and other conjoint methods. lthough we focus on secific benchmarks, there are many comarisons in the literature of these benchmarks to other methods (see reviews and citations in Green 984; Green, Krieger, and Wind 00, 00; Green and Srinivasan 978, 990; Hauser and Rao 003; Moore 003.) We test olyhedral question design versus three question design benchmarks and analytic center estimation versus Hierarchical ayes estimation. The initial simulations vary resondent heterogeneity, accuracy of resondent answers, and the number of questions. In a second set of simulations we also consider the role of self-exlicated resonses and vary the accuracy of selfexlicated resonses. Resondent Heterogeneity, Resonse Errors, and Number of Questions We focus on a design roblem involving ten features, where a roduct develoment team is interested in learning the incremental utility contributed by each feature. We follow convention and scale to zero the artworth of the low level of a feature and, without loss of generality, bound it by 00. This results in a total of ten arameters to estimate ( = 0). We anticiate that the olyhedral methods are articularly well-suited to solving roblems in which there are a large number of arameters relative to the number of resonses from each individual (q < ). Thus, we vary the number of questions from slightly less than the number of arameters (q = 8) to comfortably more than the number of arameters (q = 6). We simulate each resondent s artworths by drawing indeendently and randomly from a normal distribution with mean 50 and variance σ u, truncated to the range. We exlored the sensitivity of the findings to this secification by testing different methods of drawing artworths, including beta distributions that tend to yield more similar artworths (inverted-u shae distributions), more diverse artworths (U-shaed distributions), or moderately diverse artworths (uniform distributions). Sensitivity analyses for key findings did not suggest much variation. Nonetheless, this is an imortant area for more systematic future research. y maniulat- 6

62 ing the standard deviation of the normal distribution we exlore a relatively homogeneous oulation ( σ u = 0) and a relatively heterogeneous oulation ( σ u = 30). These values were chosen because they are comarable to those used elsewhere in the literature, because they result in moderate-to-low truncation, and because their range illustrates how the accuracy of the methods varies with heterogeneity. To simulate the resonse to each metric aired-comarison (PC) question, we calculate the true utility difference between each air of roduct rofiles by multilying the design vector by the vector of true artworths: x r u r. We assume that the resondents answers to the questions equal the true utility difference lus a zero-mean normal resonse error with variance. The assumtion of normally distributed error is common in the literature and aears to be a reasonable assumtion about PC resonse errors (Wittink and Cattin 98 reort no systematic effects due to the tye of error distribution assumed). We select resonse errors, comarable to those used in the literature. Secifically, to illustrate the range of resonse errors we use both a low resonse error ( σ c = 0) and a high resonse error ( σ c = 40). 3 For each comarison, we simulate 500 resondents (in five sets of 00). Question Design enchmarks We comare the Polyhedral question design method against three benchmarks: Random question design, a Fixed design, and the question design used by dative Conjoint nalysis (C). 4 σ c For the Random benchmark, the feature levels are chosen randomly and equallylikely. The Fixed design rovides another non-adative benchmark. For q >, we select the (q = 6) design with an algorithm that seeks the highest obtainable D-efficiency (Kuhfield, Tobias, and Garratt 994). Efficiency is not defined for q <, thus, for q = 8, we follow the rocedure established by Lenk, et. al. (996) and choose questions randomly from an efficient design for q = 6. We choose C question design as our third benchmark because it is the industry and academic standard for within-resondent adative question design. For examle, in 99 Green, 3 Resonse errors in the literature, often reorted as a ratio of error variance to true variance (heterogeneity), vary considerably. In our case, the resondent s answer, a, is the difference between the sum of the u f s. σ u Thus the variance of a is a multile of. For our situation, the ercent errors vary from 8% to 57%. 4 eginning in this section we caitalize the question design and estimation methods for easy reference. We retain lower case for generic descritions. 6

63 Krieger and garwal (. 5) stated that in the short san of five years, Sawtooth Software s dative Conjoint nalysis has become one of the industry s most oular software ackages for collecting and analyzing conjoint data, and go on to cite a number of academic aers on C. lthough accuracy claims vary, C aears to redict reasonably well in many situations (Johnson 99; Orme 999). The C method includes five sections: an unaccetability task (that is often skied), a ranking of the features, a series of self-exlicated (SE) questions, the metric aired-comarison (PC) questions, and urchase intentions for calibration concets. The question design rocedure, has not changed since it was originally rogrammed for the le II comuter in the late 70s (Orme and King 00). It adats the PC questions based on intermediate estimates (after each question) of the artworths. These intermediate estimates are based on an OLS regression using the SE and PC resonses and ensure that the airs of rofiles are nearly equal in estimated utility (utility balance). dditional constraints restrict the overall design to be nearly orthogonal (features and levels are resented indeendently) and balanced (features and levels aear with near equal frequency). To avoid handicaing the C question design in the initial simulations, we simulate the SE resonses without adding error. In articular, Sawtooth Software asks for SE resonses using a 4-oint scale, in which the resondent states the relative imortance of imroving the roduct from one feature level to another (e.g., adding automatic film ejection to an instant camera). We set the SE resonses equal to the true artworths, but discretize the answer to match the C scale. Our code was written using Sawtooth Software s documentation together with interactions with the comany s reresentatives. We then confirmed the accuracy of the code by asking Sawtooth Software to re-estimate artworths for a small samle of data. Estimation enchmark The two estimation methods are the nalytic Center (C) method described earlier and Hierarchical ayes (H) estimation. Hierarchical ayes estimation uses data from the oulation to inform the distribution of artworths across resondents and, in doing so, estimates the osterior mean of resondent-level artworths with an algorithm based on Gibbs samling and the Metroolis Hastings lgorithm (llenby and Rossi 999; rora, llenby and Ginter 998; John- 63

64 son 999; Lenk, et. al. 996; Liechty, Ramaswamy and Cohen 00; Sawtooth Software 00; Yang, llenby and Fennell 00). For C question design, Sawtooth Software recommends H as their most accurate estimation method (Sawtooth Software 00,. ). For this initial comarison, for all question design methods, we use data from the SEs as starting values and we use the SEs to constrain the rank-order of the levels for each feature (Sawtooth Software 00,. 3). 5 Criterion To comare the erformance of each benchmark we calculate the mean absolute accuracy of the arameter estimates (true vs. estimated values averaged across arameters and resondents). We chose to reort mean absolute error (ME) rather than root mean squared error (RMSE) because the former is less sensitive to outliers and is more robust over a variety of induced error distributions (Hoaglin, Mosteller and Tukey 983; Tukey 960). However, as a ractical matter, the qualitative imlications of our simulations are the same for both error measures. Indeed, excet for a scale change, the results are almost identical for both ME and RMSE. This is not surrising; for normal distributions the two measures differ only by a factor of (/π) /. The results are based on the average of five simulations, each with 00 resondents. To reduce unnecessary variance among question design methods, we first draw the artworths and then use the same artworths to evaluate each question design method. The use of multile draws makes the results less sensitive to surious effects from a single draw. 4. Results of the Initial Monte Carlo Exeriments We begin with the results obtained from using eight (q = 8) aired comarison questions. This is the tye of domain for which Polyhedral question design and nalytic Center estimation were develoed (more arameters to estimate than there are questions). Moreover, within this domain there are generally a range of artworths that are feasible, and so the olyhedron is not emty. In our simulations the olyhedron contains feasible answers for an average of 7.97 questions when resonse errors are low. When resonse errors are high, this average dros to nother version of Sawtooth Software s H algorithm also uses the SEs to constraint the relative artworths across features. We test this version in our next set of simulations. This enables us to isolate the imact of the aired-comarison question design algorithm. 64

65 Table reorts the ME in the estimated artworths for a comlete crossing of question design methods, estimation methods, resonse error, and heterogeneity. The best results (lowest error) in each column are indicated by bold text. In Table we reorganize the data to indicate the directional imact of either heterogeneity or resonse errors on the erformance of the question design and estimation methods. In articular, we average the erformance of each question design method across estimation methods (and vice versa). To indicate the directional effect of heterogeneity we average across resonse errors (and vice versa). Table Comarison of Question Design and Estimation Methods for q = 8 Mean bsolute Errors Question design Estimation Homogeneous Poulation Low Resonse Error High Resonse Error Heterogeneous Poulation Low Resonse Error High Resonse Error Random C H Efficient Fixed C H 7.8 * C C H * Polyhedral C * 9.7 * H 7.8 * 9.9 * 0.6. Smaller numbers indicate better erformance. * For each column, lowest error or not significantly different from lowest ( < 0.05). ll others are significantly different from lowest. 65

66 Table Directional Imlications of Resonse Errors and Heterogeneity for q = 8 Mean bsolute Errors Question design Homogeneous Poulation Heterogeneous Poulation Low Resonse Error High Resonse Error Random Efficient Fixed C Polyhedral.3 * 8.4 *.9 * 8. * Estimation C * 4.3 *. H 9.0 * * 6. * Smaller numbers indicate better erformance. *For each column within question design or estimation, lowest error or not significantly different from lowest ( < 0.05). ll others are significantly different from lowest. Question Design Methods The findings indicate that when there are only a small number of PC questions, the Polyhedral question design method erforms well comared to the other three benchmarks. This conclusion holds across the different levels of resonse error and heterogeneity. The imrovement over the Random question design method is reassuring, but erhas not surrising. The imrovement over the Fixed method is also not surrising when there are a small number of questions, as it is not ossible to achieve the balance and orthogonality goals that the fixed method seeks. The comarison with C question design is more interesting. Further investigation reveals that the relatively oor erformance of the C method can be attributed, in art, to endogeneity bias, resulting from utility balance the method that C uses to adat questions. To understand this result we first recognize that any adative question design method is, otentially, subject to endogeneity bias. Secifically, the qth question deends uon the answers to the first q- questions. This means that the qth question deends, in art, on any resonse errors in the first q- questions. This is a classical roblem, which often leads to bias (see for examle Judge, et. al. 985,. 57). Thus, adativity reresents a tradeoff: we get better estimates more quickly, 66

67 but with the risk of endogeneity bias. In our simulations, the absolute bias with C questions and C estimation is aroximately 6.6% of the mean when averaged across domains. This is statistically significant, in art, because of the large samle size in the simulations. Polyhedral question design is also adative and it, too, could lead to biases. However, in all four domains, the bias for C questions is significantly larger than the bias for Polyhedral questions (.0% on average for C). The endogeneity bias in C questions aears to be from utility-balanced question design; it is not removed with H estimation. While further analyses of endogeneity bias are beyond the scoe of this aer, they reresent an interesting toic for future research. In articular, it might be ossible to derive estimation methods that correct for these endogeneity biases. Estimation Methods For homogeneous oulations, Hierarchical ayes consistently erformed better than nalytic Center estimation, irresective of the question design method. The erformance differences were generally large. Hierarchical ayes estimation uses oulation-level data to moderate individual estimates. If the oulation is homogenous, then, at the individual level, the ratio of noise to true variation is higher and so moderating this variance through oulation-level data imroves accuracy. However, if the oulation is heterogeneous, then reliance on oulation data makes it more difficult to identify the true individual-level variation and nalytic Center estimation does better. For a heterogeneous oulation, the combination of Polyhedral question design and nalytic Center estimation was significantly more accurate than any other combination of question design or estimation method (Table ). The findings also suggest that Hierarchical ayes is relatively more accurate when resonse errors are high, while nalytic Center estimation is more likely to be favored when resonse errors are low. The reliance of Hierarchical ayes on oulation-level data may also exlain the role of resonse errors. If resonse errors are large, much of the individual-level variance is due to noise. Poulation-level data are less sensitive to resonse errors (due to aggregation) and so reliance on this data hels to imrove accuracy. On the other hand, when resonse errors are low, the olyhedron stays feasible longer and the nalytic Center method aears to do a better job of identifying individual-level variation. 67

68 dditional Paired-Comarison Questions lthough olyhedral methods were develoed rimarily for situations with only a relatively small number of questions, there remain imortant alications in which a larger number of questions can be asked of each resondent. To examine whether the otential accuracy advantages of olyhedral methods for low q leads to a loss of accuracy at high q, we re-examined the erformance of each method after sixteen aired-comarison questions (q = 6). Recall that the Polyhedral method is only used to design questions when the olyhedron contains feasible resonses. For low resonse errors the olyhedron is tyically emty after 8 questions, while for high resonse errors this generally occurs at around 6 or 7 questions. Once the olyhedron is emty we choose questions randomly. ecause the Polyhedral question design method is only resonsible for around half of the questions we use the label, Poly/Random. The findings are reorted in Table 3, which is analogous to the revious Table. They reveal the emergence of Fixed question design methods in some domains. sking a larger number of questions results in more comlete coverage of the arameter sace. This increases the imortance of orthogonality and balance the criteria used in efficient Fixed question design. With more comlete coverage, the ability to customize questions to focus on secific regions of the question sace becomes less imortant, mitigating the advantage offered by adative techniques. However, even after sixteen questions there remain domains in which Polyhedral question design can imrove erformance. The Poly/Random method aears to be at least as accurate as the Fixed design when the oulation is homogenous and/or resonse errors are high. Its advantage for low q does not seem to be articularly harmful for high q, esecially for high resonse errors. Table 3 also suggests that nalytic Center estimation remains a useful estimation rocedure when oulations are heterogeneous and/or resonse errors are low. We note that this result is consistent with ndrews, nsari, and Currim (00,. 87) who conclude individuallevel models overfit the data. They test OLS rather than the nalytic Center method and do not test adative methods. 68

69 Table 3 Directional Imlications of Resonse Errors and Heterogeneity for q = 6 Mean bsolute Errors Question design Homogeneous Poulation Heterogeneous Poulation Low Resonse Error High Resonse Error Random Efficient Fixed * 8.8 * 4.6 C Poly/Random 9. * * Estimation Method C * 9.9 * 6.4 H 8. * * Smaller numbers indicate better erformance. *For each column within question design or estimation, lowest error or not significantly different from lowest ( < 0.05). ll others are significantly different from the lowest. In summary, our Monte Carlo exeriments suggest that there are domains in which Polyhedral question design and/or nalytic Center estimation imrove the accuracy of conjoint analysis, but there are also domains better served by extant methods. Secifically, Polyhedral question design shows romise for low number of questions, such as the fuzzy front-end of roduct develoment and/or web-based interviewing. For larger numbers of questions, efficient Fixed designs aear to be best, but Poly/Random question design does well, esecially when resonse errors are high and oulations are homogenous. nalytic Center estimation shows romise for heterogeneous oulations and/or low resonse errors where the advantage of an individual-resondent focus is strongest. Hierarchical ayes estimation is referred when oulations are more homogeneous and resonse errors are large. 5. The Role of Self-Exlicated Questions Hybrid conjoint models refer to methods that combine both comositional methods, such as self-exlicated (SE) questions, and decomositional methods, such as metric airedcomarison (PC) questions, to roduce new estimates. lthough there are instances in which 69

70 methods that use just one of these data sources outerform or rovide equivalent accuracy to hybrid methods, there are many situations and roduct categories in which hybrid methods imrove accuracy (e.g., Green 984). n imortant hybrid from the ersective of evaluating olyhedral methods is C the most widely used method for adative metric aired-comarison questions. While C s question design algorithm has remained constant since the late 970s, its estimation rocedures have evolved to address the incommensurability of the SE and PC scales. Its default estimation rocedure relies on an ordinary least squares (OLS) regression that weighs the SE and the PC data in roortion to the number of questions asked (Sawtooth Software 00, Version 5). 6 We label the current version weighted hybrid estimation and denote it by the acronym WHSE (the SE suffix indicates reliance on SE data). Sawtooth Software also incororates the SE resonses in their Hierarchical ayes estimation rocedure by using the SEs to constrain the estimates of artworths both within a feature and between features to satisfy the ordinal conditions imosed by the SE data. For examle, if a resondent s resonses to the SE questions indicate that icture quality is more imortant than battery life, then the Hierarchical ayes arameters are restricted to satisfying this condition. We denote this algorithm with the acronym HSE to indicate that the SE resonses lay a larger role in the estimation. We also create a olyhedral hybrid by extending C estimation to incororate SE resonses. To do so, we introduce constraints on the feasible olyhedron similar to those used by HSE. For examle, we imose a condition that icture quality is more imortant than battery life by using an inequality constraint on the olyhedron to exclude oints in the artworth sace that breach this condition. When the olyhedron becomes emty, we extend OPT4 to incororate both the PC and SE constraints. We distinguish this method from the nalytic Center method by adding a suffix to the acronym: CSE. To comare WHSE, HSE, and CSE to their urebred rogenitors, we must consider the accuracy of the SE data. If the SE data are erfectly accurate, then a model based on SEs alone will redict erfectly and the hybrids would be almost as accurate. On the other hand, if the SEs are extremely noisy, then the hybrids may actually redict worse than methods that do 6 Earlier versions of C either weighed the scales equally (Version 3) or selected weights to fit urchaseintention questions (Version 4). 70

71 not use SE data. To examine these questions, we undertook a second set of simulation exeriments. To simulate SE resonses we assume that resondents answers to SE questions are unbiased but imrecise. In articular, we simulate resonse error in the SE answers by adding to the vector of true artworths, u r, a vector of indeendent identically-distributed normal error terms with variance σ se. We simulate two levels of SE resonse error low error relative to PC resonses ( σ =0 ) and high error relative to PC resonses ( σ = 70 ), truncating the SEs to a 0-to- se 00 scale. 7 We exect that these benchmarks should bound emirical situations. Recall that in the first set of simulations we assumed no SE errors ( σ = 0 ), but discretized the scale. For consistency, we also use a discrete scale when there are non-zero SE errors. ased on these SE errors, we redo the simulations for each level of PC resonse errors and heterogeneity. We summarize the results with Table 4 for lower numbers of questions (q = 8), where we reort the most accurate question-design/estimation methods for each level of resonse error and heterogeneity. For ease of comarison, the earlier results (from Table ) are summarized in the column labeled Initial Simulations. There are three results of interest. First, when the SEs are more accurate than the PCs, then the hybrids do well. In this situation, the PC question design method matters less: Polyhedral, Fixed, and Random hybrids are not significantly different in accuracy. Second, the insights obtained from Tables -3 for oulation-level versus individual-level estimation continue to hold: H or HSE do well in homogeneous domains while C or WHSE do well in heterogeneous domains. Third, when the SEs are noisy relative to the PCs, then the hybrid methods do not do as well as the urebred methods. Indeed, we exect a crossover oint at some intermediate level of relative accuracy. Table 4 also highlights the emergence of WHSE in some domains. 8 Of the hybrids tested, WHSE is the only method that makes use of the interval-scale roerties of the SEs. These metric roerties aear to hel when the signal-to-noise ratio is high (more variation in se se 7 For high SE errors truncation is aroximately 50%; for low SE errors truncation is aroximately 0% and %, resectively, for low and high heterogeneity. This is consistent with our maniulation of high vs. low information content of the SEs. 8 If WHSE were not available, Polyhedral CSE is best for low resonse errors and Polyhedral HSE is best for high resonse errors. These results suggest that there is room for future research on how best to incororate SE data with C estimation. 7

72 true artworths, less error in the SEs). This result suggests that other methods which use interval-scaled roerties of the SEs should do well in these domains a toic for further hybrid develoment (e.g., Ter Hofstede, Kim, and Wedel 00). In summary, as in biology, where genetically-diverse offsring often have traits suerior to their urebred arents, heterosis in conjoint analysis imroves redictive accuracy in some domains. Furthermore, Polyhedral question design remains romising in these domains and many of the insights from our earlier simulations still hold for hybrid methods. Finally, we exect and obtain analogous results for larger numbers of questions (q = 6). We could identify no additional insight beyond Tables -4. Table 4 The Imact of Self-Exlicated (SE) Questions (q = 8) Heterogeneity Resonse Errors Initial Simulations (no SEs) Homogeneous Low error Polyhedral H Fixed H High error Polyhedral H C H Relatively ccurate SEs Polyhedral HSE Fixed HSE Random HSE Polyhedral HSE Fixed HSE Random HSE Heterogeneous Low error Polyhedral C Polyhedral WHSE Fixed WHSE Random WHSE High error Polyhedral C Polyhedral WHSE Fixed WHSE Random WHSE Relatively Noisy SEs Polyhedral H Fixed H Polyhedral HSE Polyhedral C Polyhedral C 6. Emirical lication and Test of Polyhedral Methods While tests of internal validity are common in the conjoint-analysis literature, tests of external validity at the individual level are rare. 9 search of the literature revealed four studies that redict choices in the context of natural exeriments and one study based on a lottery choice. 9 Some researchers reort aggregate redictions relative to observed market share. See ucklin and Srinivasan (99), Currim (98), Green and Srinivasan (978), Griffin and Hauser (993), Hauser and Gaskin (984), McFadden (000), Page and Rosenbaum (989), and Robinson (980). 7

73 Wittink and Montgomery (979), Srinivasan (988), and Srinivasan and Park (997) all use conjoint analysis to redict M job choice. Samles of 48, 45, and 96 student subjects, resectively, comleted a conjoint questionnaire rior to acceting job offers. The methods were comared on their ability to redict actual job choices. First reference redictions ranged from 64% to 76% versus random-choice ercentages of 6-36%. In another natural exeriment, Wright and Kriewall (980) used conjoint analysis (Linma) to redict college alications by 0 families. They were able to correctly redict 0% of the alications when families were romted to think seriously about the features measured in conjoint analysis; 5% when they were not. This converts to a 6% imrovement relative to their null model. Leigh, MacKay and Summers (984) allocated undergraduate business majors randomly to twelve different conjoint tasks designed to measure artworths for five features. Resondents indicated their references for ten calculators offered in lottery. There were no significant differences among methods with firstreference redictions in the range of 6-4% and ercentage imrovements of 8%. The authors also comared the erformance of estimates based solely on SE resonses and observed similar erformance to the conjoint methods. In this section, we test olyhedral methods with an emirical test involving an innovative lato-comuter carrying bag. Our test differs from the natural exeriment studies because it is based on a controlled exeriment in which we chose areto sets of roduct features. t the time of our study, the roduct was not yet on the market and so resondents had no rior exerience with it. The bag includes a range of searable roduct features, such as the inclusion of a mobile-hone holder, side ockets, or a logo. We focused on nine roduct features, each with two levels, and included rice as a tenth feature. Price is restricted to two levels ($70 and $00) the extreme rices for the bags in both the internal and external validity tests. We estimated the artworths associated with rices between $70 and $00 by linearly interolating. more detailed descrition of the roduct features can be found on the website cited earlier in this aer. ecause C is the dominant industry method for adative question design, we chose a roduct category where we exected C to erform well a category where searable roduct features would lead to moderately accurate SE resonses. We anticiate that SE resonses are more accurate in categories where customers make urchasing decisions about features searately by choosing from a menu of features. In contrast, we exect SE resonses to be less accurate for roducts where the features are tyically bundled together, so that customers have little 73

74 exerience in evaluating the imortance of the individual features. If Polyhedral question design and/or estimation does well in this category, then, based on Table 4, we exect it to do well in categories where SE resonses are less accurate. Research Design Subjects were randomly assigned to one of the three conjoint question design methods: Polyhedral ( cells), Fixed, or C. We omitted Random question design because the Fixed question design method dominates Random design in Tables and 3. fter comleting the resective conjoint tasks, all of the resondents were resented with the same validation exercises. The internal validation exercise involved four holdout metric aired-comarison (PC) questions, which occurred immediately after the sixteen PC questions designed by the resective conjoint methods. The external validation exercise was the selection of a lato comuter bag from a choice set of five bags. This exercise occurred in the same session as the conjoint tasks and holdout questions, but was searated from these activities by a filler task designed to cleanse memory (see Table 5). Table 5 Detailed Research Design Row Polyhedral Fixed Polyhedral C Self-exlicated Self-exlicated Polyhedral aired comarison Fixed aired comarison Polyhedral aired comarison C aired comarison 3 Internal validity task Internal validity task Internal validity task Internal validity task 4 Purchase intentions Purchase intentions 5 Filler task Filler task Filler task Filler task 6 External validity task External validity task External validity task External validity task 74

75 Conjoint Tasks Recall that C requires five sets of questions. Pretests confirmed that all of the features were accetable to the target market, allowing us to ski the unaccetability task. This left four remaining tasks: ranking of levels within features, self-exlicated (SE) questions, metric airedcomarison (PC) questions, and urchase intention (PI) questions. C uses the SE questions to select the PC questions, thus the SE questions in C must come first, followed by the PC questions and then the PI questions. To test C fairly, we adoted this question order for the C condition. The Fixed and Polyhedral question design techniques do not require SE or PI questions. ecause asking the SE questions first could create a question-order effect, we asked only PC questions (not the SE or PI questions) rior to the validation task in the Fixed condition. To investigate the question-order effect we included two olyhedral data collection rocedures: one that matched the Fixed design (Polyhedral ) and one that matched C (Polyhedral ). In Polyhedral only PC questions receded the validation task, while in Polyhedral, all of the questions receded the validation task. This enables us to (a) exlore whether the SE questions affect the resonses to the PC questions and (b) evaluate the hybrid estimation methods that combine data from PC and SE questions. 30 The comlete research design, including the question order, is summarized in Table 5. Questions associated with the conjoint tasks are highlighted in green (Rows, and 4), while the validation tasks are highlighted in yellow (Rows 3 and 6). The filler task is highlighted in blue (Row 5). In this design, Polyhedral can be matched with Fixed; Polyhedral can be matched with C. Internal Validity Task: Holdout PC Questions Each of the question design methods designed sixteen metric aired-comarison (PC) questions. Following these questions, resondents answered four holdout PC questions a test used extensively in the literature. The holdout rofiles were randomly selected from an indeendent efficient design of sixteen rofiles and did not deend on rior answers by that reson- 30 lthough the SE resonses are collected in the Polyhedral condition, they are not used in nalytic Center estimation or Polyhedral question design. However, they do rovide the oortunity to test hybrid estimation methods. 75

76 dent. There was no searation between the sixteen initial questions and the four holdout questions, so that resondents were not aware that the questions were serving a different role. Filler Task The filler task was designed to searate the conjoint tasks and the external validity task. It was hoed that this searation would mitigate any memory effects that might influence how accurately the information from the conjoint tasks redicted which bags resondents chose in the external validity tasks. The filler task was the same in all four exerimental conditions and comrised a series of questions asking resondents about their satisfaction with the survey questions. There was no significant difference in the resonses to the filler task across the four conditions. External Validity Task: Final ag Selection Resondents were told that they had $00 to send and were asked to choose among five bags. The five bags shown to each resondent were drawn randomly from an orthogonal fractional factorial design of sixteen bags. This design was the same across all four exerimental conditions, so that there was no difference, on average, in the bags shown to resondents in each condition. The five bags were also indeendent of resonses to the earlier conjoint questions. The rice of the bags varied between $70 and $00 reflecting the difference in the anticiated market rice of the features included with each bag. y ricing the bags in this manner we ensured that the choice set reresented a Pareto frontier, as recommended by Elrod, Louviere, and Davey (99), Green, Helsen and Shandler (988), and Johnson, Meyer and Ghosh (989). Resondents were instructed that they would receive the bag that they chose. If the bag was riced at less than $00, they were romised cash for the difference. In order to obtain a comlete ranking, we told resondents that if one or more alternatives were unavailable, they might receive a lower ranked bag. The age used to solicit these rankings is resented in Figure 76

77 6. 3 t the end of the study the chosen bags were distributed to resondents together with the cash difference (if any) between the rice of the selected bag and $00. Self-Exlicated and Purchase Intention Questions The self-exlicated questions asked resondents to rate the imortance of each of the ten roduct features. For a fair comarison to C, we used the wording for the questions, the (four-oint) resonse scale, and the algorithm for rofile selection roosed by Sawtooth Software (996). For the urchase intention questions, resondents were shown six bags and we asked how likely they were to urchase each bag. We adoted the wording, resonse scale and algorithms for rofile selection suggested by Sawtooth Software. Figure 6 Resondents Choose and Kee a Lato Comuter ag 3 We acknowledge two tradeoffs in this design. The first is an endowment effect because we endow each resondent with $00. The second is the lack of a no bag otion. While both are interesting research oortunities and quite relevant to market forecasting, a riori neither should favor one of the three methods relative to the other; we exect no interaction between the endowment/forced-choice design and PC question design and leave such investigations to future research. However, the forced choice design might add noise to the most-accurate method relative to less-accurate methods. This would make it more difficult to achieve significant differences and is, thus, conservative. Pragmatically, we designed the task to maximize the ower of the statistical comarisons of the four treatments. The forced-choice also heled to reduce the (substantial) cost of this research. 77

78 Subjects The subjects (resondents) were first-year M students. They were not informed about the objectives of the study, nor had they taken a course in which conjoint analysis was taught in detail. We received 330 comlete resonses (there was one incomlete resonse) from an invitation to 360 students a resonse rate of over 9%. Pure random assignment (without quotas) yielded 80 subjects for the C condition, 88 for the Fixed condition, and 6 for the Polyhedral conditions broken out as 88 for the standard question order (Polyhedral ) and 74 for the alternative question order (Polyhedral ). The questionnaires were retested on a total of 69 subjects drawn from rofessional market research and consulting firms, former students, graduate students in Oerations Research, and second-year students in an advanced marketing course that studied conjoint analysis. The retests were valuable for fine-tuning the question wording and the web-based interfaces. y the end of the retest, resondents found the questions unambiguous and easy to answer. Following standard scientific rocedures, the retest data were not merged with the exerimental data. However, analysis of this small samle suggests that the findings agree directionally with those reorted here, albeit not at the same level of significance. 78

79 Figure 7 Examle Screens from Questionnaires (a) Price as change from $00 (b) Introduction of sleeve feature (c) Metric aired-comarison (PC) question (d) Self-exlicated (SE) questions dditional Details Figure 7 illustrates some of the key screens in the conjoint analysis questionnaires. In Figure 7a resondents are introduced to the rice feature. Figure 7b illustrates one of the dichotomous features the closure on the sleeve. This is an animated screen that rovides more detail as resondents move their ointing devices ast the icture. Figure 7c illustrates one of the PC tasks. Resondents were asked to rate their relative reference for two rofiles that varied on three features. oth text and ictures were used to describe the rofiles. In the ictures, features that did not vary between the roducts were chosen to coincide with the resondent s references for feature levels obtained in the tasks such as Figure 7b. The format was identical 79

80 for all four exerimental treatments. Finally, Figure 7d illustrates the first three self-exlicated questions. The full questionnaires for each treatment are available on the website cited earlier in this aer. We note that some of these website imrovements (e.g., dynamically changing ictures) are not standard in Sawtooth Software s imlementation, thus, our tests should be considered a test of C question design (and estimation) rather than a test of Sawtooth Software s commercial imlementation. 7. Results of the Field Test To evaluate the conjoint methods we calculated the Searman rank-order correlation between the actual and observed rankings for the five bags shown to each resondent. 3 We reort the results in Table 6 using the same benchmark methods that we used for the Monte Carlo simulations. Table 6 External Validity Tests: Correlation with ctual Choice (Larger numbers indicate better erformance) Methods without SE data fter 8 Questions Fixed Questions Polyhedral Questions fter 6 Questions Fixed Questions Polyhedral Questions nalytic Center (C) Hierarchical ayes (H) Samle size Methods that use SE data C Questions Polyhedral Questions C Questions Polyhedral Questions WHSE Estimation CSE Estimation HSE Estimation Samle size s an alternative metric, we comared how well the methods redicted which roduct the resondents favored. The two metrics rovide a similar attern of results and so, for ease of exosition, we focus on the correlation measure. There are additional reasons to focus on correlations. First-choice rediction is a dichotomous variable highly deendent uon the number of items in the choice set. In addition, it rovides less ower because it has higher variance than the Searman correlation, which is based on a rank order of five items. 80

81 In the simulation analysis we had the luxury of large samle sizes (500 resondents) and we were able to comletely control for resondent heterogeneity. lthough the samle sizes in Table 6 are large comared to revious tests of this tye, they are small comared to the simulation analysis. s a result, none of the differences across methods are significant at the 0.05 level in indeendent-samle t-tests. However, these indeendent-samle t-tests do not use all of the information available in the data. We also evaluate significance by an alternative method that ools the correlation measures calculated after each additional PC question. This results in a total of sixteen observations for each resondent. To control for heteroscedasticity we estimate a searate intercet for each question number. We also controlled for resondent heterogeneity in the randomly-assigned conditions with a null model that assumes that the ten lato bag features are equally imortant. If, desite the random assignment of resondents to conditions, the resonses in one condition are more consistent with the null model, then the comarisons would be biased in favor of this condition. We control for such otential heterogeneity by including a measure describing how accurately the equal-weights (null) model erforms on the redictive correlations. The comlete secification for this model is described in Equation, where r indexes the resondents and q indexes the number of PC questions used in the artworth estimates. The α s and β s are coefficients in the regression and ε rq is an error term. 6 () Correlationrq α qquestionq + βmmethod m + γ EqualWeightsr + ε rq q= M = m= The Question and Method terms refer to dummy variables identifying the question and method effects. The EqualWeight variable measures the correlation obtained for resondent r between the actual rankings and the rankings obtained from an equal weights model. Under this secification, the β coefficients reresent the exected increase or decrease in this correlation across questions due to Method m relative to an arbitrarily chosen base method. Positive (negative) values for the β coefficients indicate that the correlations between the actual and redicted rankings are higher (lower) for Method m than the base method. We further control for otential heteroscedasticity introduced by the anel nature of the data by calculating robust standard errors (White 980). We also estimated a random effects model, but there were almost no difference in the coefficients of interest. Moreover, the Haus- 8

82 man secification test favored the fixed-effects secification. The findings are summarized in Table 7. Table 7 External Validity Tests: Conclusions from the Multivariate nalysis Comarison of Estimation Methods Fixed Questions Polyhedral Questions C Questions Polyhedral Questions Comarison of Question design Methods C Estimation H Estimation WHSE Estimation CSE Estimation HSE Estimation Without SE Questions H > C C >> H Polyhedral >>> Fixed Polyhedral >>> Fixed With SE Questions WHSE > HSE > CSE HSE >>> WHSE > CSE Polyhedral > C Polyhedral >> C Polyhedral >>> C Method m > Method n: Method m is more accurate than Method n but the difference is not significant. Method m >> Method n: Method m is significantly more accurate than Method n (<0.05). Method m >>> Method n: Method m is significantly more accurate than Method n (<0.0). Comarison of Estimation Methods We comare the accuracy of the different estimation methods by comaring the findings in Table 6 within a column (for a secific set of questions) and looking to Table 7 for significance tests. This comarison holds the question design constant and varies the estimation method. For those exerimental cells that were designed to obtain estimates without the SE questions, Hierarchical ayes and nalytic Center estimation offer similar redictive accuracy for Fixed questions, but nalytic Center estimation erforms better for Polyhedral questions. If SE resonses are available, the referred estimation method aears to deend uon both the question design method and the number of PC resonses used in the estimation. For Polyhedral questions HSE erforms extremely well for low numbers of PC questions, erhas due to its use of oulation level data. However, increasing the number of PC resonses yields less imrovement in the accuracy of HSE relative to WHSE. fter sixteen questions all three 8

83 estimation methods converge to comarable accuracy levels, suggesting that there are sufficient data at the individual level to rovide estimates that need not deend on oulation distributions. When using PC questions designed by C, WHSE out-erforms HSE, albeit not significantly so. Comarison of Question Design Methods The findings in Table 6 also facilitate comarison of the question design methods. Comaring across columns (within rows) in Table 6 holds the estimation method constant and varies the question design. The findings favor the two conditions in which the Polyhedral question design was used. When the SE measures were not collected, the Polyhedral question design yielded significantly (<0.0) more accurate redictions than the Fixed design. This holds true irresective of the estimation method. When SE resonses were collected, the Polyhedral question design was more accurate than C across every estimation method, although the difference was not significant for WHSE. Detailed investigation reveals that for every estimation method we tested, the estimates derived using the Polyhedral questions outerform the corresonding estimates derived using C questions after each and every question number. The Incremental Predictive Value of the SE Questions and of PC Questions In this category, Table 6 suggests that hybrid methods that use both SE and PC questions consistently outerform methods that rely on PC questions alone. Thus, in this category, the SE questions rovide incremental redictive ability. We caution that the roduct category was chosen at least in art because the SE resonses were exected to be accurate. The simulations suggest that this imrovement in accuracy may not be true in all domains. We also evaluate whether the PC resonses contributed incremental accuracy. Predictions that use the SE resonses alone (without the PC resonses) yield an average correlation with actual choice of This is lower than the erformance of the best methods that use both SE and PC resonses and comarable at q = 6 to those methods that do not use SE resonses (see Table 6). We conclude (in this category) that the sixteen PC questions rovide roughly the same amount of information as the ten SE questions and that, for methods that use both, the PC data add incremental redictive ability. This conclusion is consistent with revious evidence in 83

84 the literature (Green, Goldberg, and Montemayor 98; Huber, et. al. 993, Johnson 999; Leigh, MacKay, and Summers 984). The Internal Validity Task We reeated the analysis of question design and estimation methods using the correlation measures from the internal validity (holdout questions) task. Details are in endix. The results for internal validity are similar to the results for external validity. However, there are two differences worth noting. First, while HSE redicted better than WHSE for Polyhedral question design in the choice task, there was no significant difference in the holdout task. Second, while Polyhedral question design was significantly better than Fixed design for the choice task, there were no significant differences for the holdout task. Question Order Effects: Polyhedral versus Polyhedral Polyhedral and Polyhedral varied in question order; the SE questions receded the PC questions in Polyhedral but not in Polyhedral. Otherwise, both methods used the same question design algorithm an algorithm that does not use SE data. Nonetheless, question order might influence the accuracy of the PC resonses. If the SE questions wear out or tire resondents, causing them to ay less attention to the PC questions, we might exect that inclusion of the SE questions will degrade the accuracy of the PC resonses. lternatively, the SE questions may imrove the accuracy of the PC questions by acting as a training or warm-u task which hels resondents clarify their values, increasing the accuracy of the PC questions (Green, Krieger and garwal 99; Huber, et. al. 993; Johnson 99). y comaring the two exerimental cells we investigate whether the rior SE questions affected the accuracy of the resondents PC resonses. The redictive accuracy of the two conditions are not statistically different (t = for C estimation, the referred estimation method from Tables and 7). This suggests that by the sixteenth question any wear out or warmu/learning had disaeared. However, there might still be an effect for the early questions. When we estimate erformance of C estimation using a version of Equation, the effect is not significant for external validity task (t = 0.68), but is significant for the internal validity task (t =.60). In summary, the evidence is mixed. There is no evidence that the SE questions imrove or degrade the accuracy of the PC questions for the choice task, but they might imrove accuracy for the hold out task. Further testing is warranted. For examle, if the first cut of the olyhedron 84

85 is critical, then it is imortant that the first PC question be answered as accurately as feasible. The use of SE questions might sensitize the resondent and enhance the accuracy of the first PC question. lternatively, researchers might investigate other warm-u questions, such as those in which the resondent configures an ideal roduct (Dahan and Hauser 00). Summary of the Field Test In the field test, Polyhedral question design aears to be the most accurate of the tested question design methods. When SE data are available, the most accurate estimation methods were the HSE and WHSE hybrids. If SE data were unavailable, the most accurate estimation method was C for Polyhedral questions and H for Fixed questions. To comare the field test to the simulations, we must identify the relevant domain. Fortunately, estimates of heterogeneity and PC resonse errors are a by-roduct of the Hierarchical ayes estimation and we can use H to estimate SE errors. These estimates suggest high levels of heterogeneity ( 9) and PC resonse errors ( 43), but moderately low SE resonse σ u errors ( 8). When SEs are unavailable, the simulations redict that in this domain: () σ c Polyhedral question design should be better than Fixed for both estimation methods, () C should be much better than H for Polyhedral questions, (3) C should remain better than H for Fixed questions, but the difference is not as large. The significant findings in Table 7 are consistent with () and (). Contrary to (3), H is better for Fixed questions, but not significantly so. For accurate SEs, the simulations redict that in this domain: (4) Polyhedral questions will remain strong for hybrid estimation methods, but the differences among question design methods will be less for hybrid methods than for urebred methods, (5) hybrid estimation methods will outerform the urebred methods, and (6) WHSE will outerform CSE (in the detailed simulation data for this domain HSE also outerforms CSE). Predictions (4), (5), and (6) hold true in the field test. It is always difficult to comare field data to simulations because, desite exerimental controls, there may be unobserved henomena in the field test that are not catured in the simulations. However, the two tye of data are remarkably consistent, albeit not erfectly so. σ c 85

86 8. Product Launch Subsequent to our research, Timbuk launched the lato bags with features similar to those tested including multile sizes, custom colors, logo otions, accessory holders (PD and cellular hone), mesh ockets, and lato sleeves. Timbuk considers the roduct a success it is selling well and is rofitable. We now comare the laboratory exeriment and the national launch. However, we do so with caution because the goal of the field test was to comare methods rather than to forecast the national launch. y design we used a student samle rather than a national samle, offered only two color combinations, and did not offer the large size bag. Furthermore, one tested feature, the boot, was not included in the national launch because roduction cost (and feasibility) exceeded the rice that could be justified. One feature, a bicycle stra, was added based on managerial judgment. There were five comarable features that aeared in both the field test and the national launch. With the above caveats in mind, the correlation of the redicted feature shares from the conjoint analyses with those observed in the marketlace was 0.9, which was significant. (y feature share we mean ercent of customers who chose each of the five features.) Predictions with various null models were not significant. Unfortunately, these data do not rovide sufficient ower to comare the relative accuracies of the methods nor reort correlations to more than one significant figure. 9. Conclusions and Future Research Recent develoments in math rogramming rovide new methods for designing metric aired-comarison questions and estimating artworths. The question design method uses a multidimensional olyhedron to characterize feasible arameters and focus questions to reduce the size of the olyhedron as fast as ossible. The estimation method uses the analytic center to aroximate the center of the olyhedron. This centrality estimate summarizes what is known about the feasible set of arameters. Our goals in this aer were to () roose ractical algorithms using olyhedral methods, () demonstrate their feasibility, (3) test their otential in a variety of domains, (4) comare their theoretical accuracy relative to existing methods, and (5) comare their redictive accuracy relative to existing methods in a realistic emirical situation. The field test was designed to match one of the theoretical domains that was favorable to existing 86

87 methods. The overall conclusion is that olyhedral methods are worth further develoment, exerimentation, and study. Detailed findings are summarized in Table 8. Table 8 Detailed Summary of Findings Feasibility The Polyhedral question-design method can design questions in real time for a realistic number of arameters. The nalytic Center estimation heuristic yields real time artworth estimates that rovide reasonable accuracy with little or no bias. Monte Carlo Exeriments Polyhedral question design shows the most romise for lower numbers of questions where it does well in all tested heterogeneity and resonse-error domains. Fixed question design remains best for larger numbers of questions, but Poly/Random does well for homogeneous oulations and high resonse errors. C estimation shows romise for heterogeneous oulations and low resonse errors; H erforms well when the oulation is homogeneous and resonse errors are high. When self-exlicated (SE) data are available and relatively noisy: o Polyhedral question design continues to erform well. o Purebred estimation methods may be referred. When SE data are available and relatively accurate: o Question design is relatively less imortant. o For homogeneous oulations, HSE is the best estimation method. o For heterogeneous oulations, WHSE is the best estimation method. External-Validity Exeriment The field test domain had high heterogeneity and PC resonse errors, and moderately low SE resonse errors. The findings are consistent with the Monte Carlo results in this domain. Polyhedral question design shows romise irresective of the availability of SE data. This is true for all tested estimation methods. When SE data are unavailable, nalytic Center estimation is a viable alternative to Hierarchical ayes estimation, esecially when aired with Polyhedral question design. When SE data are available, existing hybrid estimation methods (HSE and WHSE) aear to outerform the CSE hybrid. Product Launch The lato bags were launched to the market, but we must be cautious when evaluating redictive accuracy because there were many differences between the emirical exeriment and the market launch. In addition, there were insufficient data for relative comarisons. With these caveats, the conjoint analyses correlate well with the marketlace outcome. 87

88 We are encouraged by the erformance of olyhedral methods in the Monte Carlo and external-validity exeriments. We feel that with further research new algorithms based on olyhedral methods have the otential to become easier to use, more accurate, and alicable in a broader set of domains. We close by highlighting some of the oortunities. Theoretical Imrovements Error theory. Our roosed heuristic (OPT4) rovides a ractical means by which to use the nalytic Center to obtain artworth estimates from noisy data. lthough the maximum error, r δ, is likely to increase with the number of questions, the mean error, r δ /q, is likely to decrease. Furthermore, the nalytic Center estimates aear to be unbiased, excet when there is endogeneity in question design. However, we have not yet develoed a formal roof of either hyothesis. More generally, it might be ossible to derive the error-handling heuristics from more-fundamental distributional assumtions. lgorithmic imrovements for question design. There are a number of ways in which Polyhedral question design might be imroved. For examle, the roosed algorithm reverts to random selection when the olyhedron becomes emty (around q = 8 when = 0). We might exlore algorithms that use relaxed constraints (OPT4) after the initial olyhedron becomes emty. Multi-ste look-ahead algorithms might yield further imrovements. Imroved hybrid estimation. nalytic Center estimation aears to do quite well when self-exlicated (SE) data are not available or when SE data are relatively noisy. However, when the SE data are relatively accurate, existing methods are better. In articular, for some domains a method that directly exloits the metric roerties of the SEs seems to be best. We roosed CSE as a natural way to mix C estimation with SE constraints, but there might be other methods that make better use of the metric roerties of the SEs. Table 4 also suggests that SE metric roerties might be useful with Hierarchical ayes hybrids, such as the Ter Hofstede, Kim, and Wedel (00) algorithm. Comlexity constraints. Evgeniou, oussios, and Zacharia (00) demonstrate that Suort Vector Machines (SVM) can imrove estimation by automatically balancing comlexity of the artworth secification with fit. These researchers are now exloring a hybrid between nalytic-center and SVM estimation for stated-choice data an exciting develoment that can 88

89 deal with non-linearities in olyhedral secifications. We might also extend the algorithm in endix to incororate comlexity considerations directly either by using relaxed constraints (thick hyerlanes as in Figure b) or the direct use of OPT4 in the definition of the hyerlanes. Monte Carlo Exeriments ddressing the why. Our heuristics are designed to obtain better questions by reducing the feasible olyhedron as raidly as ossible. In an analogy to the theory of efficient questions, e.g., D-efficiency, this heuristic focuses the next question on that ortion of the arameter sace where we have the least information. Future Monte Carlo exeriments might test this hyothesis or identify an alternative exlanation. Such theory might lead to further imrovements in the algorithm. Learning and wear-out. The comarison of question order (Polyhedral vs. Polyhedral ) suggests that the SE questions might increase the accuracy of the aired-comarison questions by acting as warm-u questions. This is an examle of the more general issue that resonse errors might deend uon q. For examle, exloratory simulations, usingσ = κ + κ q, suggest that learning (ositive κ ) causes mean absolute error (ME) to decline more raidly as the number of questions, q, increases. Wear out (negative κ ) yields a U-shaed function. Other simulations might exlore further many behavioral issues affecting resonse accuracy (Tourangeau, Ris and Rasinski 000). Interactions. nalytic Center estimation is a by-roduct of Polyhedral question design. Tables, 6, and 7 suggest that there might be interactions among question design and estimation methods in terms of ME or redictive ability. lthough none of these interactions were significant in our data, interactions are worth further exloration. For examle, while C does well when the SE data are unavailable or relatively noisy, CSE does not do as well for relatively accurate SE data. Other domains. Our simulations exlore heterogeneity, resonse error, and the number of questions. We made a number of decisions such as the manner in which we secified heterogeneity (normally distributed with mean at the center of the range) and resonse error (normally distributed with no bias). Initial simulations suggested that the results were not sensitive to these assumtions, but more systematic exloration might identify interesting henomena that we were not able to exlore. Simulations for extremely large or q might also yield new insight. c 89

90 Other criteria. We chose ME as our evaluative criteria. Exloratory simulation suggested that root mean squared error (RMSE) aeared to be roortional to ME. We might also exlore other criteria such as redictive accuracy (analogous to Table 6), the dollar value of roduct features (Jedidi, Jagal and Manchanda 003; Ofek and Srinivasan 00), or the incremental exlanatory ower of each question and question tye. Emirical Tests in Other Categories Known alications. Polyhedral methods are beginning to diffuse. Sawtooth Software, Inc. now offers a olyhedral otion to its C software, Harris Interactive, Inc. has begun initial testing, and National Family Oinion, Inc. is exloring feasibility. Sawtooth Software has comleted an emirical test of internal validity using a Poly/C question design algorithm (Orme and King 00). In their data, on average, the C ortion chose 63% of the aired-comarison questions. They observed no significant differences between the methods after q = 30. We have not been able to obtain for their alication estimates of heterogeneity, PC resonse error, SE resonse error, or erformance for low q. Other roduct categories. We choose a category with searable features. In this category, the SE data were relatively accurate and, thus, the C benchmark was not handicaed. This domain matched one of the 3xx classes of categories in Table 4 and the emirical data were consistent with the Monte Carlo data. We hyothesize that the Monte Carlo exeriments are consistent with the eleven other classes of categories, but further emirical tests might exlore this hyothesis further. 90

91 References llenby, Greg M. and Peter E. Rossi (999), Marketing Models of Consumer Heterogeneity, Journal of Econometrics, 89, (March/ril), rora, Neeraj, Greg M. llenby and James L. Ginter (998), Hierarchical ayes Model of Primary and Secondary Demand, Marketing Science, 7,, and Joel Huber (00), Imroving Parameter Estimates and Model Prediction by ggregate Customization in Choice Exeriments, Journal of Consumer Research, 8, (Setember), ucklin, Randolh E. and V. Srinivasan (99), Determining Interbrand Substitutability Through Survey Measurement of Consumer Preference Structures, Journal of Marketing Research, 8, (February), Currim, Imran S. (98), Using Segmentation roaches for etter Prediction and Understanding from Consumer Mode Choice Models, Journal of Marketing Research,8, (ugust), Dahan, Ely and John R. Hauser (00), The Virtual Customer, Journal of Product Innovation Management, 9, 5, (Setember), Dawes, Robin M. and ernard Corrigan (974), Linear Models in Decision Making, Psychological ulletin, 8, (March), Einhorn, Hillel J. (97), Use of Nonlinear, Noncomensatory, Models as a Function of Task and mount of Information, Organizational ehavior and Human Performance, 6, - 7, Elrod, Terry, Jordan Louviere, and Krishnakumar S. Davey (99), n Emirical Comarison of Ratings-ased and Choice-based Conjoint Models, Journal of Marketing Research 9, 3, (ugust), Evgeniou, Theodoros, Constantinos oussios, and Giorgos Zacharia (00), Generalized Robust Conjoint Estimation, Working Paer, (Fontainebleau, France: INSED). Freund, Robert (993), Projective Transformations for Interior-Point lgorithms, and a Suerlinearly Convergent lgorithm for the W-Center Problem, Mathematical Programming, 58, , Robert, R. Roundy, and M.J. Todd (985), Identifying the Set of lways-ctive Constraints in a System of Linear Inequalities by a Single Linear Program, WP , MIT Sloan School of Management. 9

92 Green, Paul E., (984), Hybrid Models for Conjoint nalysis: n Exository Review, Journal of Marketing Research, , Kristiaan Helsen and ruce Shandler (988), Conjoint Internal Validity Under lternative Profile Presentations, Journal of Consumer Research, 5, (December), , Stehen M. Goldberg, and Mila Montemayor (98), Hybrid Utility Estimation Model for Conjoint nalysis, Journal of Marketing, , bba Krieger, and Manoj K. garwal (99), dative Conjoint nalysis: Some Caveats and Suggestions, Journal of Marketing Research, 3,, (May), , ----, and Jerry Wind (00), Thirty Years of Conjoint nalysis: Reflections and Prosects, Interfaces,, 3, Part of, (May-June), S56-S , and ---- (00), uyer Choice Simulators, Otimizers, and Dynamic Models, dvances in Marketing Research: Progress and Prosects, (Philadelhia, P: Wharton Press) and V. Srinivasan (978), Conjoint nalysis in Consumer Research: Issues and Outlook, Journal of Consumer Research, 5,, (Setember), and ---- (990), Conjoint nalysis in Marketing: New Develoments With Imlications for Research and Practice, Journal of Marketing, 54, 4, (October), and ---- (978), Conjoint nalysis in Consumer Research: Issues and Outlook, Journal of Consumer Research, 5,, (Setember), Griffin, bbie and John R. Hauser (993), "The Voice of the Customer," Marketing Science, vol., No., (Winter), -7. Gritzmann P. and V. Klee (993), Comutational Comlexity of Inner and Outer J-Radii of Polytoes in Finite-Dimensional Normed Saces, Mathematical Programming, 59(), Hadley, G. (96), Linear lgebra, (Reading, M: ddison-wesley Publishing Comany, Inc.) Hauser, John R., and Steven P. Gaskin (984), "lication of the `Defender' Consumer Model," Marketing Science, Vol. 3, No. 4, (Fall), and Vithala Rao (003), Conjoint nalysis, Related Modeling, and lications, dvances in Marketing Research: Progress and Prosects, Jerry Wind, Ed., forthcoming and Steven M. Shugan (980), "Intensity Measures of Consumer Preference," Oeration Research, Vol. 8, No., (March-ril), Hoaglin, David C., Frederick Mosteller, and John W. Tukey (983), Understanding Robust and Exloratory Data nalysis, (New York, NY: John Wiley & Sons, Inc.). 9

93 Huber, Joel (975), Predicting Preferences on Exerimental bundles of ttributes: Comarison of Models, Journal of Marketing Research,, (ugust), , Dick R. Wittink, John. Fiedler, and Richard Miller (993), The Effectiveness of lternative Pre-ference Elicitation Procedures in Predicting Choice, Journal of Marketing Research, and Klaus Zwerina (996), The Imortance of Utility alance in Efficient Choice Designs, Journal of Marketing Research, 33, (ugust), Jedidi, Kamel; Puneet Manchanda, and Sharan Jagal (003), Measuring Heterogeneous Reservation Prices For Product undles, Marketing Science,,, (Winter), forthcoming. Johnson, Eric J., Robert J. Meyer, and Sanjoy Ghose (989), When Choice Models Fail: Comensatory Models in Negatively Correlated Environments, Journal of Marketing Research, Johnson, Richard (987), ccuracy of Utility Estimation in C, Working Paer, Sawtooth Software, Sequim, W, (ril) (99), Comment on `dative Conjoint nalysis: Some Caveats and Suggestions, Journal of Marketing Research, 8, (May), (999), The Joys and Sorrows of Imlementing H Methods for Conjoint nalysis, Working Paer, Sawtooth Software, Sequim, W, (November). Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkeohl, and T. C. Lee (985), The Theory and Practice of Econometrics, (New York, NY: John Wiley and Sons). Karmarkar, N. (984), New Polynomial Time lgorithm for Linear Programming, Combinatorica, 4, Kuhfeld, Warren F., Randall D. Tobias, and Mark Garratt (994), Efficient Exerimental Design with Marketing Research lications, Journal of Marketing Research, 3, 4, (November), Leigh, Thomas W., David. MacKay, and John O. Summers (984), Reliability and Validity of Conjoint nalysis and Self-Exlicated Weights: Comarison, Journal of Marketing Research,, 4, (November), Lenk, Peter J., Wayne S. DeSarbo, Paul E. Green, and Martin R. Young (996), Hierarchical ayes Conjoint nalysis: Recovery of Partworth Heterogeneity from Reduced Exerimental Designs, Marketing Science, 5,, Liechty, John, Venkatram Ramaswamy, Steven Cohen (00), Choice-Menus for Mass Customization: n Exerimental roach for nalyzing Customer Demand With an - 93

94 lication to a Web-based Information Service, Journal of Marketing Research, 38,, (May). Louviere, Jordan J., David. Hensher, and Joffre D. Swait (000), Stated Choice Methods: nalysis and lication, (New York, NY: Cambridge University Press). McFadden, Daniel (000), Disaggregate ehavioral Travel Demand s RUM Side: Thirty- Year Retrosective, Working Paer, University of California, erkeley, (July). Moore, William L. (003), Cross-Validity Comarison of Conjoint nalysis and Choice Models, Working Paer, (Salt Lake City, Utah: University of Utah), February and Richard J. Semenik (988), Measuring Preferences with Hybrid Conjoint nalysis: The Imact of a Different Number of ttributes in the Master Design, Journal of usiness Research, Nesterov, Y. and. Nemirovskii (994), Interior-Point Polynomial lgorithms in Convex Programming, SIM, Philadelhia. Ofek, Elie and V. Srinivasan (00), How Much Does the Market Value an Imrovement in a Product ttribute? Marketing Science,, 4, (Fall), Orme, ryan (999), C, CC, of oth?: Effective Strategies for Conjoint Research, Working Paer, Sawtooth Software, Sequim, W and W. Christoher King (00), Imroving C lgorithms: Challenging a Twenty-year-old roach, Presentation at dvanced Research Technology Conference, June 3. Page, lbert L. and Harold F. Rosenbaum (989), Redesigning Product Lines with Conjoint nalysis: How Sunbeam Does It, Journal of Product Innovation Management, 4, Reibstein, David, John E. G. ateson, and William oulding (988), Conjoint nalysis Reliability: Emirical Findings, Marketing Science, 7, 3, (Summer), Robinson, P. J. (980), lications of Conjoint nalysis to Pricing Problems, in Market Measurement and nalysis: Proceedings of the 979 ORS/TIMS Conference on Marketing, David. Montgomery and Dick Wittink, eds., (Cambridge, M: Marketing Science Institute), Sandor, Zsolt and Michel Wedel (00), Designing Conjoint Choice Exeriments Using Managers Prior eliefs, Journal of Marketing Research, 38, 4, (November), and ---- (00), Profile Construction in Exerimental Choice Designs for Mixed Logit Models, Marketing Science,, 4, (Fall), Sawtooth Software, Inc. (996), C System: dative Conjoint nalysis, C Manual, 94

95 (Sequim, W: Sawtooth Software, Inc.) ---- (00), The C/Hierarchical ayes Technical Paer, (Sequim, W: Sawtooth Software, Inc.) ---- (00), C 5.0 Technical Paer, (Sequim, W: Sawtooth Software, Inc.). Sonnevend, G. (985a), n nalytic Center for Polyhedrons and New Classes of Global lgorithms for Linear (Smooth, Convex) Programming, Proceedings of the th IFIP Conference on System Modeling and Otimization, udaest (985b), New Method for Solving a Set of Linear (Convex) Inequalities and its lications for Identification and Otimization, Prerint, Deartment of Numerical nalysis, Institute of Mathematics, Eőtvős University, udaest, 985. Srinivasan, V. (988), Conjunctive-Comensatory roach to The Self-Exlication of Multiattributed Preferences, Decision Sciences, and Chan Su Park (997), Surrising Robustness of the Self-Exlicated roach to Customer Preference Structure Measurement, Journal of Marketing Research, 34, (May), Srinivasan, V. and llan D. Shocker (973), Linear Programming Techniques for Multidimensional nalysis of Preferences, Psychometrika, 38, 3, (Setember), Ter Hofstede, Frenkel, Youngchan Kim, and Michel Wedel (00), ayesian Prediction in Hybrid Conjoint nalysis, Journal of Marketing Research, 39, (May), Tourangeau, Roger, Lance J. Ris, and Kenneth Rasinski (000), The Psychology of Survey Resonse, (New York, NY: Cambridge University Press), Toubia, Olivier, Duncan Simester, John R. Hauser (003), Polyhedral Methods for dative Choice-based Conjoint nalysis, working aer, Cambridge, M: Center for Innovation in Product Develoment, MIT, (February). Tukey, John W. (960), Survey of Samling from Contaminated Distributions, I. Olkin, S. Ghurye, W. Hoeffding, W. Madow, and H. Mann (Eds.) Contributions to Probability and Statistics, (Stanford, C: Stanford University Press), Urban, Glen L. and Gerald M. Katz, Pre-Test Market Models: Validation and Managerial Imlications, Journal of Marketing Research, Vol. 0 (ugust 983), -34. Vaidja, P. (989), Locally Well-ehaved Potential Function and a Simle Newton-Tye Method for Finding the Center of a Polytoe, in: N. Megiddo, ed., Progress in Mathematical Programming: Interior Points and Related Methods, Sringer: New York,

96 White, H (980), Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity, Econometrica, 48, Wittink, Dick R. and Philie Cattin (98), lternative Estimation Methods for Conjoint nalysis: Monte Carlo Study, Journal of Marketing Research, 8, (February), and David. Montgomery (979), Predictive Validity of Trade-off nalysis for lternative Segmentation Schemes, 979 M Educators Conference Proceedings, Neil eckwith ed., Chicago, IL: merican Marketing ssociation. Wright, Peter and Mary nn Kriewall (980), State-of-Mind Effects on ccuracy with which Utility Functions Predict Marketlace Utility, Journal of Marketing Research, 7, (ugust), Yang, Sha; Greg M. llenby, and Geraldine Fennell (00), Modeling Variation in rand Preference: The Roles of Objective Environment and Motivating Conditions, Marketing Science,,, (Winter),

97 endix : Mathematics of Fast Polyhedral dative Conjoint Estimation Consider the case of arameters and q questions where q. Let u j be the j th arameter of the resondent s artworth function and let u r be the vector of arameters. Without loss of generality we assume binary features such that u j is the high level of the j th feature and constrain their values between 0 and 00. For more levels we simly recode the u vector and im- r ose constraints such as um u h. We handle such inequality constraints by adding slack variables, v hm 0, such that u h = u m + v hm. Let r be the number of externally imosed constraints, of which r r are inequality constraints. r Let be the vector describing the left-hand rofile in the i th aired-comarison z i l question and let z r ir be the vector describing the right-hand rofile. The elements of these r r r vectors are binary indicators taking on the values 0 or. Let X be the q matrix of x = z z for i = to q. Let a i be the resondent s answer to the i th question and let a r be the q vector of r r answers for i = to q. Then, if there were no errors, the resondent s answers imly Xu = a. To handle additional constraints, we augment these equations such that X becomes a (q+r)x(+r ) matrix, a r becomes a (q+r)x vector, and u r becomes a (+r )x vector. These augmented relationshis form a olyhedron, P = {u R +r r r r r Xu = a, u 0}. We begin by assuming that P is non-emty, that X is full-rank, and that no j exists such that u j =0 for all u r in P. We later indicate how to handle these cases. Finding an Interior Point of the Polyhedron To begin the algorithm we first find a feasible interior oint of P by solving a linear rogram, LP (Freund, Roundy and Todd 985). Let e r r be a (+r ) vector of s and let 0 be a (+r ) vector of 0 s; the yj s and θ are arameters of LP and y r is the (+r ) vector of the y j s. (When clear in context, inequalities alied to vectors aly for each element.) LP is given by: i il ir + r' (LP) max y, subject to: j= j Xu r θ a r r r r r r =, θ, u y 0, y e 97

98 * r If ( u, y r * *- * *, θ ) solves LP, then θ u r r r * is an interior oint of P whenever y > 0. If there are some y j s equal to 0, then there are some j s for which u j =0 for all u r P. If LP is infeasible, then P is emty. We address these cases later in this aendix. Finding the nalytic Center The analytic center is the oint in P that maximizes the geometric mean of the distances from the oint to the faces of P. We find the analytic center by solving OPT. + r' (OPT) max ln( u ), subject to: j = j r r r r Xu = a, u > 0 Freund (993) roves with rojective methods that a form of Newton s method will converge raidly for OPT. To imlement Newton s method we begin with the feasible oint from LP and imrove it with a scalar, α, and a direction, d r r, such that u d r + α is close to the otimal solution of OPT. ( d r is a (+r ) vector of d j s.) We then iterate subject to a stoing rule. We first aroximate the objective function with a quadratic exansion in the neighborhood of u r. + r' + r' + r' d j d j j = j = j = u j u j () ln( u j + d j ) ln ( u j ) + If we define U as a (+r ) (+r ) diagonal matrix of the u j s, then the otimal direction solves OPT: (OPT) r r r r T T max e U d ( ) d U d, subject to: X d r = 0 r Newton s method solves OPT quickly by exloiting an analytic solution to OPT. To see this, consider first the Karush-Kuhn-Tucker (KKT) conditions for OPT. If z r is a (+r ) vector arameter of the KKT conditions that is unconstrained in sign then the KKT conditions are written as: r () U r T r d U e = X z (3) X d r = 0 r Multilying on the left by XU r r, gives Xd XUe = XU z. lying 3 to this equation gives: X r T r XUe = XU X z. Since Ue r r r r r = u and since Xu = a, we have a = XU X T z r. ecause T r 98

99 X is full rank and U is ositive, we invert XU X T to obtain z r = -(XU X T ) - a r. Now relace z r in by this exression and multily by U to obtain d r = u r U X T (XU X T ) - a r. r ccording to Newton s method, the new estimate of the analytic center, u, is given by r r r r r u u d U ( e U = + α = + α d ). There are two cases for α. If U d r < 4, then we use α = because u r r r r is already close to otimal and e + αu d > 0. Otherwise, we comute α with a line search. Secial Cases If X is not full rank, XU X T might not invert. We can either select questions such that X is full rank or we can make it so by removing redundant rows. Suose that x r k is a row of X such rt q that = + r rt q x k β i= i k ix, i. Then if + r a k = β, we remove i = i k ia, i x r k. If a emty and we emloy OPT4 described later in this aendix. k q+ r i =, i k β a i i, then P is If in LP we detect cases where some y j s = 0, then there are some j s for which u j =0 for r all u P. In the later case, we can still find the analytic center of the remaining olyhedron by removing those j s and setting uj = 0 for those indices. If P is emty we emloy OPT4. Finding the Ellisoid and its Longest xis If u r is the analytic center and U is the corresonding diagonal matrix, then Sonnevend r (985a, 985b) demonstrates that E P E (u u r +r where, E = { u r Xu r r = a, ) T r r U ( u u) } and E +r is constructed roortional to E by relacing with (+r ). ecause we are interested only in the direction of the longest axis of the ellisoids we can work with the simler of the r r r roortional ellisoids, E. Let g = u u, then the longest axis will be a solution to OPT3. (OPT3) r max g T r g subject to: g rt U g r, r X g = 0 r OPT3 has an easy-to-comute solution based on the eigenstructure of a matrix. To see this we begin with the KKT conditions (where φ and γ are arameters of the conditions). r r T r (4) g = φu g + X γ r (5) ( g T U r φ g ) = 0 r (6) g T U r r g, 0 r X g =, φ 0 99

100 r It is clear that g T U r g = at otimal, else we could multily g r by a scalar greater than and still have g r feasible. It is likewise clear that φ is strictly ositive, else we obtain a contradiction by left-multilying 4 by g r r T and using 0 r r X g = to obtain g T r g = 0 which contradicts r g T U r r r T r r g =. Thus, the solution to OPT3 must satisfy g = φu g + X γ, g T U r r g =, 0 r Xg =, and φ r r = / φ. > 0. We rewrite 4-6 by letting I be the identify matrix and defining η=/φ and ω γ (Α7) r ( U ηi ) g = X r ω T (Α8) r g T U r g = (9) r 0 r X g =, φ > 0 We left-multily 7 by X and use 9 to obtain r T X U g = XX r ω. Since X is full rank, XX T r is invertible and we obtain ω = ( XX T ) ( U X T ( XX T ) XU XU r g which we substitute into 7 to obtain r r ) g = η g. Thus, the solution to OPT3 must be an eigenvector of the matrix, ( T T M U X ( XX ) XU ). To find out which eigenvector, we left-multily 7 by r T g and use 8 and 9 to obtain T g r r r η g =, or g T r g =/ η where η>0. Thus, to solve OPT3 we maximize /η by selecting the smallest ositive eigenvalue of M. The direction of the longest axis is then given by the associated eigenvector of M. We then choose the next question such that x r q+ is most nearly collinear to this eigenvector subject any constraints imosed by the questionnaire design. (For examle, in our simulation we require that the elements of x r q+ be, 0, or.) The answer to x r q+ defines a hyerlane orthogonal to x r q+. We need only establish that the eigenvalues of M are real. To do this we recognize that M = PU where P = (I X T (XX T ) - X) is symmetric, i.e., P=P T. Then if η is an eigenvalue of M, det( U P ηi ) =0, which imlies that det[ U ( U PU η I ) U ] = 0. This imlies that η is an eigenvalue of U PU, which is symmetric. Thus, η is real (Hadley 96, 40). djusting the Polyhedron so that it is non-emty P will remain non-emty as long as resondents answers are consistent. However, in any real situation there is likely to be q < such that P is emty. To continue the olyhedral algorithm, we adjust P so that it is non-emty. We do this by relacing the equality constraint, 00

101 r r r r r r Xu = a, with two inequality constraints, Xu a + δ and X u a r r δ, where r δ is a q vector of errors, δ i, defined only for the question-answer imosed constraints. We solve the following otimization roblem. Our current imlementation uses the -norm where we minimize the maximum δ i, but other norms are ossible. The advantage of using the -norm is that (OPT 4) is solvable as a linear rogram. r r r (OPT4) min δ subject to: r r r X u a + δ, r r X u a δ, 0 r u, t some oint such that q >, extant algorithms will outerform OPT4 and we can switch to those algorithms. lternatively, a researcher might choose to switch to constrained regression (norm-) or mean-absolute error (norm-) when q >. Other otions include relacing some, but not all, of the equality constraints with inequality constraints. We leave these extensions to future research. 0

102 endix : Internal Validity Tests for Lato Comuter ags Methods without SE data Table.. Correlation with ctual Resonse fter 8 Questions Fixed Questions Polyhedral Questions fter 6 Questions Fixed Questions Polyhedral Questions nalytic Center (C) Hierarchical ayes (H) Samle size Methods that use SE data C Questions Polyhedral Questions C Questions Polyhedral Questions WHSE Estimation CSE Estimation HSE Estimation Samle size The missing observations reflect resondents who gave the same resonse for all four holdout questions (in which case the correlations were undefined). Table.. Conclusions from the Multivariate nalysis Without SE Questions Comarison of Estimation Methods Fixed Questions H > C Polyhedral Questions C >>> H C Questions Polyhedral Questions Comarison of Question design Methods C Estimation Polyhedral > Fixed H Estimation Fixed > Polyhedral WHSE Estimation CSE Estimation HSE Estimation With SE Questions WHSE > HSE >>> CSE WHSE > HSE > CSE Polyhedral >>> C Polyhedral >>> C Polyhedral >>> C 0

103 Chater 3: Polyhedral Methods for dative Choice-ased Conjoint nalysis bstract Choice-based conjoint analysis (CC) is used widely in marketing for roduct design, segmentation, and marketing strategy. We roose and test a new olyhedral question-design method that adats each resondent s choice sets based on revious answers by that resondent. Individual adatation aears romising because, as demonstrated in the aggregate customization literature, question design can be imroved based on rior estimates of the resondent s artworths information that is revealed by resondents answers to rior questions. The algorithm designs questions that quickly reduce the set of artworths that are consistent with the resondent s choices. Recent olyhedral interior-oint algorithms rovide the raid solutions necessary for real-time comutation. To identify domains where individual adatation is romising (and domains where it is not), we evaluate the erformance of olyhedral CC methods with Monte Carlo exeriments. We vary magnitude (resonse accuracy), resondent heterogeneity, estimation method, and question-design method in a 4x 3 exeriment. The estimation methods are Hierarchical-ayes estimation (H) and nalytic-center estimation (C). The latter is a new individual-level estimation rocedure that is a by-roduct of olyhedral question design. The benchmarks for individual adatation are random designs, orthogonal designs, and aggregate customization. The simulations suggest that olyhedral question design does well in many domains, articularly those in which heterogeneity and artworth magnitudes are relatively large. We close by describing an emirical alication to the design of executive education rograms in which 354 web-based resondents answered stated-choice tasks with four service rofiles each. The findings confirm the feasibility of imlementing olyhedral CC methods with actual resondents, test an imortant design criterion (choice-balance), and rovide emirical data on convergence. 03

104 . Introduction Choice-based conjoint analysis (CC) describes a class of techniques that are amongst the most widely adoted market research methods. In CC tasks resondents are resented with two or more roduct rofiles and asked to choose the rofile that they refer (see Figure for an examle). This contrasts with other conjoint tasks that ask resondents to rovide reference ratings for roduct attributes or rofiles. ecause choosing a referred roduct rofile is often a natural task for resondents consistent with marketlace choice, suorters of CC have argued that it yields more accurate resonses. CC methods have been shown to erform well when comared against estimates of marketlace demand (Louviere, Hensher, and Swait 000). Figure Examle of a CC Task for the Redesign of Polaroid s I-Zone Camera Imortant academic research investigating CC methods has sought to imrove the design of the roduct rofiles shown to each resondent. This has led to efficiency imrovements, 04

105 yielding more information from fewer resonses. ecause an increasing amount of market research is conducted on the Internet, new oortunities for efficiency imrovements have arisen. Online rocessing ower makes it feasible to adat questions based on rior resonses. To date the research on adative question design has focused on adating questions based on resonses from rior resondents ( aggregate customization ). Efficient designs are customized based on arameters obtained from retests or from managerial judgment. Examles of aggregate customization methods include Huber and Zwerina (996), rora and Huber (00) and Sandor and Wedel (00). In this study we roose a CC question design method that adats questions using the revious answers from that resondent ( individual adatation ). The design of each choice task varies according to that resondent s selection from rior choice tasks. The aroach is motivated in art by the success of aggregate customization, which uses the resonses from other resondents to design more efficient questions. The algorithm that we roose focuses on what is not known about artworths (given the resondent s answers to rior questions) and seeks to reduce quickly the set of artworths that are consistent with the resondent s choices. To achieve this goal we focus on four design criteria: non-dominance, feasibility, choice balance, and symmetry. In later discussion we also describe an analogy between the roosed algorithm and D- efficiency. fter data are collected with these adative questions, artworths can be estimated with standard methods (aggregate random utility or Hierarchical ayes methods). s an alternative, we roose and test an individual-level estimation method that relies on the analytic center of a feasible set of arameters. Our roosal differs in both format and hilosohy from the other individual-level adative conjoint analysis (C) methods. We focus on stated-choice data rather than C s metric aired-comarisons and we focus on analogies to efficient design rather than C s utility balance subject to orthogonality goals. Polyhedral methods are also feasible for metric-airedcomarison data (Toubia, Simester, Hauser and Dahan 003). However, as will become aarent, there are imortant differences between the metric-aired-comarison algorithm and the algorithm roosed in this aer. The remainder of the aer is organized as follows. We begin by reviewing existing CC question design and estimation methods. We next roose a olyhedral aroach to the 05

106 design of CC questions. We then evaluate the roosed olyhedral methods using a series of Monte Carlo simulations, where we hoe to demonstrate the domains in which the roosed method shows romise (and where existing methods remain best). We comare erformance against three question design benchmarks, including an aggregate customization method that uses rior data from either managerial judgment or retest resondents. ecause we exect that individual-adatation is most romising when resonses are accurate and/or resondents are heterogeneous, we comare the four question design methods across a range of customer heterogeneity and resonse error domains, while also varying the estimation method. We then describe an emirical alication of the roosed method to the design of executive education rograms at a major university. The aer concludes with a review of the findings, limitations, and oortunities for future research.. Existing CC Question Design and Estimation Methods To date, most alications of CC assume that each resondent answers the same set of questions or that the questions are either blocked across sets of resondents or chosen randomly. For these conditions, McFadden (974) shows that the inverse of the covariance matrix, Σ, of the MLE estimates is roortional to: q J i i i () Σ = R ( zij zik Pik )' Pij ( zij r J r i= j= k= k= where R is the effective number of relicates; J i is the number of rofiles in choice set i; q is the number of choice sets; z r ij is a binary vector describing the j th rofile in the i th choice set; and P ij is the robability that the resondent chooses rofile j from the i th choice set. Without loss of generality, we use binary vectors in the theoretical develoment to simlify notation and exosition. Multi-level features are used in both the simulations and the alication. We can increase recision by decreasing a measure (norm) of the covariance matrix, that is, by either increasing the number of relicates or increasing the terms in the summations of Equation. Equation also demonstrates that the covariance of logit-based estimates deends on the choice robabilities, which, in turn, deend uon the artworths. In general, the exerimental design that rovides the most recise estimates will deend uon the arameters. Many researchers have addressed choice set design. One common measure is D- efficiency, which seeks to reduce the geometric mean of the eigenvalues of Σ (Kuhfield, Tobias, r J r z ik P ik ) 06

107 and Garratt 994). If u r reresents the vector of the artworths, then the confidence region for maximum likelihood estimates ( û r r r r r ) is an ellisoid defined by ( u u)' Σ ( u u ), Greene (993,. 90). The length of the axes of this ellisoid are given by the eigenvalues of the covariance matrix, so that minimizing the geometric mean of these eigenvalues shrinks the confidence region around the estimates. ecause efficiency deends on the artworths, it is common to assume a riori that the stated choices are equally likely. For this aer we will label such designs as orthogonal efficient designs. rora and Huber (00), Huber and Zwerina (996), and Sandor and Wedel (00) demonstrate that we may imrove efficiency by using data from either retests or rior managerial judgment. These researchers imrove D-efficiency by relabeling, which ermutes the levels of features across choice sets, swaing, which switches two feature levels among rofiles in a choice set, and cycling, which is a combination of rotating levels of a feature and swaing them. The rocedures sto when no further imrovement is ossible. Simulations suggest that these rocedures imrove efficiency and, hence, reduce the number of resondents that are necessary. Following the literature, we label these designs as aggregate customization. Estimation In classical logit analysis artworths are estimated with maximum likelihood techniques. ecause it is rare that a resondent will be asked to make enough choices to estimate artworth values for each resondent, the data usually are merged across resondents to estimate oulation-level (or segment-level) artworth values. Yet managers often want estimates for each resondent. Hierarchical ayes (H) methods rovide (osterior) estimates of artworths for individual resondents by using oulation-level distributions of artworths to inform individuallevel estimates (llenby and Rossi 999; rora, llenby and Ginter 998; Johnson 999; Lenk, et. al. 996). In articular, H methods use data from the full samle to iteratively estimate both the osterior means (and distribution) of individual-level artworths and the osterior distribution of those artworths at the oulation-level. The H method is based on Gibbs samling and the Metroolis Hastings lgorithm. Liechty, Ramaswamy, and Cohen (00) demonstrate the effectiveness of H for choice menus, while rora and Huber (00) show that it is ossible to imrove the efficiency of H estimates with choice-sets designed using Huber-Zwerina relabeling and swaing. In other re- 07

108 cent research, ndrews, inslie, and Currim (00,. 479) resent evidence that H models and finite mixture models estimated from simulated scanner-anel data recover household-level arameter estimates and redict holdout choice about equally well excet when the number of urchases er household is small. 3. Polyhedral Question Design Methods We extend the hilosohy of customization by develoing algorithms to adat questions for each resondent. Stated choices by each resondent rovide information about arameter values for that resondent that can be used to select the next question(s). In high dimensions (high ) this is a difficult dynamic otimization roblem. We address this roblem by making use of extremely fast algorithms based on rojections within the interior of olyhedra (much of this work started with Karmarkar 984). In articular, we draw on the roerties of bounding ellisoids discovered in theorems by Sonnevend (985a, 985b) and alied by Freund (993), Nesterov and Nemirovskii (994), and Vaidja (989). We begin by illustrating the intuitive ideas in a two-dimensional sace with two roduct rofiles (J i = ) and then generalize to a larger number of dimensions and multichotomous choice (the simulations and the alication are based on multichotomous choice.) The axes of the sace reresent the artworths (utilities) associated with two different roduct attributes, u and u. oint in this sace has a value on each axis, and will be reresented by a vector of these two artworths. The ultimate goal is to estimate the oint in this sace (or distribution of oints) that best reresents each resondent. The question-design goal is to focus recision toward the oints that best reresent each resondent. This goal is not unlike D-efficiency, which seeks to minimize the confidence region for estimated artworths. Without loss of generality we scale all artworths in the figures to be non-negative and bounded from above. Following convention, the artworth associated with the least-referred level is set arbitrarily to zero. 3 Suose that we have already asked i- stated-choice questions and suose that the hexagon (olyhedron) in Figure reresents the artworth vectors that are consistent with the resondent s answers. Suose further that the i th question asks the resondent to choose between two rofiles with feature levels z r i and z r i. If there were no resonse errors, then the resondent would select Profile whenever (z i - z i ) u + (z i z i ) u 0; where z ijf refers to the f th feature of z ij and u f denotes the artworth associated with the f th feature. This inequality con- 08

109 straint defines a searating line or, in higher dimensions, a searating hyerlane. In the absence of resonse errors, if the resondent s true artworth vector is above the searating hyerlane the resondent chooses Profile ; if it is below the resondent chooses Profile. Thus, the resondent s choice of rofiles udates our knowledge of which artworth vectors are consistent with the resondent s references, shrinking the feasible olyhedron. Figure Stated Choice Resonses Divide the Feasible Region Selecting Questions Questions are more informative if they reduce the feasible region more raidly. To imlement this goal we adot four criteria. First, neither rofile in the choice set should dominate the other rofile. Otherwise, we gain no information when the dominating rofile is chosen and the artworth sace is not reduced. Second, the searating hyerlane should intersect with and divide the feasible region derived from the first i- questions. Otherwise, there could be an answer to the i th question that does not reduce the feasible region. corollary of these criteria is that, for each rofile, there must be a oint in the feasible region for which that rofile is the referred rofile. 09

110 The i th question is more informative if, given the first i- questions, the resondent is equally likely to select each of the J i rofiles. This imlies that, a riori, the answer to the i th question should be aroximately equally likely a criterion we call choice balance. Choice balance will shrink the feasible region as raidly as feasible. For examle, if the oints in the feasible region are equally likely (based on i- questions) then the redicted likelihood, π ij, of choosing the j th region is roortional to the size of the region. The exected size of the region after the i th question is then roortional to Σ π, which is minimized when π =. 4 rora j= ij ij and Huber (00,. 75) offer a further motivation for choice balance based on D-efficiency. For two roduct rofiles, the inverse covariance matrix, of π ij ( π ij ), which is also maximized for π ij =. Σ, is roortional to a weighted sum The choice-balance criterion will hold aroximately if we favor searating hyerlanes that go through the center of the feasible olyhedron and cut the feasible region aroximately in half. This is illustrated in Figure 3a, where we favor bifurcation cuts relative to sliver cuts that yield unequal sized regions. If the searating hyerlane is a bifurcation cut, both the nondomination and feasibility criteria are satisfied automatically. Figure 3 Comaring Cuts and the Resulting Feasible regions (a) bifurcation cuts (b) short- vs. long-axis cuts However, not all bifurcation cuts are equally robust. Suose that the current feasible region is elongated, as in Figure 3b, and we must decide between many searating hyerlanes, 0

111 two of which are illustrated. One cuts along the long axis and yields long thin feasible regions while the other cuts along the short axis and yields feasible regions that are more symmetric. The long-axis cut focuses recision where we already have high recision, while the short-axis cut focuses recision where we now have less recision. For this reason, we refer short-axis cuts to make the ost-choice feasible regions reasonably symmetric. We can also motivate this criterion relative to D-efficiency. D-efficiency minimizes the geometric mean of the axes of the confidence ellisoid a criterion that tends to make the confidence ellisoids more symmetric. For two rofiles we favor non-dominance, feasibility, choice balance, and ost-choice symmetry if we select rofiles such that (a) the searating hyerlanes go through the center of the feasible region and (b) the searating hyerlanes are erendicular to the longest axis of the feasible olyhedron as defined by the first i- stated choices. To imlement these criteria we roose the following heuristic algorithm. Ste Find the center and the longest axis of the olyhedron based on i- questions. Ste Find the two intersections between the longest axis and the boundary of the olyhedron. Each intersection oint is defined by a artworth vector and the difference between these vectors defines a hyerlane that is erendicular to the longest axis. ecause resondents choose from rofiles rather than artworth vectors, we need a third ste to identify rofiles that yield this searating hyerlane: Ste 3 Select a rofile corresonding to each intersection oint such that the searating hyerlane divides the region into two aroximately equal sub-regions. The basic intuition remains the same when we extend this heuristic algorithm to more than two rofiles (J i > ), although the geometry becomes more difficult to visualize (some hyerlanes become oblique, but the regions remain equi-robable). We first address this generalization and then describe several imlementation issues, including how we use utility maximization to select the rofiles in Ste 3.

112 Selecting More than Two Profiles In a choice task with more than two rofiles the resondent s choice defines more than one searating hyerlane. The hyerlanes that define the i th feasible region deend uon the rofile chosen by the resondent. For examle, consider a choice task with four roduct rofiles, labeled,, 3, and 4. If the resondent selects Profile, then we learn that the resondent refers Profile to Profile, Profile to Profile 3, and Profile to Profile 4. This defines three searating hyerlanes the resulting olyhedron of feasible artworths is the intersection of the associated regions and the rior feasible olyhedron. In general, J i rofiles yield J i (J i )/ ossible hyerlanes. For each of the J i choices available to the resondent, J i - hyerlanes contribute to the definition of the new olyhedron. The full set of hyerlanes, and their association with stated choices, define a set of J i convex regions, one associated with each answer to the stated-choice question. We extend Stes to 3 as follows. Rather than finding the longest axis, we find the (J i /) longest axes and identify the J i oints where the (J i /) longest axes intersect the olyhedron. If J i is odd, we select randomly among the vectors intersecting the (J i /) th longest axis. We associate rofiles with each of the J i artworth vectors by solving the resondent s maximization roblem for each vector (as described next). Our solution to the maximization roblem assures the hyerlanes go (aroximately) through the center of the olyhedron. The aroximation arises because we design the rofiles from a discrete attribute sace. They would ass exactly through the analytic center if the attribute sace were continuous. It is easy to show that such hyerlanes divide the feasible region into J i collectively exhaustive and mutually exclusive convex sub-regions of aroximately equal size (excet for the regions indifference borders, which have zero measure). Non-dominance and feasibility remain satisfied and the resulting regions tend toward symmetry. ecause the searating hyerlanes are defined by the rofiles associated with the artworth vectors (Ste 3) not the artworth vectors themselves (Ste ), some of the hyerlanes do not line u with the axes. For J i > the stated roerties remain aroximately satisfied based on wedges formed by the J i - hyerlanes. Later in the aer we examine the effectiveness of the roosed heuristic for J i = 4. Simulations examine overall accuracy and an emirical test examines whether feasibility and choice balance are achieved for real resondents.

113 Imlementation Imlementing this heuristic raises challenges. lthough it is easy to visualize (and imlement) the heuristic with two rofiles in two dimensions, ractical CC roblems require imlementation with J i rofiles in large -dimensional saces with -dimensional olyhedra and (- )-dimensional hyerlane cuts. Furthermore, the algorithm should run sufficiently fast so that there is little noticeable delay between questions. The first challenge is finding the center of the current olyhedron and the J i / longest axes (Ste ). If we define the longest axis of a olyhedron as the longest line segment in the olyhedron, then we would need to enumerate all vertices of the olyhedron and comute the distances between the vertices. Unfortunately, for large this roblem is comutationally intractable (Gritzmann and Klee 993); solving it would lead to lengthy delays between questions for each resondent. Furthermore, this definition of the longest axes of a olyhedron may not cature the intuitive concets that we used to motivate the algorithm. Instead, we turn to Sonnevend s theorems (985a, 985b) which state that the shae of olyhedra can be aroximated with bounding ellisoids centered at the analytic center of the olyhedron. The analytic center is the oint that maximizes the geometric mean of the distances to the boundaries. Freund (993) rovides efficient algorithms to find the analytic centers of olyhedra. Once the analytic center is found, Sonnevend s results rovide analytic exressions for the ellisoids. The axes of ellisoids are well-defined and cature the intuitive concets in the algorithm. The longest axes are found with straightforward eigenvalue comutations for which there are many efficient algorithms. With well-defined axes it is simle to find the artworth vectors on the boundaries of the feasible set that intersect the axes (Ste ). We rovide technical details in an endix. To imlement Ste 3 we must define the resondent s utility maximization roblem. We do so in an analogy to economic theory. For each of the J i utility vectors on the boundary of the olyhedron we obtain the j th rofile, z r ij, by solving: (OPT) max z r ij u r r r ij subject to: z ij c M, elements of z r ij {0,} r r where u ij is the utility vector chosen in Ste, c are costs of the features, and M is a budget constraint. We imlement (aroximate) choice balance by setting c r equal to the ana- 3

114 lytic center of the feasible olyhedron ( u i ) comuted after the first i- questions. t otimality r r the constraint in OPT will be aroximately binding, which imlies that z u z u M for ij i ik i all k j. It may be aroximate due to the integrality constraints in OPT (elements of z r ij {0,} ). This assures that the J i (J i )/ searating hyerlanes go (aroximately) through the analytic center. The binding constraints are all bifurcations which tends to make the regions aroximately equal in size. Finally, we also know that the solution to OPT ensures that each of the searating hyerlanes ass through the feasible olyhedron because each rofile is referred at the utility vector to which it corresonds. Solving OPT for rofile selection (Ste 3) is a knasack roblem which is well-studied and for which efficient algorithms exist. M is an arbitrary constant that we draw randomly from a comact set (u to m times) until all rofiles in a stated-choice task are distinct. If the rofiles are not distinct we use those that are distinct. If none of the rofiles are distinct then we ask no further questions (in ractice this is a rare occurrence in both simulation and emirical situations). OPT also illustrates the relationshi between choice balance and utility balance a criterion in aggregate customization. In our algorithm, the J i rofiles are chosen to be equally likely based on data from the first i- questions. In addition, for the artworths at the analytic center of the feasible region the utilities of all rofiles are aroximately equal. However, utility balance only holds at the analytic center, not throughout the feasible region. Thus, while, a riori, the rofiles are equally likely to be chosen, it will be rare that the resondent is indifferent among the rofiles. Hence, choice balance is unlikely to lead to resondent fatigue and we observed none in our emirical alication. We illustrate the algorithm for J i = 4 with the two-dimensional examle in Figure 4. We begin with the current olyhedron of feasible artworth vectors (in Figure 4a). We then use Freund s algorithm to find the analytic center of the olyhedron as illustrated by the black dot in Figure 4a. We next use Sonnevend s formulae to find the equation of the aroximating ellisoid and obtain the J i / longest axes (Figure 4b), which corresond to the J i / smallest eigenvalues of the matrix that defines the ellisoid. We then identify J i target artworth vectors by finding the intersections of the J i / axes with the boundaries of the current olyhedron (Figure 4c). Finally, for each target utility vector we solve OPT to identify J i roduct rofiles. The J i roduct rofiles each imly J i - hyerlanes (illustrated for Profile in Figure 4d). If the resondent 4

115 chooses Profile, this imlies a new smaller olyhedron defined by the searating hyerlanes. s drawn in Figure 4d, one of the hyerlanes is redundant; this is less likely in higher dimensions. Were we to draw all J i (J i )/ = 6 hyerlanes, they would divide the olyhedron into mutually exclusive and collectively exhaustive convex regions of aroximately equal size. We continue for q questions or until OPT no longer yields distinct rofiles. Figure 4 ounding Ellisoids and the nalytic Center of the Polyhedra (a) Find the analytic center (b) Find Sonnevend s ellisoid and axes (c) Partworth values on boundary of olyhedron (d) Cuts imly region associated with rofile 5

116 Incororating Managerial Constraints and other Prior Information Previous research suggests that rior constraints enhance estimation (Johnson 999; Srinivasan and Shocker 973). For examle, self-exlicated data might constrain the rank order of artworth values across features. Such constraints are easy to incororate and shrink the feasible olyhedron. Most conjoint analysis studies use multi-level features, some of which are ordinal scaled (e.g., icture quality). For examle, if u fm and u fh are the medium and high levels of feature f, we add the constraint, u fm u fh, to the feasible olyhedron. 5 We similarly incororate information from managerial riors or retest studies. Resonse Errors In real questionnaires there are likely resonse errors in stated choices. When there are resonse errors, the searating hyerlanes are aroximations rather than deterministic cuts. For this and other reasons, we distinguish question selection and estimation. The algorithm we roose is a question-selection algorithm. fter the data are collected we can estimate the resondents artworths with most established methods, which address resonse error formally. For examle, olyhedral questions can be used with classical random-utility models or H estimation. It remains an emirical question as to whether or not resonse errors counteract the otential gains in question selection due to individual-level adatation. lthough we hyothesize that the criteria of choice balance and symmetry lead to robust stated-choice questions, we also hyothesize that individual-level adatation will work better when resonse errors are smaller. We examine these issues in the next section. nalytic-center (C) Estimation The analytic center of the i th feasible olyhedron rovides a natural summary of the information in the first i stated-choice resonses. This summary measure is a good working estimate of the resondent s artworth vector. It is a natural byroduct of the question-selection algorithm and is available as soon as each resondent comletes the i th stated-choice question. Such estimates might also be used as starting values in H estimation, as estimates in classical ayes udating, and as riors for aggregate customization. nalytic-center estimates also give us a means to test the ability of the olyhedral algorithm to imlement the feasibility and choice-balance criteria. Secifically, if we use the i th C 6

117 estimate to forecast choices for q > i choice sets, it should redict 00% of the first i choices (feasibility) and (/J i ) ercent of the last q i choices (choice balance). When J i does not vary with i, the internal redictive ercentage should aroximately equal [i + (/J i )(q-i)]/q. We examine this statistic in the emirical alication later in the aer. We can give the C estimate a statistical interretation if we assume that the robability of a feasible oint is roortional to its distance to the boundary of the feasible olyhedron. In this case, the analytic center maximizes the likelihood of the oint (geometric mean of the distances to the boundary). C estimates each resondent s artworth vectors based on data only from that resondent. This advantage is also a disadvantage because, unlike Hierarchical ayes estimation, C estimation does not use information from other resondents. This suggests an oortunity to imrove the accuracy of C estimates by using data from the oulation distribution of artworths. While the full develoment of such an C algorithm is beyond the scoe of the aer, we can test its otential by using the (known) oulation distribution as a ayesian rior to udate C estimates. We hyothesize that C estimates will be less accurate (relative to H) when the resondents are homogeneous, but that this disadvantage can be offset with the develoment of an C ayesian hybrid. Incororating Null Profiles Many researchers refer to include a null rofile as an additional rofile in the choice set (as in Figure ). Polyhedral concets generalize readily to include null rofiles. If the null rofile is selected from choice set i then we add the following constraints: * * z r k r r r r zu zu jk, i denotes the rofile chosen from choice set k (given that the null rofile was not chosen in choice set k). Intuitively, these constraints recognize that if the null rofile is selected in one ij k where choice set, then all of the alternatives in that choice set have a lower utility than the rofiles selected in other choice sets (excluding other choice sets where the null was chosen). lternatively, we can exand the arameter set to include the artworth of an outside otion and write the aroriate constraints. fter incororating these constraints the question design heuristic (and analytic-center estimation) roceed as described above. We leave ractical imlementation, Monte Carlo testing, and emirical alications with null rofiles to future research. 7

118 Metric-Paired-Comarison Questions Polyhedral methods are also feasible for metric-aired comarison questions. In articular, Toubia et al. (003) roose a olyhedral method for metric aired-comarison data that also uses Sonnevend s ellisoids. However, metric-air and the CC formats resent fundamentally different challenges resulting in very different olyhedral algorithms. For examle, each metricair question defines equality constraints which reduce the dimensionality of the feasible olyhedron. In the CC algorithm the inequality constraints do not reduce the dimensionality of the feasible olyhedron. Furthermore, the metric-airs olyhedron becomes emty after sufficient questions. The metric-airs algorithm must revert to an alternative question selection method and the metric-airs analytic-center algorithm must address infeasibility. In the CC algorithm, the olyhedron always remains feasible. Moreover, because the metric-airs algorithm identifies the artial rofiles directly, the utility-maximization knasack algorithm is new to the CC algorithm. Summary Polyhedral (ellisoid) algorithms rovide a means to adat stated-choice questions for each resondent based on that resondent s answers to the first i- questions. The algorithms are based on the intuitive criteria of non-dominance, feasibility, choice balance, and symmetry and reresent an individual-level analogy to D-efficiency. Secifically, the olyhedral algorithm focuses questions on what is not known about the artworth vectors and does so by seeking a small feasible region. Choice balance, symmetry, and the shrinking ellisoid regions rovide analogies to D-efficiency which seeks questions to minimize the confidence ellisoid for maximumlikelihood estimates. While both olyhedral question design and aggregate customization are comatible with most estimation methods, including C estimation, the two methods reresent a key tradeoff. Polyhedral question design adats questions for each resondent but may be sensitive to resonse errors. ggregate customization uses the same design for all resondents, but is based on rior statistical estimates that take resonse errors into account. This leads us to hyothesize that olyhedral methods will have their greatest advantages relative to existing methods (question design and/or estimation) when resonses are more accurate and/or when resondents artworths are more heterogeneous. We next examine individual-level adatation and C estima- 8

119 tion with Monte Carlo exeriments. 4. Monte Carlo Exeriments We use Monte Carlo exeriments to investigate whether olyhedral methods show sufficient romise to justify further develoment and to identify the emirical domains in which the otential is greatest. Monte Carlo exeriments are widely used to evaluate conjoint analysis methods, including studies of interactions, robustness, continuity, attribute correlation, segmentation, new estimation methods, and new data-collection methods. In articular, they have roven articularly useful in the first tests of aggregate customization and in establishing domains in which aggregate customization is referred to orthogonal designs. Monte Carlo exeriments offer several advantages for an initial test of new methods. First, with any heuristic, we need to establish comutational feasibility. Second, Monte Carlo exeriments enable us to exlore many domains and control the arameters that define those domains. Third, other researchers can readily relicate and extend Monte Carlo exeriments, facilitating further exloration and develoment. Finally, Monte Carlo exeriments enable us to control the true artworth values, which are unobserved in studies with actual consumers. However, Monte Carlo exeriments are but the first ste in a stream of research. ssumtions must be made about characteristics that are not varied, and these assumtions reresent limitations. In this aer we exlore domains that vary in terms of resondent heterogeneity, resonse accuracy (magnitude), estimation method, and question-design method. This establishes a 4x 3 exerimental design. We encourage subsequent researchers to vary other characteristics of the exeriments. Structure of the Simulations For consistency with rior simulations, we adot the basic simulation structure of rora and Huber (00). rora and Huber varied resonse accuracy, heterogeneity, and questiondesign method in a 3 exeriment using Hierarchical ayes (H) estimation. Huber and Zwerina (996) had earlier used the same structure to vary resonse accuracy and question-design with classical estimation, while more recently Sandor and Wedel (00) used a similar structure to comare the imact of rior beliefs. The Huber-Zwerina, and rora-huber algorithms were aggregate customization methods based on relabeling and swaing. These algorithms work best for stated-choice roblems in 9

120 which relabeling and swaing are well-defined. We exand the rora and Huber design to include four levels of four features for four rofiles, which ensures that comlete aggregate customization and orthogonal designs are ossible. Sandor and Wedel included cycling, although they note that cycling is less imortant in designs where the number of rofiles equals the number of feature levels. 6 Within a feature rora and Huber choose artworths symmetrically with exected magnitudes of - β, 0, and + β. They vary resonse accuracy by varying β. Larger β s imly higher resonse accuracy because the variance of the Gumbel distribution, which defines the logit model, is inversely roortional to the squared magnitude of the artworths (en-kiva and Lerman 985,. 05, roerty 3). For four levels we retain the symmetric design with magnitudes of - β, 3 β, β, and β. rora and Huber model heterogeneity by allowing artworths 3 to vary among resondents according to normal distributions with variance,. They secify a coefficient of heterogeneity as the ratio of the variance to the mean. Secifically, they maniulate low resonse accuracy with β = 0.5 and high resonse accuracy with β =.5. They maniulate high heterogeneity with σ / β =.0 and low heterogeneity with σ / β = 0.5. Given β these values, they draw each resondent s artworths from a normal distribution with a diagonal covariance matrix. Each resondent then answers the stated-choice questions with robabilities determined by a logit model based on that resondent s artworths. rora and Huber comare question selection using root mean squared error (RMSE). For comarability, we adot the same criterion and, in addition, reort three more intuitive metrics. We select magnitudes and heterogeneity that reresent the range of average artworths and heterogeneity that we might find emirically. While we could find no meta-analyses for these values, we did have available to us data from a rorietary CC alication (D-efficient design, H estimation) in the software market. The study included data from almost,00 home consumers and over 600 business customers. In both data sets β ranged from aroximately 3.0 to +.4. We chose our high magnitude (3.0) from this study recognizing that other studies might have even higher magnitudes. For examle, Louviere, Hensher and Swait (000) reort stated-choice estimates (logit analysis) that are in the range of 3.0 and higher. fter selecting β for high magnitudes, we set the low magnitude β to the level chosen by rora and Huber. In the emirical data, the estimated variances ranged from 0. to 6.9 and the 0 β σ β

121 heterogeneity coefficient varied from 0.3 to To aroximate this range and to rovide symmetry with the magnitude coefficient, we maniulated high heterogeneity with a coefficient of three times the mean. Following rora and Huber we maniulated low heterogeneity as half the mean. We feel that these values are reresentative of those that might be obtained in ractice. Recall that, as a first test of olyhedral methods, we seek to identify domains that can occur in ractice and for which olyhedral methods show romise. More imortantly, these levels illustrate the directional differences among methods and, hence, rovide insight for further develoment. Exerimental Design In addition to maniulating magnitude (two levels) and heterogeneity (two levels) we maniulate estimation method (two levels), and question-design method (four levels). The estimation methods are Hierarchical ayes and nalytic Center estimation. The former is wellestablished and incororates information from other resondents in each individual estimate. The latter is the only feasible method, of which we are aware, that rovides individual-level estimates using only information from that resondent. The question-design methods are random, orthogonal designs with equally-likely riors, aggregate customization (rora-huber), and olyhedral methods. To simulate aggregate customization, we assume that the retest data are obtained costlessly and, based on this data, we aly the rora-huber algorithm. Secifically, we simulate an orthogonal retest that uses the same number of resondents as in the actual study. For the orthogonal design we adot the rora-huber methods as detailed in Huber and Zwerina (996,. 30-3). We set q = 6 so that orthogonal designs, relabeling, and swaing are well-defined. Exloratory simulations suggest that the estimates become more accurate as we increase the number of questions, but the relative comarisons of question design and estimation for q = 8 and for q = 4 rovide similar qualitative insights. 8 For each combination of question design method, estimation method, heterogeneity level and magnitude level we simulated,000 resondents. 9 Practical Imlementation Issues In order to imlement the olyhedral algorithm we made two imlementation decisions: () We randomly drew M u to thirty times (m = 30) for the simulations. We believe that the accuracy of the method is relatively insensitive to this decision. () ecause, rior to the first ques-

122 tion, the olyhedron is symmetric, we selected the first question by randomly choosing from among the axes. Other decisions may yield greater (or lesser) accuracy, hence the erformance of the olyhedral methods tested in this aer should be considered a lower bound on what is ossible with further imrovement. For examle, future research might use aggregate customization to select the first question. ll olyhedral otimization, question selection, and estimation algorithms are described in the endix and imlemented in Matlab code. The web-based alication described later in this aer uses Perl and Html for web-age resentation. ll code (and the orthogonal design) are available at mitsloan.mit.edu/vc and is oen-source. Comarative Results of the Monte Carlo Exeriments Table reorts four metrics describing the simulation results in a table format similar to rora and Huber: root mean square error (RMSE); the ercentage of resondents for whom each question design method has the lowest RMSE; the hit rate, and the average correlation between the true and estimated artworths. The hit rate measures the ercentage of times each method redicts the most-referred rofile and is based on,000 sets of holdout rofiles. We use italic bold text to indicate the best question design method for each estimation method within an exerimental domain (and any other methods that are not statistically different from the best method). Tables a and b summarize the best question design method and the best estimation method, resectively, for each metric in each domain. The entries in Tables corresond to the best overall method (question design x estimation) for each domain. For comarability between estimation methods we first normalized the artworths to a constant scale. Secifically, for each resondent, we normalize both the true artworths and the estimated artworths so that their absolute values sum to the number of arameters and their values sum to zero for each feature. In this manner, the RMSEs can be interreted as a ercent of the mean artworths. Within an estimation method, subject to statistical confidence, this scaling does not change the relative comarisons among question design methods. This scaling has the added advantage of making the results roughly comarable in units for the different maniulations of magnitude (resonse accuracy) and heterogeneity. This scaling addresses two issues. First, nalytic-center estimation is unique to a ositive linear transformation and thus focuses on the relative values of the artworths as required by many managerial alications. Second, un-

123 scaled logit analyses confound the magnitude of the stochasticity of observed choice behavior with the magnitude of the artworths. Our scaling enables us to focus on the relative artworths. For volumetric forecasts, we recommend the methods roosed and validated by Louviere, Hensher, and Swait (000). These methods are well-documented and reliable and have been roven aroriate for matching disarate scales. Table Monte Carlo Simulation Results Magnitude and Heterogeneity Mag Het Question Design RMSE Percent est Hit Rates Correlations H C H C H C H C Low High random orthogonal customized olyhedral Low Low random orthogonal customized olyhedral High High random orthogonal customized olyhedral High Low random orthogonal customized olyhedral est or not significantly different than best at < Note that lower values of RMSE reflect increased accuracy, while higher values on ercent best, hit rates, and correlations denote increased accuracy. 3

124 Table a Comarison Summary for Question Design Magnitude Heterogeneity RMSE % est Hit Rates Correlations Low High Random Customized Polyhedral Polyhedral Customized Random Orthogonal Customized Low Low Orthogonal Orthogonal Customized Orthogonal Customized Orthogonal High High Polyhedral Polyhedral Polyhedral Polyhedral High Low Polyhedral Polyhedral Polyhedral Polyhedral Table b Comarison Summary for Estimation Magnitude Heterogeneity RMSE % est Hit Rates Correlations Low High C H C H H Low Low H C H C H H High High C C C H High Low H H H H Question Design Methods s hyothesized, olyhedral question design erforms well in the high magnitude (high resonse accuracy) domains. For these domains, olyhedral question design is best, or tied for best, for all metrics and for both estimation methods. These domains favor individual-level adatation because the design of subsequent questions is based on more accurate information from 4

125 the revious answers. When magnitudes are low, the greater resonse error works against individual adatation and olyhedral question design does not do as well. The imact of heterogeneity is more comlex. We exect individual-level adatation to erform well when resondents are heterogeneous. When both magnitudes and heterogeneity are high, conditions favor olyhedral question design and it erforms well. In this domain, the accuracy of the data enables individual-level adatation to be efficient. However, for low magnitudes the disadvantages of high resonse errors aear to offset the need for customization (high heterogeneity). In this domain, olyhedral question design is best only for C estimation, erhas because the two olyhedral methods are comlementary olyhedral question design shrinks the feasible region raidly making C estimation more accurate. We exect low heterogeneity to reduce the need for customization and low resonse accuracy (low magnitude) to work against deterministic customization. In this domain, as redicted, olyhedral methods do not do as well as orthogonal and aggregate customized designs. Perhas one surrise in Table is the strong erformance of random designs relative to orthogonal designs esecially for H estimation and when either magnitudes or heterogeneity are high. We believe that this relative erformance can be exlained by two henomena. First, orthogonal designs are otimal only when the artworths are zero. For higher magnitudes orthogonal designs are further from otimal (Huber and Zwerina 996; rora and Huber 00). For examle, when we comute D-errors for orthogonal and random designs in the high magnitude domains, the D-errors are higher for orthogonal questions than for random questions. Second, H uses inter-resondent information effectively. s Sandor and Wedel (003) illustrate, multile designs are more likely to contribute incremental information about the oulation. The accuracy of random designs relative to fixed, orthogonal designs is consistent with simulations of differentiated designs as roosed by Sandor and Wedel (003). Finally, we note three asects of Table. First, Table relicates the rora-huber simulations when the domains and metric are matched rora and Huber use H and reort RMSE. In the rora-huber simulations, aggregate customization is suerior to orthogonal designs when magnitudes are high and when heterogeneity is high. Second, there are ties in Table, esecially for the low magnitude and high heterogeneity domain, erhas because high heterogeneity favors customization while low magnitudes make customization more sensitive to errors. Even when we increase the samle size to,500 (for RMSE) we are unable to break the ties. In this 5

126 domain, for most ractical roblems, question design aears less critical. Third, erformance varies slightly by metric, esecially when there are ties and esecially for low magnitudes. Estimation Methods The findings in Table also facilitate the comarison of estimation methods. We have already noted that C erforms well when matched with olyhedral questions for high heterogeneity. In most other domains (and for most metrics), H is more accurate. In theory, the advantages of H come from a number of roerties, one of which is the use of oulation-level data to moderate the individual-level estimates (shrinkage). We exect this to be articularly beneficial when the oulation is homogenous, because the oulation-level data rovide more information about individual references in these domains. This is consistent with Table, and may hel to exlain why C s relative erformance imroves when the oulation is more heterogeneous. efore we reject C estimation for the domains in which H is now suerior, we investigate the theoretical otential to imrove C by relacing the C estimate with a convex combination of the C estimate and the oulation mean (shrinkage). If we knew the oulation mean ( β ), its variance ( σ β ), and the accuracy of C (RMSE), then classical ayes udating rovides a formula by which to imlement shrinkage. To test shrinkage, we use the known oulation mean, its variance, and the RMSE from Table. With these values we comute a single arameter for each domain, α, with which to weigh the oulation mean. Using these theoretical α's, a convex combination of C and the oulation mean rovides the best overall estimate in all four domains RMSEs of 0.863, 0.87, 0.50, and 0.403, resectively, the last three are significantly best at the 0.05 level. In ractice, we do not know α, so we must estimate it from the data. Otimal estimation is beyond the scoe of this aer, but surrogates, such as using either H or classical logit analysis to estimate α, should erform well. We conclude that C shows sufficient romise to justify further develoment. Summary of Monte Carlo Exeriments We summarize the results of the Monte Carlo exeriments as follows: Polyhedral question design shows the most romise when magnitudes are high (resonse errors are low). 6

127 If magnitudes are low (resonse errors are high) it may be best not to customize designs; fixed orthogonal questions aear to be more accurate than olyhedral or customized methods. H estimation does well in all domains. C estimation erforms well when matched with olyhedral question design and when heterogeneity is high. C estimation shows sufficient romise to justify further develoment. Preliminary analysis suggests that a modified ayesian C estimate is articularly romising if an otimal means can be found to estimate a single arameter, α. Like many new technologies, we hoe olyhedral question design and analytic-center estimation will imrove further with use, exerimentation, and evolution (Christensen 998). 5. lication to the Design of Executive Education Programs Polyhedral methods for CC have been imlemented in at least one emirical alication. We describe this alication briefly as evidence that it is feasible to imlement the roosed methods using actual resondents. Furthermore, the data rovide emirical estimates of magnitude and heterogeneity, enable us to test the feasibility and choice-balance criteria, and rovide insight on the convergence of C and H estimates. ecause the managerial environment required that we imlement a single question-design method, we cannot comare questiondesign methods in this first roof-of-concet alication. Feature Design and Samle Selection for the Polyhedral Choice-ased Conjoint Study The alication suorted the design of new executive education rograms for a business school at a major university. The school was the leader in twelve-month executive advanceddegree rograms with a fifty-year track record that has roduced alumni who include CEOs, rime ministers, and a secretary general of the United Nations. However, the demand for twelve-month rograms was shrinking because it has become increasingly difficult for executives to be away from their comanies for twelve months. The senior leadershi of the school was considering a radical redesign of their rograms. s art of this effort, they sought inut from otential students. ased on two qualitative studies and detailed internal discussions, the 7

128 senior leadershi of the school identified eight rogram features that were to be tested with conjoint analysis. The features included rogram focus (3 levels), format (4 levels), class comosition (3 levels of interest focus, 4 age categories, 3 tyes of geograhic focus), sonsorshi strategy (3 levels), comany focus (3 levels), and tuition (3 levels). This 4 x3 6 is relatively large for CC alications (e.g., Huber 997, Orme 999), but roved feasible with rofessional web design. We rovide an examle screenshot in Figure 5 (with the university logo and tuition levels redacted). Prior to answering these stated-choice questions, resondents reviewed detailed descritions of the levels of each feature and could access these descritions at any time by clicking the feature s logo. Figure 5 Examle Web-ased Stated-Choice Task for Executive-Education Study fter wording and layout was refined in retests, otential resondents were obtained from the Graduate Management dmissions Council through their Graduate Management dmissions Search Service. Potential resondents were selected based on their age, geograhic location, educational goals, and GMT scores. Random samles were chosen from three strata those within driving distance of the university, those within a short airlane flight, and those within a moderate airlane flight. Resondents were invited to articiate via s from the 8

129 Director of Executive Education. s an incentive, resondents were entered in a lottery in which they had a -in-0 chance of receiving a university-logo gift worth aroximately $00. Pretests confirmed that the resondents could comfortably answer twelve stated-choice questions (recall that the resondents were exerienced executives receiving minimal resonse incentives). Of those resondents who began the CC section of the survey, 95% comleted the section. The overall resonse rate was within ranges exected for both rorietary and academic web-based studies (Couer 000; De ngelis 00; Dillman et. al., Sheehan 00). 0 No significant differences were found between the artworths estimated for early resonders and those estimated for later resonders. The committee judged the results intuitive, but enlightening, and adequate for managerial decision-making. ased on the results of the conjoint study and internal discussions, the school has redesigned and retargeted its twelve-month rograms with a focus on both global leadershi and innovation/entrereneurshi. eginning with the class of 004, it is adding a new, flexible format to its traditional offerings. Technical Results The data rovide an oortunity to examine several technical issues. First, we estimate the magnitude and heterogeneity arameters using H. We obtained estimates of magnitude ( β ) that ranged from.0 to 3.4, averaging.5. The heterogeneity coefficient ranged from 0.9 to 3., averaging.9. These observed magnitude and the observed heterogeneity coefficients san the ranges addressed in the Monte Carlo simulations. Hit rates are more comlex. y design, olyhedral questions select the choice sets that rovide maximum information about the feasible set of artworths. If the nalytic-center (C) estimates remain feasible through the twelfth question they obtain an internal hit rate of 00% (by design). They remained feasible and this internal hit rate was achieved. This hit rate is not guaranteed for H estimates, which, nonetheless, do quite well with internal hit rates of 94%. s described earlier in this aer, we can examine internal consistency by comaring the hit rates based on C estimates from the first i questions to their theoretical value, [ i ( i )] +. The fit is almost erfect (adj. R = ), suggesting that olyhedral question design was able to achieve excellent choice balance for the resondents in this study. ecause these internal hit rates are not guaranteed for H, we lot the hit rates for both estimation methods in Figure 6a. On this metric, H is more concave than C doing better for low i, but less well for high i. 4 9

130 To gain further insight we adot an evaluation method used by Kamakura and Wedel (995,. 36) to examine how raidly estimates converge to their final values. Following their structure, we comute the convergence rates as a function of the number of stated-choice tasks (i) using scaled RMSE to maintain consistency with rora and Huber (00) and with the Monte Carlo simulations. From question 3 onward, C and H are quite close achieving roughly equal convergence. H does less well for i = and, most likely because individual-level variation (in a oulation with moderately high heterogeneity) counterbalances the benefit to H of the oulation-level data. Figure 6 Emirical Hit Rates and RMSE Convergence for Executive-Education Study (a) Hit Rates (b) RMSE Convergence ased on this initial alication we conclude that adative olyhedral choice-based questions are ractical and achieve both feasibility and choice-balance. nalytic-center estimation aears to be comarable to H and is deserving of further study, erhas with the ayesian hybrids suggested earlier in the text. 6. Conclusions and Research Oortunities Research on stated-choice question design suggests that careful selection of choice sets has the otential to increase accuracy and reduce costs by requiring fewer resondents, fewer questions, or both. This is articularly true in choice-based conjoint analysis because the most efficient design deends uon the true artworth values. In this aer we exlore whether the 30

131 success of aggregate customization can be extended to individual-level adative question design. We roose heuristics for designing rofiles for each choice set. We then rely on new develoments in dynamic otimization to imlement these heuristics. s a first test, we seek to identify whether or not the roosed methods show romise in at least some domains. It aears that such domains exist. Like many roosed methods, we do not exect olyhedral methods to dominate in all domains and, indeed, they do not. However, we hoe that by identifying romising domains we can insire other researchers to exlore hybrid methods and/or imrove the heuristics. While olyhedral methods are feasible emirically and show romise, many challenges remain. For examle, we might allow fuzzy constraints for the olyhedra. Such constraints might rovide greater robustness at the exense of recision. Future simulations might exlore other domains including non-diagonal covariance structures, robit-based random-utility models, mixtures of distributions, and finite mixture models. Recently, Ter Hofstede, Kim, and Wedel (00) demonstrated that self-exlicated data could imrove H estimation for full-rofile conjoint analysis. Polyhedral estimation handles such data readily hybrids might be exlored that incororate both self-exlicated and stated-choice data. Future develoments in dynamic otimization might enable olyhedral algorithms that look many stes ahead. We close by recognizing research on other otimization algorithms for conjoint analysis. Evgeniou, oussios, and Zacharia (003) roose suort vector machines (SVMs) to balance comlexity of interactions with fit. They are currently exloring hybrids based on SVMs and olyhedral methods. 3

132 Endnotes. This is equivalent to maximizing the th root of the determinant of Σ. Other norms include -efficiency, which maximizes the trace of Σ /, and G-efficiency, which maximizes the maximum diagonal element of Σ.. The Huber-Zwerina and rora-huber algorithms maximize det Σ based on the mean artworths and do so by assuming the mean is known from retest data (or managerial judgment). Sandor and Wedel (00) include, as well, a rior covariance matrix in their calculations. They then maximize the exectation of det Σ, where the exectation is over the rior subjective beliefs. 3. In the alication described later in the aer we use warm-u questions to identify the lowest level of each feature (a common solution to this issue). 4. For J i rofiles, equally-sized regions also maximize entroy, defined as -Σ j π ij logπ ij. Formally, maximum entroy is equal to the total information obtainable in a robabilistic model (Hauser 978, Theorem,. 4). 5. In the theoretical derivation we used binary features without loss of generality for notational simlicity. n ordinal multi-level feature constraint is mathematically equivalent to a constraint linking two binary features. 6. In the designs that we use, the efficiency of the Sandor and Wedel algorithm is aroximately equal to the efficiency of the Huber and Zwerina algorithm. 7. There was also an outlier with a mean of 0.0 and a variance of 0.88 imlying a heterogeneity coefficient of 9.0. Such cases are clearly ossible, but less likely to reresent tyical emirical situations. Researchers who refer a unitless metric for heterogeneity can rescale using the standard deviation rather than the variance. For our two-level maniulation, directional differences are the same. 8. The estimates at q = 6 are aroximately 5% more accurate than those at q = 8 and the estimates at q = 4 are aroximately % more accurate than those at q = The simulations are based on ten sets of one hundred resondents. To reduce unnecessary variance, the same true artworths for each of the,000 resondents are used for each question design method. 3

133 0. Couer (000,. 384) estimates a 0% resonse rate for oen-invitation studies and a 0-5% resonse rate for studies with re-recruited resondents. De ngelis (00) reorts click-through rates of 3-8% from an list of 8 million records. In a study designed to otimize resonse rates, Dillman, et al. (00) comare mail, telehone, and web-based surveys. They obtain a resonse rate of 3% for the web-based survey. Sheehan (00) reviews all ublished studies referenced in cademic Search Elite, Exanded cademic Index, rticlefirst, Lexis-Nexis, Psychlit, Sociological bstracts, I-Inform, and ERIC and finds resonse rates droing at 4% er year. Sheehan s data suggest an average resonse rate for 00 of 5.5%. The resonse rate in the Executive Education study was 6%. 33

134 References llenby, Greg M. and Peter E. Rossi (999), Marketing Models of Consumer Heterogeneity, Journal of Econometrics, 89, (March/ril), ndrews, Rick L., ndrew inslie, and Imran S. Currim (00), n Emirical Comarison of Logit Choice Models with Discrete versus Continuous Reresentations of Heterogeneity, Journal of Marketing Research, 39, (November), rora, Neeraj, Greg M. llenby and James L. Ginter (998), Hierarchical ayes Model of Primary and Secondary Demand, Marketing Science, 7,, and Joel Huber (00), Imroving Parameter Estimates and Model Prediction by ggregate Customization in Choice Exeriments, Journal of Consumer Research, 8, (Setember), en-kiva, Moshe and Steven R. Lerman (985), Discrete Choice nalysis: Theory and lication to Travel Demand, (Cambridge, M: MIT Press). Christensen, Clayton (998), The Innovator's Dilemma : When New Technologies Cause Great Firms to Fail (oston, M: Harvard usiness School Press). Couer, Mick P. (000), Web Surveys: Review of Issues and roaches, Public Oinion Quarterly, 64, De ngelis, Chris (00), Samling for Web-based Surveys, nnual Marketing Research Conference, tlanta, G, (Setember 3-6). Dillman, Don., Glenn Phels, Robert Tortora, Karen Swift, Julie Kohrell, and Jodi erck (00), Resonse Rate and Measurement Differences in Mixed Mode Surveys Using Mail, Telehone, Interactive Voice Resonse, and the Internet, (Pullman, W: Social and Economic Sciences Research Center, Washington State University). Evgeniou, Theodoros, Constantinos oussios, and Giorgos Zacharia (003), Generalized Robust Conjoint Estimation, Working Paer, (Fontainebleau, France: INSED), May. Freund, Robert (993), Projective Transformations for Interior-Point lgorithms, and a Suerlinearly Convergent lgorithm for the W-Center Problem, Mathematical Programming, 58, nd 34

135 ----, Robert, R. Roundy, and M.J. Todd (985), Identifying the Set of lways-ctive Constraints in a System of Linear Inequalities by a Single Linear Program, WP , Sloan School of Management, MIT. Greene, William H. (993), Econometric nalysis, E, (Englewood Cliffs, NJ: Prentice-Hall, Inc.) Gritzmann P. and V. Klee (993), Comutational Comlexity of Inner and Outer J-Radii of Polytoes in Finite-Dimensional Normed Saces, Mathematical Programming, 59(), Hadley, G. (96), Linear lgebra, (Reading, M: ddison-wesley Publishing Comany, Inc.) Hauser, John R. (978), "Testing the ccuracy, Usefulness and Significance of Probabilistic Models: n Information Theoretic roach," Oerations Research, Vol. 6, No. 3, (May-June), Huber, Joel (997), What we Have Learned from 0 Years of Conjoint Research, Research Paer Series, (Sequim, W: Sawtooth Software, Inc.) and Klaus Zwerina (996), The Imortance of Utility alance in Efficient Choice Designs, Journal of Marketing Research, 33, (ugust), Johnson, Richard (999), The Joys and Sorrows of Imlementing H Methods for Conjoint nalysis, Research Paer Series, (Sequim, W: Sawtooth Software, Inc.). Kamakura, Wagner and Michel Wedel (995), Life-Style Segmentation with Tailored Interviewing, Journal of Marketing Research, 3, (ugust), Karmarkar, N. (984), New Polynomial Time lgorithm for Linear Programming, Combinatorica, 4, Kuhfeld, Warren F., Randall D. Tobias, and Mark Garratt (994), Efficient Exerimental Design with Marketing Research lications, Journal of Marketing Research, 3, 4, (November), Lenk, Peter J., Wayne S. DeSarbo, Paul E. Green, and Martin R. Young (996), Hierarchical ayes Conjoint nalysis: Recovery of Partworth Heterogeneity from Reduced Exerimental Designs, Marketing Science, 5,, Liechty, John, Venkatram Ramaswamy, Steven Cohen (00), Choice-Menus for Mass Customization: n Exerimental roach for nalyzing Customer Demand With an lication to a Web-based Information Service, Journal of Marketing Research, 38,, (May). 35

136 Louviere, Jordan J., David. Hensher, and Joffre D. Swait (000), Stated Choice Methods: nalysis and lication, (New York, NY: Cambridge University Press). McFadden, Daniel (974), Conditional Logit nalysis of Qualitative Choice ehavior, Frontiers in Econometrics, P. Zarembka, ed., (New York: cademic Press), Nesterov, Y. and. Nemirovskii (994), Interior-Point Polynomial lgorithms in Convex Programming, SIM, Philadelhia. Orme, ryan (999), C, CC, or oth?: Effective Strategies for Conjoint Research, Working Paer, Sawtooth Software, Sequim, W. Sandór, Zsolt and Michel Wedel (00), Designing Conjoint Choice Exeriments Using Managers Prior eliefs, Journal of Marketing Research, 38, 4, (November), and ---- (003), Differentiated ayesian Conjoint Choice Designs, working aer, (nn rbor, MI: University of Michigan usiness School). Sheehan, Kim (00), Survey Resonse Rates: Review, Journal of Comuter- Mediated Communications, 6,, (January). Sonnevend, G. (985a), n nalytic Center for Polyhedrons and New Classes of Global lgorithms for Linear (Smooth, Convex) Programming, Proceedings of the th IFIP Conference on System Modeling and Otimization, udaest (985b), New Method for Solving a Set of Linear (Convex) Inequalities and its lications for Identification and Otimization, Prerint, Deartment of Numerical nalysis, Institute of Mathematics, Eőtvős University, udaest, 985. Srinivasan, V. and llan D. Shocker (973), Linear Programming Techniques for Multidimensional nalysis of Preferences, Psychometrika, 38, 3, (Setember), Ter Hofstede, Frenkel, Youngchan Kim, and Michel Wedel (00), ayesian Prediction in Hybrid Conjoint nalysis, Journal of Marketing Research, 39, (May), Toubia, Olivier, Duncan I. Simester, John R. Hauser, and Ely Dahan (003), Fast Polyhedral dative Conjoint Estimation, Marketing Science, forthcoming. Vaidja, P. (989), Locally Well-ehaved Potential Function and a Simle Newton-Tye Method for Finding the Center of a Polytoe, in: N. Megiddo, ed., Progress in Mathematical Programming: Interior Points and Related Methods, Sringer: New York,

137 endix: Mathematics of Polyhedral Methods for CC nalysis This aendix is designed to be self-contained. Related math rogramming involved in finding an interior oint, the analytic center, and the Sonnevend ellisoid are resented in detail by Toubia et. al. (003) in their metric-airs algorithm. We include the modified math rogramming formulations here for comleteness. We caution readers that there are imortant differences between the stated-choice formulations as detailed in this aer and the metric-air formulations resented by Toubia, et. al. The stated-choice algorithm and the knasack roblem obviously do not arise in the metric-airs setting. u f r z r ij Definitions and ssumtions It is helful to begin with several definitions: the f th arameter of the resondent s artworth function where u f 0 is the high level of the f th feature (we assume binary features without loss of generality) and u = 00 f = f the number of (binary) features u r the vector of arameters the number of externally imosed constraints, of which r r are inequality constraints. the vector describing the j th rofile in the i th choice set, where j= indexes the resondent s choice from each set r r r X the q(j-) matrix of xij = zi zij for i = to q and j = to J (to simlify notation, we dro the i subscrit from J i ). Inequality constraints are incororated by adding slack variables. For examle if there are multile levels and u m u h, then u h = u m + v hm with v hm 0. If there were no errors, the resondent s choices would imly 0 r r r X u where 0 is a vector of 0 s. We add slack variables and augment u r r such that 0 r Xu =. We incororate the additional constraints by augmenting these equations so that u r and X include r additional slack variables and r additional equations. This forms a r olyhedron, P CC = {u R +q(j-)+r r Xu a r r r r =, u 0 } where a contains non-zero elements due to the external constraints. We begin by assuming that P CC is non-emty, that X is full-rank, and that no j exists such that u f = 0 for all u r in P CC. Interior Point Math Program To find a feasible interior oint, solve the following linear rogram (see Freund, Roundy and Todd 985): + q( J ) + r' r r r r r r r max r r y f, subject to: Xu = θ a, θ, u y 0, y e u, y, θ f = r r r r * r * * r where e is a vector of s. Let ( u, y, θ ) denote a solution. If y * > 0, then θ *- u * is an interior oint of P CC. If then P CC is emty. y * f = 0, then uf = 0 for all u r P CC. If the linear rogram is infeasible, 37

138 nalytic Center Math Program Solve the following math rogram: + q( J ) + r' max ln ( u ), subject to: f = f r r r r Xu = a, u > 0 We do so using an algorithm develoed by Freund (993) that begins with the feasible 0 oint u r r r t+ r t t t that was found earlier. t each iteration we set u = u +α d where r t d is found using the following quadratic aroximation of the objective function: ( U t t + q( J ) + r' + q( J ) + r' + q( J ) + r' t d d t f f ln ( u f + d f ) ln ( u f ) + = = = f f f u f u f If U t is a diagonal matrix of the u t t f s, then d r solves: r r r r T t T t max e U d ( ) ( ) d ( U ) d, subject to: X d r = 0 r Using the Karush-Kuhn-Tucker (KKT) conditions, r t ) d < 0.5, t d r r =u t (U t ) X T [X(U t ) X T ] - a r. If t u r is already close to otimal and we set α t =. Otherwise we find the otimal α t with a line search. The rogram continues to convergence at u r. If P CC is emty we emloy the error modeling rocedure in Toubia, et. al. (003). Note however that P CC will not be emty if CC questions are chosen with the olyhedral algorithm. If X is not full rank, X(U t ) X T might not invert. There are two ractical solutions: () select questions such that X is full rank or () make X full rank by removing redundant rows (see Toubia, et. al. 003). If when searching for feasibility we identify some f s for which u f = 0 for all u r P CC, we can find the analytic center of the remaining olyhedron by removing those f s and setting u f = 0 for those indices. The Longest xes of the Sonnevend Ellisoid If u r is the analytic center and U is the corresonding diagonal matrix, then Sonnevend (985a, 985b) demonstrates that E P CC E +q(j-)+r where, E = {u r Xu r r r r = a, ( ) T r r u u U ( u u ) } and E +q(j-)+r is constructed roortional to E by relacing with (+q(j-)+r ). ecause we are interested only in the direction of the longest axes of the r r r ellisoids, we can work with the simler of the roortional ellisoids, E. Let g = u u, then the longest axis is a solution to: r r r max g T r r g subject to: g T U g, 0 r X g = Using the KKT conditions, the solution to this roblem is the eigenvector of the matrix, 38

139 ( T T U X ( XX ) X U ), that is associated with its smallest ositive eigenvalue. The direction of the next longest axis is given by the eigenvector associated with the second smallest eigenvalue, etc. Selecting Profiles for Target Partworth Values We select the values of the u r ij s for the next question (i = q+) based on the longest axes. Each axis rovides two target values. For odd J we randomly select from target values derived from the [(J+)/] th eigenvector. To find the extreme estimates of the arameters, u r ij, we r r r r r r r r r r r r solve for the oints where u i = u + α g, u i = u α g, u i 3 = u + α 3g3, and u i4 = u α 4 g 4 intersect P CC (the generalization to J 4 is straightforward). For each α we do this by increasing α until the first constraint in P CC is violated. To find the rofiles in the choice set we select, as researcher determined arameters, feature costs, c r, and a budget, M. Without such constraints, the best rofile is trivially the rofile with all features set to their high levels. Subject to this budget constraint, we solve the following knasack roblem with dynamic rogramming. (OPT) max z r ij ur ij subject to: r r z c M ij, elements of z r ij {0,} For multi-level features we imose constraints on OPT that only one level of each feature is chosen. In the algorithms we have imlemented to date, we set c = u and draw M from a r r uniform distribution on [0, 50], redrawing M (u to thirty times) until all four rofiles are distinct. If distinct rofiles cannot be identified, then it is likely that P CC has shrunk sufficiently for the managerial roblem. For null rofiles, extend the constraints accordingly, as described in the text. 39

140 Chater 4: Non-Deterministic Polyhedral Methods for dative Choice-ased Conjoint nalysis bstract Polyhedral methods have been introduced recently both for metric aired-comarison conjoint analysis (Toubia, Simester, Hauser and Dahan 003) and for choice-based conjoint analysis (Toubia, Hauser, Simester 004). lthough these methods aear romising in simulation tests as well as in field exeriments, they systematically underestimate resonse error. In articular, olyhedral questionnaires are tyically designed as if resonses contained no error. In this aer I generalize the olyhedral method for choice-based conjoint analysis introduced by Toubia et al. (004), and allow resonse error to be taken into account in the design as well as the estimation of the conjoint questionnaire. More recisely, I consider a hyothetical rocess in which each conjoint question would be randomly discarded with a ositive robability. With this rocess, the osterior distribution on the artworths is a mixture of uniform distributions suorted by olyhedra. I generalize the design criteria of choice balance and ostchoice symmetry to mixtures of olyhedra. The deterministic olyhedral method of Toubia et al. (004) becomes a secial case in which the mixture contains only one distribution. I use simulations to test the validity of this non-deterministic aroach. The simulations suggest that when resonse error is high, the non-deterministic aroach imroves both olyhedral question selection and olyhedral estimation. 40

141 . Introduction Polyhedral methods have been introduced recently both for metric aired-comarison conjoint analysis (Toubia, Simester, Hauser and Dahan 003) and for choice-based conjoint analysis (Toubia, Hauser, Simester 004). These methods aear romising in simulation tests as well as in field exeriments. They are based on a new aroach to conjoint analysis, relying on the identification of feasible sets of arameters consistent with the resondent s answers. However one major limitation of the olyhedral methods introduced so far is their treatment of resonse error. In articular, resonse error is systematically underestimated, and olyhedral questionnaires are tyically designed as if resonses contained no error. Consequently, simulation exeriments suggest that olyhedral methods are most useful when resonse error is low, while other methods aear better when resonse error is high. In this aer I offer another interretation of the olyhedral method for choice-based conjoint analysis introduced by Toubia et al. (004). This new interretation allows generalizing the method to take resonse error into account. The deterministic method of Toubia et al. (004) becomes a secial case of the broader set of methods introduced in this aer. The aer is structured as follows. In Section, I briefly review the deterministic olyhedral method of Toubia et al. (004). In Section 3, I extend this method to take resonse error into account. I test the erformance of the new method in Section 4, using simulations. Section 5 concludes and suggests avenues for future research.. asic olyhedral methods for Choice-ased Conjoint nalysis In this section I briefly review the method roosed by Toubia, Hauser and Simester (004). Choice-based Questions as sets of constraints Toubia et al. (004) roose viewing the answers to a choice-based conjoint questionnaire as a set of inequality constraints. In articular, if the vectors x j and x k reresent the j th and k th alternatives in a choice question, and if the vector U reresents the resondent s artworths, then the resondent s utility for alternatives j and k are resectively x j.u + ε j and x k.u + ε k where ε j and ε k reresent resonse error. The resondent will choose alternative j over alternative k only if (x j - x k ).U + (ε j -ε k ) 0. Each choice between J alternatives can be reresented by a set of (J-) 4

142 such inequality constraints on U and ε. The overall information available on U and ε is reresented by the following set of constraints: X.U + ε 0, U 0, e.u = 00 where X reresents all the binary comarisons imlied by the conjoint questionnaire, U 0 results from setting to 0 the artworth of the lowest level of each attribute, and e is a vector of s (e.u = 00 is a scaling constraint that is imosed without loss of generality). Polyhedra as feasible regions Consider the following otimization roblem: Minimize ε subject to: X.U + ε 0, U 0, e.u = 00 This otimization roblem reflects a tyical minimum distance estimation rocedure. However, this roblem usually does not have a unique solution. (Non-uniqueness of the solution is more likely if the number of choice questions is moderately low comared to the number of arameters.) In this case, the otimal ε is equal to 0 and the set of solutions in U is defined by Ω={U: X.U 0, U 0, e.u = 00}. The method of Toubia et al. (004) relies on the observation that Ω is a well-known, well studied mathematical object called a olyhedron. s noted earlier, this olyhedron reresents all U that are feasible when there is no resonse error (ε = 0). Question design Let Ω k be the olyhedron corresonding to the feasible region after question k. The next feasible olyhedron Ω k+ will be defined by the intersection of Ω k and the set of oints satisfying the constraints corresonding to the (k + ) st question. Let us assume that the (k + ) st question consists of a choice between J alternatives. Potential answers to the question divide Ω k into J collectively exhaustive and mutually exclusive subolyhedra Ω k+, Ω k+, Ω J k+, such that Ω k+ is equal to Ω j k+ if and only if the resondent chooses alternative j in question (k + ). Toubia et al. (004) adot two criteria when designing choice questions (they mention two additional criteria, which are imlied by choice balance): Choice balance: Ω k+, Ω k+, Ω J k+ should be of similar sizes. Post-choice symmetry: Ω k+, Ω k+, Ω J k+ should be as sherical as ossible. 4

143 The first criterion minimizes the exected size of the next feasible olyhedron Ω k+, while the second criterion makes uncertainty on U similar in all dimensions. These two criteria are oerationalized by the two following rinciles: Princile : resondent with a utility vector equal to the center of Ω k should be indifferent between all J alternatives. Princile : Ω k+, Ω k+, Ω k+ J should divide Ω k along its longest axes. Toubia et al. (004) imlement these rinciles by adoting the following rocedure, reeated for all questions k: a. Comute the analytic center of Ω k, C k. The analytic center is the oint that maximizes the geometric mean of the distance to the boundaries of the olyhedron. b. roximate Ω k by an ellise E k centered at C k using Sonnevend s (985a, b) theorems. c. Find the J/ longest axes of E k (if J is odd find the (J+)/ longest axes). d. Define U U J as the intersections between the longest axes of the ellise E k and the olyhedron Ω k. e. Solve the J following knasack roblems (j = J): Maximize x.u j subject to x.c k M where M is a randomly drawn constant. f. The solutions to these knasack roblems, x* x* J, are the alternatives resented in the next choice question. In the above rocedure, Princile is aroximately satisfied by using x.c k M as the constraint in the Knasack roblems in ste e (at otimality the constraints will be aroximately binding, resulting in x*.u ~ x*.u ~ x* J.U ~ M). Princile is aroximately satisfied by choosing U U J as the intersection between the olyhedron and the longest axes of its aroximating ellise. 43

144 Estimation Choice-based conjoint questionnaires designed by a olyhedral method can be estimated using any estimation rocedure. The question-selection rocedure rovides an alternative estimate after question k the analytic center of the feasible olyhedron, C k. (nalytic Center estimation can be alied to any set of choice questions, and is not restricted to olyhedral questionnaires.) Performance Toubia et al. (004) evaluate the erformance of both their question selection algorithm and of nalytic Center (C) estimation, using the same simulation design as rora and Huber (00). This x design varies resonse accuracy and resondent heterogeneity. Polyhedral question selection is comared to randomly generated questions, orthogonal designs, and aggregate customization designs (Huber and Zwerina 996, rora and Huber 00). nalytic center estimation is comared to hierarchical ayes. The results of the simulations can be summarized as follows (see Tables to 3 in Toubia et al. 004 for details): Polyhedral question selection achieves an imrovement over traditional methods when resonse error is low. However it does not erform better than traditional methods when resonse error is high. nalytic center estimation does not erform as well as hierarchical ayes when the oulation is homogeneous, achieves similar erformance when heterogeneity is high and resonse error is high, and achieves suerior erformance when heterogeneity is high and resonse error is low. The fact that the olyhedral methods (question selection as well as estimation) roosed by Toubia et al. (004) erform relatively better when resonse error is low might be due to their deterministic nature. In articular, recall that the definition of the olyhedron relies on the assumtion that there is no resonse error. In the next section I attemt to imrove those methods by develoing a more general framework that allows taking resonse error into account. 44

145 3. Taking resonse error into account Polyhedra and robability density functions Let Ω be a olyhedron. Let us define P Ω as the uniform robability density function suorted by Ω: P Ω (x) = 0 if x Ω P Ω (x) = P Ω (y) for all (x,y) Ω If we assume a flat (uniform) rior on U, if Ω is our feasible olyhedron, and if we assume that there is no resonse error, it is easy to show that P Ω is the osterior distribution of U. In articular, the osterior distribution is 0 for all oints outside of the olyhedron, and it is roortional to the rior distribution for all oints inside the olyhedron. Hence, the center of the feasible olyhedron can be interreted as the osterior exected value of U. More imortantly, minimizing the size of the olyhedron can be viewed as minimizing the osterior variance of U, and making the feasible olyhedron as sherical as ossible can be viewed as making the osterior variance of U similar in all dimensions. Introduction of resonse error One question Let Ω 0 be the initial olyhedron defined by the identifying constraints {U 0, e.u = 00}. For ease of exosition, let us first consider the secial case in which we resent the resondent with only one choice question between J alternatives. Without loss of generality, let us assume that the resondent chooses alternative. Let Ω be the intersection of Ω 0 with the set of oints satisfying the constraints corresonding to the selection of the alternative. Ω would be the new olyhedron under the deterministic aroach of Toubia et al. (004). Suose now that instead of systematically adding the constraints corresonding to the resondent s choice, we were to take them into account only with robability α, and to simly ignore the question with robability - α. In this case, our next feasible olyhedron would be Ω with robability α, and it would remain Ω 0 with robability - α. Our osterior distribution on U would then be a mixture of two distributions, αp Ω + (- α)p Ω0. I later exlore how resonse error can be catured by aroriately choosing the arameter α. Ignoring a question is different from assuming that the resondent made an error on this 45

146 question. Let us denote by α the robability that the resondent s choice actually matches his or her truly referred alternative. α will be chosen as a function of α such that ignoring the question with robability (- α) will be aroximately equivalent (i.e., lead to the same osterior distribution) to assuming that the resondent made an error with robability (- α ). Generalization to multile questions Let us now generalize the rocedure to longer questionnaires. Let us still consider a rocess that ignores each question with robability (- α). fter k questions, the distribution of U would be: ws PΩs s f(u )= w () s s where the summation is over all subsets s of { k}. In articular, a set of constraints can be associated to each subset s of questions, resulting in a olyhedron Ω s and a corresonding distribution P Ωs. The weight w s = α s (- α) k- s is the robability that s would be the subset of questions taken into account in this hyothetical rocess ( s is the number of elements in s). We set w s = 0 if the olyhedron Ω s is emty. 33 The deterministic aroach is a secial case of Equation, in which α =. More comlex riors We have assumed so far that the rior was uniform and characterized by the distribution P Ω0 corresonding to the olyhedron Ω 0 = {U: U 0, e.u 00}. However, any mixture of olyhedra could be used to characterize the initial rior. For examle, a normal rior on U could be aroximated by a mixture of uniform distributions with increasingly large suorts and small weights. Estimation n estimate of U is rovided by: 33 This is why we have to normalize the weights in Equation and divide them by. If w s were not set to 0 when the olyhedron is emty, then Equation would simly be f(u )= w P s. 46 s Ωs s w s

147 ws C s s U = where C s is the analytic center of Ω s. w s s Question design The two basic rinciles for question selection used by Toubia et al. (004) can be alied to the non-deterministic framework roosed here. In articular, a resondent with a utility vector equal to the osterior exected value of U after question k should be aroximately indifferent between the alternatives in the (k+) st choice set. In addition, the mixtures of olyhedra associated with each alternative in this choice set should divide the initial mixture its longest axes. This raises the issue of defining and comuting the longest axes of a mixture of olyhedra. Let v s be the longest axis of the olyhedron Ω s. Intuitively, the longest axis of the mixture s Longest axis of a mixture of olyhedra s w Ω w Ω s s s s along should be aligned with the longest axes of the most likely olyhedra in the mixture. More recisely, the longest axis should cature, or summarize, the directions of the most likely olyhedra in the mixture. I define the longest axis of the mixture of olyhedra as the vector v* that maximizes w.( v s '. v. (This exression is a norm on the set of inner roducts {vs.v} s.) s s ) Let us define V as the k x matrix ( being the dimension of U) obtained by stacking the transosed longest axes of each olyhedron in the mixture (each row of V is a transosed longest axis). Let us define Π as the k x k diagonal matrix with elements corresonding to the robabilities associated with each subset (i.e., Π ss =w s ). We have: vector v* maximizing s w s.( v s '. v ) = v *V *Π*V*v. The is then simly the eigenvector associated with the largest eigenvalue of V *Π*V. (This matrix being symmetric, ositive semi-definite, its eigenvalues are all real and non-negative.) I simly define the second longest axis as the eigenvector associated with the second largest eigenvalue, and so on and so forth. y construction, the longest axes are s w s.( v s '. v ) 47

148 orthogonal to one another, as they are in the case of a single ellise. (The rocedure just defined is analogous to factor analysis.) Detailed algorithm Question selection is summarized as follows: a. Comute U = s w s s C s w s b. roximate each olyhedron in the mixture by an ellise and comute the corresonding longest axis v s. c. Find the J/ longest axes of the mixture (if J is odd find the (J+)/ longest axes) d. Define U U J as the intersections between the longest axes of the mixture and the olyhedron Ω 0 = {U: U 0, e.u 00} e. Solve the following J knasack roblems (j = J): Maximize x.u j subject to x.u M where M is a randomly drawn constant. 34 f. The solutions to these roblems, x* x* J, are the alternatives resented in the next choice question. Practical Imlementation When k questions have been asked to the resondent, the distribution of U is a mixture of k olyhedra. s k gets large, the comutation of the analytic centers and longest axes of each olyhedron in the mixture takes too long to solve between questions in a web-based setting. However asects of the structure of the roblem can be exloited to reduce the amount of comutations required between questions. First, it is unnecessary to comute the analytic centers and longest axes of all k olyhedra in the mixture after question k: if the analytic centers and longest axes of the olyhedra involved after question (k-) have been saved, the comutation of only k- additional olyhedra 34 In the simulations in this aer, M was drawn u to 30 times, until all solutions to the Knasack roblems were different. If all solutions to the Knasack roblems were similar after 30 draws, the questionnaire stoed. If there were only K < J different solutions after 30 draws, these K otions were the alternatives resented to the resondent. 48

149 is required. Indeed, the set of k olyhedra can be divided into those in which question k is ignored and those in which it is taken into account. The analytic center and longest axis of any olyhedron in which question k is ignored have already been comuted before question k was asked, and can hence be reused. Second, the comutation of the analytic center of a olyhedron involves two stes, one of which can almost always be byassed in our case. The first ste consists in finding an interior oint of the olyhedron by solving a Linear Program; the second ste consists in alying Newton s method using the interior oint as a starting oint. fter question k, the olyhedron Ω { k} obtained by taking all k questions into account is a subset of all the other olyhedra in the mixture. Hence the analytic center of Ω { k} is an interior oint of all these olyhedra. This can be exloited by first comuting the analytic center of Ω { k} and using it as a starting oint in Newton s method for the other olyhedra. This aears to reduce by a factor of aroximately two the time required for the comutation of the analytic centers. Desite these simlifying techniques, the comutations required between questions quickly becomes too time consuming. To address this issue, I aroximate the distribution of U by a mixture of a subset of the k olyhedra. In articular, I sort the robability weights w s in decreasing order and comute the analytic centers and longest axes of the olyhedra starting with the most likely ones, until a reset comuting time limit is reached. In the following simulations, this time limit was second. More recisely, if S limit if the subset of olyhedra used to aroximate the mixture, then Û in ste a above is aroximated with w s s Slimit C w s s Slimit s Slimit w.( v is aroximated by the vector maximizing s '.v ), instead of ws.( v s '. v ). Comutation of α s, and the longest axis of the mixture in ste c This aer introduces resonse error by considering a rocess in which each question would be ignored with robability (-α). This rocess was adoted because it is congruent with olyhedral methods. more direct introduction of resonse error would result in a mixture of convex and non-convex sets for which no convenient aroximation is known. s s 49

150 However, our choice of α should make this rocess equivalent to one in which there exists resonse error and we adjust our osterior distribution accordingly. For examle, we should be more likely to ignore a question when resonse error is larger (i.e., α should be smaller). The link between α and resonse error should also reflect the fact that ignoring a question is different from recognizing that the resondent made an error on this question. n error from the resondent does tell us that the selected alternative is not the one with the highest true Utility, which in itself is informative. We now derive the aroriate α. Let α k be the robability that the resondent selected his or her truly referred alternative in question k. For simlicity, let us consider a articular noise structure in which the resondent chooses his or her truly referred alternative with ' - α robability α k and chooses each other J- alternatives with equal robability k. Note that in J - the following simulations, choices were made according to the logistic robabilities, making this assumtion an aroximation. 35 Let (U) be the mixture of uniform distributions corresonding to our rior distribution on U before question k. lying ayes rule, the osterior distribution of any vector U is roortional to α k (U) if U is consistent with the answer to question k and it ' (- αk ). ( U ) is roortional to if U is inconsistent with that answer. oth osterior densities are J - roortional to (U) and the ratio is: ' (J - ).αk () ' - α k Let us now consider the rocess introduced in this aer in which we ignore question k with robability - α k. The osterior distribution of U is then the mixture α k (U) + (- α k )(U) where (U) is the osterior distribution obtained if the rior is (U) and if the constraints corresonding to question k are taken into account. In articular, (U) = 0 if U is outside of the corresonding feasible region Ω, and is equal to (U) / P(Ω) where P(Ω)= Ω ( U). du if U is in Ω. The α k.( ) osterior of a oint U consistent with the answer to question k is then +(- αk ).( U ). P( ΩU ) 35 In real situations both this assumtion and the assumtion of logistic robabilities are aroximations, and which one better describes actual consumer choices is an emirical question. 50

151 The osterior of a oint U inconsistent with that answer is (- α k )(U). oth osterior densities are again roortional to (U), and the ratio is now equal to: (3) α k P( Ω )+(- α - α k k ) Let us consider the average case in which P(Ω) = /J, which is the target robability dictated by the choice balance criterion (Princile ). With P(Ω) = /J the ratios () and (3) are equal if: (4) J.αk +(- αk - α k ) (J - ).α = ' - α k ' k α k ' J.αk - = J - If α k satisfies Equation 4 and if choice balance is satisfied, then ignoring question k with robability (-α k ) would yield the same osterior distribution as realizing that the resondent made an error on question k with robability α k. 36 In the case where P(Ω) is different from /J, i.e., choice balance is not achieved, the two rocesses will be only aroximately equivalent. For simlicity, in the simulations reorted in this aer, I chose the same α for all resondents and all questions. In articular, a arameter α was estimated based on a retest, and α was comuted as a function of α according to Equation 4. α was comuted using the following rocedure: a. n estimate of the oulation mean of U, U, was obtained based on a retest (00 resondents assigned to an orthogonal design in the simulations). b. set of R random questions (R = 00 in the simulations) was generated and the logistic robabilities based on U ' ini o o were comuted for each question. (robabilities r J r corresonded to the J alternatives of the r th question). The robability that the resondent chose his or her truly referred alternative on question r was α r = max{ r r J }. n R r r= initial estimate of α, α, was obtained as ( α' )/R. 36 Equation 4 assumes that α k > /J, i.e., the choice question is informative. 5

152 c. N resondents (N =00 in the simulations) were simulated, all with U =U, using α = ' J.αini - in Equation. For each resondent, the logistic robabilities were saved. α J - was estimated as in ste b. Note that ste b was necessary because an initial value of α was required in order to design the questions in ste c. In theory, ste c should have been reeated until convergence, i.e., until the value of α used to design the questionnaires was the same as the one obtained from the choice robabilities. However, the value of α obtained from the choice robabilities did not aear to be very sensitive to the one used to design the questionnaires, and only one iteration was erformed in the simulations. Thus, the simulations will understate the erformance of an algorithm that is reeated until convergence. Note also that the retest information required by this rocedure is the same as that required by traditional aggregate customization (Huber and Zwerina 996, rora and Huber 00). This facilitated the comarison between the two methods. 4. Simulations o Design The design of the simulations reorted in this section was similar to that adoted by rora and Huber (00) and Toubia et al. (004). s in Toubia et al. (004), the number of features was 4, the number of levels er feature was 4, and the number of alternatives er choice set, J, was 4. This allowed designing orthogonal designs using the rocedure described by rora and Huber (00). s in rior studies, the artworths of the four levels came from a normal distribution with mean β, - β/3, β/3, and β resectively, and with standard deviation σ β (the covariance between artworths was 0). The magnitude of the artworths (influencing the level of resonse error) was controlled by the arameter β, with high magnitude reresented by β = 3, and low magnitude by β =0.5. The heterogeneity of the oulation was controlled by the ratio σ β /β, which was set to 3 in the case of high heterogeneity, and 0.5 in the case of low heterogeneity. For each combination of magnitude and heterogeneity, I simulated 0 sets of 00 resondents. In addition to maniulating magnitude (two levels) and heterogeneity (two levels), I also varied question design (5 different question selection methods) and estimation (3 different esti- 5

153 mation methods). More recisely, for each simulated resondent, I considered five sets of questions:. random design. n orthogonal design 3. design obtained by aggregate customization, using swaing and relabeling (rora and Huber 00). 4. questionnaire designed by the deterministic olyhedral method of Toubia et al. (004). 5. questionnaire designed by the non-deterministic olyhedral method roosed in this aer. For each resondent, each set of questions was estimated by. Hierarchical ayes (using the 00 resondents in the corresonding set).. The deterministic C estimation method of Toubia et al. (004). 3. The non-deterministic C estimation method roosed in this aer. Note that both aggregate customization and the non-deterministic olyhedral method described in this aer require a rior estimate of the oulation mean. For each of the 0 sets of resondents, this was rovided by the oulation estimate obtained when running hierarchical ayes on the orthogonal designs. Results I comare the methods on their ability to recover resondents true utility vectors. Table reorts the root mean squared error (RMSE) achieved by each combination of question design and estimation method. Tables and 3 summarize the best question-design method and the best estimation method, resectively. For comarability between estimation methods, I normalized the artworths to a constant scale. Secifically, for each resondent, I normalized both the true artworths and the estimated artworths so that their absolute values summed to the number of arameters and their values summed to zero for each feature (Toubia et al. 004). This scaling addresses two issues. First, olyhedral estimation is unique to a ositive linear transformation and thus focuses on the relative values of the artworths. Second, unscaled logit analyses counfound the magnitude of the stochasticity of observed choice behavior with the magnitude of the artworths. 53

154 Question-design methods Not surrisingly, taking resonse error into account is more useful to olyhedral methods when resonse error is higher. In articular, when resonse error is high (low magnitude) and heterogeneity is high, non-deterministic olyhedral questions couled with non-deterministic C estimation outerform deterministic olyhedral questions. However, when resonse error is large and heterogeneity is low, although non-deterministic C estimation outerforms deterministic C estimation, deterministic and non-deterministic olyhedral questions couled with nondeterministic C estimation achieve similar erformance, and do not erform as well as orthogonal designs couled with non-deterministic C estimation. When resonse error is low (high magnitude), the erformance of non-deterministic olyhedral questions is usually slightly lower than that of deterministic olyhedral questions. However, non-deterministic olyhedral questions still outerform the other question-design methods, with any of the three estimation methods. I hyothesize that the difference in erformance between deterministic and nondeterministic olyhedral questions in the low resonse error cases is due to the fact only a subset of the olyhedra in the mixture was considered (see ractical imlementation ). Future research might exlore the sensitivity of erformance to this aroximation, and allow isolating the benefits of the concetual framework roosed in this aer from the limitations introduced by imlementation choices. Certainly with imroved comuter seed and memory (Moore s Law) the need for this aroximations will disaear. Estimation methods Toubia et al. (004) note that deterministic C estimation erforms better (relative to hierarchical ayes) when heterogeneity is high. Their simulations also reveal that it erforms better when resonse error is low (magnitude is high). Table indicates that, when resonse error is high, deterministic C estimation can be significantly imroved by taking error into account. In articular, non-deterministic C estimation outerforms both deterministic C estimation and H estimation in the low magnitude / high heterogeneity and in the low magnitude / low heterogeneity conditions. When resonse error is low (high magnitude), non-deterministic C estimation does not always erform as well as deterministic C estimation. s with question-design methods, the difference in erformance between deterministic and non-deterministic C estimation when re- 54

155 sonse error is low is ossibly due to the aroximations in the imlementation of the nondeterministic method. Table Simulation Results RMSE Magnitude Heterogeneity Question Design H Deterministic C Non-deterministic C Low High Random Orthogonal Customized 0.696* Polyhedral Deterministic Polyhedral Non deterministic * * Low Low Random Orthogonal 0.775* 0.898* 0.708* Customized Polyhedral Deterministic Polyhedral Non deterministic High High Random Orthogonal Customized Polyhedral Deterministic Polyhedral Non deterministic 0.478* 0.393* 0.459* * High Low Random Orthogonal Customized Polyhedral Deterministic Polyhedral Non deterministic 0.38* 0.47* * * * est or not significantly different from best at < 0.05.The comarisons are done across question designs for each estimation method and for each combination of Magnitude / Heterogeneity 55

156 Table Comarison Summary for Question Design Magnitude Heterogeneity Lowest RMSE Low High Non-deterministic Polyhedral Low Low Orthogonal High High Deterministic Polyhedral High Low Deterministic Polyhedral / Non-deterministic Polyhedral Table 3 Comarison Summary for Estimation Magnitude Heterogeneity Lowest RMSE Low High Non-deterministic C Low Low Non-deterministic C High High Deterministic C High Low H Summary of the simulation results The simulations suggest that: Taking resonse error into account imroves the erformance of olyhedral question selection when resonse error is high and heterogeneity is high. However the imrovement requires couling non-deterministic olyhedral question design with nondeterministic C estimation. Taking resonse error into account imroves the erformance of olyhedral estimation when resonse error is high. In this case, non-deterministic C estimation also outerforms hierarchical ayes estimation. 56

157 5. Conclusions and Future Research In their conclusion, Toubia et al. (004) note: many challenges remain. For examle, we might allow fuzzy constraints for the olyhedra. In this aer I attemt to address this challenge by generalizing their aroach and allowing resonse error to be taken into account in the design as well as the estimation of olyhedral questionnaires. One difficulty comes from the necessity to deal only with convex olyhedra, which would not be satisfied if resonse error were introduced directly. Instead, I consider a hyothetical rocess in which each conjoint question would be randomly discarded with a ositive robability. With this rocess, the osterior distribution on the artworths is a mixture of uniform distributions suorted by olyhedra. The deterministic olyhedral method becomes a secial case in which the mixture contains only one distribution. I generalize the criteria of choice balance and ost-choice symmetry used by Toubia et al. (004) to mixtures of olyhedra. The robability of discarding a question is chosen such that the hyothetical rocess under consideration yields a osterior distribution that is aroximately equal to the one obtained if resonse error were directly taken into account. I use simulations to test the validity of this non-deterministic aroach. The simulations suggest that when resonse error is high, the non-deterministic aroach imroves both olyhedral question selection and olyhedral estimation. There are many exciting areas for future research. I suggest two comelling directions. First, in the current imlementation, the rior distribution of the artworths (before the beginning of the questionnaire) is uniform. It would be interesting to exlore more sohisticated riors, in articular mixtures of uniform distributions that aroximate more comlex distributions. Second, although the simulations are romising, it is an imortant future direction to validate the method using a field exeriment. This work has begun. 57

158 References rora, Neeraj and Joel Huber (00), Imroving Parameter Estimates and Model Prediction by ggregate Customization in Choice Exeriments, Journal of Consumer Research, 8, (Setember), Huber, Joel and Klaus Zwerina (996), The Imortance of Utility alance in Efficient Choice Designs, Journal of Marketing Research, 33, (ugust), Sonnevend, G. (985a), n nalytic Center for Polyhedrons and New Classes of Global lgorithms for Linear (Smooth, Convex) Programming, in Control and Information Sciences, Vol. 84. erlin: Sringer Verlag, (985b), New Method for Solving a Set of Linear (Convex) Inequalities and its lications for Identification and Otimization, rerint, Deartment of Numerical nalysis, Institute of Mathematics, Eőtvős University, udaest. Toubia, Olivier, John R. Hauser, and Duncan I. Simester (004), Polyhedral Methods for dative Choice-based Conjoint nalysis, Journal of Marketing Research, 4, (February), 6-3., Duncan Simester, John R. Hauser, and Ely Dahan (003), Fast Polyhedral dative Conjoint Estimation, Marketing Science, Vol., No. 3 (Summer),

159 Chater 5: Proerties of Preference Questions: Utility alance, Choice alance, Configurators, and M-Efficiency bstract This aer uses stylized models to exlore efficient reference questions. The formal analyses rovide insights on criteria that are common in widely-used question-design algorithms. We demonstrate that the criterion of metric utility balance leads to inefficient questions, absolute biases, and relative biases among artworths. On the other hand, choice balance for statedchoice questions (the analogy of utility balance for metric questions) can lead to efficiency under some conditions, esecially for discrete features. When at least one focal feature is continuous, choices are imbalanced at otimal efficiency. This choice imbalance declines as resonses become more accurate and non-focal features increase. The analysis of utility- and choice-balance rovide insights on the relative success of metric- and choice-based adative olyhedral methods and suggest future imrovements. We then use the formal models to exlore a new form of data collection that has become oular in e-commerce and web-based market research configurators. We study the stylized fact that questions normally used as warm-u questions in conjoint analysis are surrising accurate in redicting consumer behavior that is relevant to e-commerce alications. For these alications, new data suggest that warm-u questions are, erhas, more useful managerially than aired-comarison tradeoff questions. We examine and generalize this insight to demonstrate the advantages of matching tradeoff, configurator, and range questions to the relevant managerial decisions. This further generalizes to M-efficiency (managerial efficiency), an alternative criteria for question design algorithms. Designs which minimize D-errors and -errors may not minimize M D - and M -errors which measure recision relevant to managerial decisions. We rovide examles in which simle modifications to question design increase M-efficiency. 59

160 . Motivation Preference measurement is crucial to managerial decisions in marketing. Methods such as conjoint analysis, voice-of-the-customer methods, and, more recently, configurators, are used widely for roduct develoment, advertising, and strategic marketing. In resonse to this managerial need, marketing scientists have develoed many sohisticated and successful algorithms to select questions efficiently. For examle, Figure illustrates two oular formats that have been studied widely. Figure Poular Formats for Conjoint nalysis (a) Metric aired-comarison format (lato bags) (b) Stated-choice format (Executive Education) For the metric-airs format (Figure a) questions can be selected with efficient designs, with dative Conjoint nalysis (C), or with olyhedral methods (Kuhfield, Tobias, and Garratt 994; Sawtooth 00; Toubia, Simester, Hauser, Dahan 003). For the stated-choice format (Figure b), recent advances adat questions based on retests, managerial beliefs, or rior questions and do so either by selecting an aggregate design that is relicated for all subsequent resondents or by adating question design for each resondent (rora and Huber 00; Huber and Zwerina 996; Johnson, Huber and acon 003; Kanninen 00; Sandor and Wedel 00, 00, Toubia, Hauser, and Simester 003). These question-design methods are used with a variety of estimation methods (regression, Hierarchical ayes estimation, mixed logit models, 60

161 analytic center aroximations, and hybrid methods) and have roven to rovide more efficient and more accurate data with which to estimate artworths. Figure Examle Configurators (a) Configurator from Dell.com (b) Innovation Toolkit for Industrial Flavorings (c) Design Palette for Picku Trucks (d) User Design for Lato Comuter ags Recently, as reference measurement has moved to web-based questionnaires, e- commerce firms and academic researchers have exerimented with a new format in which resondents (or e-commerce customers) select features to configure a referred roduct. The bestknown e-commerce examle is Dell.com (Figure a); however, the format is becoming oular on many websites, esecially automotive websites (e.g., kbb.com s auto choice advisor, vehix.com, autoweb.com). For marketing research, configurators are known variously as innovation toolkits, design alettes, and user design formats. They have been instrumental in the design of flavorings for International Flavor and Fragrance, Inc (Figure b), new truck latforms for 6

162 General Motors (Figure c), and new messenger bags for Timbuk (Figure d). (Figures from Dahan and Hauser 00; Thomke and von Hiel 00, Urban and Hauser 003; von Hiel 00). Resondents find configurators easy to use and a natural means to exress references. However, because configurators focus on the resondents referred roduct, they have only been used to estimate artworth estimates when resondents are asked to reeat the tasks with varying stimuli (e.g., Liechty, Ramaswamy and Cohen 00). In this aer, we examine question design for all three forms of reference measurement: metric airs, stated choices, and configurators. We use formal modeling methods to examine key roerties that have been used by the algorithms roosed to date. With these models we seek to understand basic roerties in order to hel exlain why some algorithms erform well and others do not. These models also rovide insight on the role of configurators in reference measurement and indicate the strengths and weaknesses of this new form of data collection relative to more-standard methods. The models themselves are drawn from emirical exerience and reresent, in a stylized manner, the basic roerties of question design. The formal models also rovide insight on the relationshi of question design to managerial use suggesting, for examle, that some measurement formats are best for develoing a single roduct, some for roduct-line decisions, and some for e-commerce decisions. These insights lead to revised managerial criteria (M -error and M D -error). The aer is structured as follows: We examine first the concet of utility balance as alied to the metric airedcomarison format and demonstrate that this criterion leads to both bias and inefficiency for that format. This analysis also lends insight on recent comarisons between C and olyhedral methods (Toubia, et. al. 003). We then examine a related concet for the stated-choice format. The literature often labels this concet utility balance, but, for the urose of this aer, we label it choice balance to indicate its relationshi to stated-choice data. Our analyses indicate that, unlike utility-balance for metric data, choice balance for stated-choice data can lead to imroved efficiency under some conditions. In other cases, otimal efficiency requires unbalanced choices. Our analyses both reflect insights from recent algorithmic aers and rovide new insights on how choice balance changes as a function of the characteristics of the roblem. 6

163 We then examine data that suggests that configurator questions rovide more useful information for e-commerce decisions than tradeoff questions (Figure ). We exlore this observation with a stylized model to suggest how question formats are best matched to managerial decisions. These insights generalize to M-efficiency. We define our scoe, in art, by what we do not address. lthough we draw exerience from and seek to rovide insight for new algorithms, algorithmic design is not the focus of this aer. Nor do we wish to enter the debate on the relative merits of metric-rating or stated-choice data. For now we leave that debate to other authors such as Elrod, Louviere and Davey (99) or Moore (003). We study both data formats (and configurators) and use formal models to gain insights on all three forms of reference questions. We do not address hierarchical models of resondent heterogeneity, but rather focus on each resondent individually. We recognize the ractical imortance of hierarchical models, but abstract from such estimation to focus on other basic roerties. The recent emergence of individual-secific question designs and/or differentiated designs suggest that our insights should extend to such analyses (Johnson, Huber, and acon 003; Sandor and Wedel 003; Toubia, Hauser, and Simester 003). ut that remains future research.. Efficiency in Question Design efore we begin our analyses, we review briefly the concets of efficient question design. The concet of question efficiency has evolved to reresent the accuracy with which artworths are estimated. Efficiency focuses on the standard errors of the estimates. Let r be the vector of artworths to be estimated and r the vector of estimated artworths. Then, if r is (aroximately) normally distributed with variance Σ, the confidence region for the estimates is an r r ellisoid defined by ( r r )' Σ ( ), Greene (993,. 90). 37 For most estimation methods, Σ deends uon the questions that the resondent is asked and, hence, an efficient set of questions minimizes the confidence ellisoid. This is imlemented as minimizing a norm of the matrix, Σ. -errors are based on the trace of Σ, D-errors on the determinant of Σ, and G-errors on the maximum diagonal element. ecause the determinant is often the easiest to otimize, most 37 In ractice, Σ is unknown and is relaced with its estimated value. For simlicity we assume that Σ is known, which should change neither the insights nor the results of our analyses. 63

164 emirical algorithms work with D-efficiency (Kuhfeld, Tobias, and Garratt 994,. 547). G- efficiency is rarely used. ecause the determinant of a matrix is the roduct of its eigenvalues and the trace is the sum of its eigenvalues, -errors and D-errors are often highly correlated (Kuhfeld, Tobias, and Garratt 994,. 547, rora and Huber 00,. 74). In this aer we use both, choosing the criterion which leads to the most transarent insight. In some cases the results are easier to exlain if we work directly with Σ recognizing det(σ) = [det( Σ )]. 3. Utility alance for the Metric Paired-Comarison Format The metric airs format (Figure a) is widely used for commercial conjoint analysis (Green, Krieger, Wind 00,. S66; Ter Hofstede, Kim and Wedel 00,. 59). lthough the format has been used academically and commercially for over 5 years, its most common alication is in the adative comonent of C (Green, Krieger and garwal 99, Hauser and Shugan 980, Johnson 99). 39 Metric utility balance is at the heart of C. With the goal that a difficult choice rovides better information for further refining utility estimates, C resents to the resondent airs of concets that are as nearly equal as ossible in estimated utility (Orme 999,. ; Sawtooth Software, 00,., resectively). This criterion has aeal. For examle, in the study of resonse latencies, Haaijer, Kamakura, and Wedel (000,. 380) suggest that resondents take more time on choice sets that are balanced in utility and, therefore, make less errorrone choices. Green, Krieger and garwal (99,. 6) quoting a study by Huber and Hansen (986) ascribe better redictive ability to greater resondent interest in these difficult-to-judge airs. Finally, Shugan (980) rovides a theory to suggest that the cost of thinking is inversely roortional to the square of the utility difference. There are clear advantages in terms of resondent interest, attention, and accuracy for metric utility-balanced questions The trace of a matrix does not necessarily equal the inverse of the trace of the inverse matrix, thus an emirical algorithm that maximizes the trace of Σ - does not necessarily minimize the trace of Σ. However, the eigenvalues are related, esecially when the eigenvalues of Σ are similar in magnitude, as they are in emirical situations. 39 For examle, Dr. Lynd acon, M s Vice President for Market Research and Executive Vice President of NFO Worldgrou confirms that metric airs remain one of the most oular forms of conjoint analysis. Sawtooth Software claims that C is the most oular form of conjoint analysis in the world, likely has the largest installed base, and, in 00, accounted for 37% of their sales. Private communications and Marketing News, 04/0/0,

165 However, recent simulation and emirical evidence suggests that alternative criteria rovide question designs that lead to more accurate artworth estimates (Toubia, et. al. 003). Furthermore, adatation imlies endogeneity in the design matrix, which, in general, leads to bias. Thus, we exlore whether or not the criterion of metric utility balance has adverse characteristics, such as systematic biases, that offset the advantages in resondent attention. We begin with a stylized model. Stylized Model For simlicity we consider roducts with two features, each of which is secified at two levels. We assume further that the features satisfy referential indeendence such that we can write the utility of a rofile as an additive sum of the artworths for the features of the rofile (Keeney and Raiffa 976,. 0-05; Krantz, et. al. 97). Without loss of generality, we scale the low level of each feature to zero. Let be the artworth of the high level of feature and let be the artworth of the high level of feature. For these assumtions there are four ossible rofiles which we write as {0, 0}, {0, }, {, 0}, and {, } to indicate the levels of features and, resectively. With this notation we have the true utilities given by: u(0, 0) = 0 u(, 0) = u(0, ) = u(, ) = + For metric questions in which the resondent rovides an interval-scaled resonse, we model resonse error as an additive, zero-mean random variable, e. For examle, if the resondent is asked to comare {, 0} to {0, }, the answer is given by a = + e. We denote the robability distribution of e with f (e). We denote the estimates of the artworths, estimated from resonses to the questions, by and. dative (Metric) Utility alance Without loss of generality, consider the case where > > 0 such that the off- > diagonal question ({0,} vs. {,0}) is the most utility-balanced question. We assume that this ordinal relationshi is based on rior questions. We label the error associated with the most utility-balanced question as eub and label errors associated with subsequent questions as either e or e. We now consider an algorithm which adats questions based on utility balance. dative metric utility balance imlies the following sequence. 65

166 First question: = + e ub Second question: = + e if e ub < (Case ) = + e if (Case ) e ub Suose that e ub, then: E [ = ] = + E[ e ] E[ E E e e E e e ] = [ ] + + [ ub ub ] = + [ ub ub ] > Suose that e ub <, then: E [ = ] = + E[ e ] E[ E E e e E e e ] = [ ] + [ ub ub < ] = [ ub ub < ] > Thus, under both Case and Case, one of the artworths is biased uwards for any zero-mean distribution that has non-zero measure below -. The intuition underlying this roof is that the second question in the adative algorithm deends directly on the random error a direct version of endogeneity bias (Judge, et. al. 985,. 57). We demonstrate how to generalize this result with a seudo-regression format of r r = ( X ' X ) ( X ' a) = r r + ( X ' X ) X ' e, where a r and e r are the answer and error vectors and X is the question matrix. Then the second row of X deends on the magnitude of e ub relative to -. Secifically, the first row of X is [,-] and its second row can be written as [b, -b] where b = if errors are small and b = 0 if errors are large. The bias, r, is then given by the following equation where k = or. ( ) ( ) b e + ub ek b eub () E[ r ] = E[( X ' X ) X ' e] = E( ) = E( ) beub + ek beub Equation illustrates the henomenon clearly: b = if e ub is small (more negative) and b = 0 if e ub is large (more negative), hence either or is biased uward. These arguments generalize readily to more than two features and more than two questions. The basic insight is that the errors from earlier questions determine the later questions (X) in a systematic way the essence of endogeneity bias. 66

167 Relative ias ias alone does not necessarily imact managerial decisions because artworths are unique only to a ositive linear transformation. Thus, we need to establish that the bias is also relative, i.e., the bias deends on the magnitude of the true artworth. ecause the bias is roortional to e ub we need to show that E[ eub eub ] < 0 to demonstrate that bias is greater for smaller (true) artworths than it is for larger (true) artworths. We do so by direct differentiation in the endix. Thus, we can state the following roosition. Proosition. For the stylized model, on average, metric utility balance biases artworths uward and does so differentially deending uon the true values of the artworths. When we exand the adative questions to a more comlex world, in which there are more than two features, the intuition still holds; however, the magnitude of the bias deends uon the secific manner by which metric utility balance is achieved and on the accuracy of the resondent s answers. s an illustration, we built a 6-question simulator for a two-artworth roblem in which, after the first question, all subsequent questions were chosen to achieve metric utility balance based on artworths estimated from rior questions. ased on 6,000 regressions each, biases ranged from % for relatively large true artworths to 3% for relatively small artworths. 40 To examine the ractical significance we cite data from Toubia, et. al. (003) who simulate C question design for a 0-artworth roblem. In their data, biases were aroximately 6.6% of the magnitudes of the artworths at eight questions when averaged across the conditions in their exerimental design. The biases are significant at the 0.05 level. The Effect of Metric Utility alance on Efficiency lthough metric utility balance leads to bias, it might be the case that adatation leads to greater efficiency. For D-errors, minimizing the determinant of Σ is the same as maximizing the determinant of Σ. For metric data, Σ = X ' X (for aroriately centered X). From this ex- 40 The simulator is available from htt://mitsloan.mit.edu/vc. The simulator is designed to demonstrate the henomenon; we rely on Toubia, et. al. (003) for an estimate of the emirical magnitude. The simulator chooses x uniformly on [0,] and x based on utility balance. The absolute values of the resonse errors are aroximately half those of the deendent measure. The magnitude of the bias changes if the resonse errors change, but the direction of the bias remains. 67

168 ression we see two things immediately. First, as Kanninen (00,. 7) and Kuhfeld, et. al. (994,. 547) note, for a given set of levels, efficiency tends to ush features to their extreme values. (These statements assume that the maximum levels are selected to be as large as is reasonable for the emirical roblem. Levels which are too extreme risk additional resondent error not exlicitly acknowledged by the statistical model used in question design. See Delquie 003). Second, erfect metric utility balance imoses a linear constraint on the columns of X because 0 r X =. This constraint induces X X to be singular. When X X is singular, the r determinant will be zero and D-errors increase without bound. Imerfect metric utility balance leads to det(x X) 0 and extremely large D-errors. -errors behave similarly. Thus, greater metric utility balance tends to make questions inefficient. 4 Insights on the Relative Performance of Metric Polyhedral Methods The analysis of metric utility balance rovides insights on the relative erformance of metric olyhedral methods. Metric olyhedral methods focus on asking questions to reduce the feasible set of artworths as raidly as ossible by solving an eigenvalue roblem related to Sonnevend ellisoids. These ellisoids focus questions relative to the axis about which there is the most uncertainty. Polyhedral methods do not attemt to imose metric utility balance. In algorithms to date, olyhedral methods are used for (at most) the first n questions where n equals the number of artworths. However, metric olyhedral methods also imose the constraint that the rows of X be orthogonal (Toubia, et. al. 003, Equation 9). Thus, after n questions, X will be square, non-singular, and orthogonal ( XX ' I imlies X ' X I ). 4 Subject to scaling, this orthogonality relationshi minimizes D-error (and -error). Thus, while the adatation inherent in metric olyhedral methods leads to endogenous question design, the orthogonality constraint aears to overcome the tendency of endogeneity to increase errors. Toubia, et. al. (003) reort at most a % bias for metric olyhedral question selection, significantly less than observed for C question selection. 4 The simulator cited in the acknowledgements section of this aer also contains an examle in which metric utility balance can be imosed. The more closely utilities are balanced, the worse the fit. 4 Orthogonal rows assume that XX =αi where α is a roortionality constant. ecause X is non-singular, X and X are invertible, thus X XX X - = αx X - X X = αi. 68

169 Summary Metric utility balance may retain resondent interest and encourage resondents to think hard about tradeoffs, however, these benefits come at a rice. Metric utility balance leads to bias, relative bias, and lowered efficiency. On the other hand, the inherent orthogonality criterion (rather than utility balance) in metric olyhedral methods enables metric olyhedral question design to achieve the benefits of adatation with significantly less bias and, aarently, without reduced efficiency. 4. Choice alance for the Stated-Choice Format In the stated-choice format (Figure b), two rofiles that have the same utility will be equally likely to be chosen, thus choice balance is related to utility balance. However, because the underlying statistical models are fundamentally different, the imlications of choice balance for stated-choice questions are different from the imlications of metric utility for metric airedcomarison questions. Choice balance is an imortant criterion in many of the algorithms used to customize stated-choice questions. For examle, Huber and Zwerina (996) begin with orthogonal, levelbalanced, minimal overla designs and demonstrate that swaing and relabeling these designs for greater choice balance increases D -efficiency. rora and Huber (00,. 75) test these algorithms in a variety of domains with hierarchical ayes methods commenting that the maximum D -efficiency is achieved by sacrificing orthogonality for better choice balance. 43 Orme and Huber (00,. 8) further advocate choice balance. Toubia, Hauser, and Simester (003) use choice-balance to increase accuracy in adative choice-based questions. Finally, the concet of choice balance is germane to the rincile, used in the comuter adative testing literature, that a test is most effective when its items are neither too difficult nor too easy, such that the robabilities of correct resonses are close to 0.50 (Lord 970). To understand better why choice balance imroves efficiency while metric utility balance degrades efficiency, we recognize that, for logit estimation of artworths with stated-choice data, Σ deends uon the true artworths and, imlicitly, on resonse error. Secifically, McFadden (974) shows that: 43 D -efficiency recognizes that Σ deends on the true artworths. See Equation. 69

170 q J i i i () Σ = R ( xij xik Pik )' Pij ( xij r J r i= j= k= k= where R is the effective number of relicates; J i is the number of rofiles in choice set i; q is the number of choice sets; x r ij is a row vector describing the j th rofile in the i th choice set; and P ij is the robability that the resondent chooses rofile j from the i th choice set. Equation has a more transarent form when all choice sets are binary, in articular: q (3) Σ = R ( xi xi )' Pi ( Pi )( xi xi ) = R where d r i r r i= i= r r J r r x ik P ik ) q r d ' P ( P i i i r ) d i is the row vector of differences in the features for the i th choice set. In Equation 3 it is clear that choice balance (P i ½) tends to increase a norm of Σ. However, the choice robabilities (P i ) are a function of the feature differences ( d i may not maximize either the determinant (or the trace) of Σ. r ), thus ure choice balance First recognize that the true choice robabilities in the logit model are functions of the r r utilities, d. If we hold the utilities constant, we also hold the choice robabilities constant. i Subject to this linear utility constraint, we can increase efficiency without bound by increasing the absolute magnitudes of the d r i s. This result is similar to that for the metric-airs format; the exerimenter should choose feature differences that are as large as reasonable subject to the linear constraint imosed by d i r r and subject to any errors (outside the model) introduced by large feature differences (Delquie 003). Kanninen (00) exloits these roerties effectively by selecting one continuous feature (e.g. rice) as a focal feature with which to maniulate d, while treating all remaining features as discrete by setting them to their maximum or minimum levels. She reorts substantial increases in efficiency for both binary and multinomial choice sets (. 3). Kanninen uses numerical means to select the otimal levels of the continuous focal feature. We gain further insight with formal analysis. In articular, we examine the otimality conditions for the focal feature, holding the others constant. For greater transarency we use bi- ri r 70

171 nary questions and the trace of Σ rather than the determinant. 44 The trace is given by: (4) trace( Σ ) = d q K i= k = ik P ( P ) i If we assume, without loss of generality, that the focal feature is the K th feature, then the first-order conditions for the focal feature are: K Pi Pi (5) d ik = K dik for all i k = where d ik is the level of the k th feature difference in the i th binary question. We first rule out the trivial solution of d = 0 for all k. Not only would this imly a ik choice between two identical alternatives, but the second-order conditions imly a minimum. slightly more interesting case occurs when erfect choice balance is already achieved by the K- discrete features. We return to this solution later, however, in the focal-feature case it occurs with zero robability (i.e., on a set of zero measure). The more interesting solutions for a continuous focal feature require that d ik be non-zero and, hence, that P P. Thus, when d ik can i i i vary continuously, Equation 5 imlies that otimal efficiency requires choice imbalance. There are two equivalent solutions to Equation 5 corresonding to a right-to-left switch of the two alternatives. That is, the sign of d ik must match the sign of P P. Thus, without i i loss of generality, we examine situations were d ik is ositive (see endix). We lot the marginal imact of d ik on trace( Σ ) for illustrative levels of the non-focal feature difference. 45 Following rora and Huber (00) and Toubia, Hauser and Simester (003), we use the magnitude of the artworths as a measure of resonse accuracy higher magnitudes indicate higher resonse accuracy. Figure 3 illustrates that efficiency is maximized for non-zero focal feature differences. Figure 3 also illustrates that the otimal level of the focal feature difference deends uon the level of the non-focal features and uon the magnitude. i i Choice balance ( P P ) is a non-linear function of the feature differences. Nonetheless, we can identify the systematic imact of both non-focal features and resonse accuracy. In 44 For a focal feature, the trace and the determinant have related maxima, esecially if all but the non-focal feature are set to their minimum or maximum values. 45 For low magnitudes, the artworths of both feature are set equal to.0 in these lots. For high magnitudes they are doubled. The low level of the non-focal feature (absolute value) is.5. The high level is

172 articular, formal analyses reveal that choice balance increases with greater resonse accuracy and with greater levels of the non-focal feature differences. We formalize these insights in two roositions. The roofs are in the endix. Proosition. s resonse accuracy increases, choice balance increases and the level difference in the focal feature decreases. increases. Proosition 3. s the levels of the non-focal features increase, choice balance Figure 3 Otimal Efficiency and Choice Imbalance Efficicency (trace) 4 3 Efficicency (trace) Feature difference Low levels non-focal features High levels non-focal features Feature difference Low magnitude High magnitude (a) Low vs. high non-focal feature levels (b) Low vs. high magnitude in logit model Quantal Features In many alications, all features are binary they are either resent or absent. Examles include four-wheel steering on automobiles, cell-hone holders on lato bags, and auto focus on cameras. In these cases, the focal-feature method cannot be used for maximizing efficiency. However, discrete-search algorithms are feasible to maximize a norm on. rora and Huber (00) and Sandor and Wedel (00) rovide two such algorithms. Sandor and Wedel (00) extend these concets to a mixed logit model. n interesting (and common) case occurs when the exerimenter limits the number of features that vary, erhas due to cognitive load (Sawtooth Software 996,. 7; Shugan 980). Σ 7

173 For a given number of binary features (±), k d ik is fixed. In this case, it is easy to show that Equation 4 is maximized when choices are as balanced as feasible. 46 alies when the researcher requires that all features vary. This same henomenon Insights on the Relative Performance of Choice-based Polyhedral Methods Like metric olyhedral methods, choice-based olyhedral methods rely uon Sonnevend ellisoids. However, for stated-choice data, resondents answers rovide inequality constraints rather than equality constraints. Rather than selecting questions vectors arallel to the longest axis of the ellisoid, choice-based olyhedral methods use the ellisoid to identify target artworths at the extremes of the feasible region. The choice-based algorithm then solves a utility maximization (knasack) roblem to identify the features of the rofiles. The knasack roblem assures that the constraints go (aroximately) through the analytic center of the region. This criterion rovides aroximate a riori choice balance. (y a riori we mean that, given the rior distribution of feasible oints, the redicted choices are equal.) In this manner, the algorithm achieves choice balance without requiring utility balance throughout the region. Utility balance only holds for the analytic center of the region a set of zero measure. The alications of choice-based olyhedral methods to date have focused on discrete features imlemented as binary features. For discrete features, choice balance achieves high efficiency. For continuous features, Proosition suggests algorithmic imrovements. Secifically, the knasack roblem might be modified to include trace( Σ ) in the objective function. However, unlike a knasack solution, which is obtained raidly, the revised math rogram may need to rely on new, creative heuristics. Summary Choice balance affects the efficiency of stated-choice questions quite differently than metric utility balance affects metric aired-comarison questions. This, desite the fact that choice balance is related to ordinal utility balance. If at least one feature is continuous, then that feature can be used as a focal feature and the most efficient questions require choice imbalance 46 If there are a maximum number of features that can be non-zero rather than a fixed number, it is ossible, although less likely, that a maximum will might occur for less than the maximum number of features. This case can be checked numerically for any emirical roblem. Nonetheless, the qualitative insight holds. 73

174 (and non-zero levels of the focal feature). However, we show formally that more accurate resonses and higher non-focal features lead to greater choice balance. If features are discrete and the number which can vary is fixed, then choices should be as balanced as feasible. Finally, this analysis hels exlain the relative erformance of stated-choice olyhedral methods and aggregate customization methods. 5. Configurators In Sections 3 and 4 we exlored two oular formats for reference questions suggesting that, for the metric aired-comarison format, utility balance led to bias, relative bias and inefficiency while, for the choice-based format, choice balance could be efficient for discrete features but not necessarily for continuous focal features. We now exlore a relatively new format, configurators, in which resondents focus on the features one at a time and are not asked to comare rofiles in the traditional sense. Neither utility balance nor choice balance aly to configurators. Nonetheless, analysis of this new format leads to insights that generalize to all three question formats and hels us understand which format is most efficient in which situations. In configurators, resondents are tyically given a rice for each feature level and asked to indicate their references for that rice/feature combination (review Figure ). Configurator questions are equivalent to asking the resondent whether the artworths of the features ( k s) are greater than or less than the stated rices. These data aear quite useful. For examle, Liechty, Ramaswamy and Cohen (00) use ten reeated configurator measures er resondent to estimate successfully hierarchical ayes robit models. Stylized Fact: Warm-u Questions do Surrisingly Well in Predicting Feature Choice Figure 4 resents an emirical examle. In data collected by Toubia, et. al. (003), resondents were shown five lato bags worth aroximately $00 and allowed to choose among those bags. Various conjoint methods based on metric aired-comarison questions collected rior to this task redicted these rofile choices quite well. 74

175 Figure 4 Warm-u Questions are Effective Predictors of Configurator Tasks Percent of Features Predicted Correctly With Initial Questions Without Initial Questions Predicted Percentile of Chosen Profile With Initial Questions Without Initial Questions Number of Conjoint Questions Number of Conjoint Questions (a) bility to redict feature choice (b) bility to redict referred rofile efore comleting the metric aired-comarison task, in order to familiarize resondents with the features, resondents answered warm-u questions. They were asked to indicate the referred level of each feature. Such warm-u questions are common. We re-analyzed Toubia, et. al. s data in two ways. One conjoint estimation method did not use the warm-u questions. The other conjoint estimation method included monotonic constraints based on the warm-u questions. 47 One month after the initial conjoint-analysis questionnaire, the resondents were recontacted and given a chance to customize their chosen bag with a configurator. Figure 4 lots the ability of the conjoint analyses to redict configurator choice as a function of the number of metric aired-comarison questions that were answered. For both redictive criteria, the ability to redict configurator choice is surrising flat when initial warm-u questions are included. (Predictions are less flat for forecasts of the choice of five lato bags, Toubia, et. al. 003, Table 6.) For configurator redictions, the nine warmu questions aear to rovide more usable information than the sixteen aired-comarison questions. This emirical observation is related to observations by Evgeniou, oussios, and Zacharia (003) whose simulations suggest the imortance of ordinal constraints. While, at first, Figure 4 aears to be counter-intuitive, it becomes more intuitive when we realize that the 47 In the method labeled with initial questions, the warm-u questions where used to determine which of two feature levels were referred (e.g., red or black). In the method labeled without initial questions, such ordering data were not available to the estimation algorithm. In this articular lot, for consistency, we use the analyticcenter method of Toubia, et. al., however, the basic insights are not limited to this estimation method. 75

176 warm-u questions are similar to configurator choice and the aired-comarison questions are similar to the choice among five (Pareto) rofiles. We formalize this intuition with our stylized model. Stylized Model to Exlain the Efficiency of Warm-u Questions For transarency assume metric warm-u questions. This enables us to comare the warm-u questions to aired-comarison questions without confounding a metric-vs.-ordinal characterization. We consider the same three questions used reviously, rewriting e as e for mnemonic exosition of aired-comarison tradeoff questions. Without loss of generality we incororate feature rices into the definition of each feature. We assume not assume > as we did before. The three questions are: (Q) = + e (Q) = + e (Q3) = + e t > > ub 0, but need We now consider stylized estimates obtained from two questions. In our stylized world of two features at two levels, Q and Q can be either metric warm-u questions or metric aired-comarison questions. Q3 can only be a metric aired-comarison question. ssuming all questions are unbiased, we examine which combination of questions rovides the greatest recision (efficiency): Q+ Q, Q+Q3, or Q+Q3. Consider feature. ecause feature rice is incororated in the feature, if 0, then feature choice is correctly redicted if k is non-negative. (We use similar reasoning for feature.) Simle calculations (see endix) reveal that the most accurate combination of two questions is Q+Q if and only if Prob[ e ] > Prob[ ek + et k ]. Since k is a negative k k number, these conditions hold for most reasonable density functions. For examle, in the endix we demonstrate that, if the errors are indeendent and identically distributed normal random variables, then this condition holds. Proosition 4. For zero-mean normally distributed (i.i.d.) errors, the warm-u questions (Q+Q) rovide more accurate estimates of feature choice than do airs of questions that include a metric aired-comarison question (Q3). 76 k t

177 Intuitively, metric warm-u questions arse the errors more effectively than tradeoff questions when the focus is on feature choice rather than holistic roducts. The questions focus recision where it is needed for the managerial decisions. We rovide a more formal generalization of these concets in the next section. The intuition does not deend uon the metric roerties of the warm-u questions and alies equally well to ordinal questions. We invite the reader to extend Proosition 4 to ordinal questions using the M-efficiency framework that is exlored in the next section. 6. Generalizations of Question Focus: M-Efficiency The intuition drawn from Proosition 4 suggests that it is best to match reference questions carefully to the managerial task. For examle, metric warm-u questions are similar to configurator questions and may be the most efficient for redicting feature choice. On the other hand, tradeoff questions are similar to the choice of a single roduct from a Pareto set. Estimating artworths from tradeoff questions may be best if the managerial goal is to redict consumer choice when a single new roduct is launched into an existing market. We now examine this more general question. We begin with the stylized model for two questions and then suggest a generalization to an arbitrary number of questions. Stylized nalysis: Matching Question Selection to Managerial Focus With metric questions (and referential indeendence) we can ask one more indeendent question, which we call a range question. 48 We then consider three stylized roductdeveloment decisions. (Q4) = + + e + r Launch a single roduct The roduct-develoment manager has one shot at the market and has to choose the single best roduct to launch, erhas because of shelf-sace or other limitations. The manager is most interested in the difficult tradeoffs, esecially if the two features are substitutes. The manager seeks recision on. Launch a roduct line (latform decision) 48 Such questions are feasible with a metric format. They would be eliminated as dominated alternatives in an ordinal format such as stated-choice questions. 77

178 The roduct-develoment manager is working on a latform from which to launch multile roducts, erhas because of synergies in roduction. The manager is most interested in the range of roducts to offer (the breadth of the latform). The manager seeks recision on +. Launch a configurator-based e-commerce website The roduct-develoment manager lans to sell roducts with a web-based configurator, erhas because of flexible manufacturing caability. However, there is a cost to maintaining a broad inventory and there is a cognitive load on the consumer based on the comlexity of the configurator. The manager seeks recision on and on. For these stylized decisions we exect intuitively that we gain the most recision by matching the questions to the managerial decisions. Table illustrates this intuition. For decisions on a single roduct launch, we gain the greatest recision by including tradeoff questions (Q3); for decisions on roduct lines we gain the greatest recision by including range questions (Q4), and for decisions on e-commerce we gain the greatest recision by focusing on featuresecific questions (Q and Q). These analytical results, which match question format to managerial need (for metric data), are comarable in sirit to the emirical results of Marshall and radlow (00) who examine congruency between the formats of the rofile and validation data (metric vs. ranking vs. choice) in hybrid conjoint analysis. Marshall and radlow s analyses suggest that rofile data require less augmentation with self-exlicated data when they are congruent with the validation data. Table Error Variances Due to lternative Question Selection Tradeoff Questions Range Questions Feature Focus Single-roduct launch σ 3σ σ Product-line decisions 3σ σ σ e-commerce website 3 σ 3 σ σ 78

179 M-Efficiency The insights from Table are simle, yet owerful, and generalize readily. In articular, the stylized managerial decisions seek recision on M r where M is a linear transform of r. For examle, if the manager is designing a configurator and wants recision on the valuation of features relative to rice, then the manager wants to concentrate recision not on the artworths er se, but on secific combinations of the artworths. We indicate this managerial focus with a matrix, M c, where the last column corresonds to the rice artworth. This focus matrix rovides criteria, analogous to D-errors and -errors, by which we can design questions. the estimate of M c 0 = M-Efficiency for Metric Data We begin with metric data to illustrate the criteria. For metric data, the standard error of M r is roortional to Σ = M ( X ' X ) M ' (Judge, et. al. 985,. 57). y defining efficiency based on Σ M M rather than Σ, we focus recision on the artworths that matter most to the manager s decision. We call this focus managerial efficiency (M-efficiency) and define M-errors as follows for suitably scaled X matrices. M D -error = q det( M ( X ' X ) M ' ) / n M -error = q trace( M ( X ' X ) M ' ) / n The managerial focus affects D-errors and -errors differently. M D -error is defined by the determinant of the modified covariance matrix and, hence, focuses on the geometric mean of the estimation variances (eigenvalues). ll else equal, the geometric mean is minimized if the estimation variances are equal in magnitude. On the other hand, M -error is defined by the trace of the covariance matrix and, hence, focuses on the arithmetic mean of the estimation variances. Thus, if the manager wants differential focus on some combinations of artworths and is willing to accet less recision on other combinations, then M -errors might be a better metric than M D - errors. 79

180 To illustrate M-efficiency, we rovide an examle in which we accet higher - and D- errors in order to reduce M - and M D -errors. Consider an orthogonal array, X. For this design, X X = I 6, where I 6 is a 6x6 identify matrix. The orthogonal array minimizes both D-errors and -errors. It does not necessarily minimize either M D - or M -errors as we illustrate with the fivefeature configurator matrix, M c. = c M = X To increase M-efficiency, we erturb M M X X closer to c cm M while maintaining full rank. We choose X M such that ] ) [ c c M M M M X X ( 6 I + = β β α where α and β are scalars, ] [0, β, and α is chosen to maintain constant question magnitudes. 49 y construction, X M is no longer an orthogonal array, hence both -errors and D-errors will increase. However, M - and M D -errors decrease because X M is more focused on the managerial roblem. For exam- 49 Recall that efficiency criteria tend to ush feature levels to the extremes of the feasible sace. Thus, to comare different designs we must attemt to kee the magnitudes of the features constant. In this examle, we imlement that constraint by keeing the trace of X M X M equal to the trace of X X. Our goal is to rovide an examle where M-efficiency matters. We can also create examles for other constraints the details would be different, but the basic insights remain. 80

181 le, with β = 4/5, M -errors decrease by over 0%; M D -errors decrease, but not as dramatically. Larger β s turn out to be better at reducing M D -errors, but less effective at reducing M -errors. Decreases in M -efficiency are also ossible for square M matrices. However, we cannot imrove M D -errors for square M matrices, because, for square M, the question design that minimizes D-error will also minimize M D -error. This is true by the roerties of the determinant: det( Σ M ) = det(m(x X) - M ) = det(m)det((x X) - )det(m ) = det(m )det(m) det((x X) - ) = det(m M) det((x X) - ). s an illustration, suose the manager wants to focus on configurator rices (as in M c ), but also wants to focus on the highest rice osted in the configurator. To cature this focus, we make M square by adding another row to M c, [ -]. To decrease M - error we unbalance X to make rice more salient in the question design. For examle, a 40% unbalancing decreases M -errors by 5% with only a slight increase in M D -errors (<%). We chose these examles to be simle, yet illustrative. We can create larger examles where the decreases in M-errors are much larger. Our erturbation algorithms are simle; more comlex algorithms might reduce M-errors further. M-efficiency aears to be an interesting and relevant criterion on which to focus future algorithmic develoment. M-Efficiency for Stated-Choice Questions The same arguments aly to stated-choice questions. The M-efficient criteria become either the determinant (M D -error) or the trace (M -error) of M(Z PZ) - M where Z is a robability-balanced X matrix and P is a diagonal matrix of choice robabilities. See rora and Huber (00,. 74) for notation. s in Section 4, there will be a tradeoff between choice balance and the levels of focal features. For M-errors, that tradeoff will be defined on combinations of the features rather than the features themselves. Summary of Sections 5 and 6 Configurator questions, based on analogies to e-commerce websites, are a relatively recent form of reference questions. Emirical data suggest that such questions are surrisingly good at redicting feature choice, in large art because they focus recision on managerial issues that are imortant for e-commerce and mass customization. This insight generalizes. Intuitively, recision (efficiency) is greatest when the question format is matched to the managerial decision. 8

182 M -errors (and M D -errors) rovide criteria by which match format to managerial relevance. These criteria can be used to modify question-design algorithms. There remain the challenges of new algorithmic develoment. Existing algorithms have been otimized and tested for D- and D -efficiency. However, Figure 4, Table, and the concet of M-errors have the otential to refocus these algorithms effectively. 7. Summary and Future Research In the last few years great strides have been made in develoing algorithms to select questions for both metric aired-comarison reference questions and for stated-choice questions. lgorithms have been develoed for a wide range of estimation methods including regression, logit, robit, mixed logit, and hierarchical ayes (both metric and choice-based). In some cases the algorithms focus on static designs and in other cases on adative designs. This existing research has led to reference questions that are more efficient and more accurate as evidenced by both simulation and emirical exeriments. Summary In this aer, we use stylized models to gain insight on the roerties of algorithms and the criteria they otimize. The models abstract from the comlexity of emirical estimation to examine roerties transarently. In key cases, the stylized models rovide insight on emirical facts. While some of the following insights formalize that which was known from rior algorithmic research, many insights are new and rovide a basis for further develoment: Metric utility balance leads to relative biases in artworth estimates. Metric utility balance is inefficient. Choice balance, while related to utility balance, affects stated-choice questions differently than utility balance affects metric questions. When features are discrete and the number of features which can vary is fixed, choice balance is a comonent of otimal efficiency. Otherwise, otimal efficiency imlies non-zero choice balance. s resonse accuracy and/or the levels of non-focal features increase, otimal efficiency imlies greater choice balance. 8

183 Warm-u (configurator) questions are surrisingly accurate for redicting feature choice because they focus recision on the feature-choice decision. This concet generalizes. Tradeoff, range, or configurator questions rovide greater efficiency when they are matched to the aroriate managerial roblem. M-errors rovide criteria by which both metric and stated-choice questions can be focused on the managerial decisions that will be made based on the data. Simle algorithms (erturbation and/or unbalancing) can decrease M-errors effectively. The formal models also rovide a theory with which to understand the accuracy of recently roosed olyhedral methods. Secifically, Metric olyhedral methods aear to avoid the bias and inefficiency of utility balance by maintaining orthogonality while focusing questions where artworths estimates are less recise. To date, stated-choice olyhedral methods have been alied for discrete features. Hence, their use of aroximate choice balance makes them close to efficient. However, stated-choice olyhedral methods for continuous features might be imroved with algorithms that seek non-zero choice balance. Future Research This aer, with its focus on stylized models, takes a different tack than recent algorithmic aers. The two tacks are comlementary. Our stylized models can be extended to exlain simulation results in ndrews, inslie and Currim (00), emirical results in Moore (003), or to exlore the efficient use of self-exlicated questions (Ter Hofstede, Kim and Wedel 00). On the other hand, insights on utility balance, choice balance, and M-efficiency can be used to imrove algorithms for both aggregate customization and olyhedral methods. Finally, interest in configurator-like questions is growing as more marketing research moves to the web and as e-commerce managers demand marketing research that matches the decisions they make daily. We have only begun to understand the nature of these questions. 83

184 References ndrews, Rick L., ndrew inslie, and Imran S. Currim (00), n Emirical Comarison of Logit Choice Models with Discrete versus Continuous Reresentations of Heterogeneity, Journal of Marketing Research, 39, (November), rora, Neeraj and Joel Huber (00), Imroving Parameter Estimates and Model Prediction by ggregate Customization in Choice Exeriments, Journal of Consumer Research, 8, (Setember), Dahan, Ely and John R. Hauser (00), "The Virtual Customer," Journal of Product Innovation Management, 9, 5, (Setember), Delquie, Philie (003), Otimal Conflict in Preference ssessment, Management Science, 49,, (January), 0-5. Evgeniou, Theodoros, Constantinos oussios, and Giorgos Zacharia (003), Generalized Robust Conjoint Estimation, Working Paer, (Fontainebleau, France: INSED), May. Elrod, Terry, Jordan Louviere, and Krishnakumar S. Davey (99), n Emirical Comarison of Ratings-ased and Choice-based Conjoint Models, Journal of Marketing Research 9, 3, (ugust), Green, Paul E, bba Krieger, and Manoj K. garwal (99), dative Conjoint nalysis: Some Caveats and Suggestions, Journal of Marketing Research,. 5-.,, and Yoram Wind (00), Thirty Years of Conjoint nalysis: Reflections and Prosects, Interfaces, 3, 3, Part, (May-June), S56-S73. Greene, William H. (993), Econometric nalysis, E, (Englewood Cliffs, NJ: Prentice-Hall, Inc.) Haaijer, Rinus, Wagner Kamakura, and Michel Wedel (000), Resonse Latencies in the nalysis of Conjoint Choice Exeriments, Journal of Marketing Research 37, (ugust), Hauser, John R. and Steven M. Shugan (980), Intensity Measures of Consumer Preference, Oerations Research, 8,, (March-ril), Huber, Joel and David Hansen (986), Testing the Imact of Dimensional Comlexity and ffective Differences of Paired Concets in dative Conjoint nalysis, dvances in Consumer Research, 4, Melanie Wallendorf and Paul nderson, eds., Provo, UT: ssociations of Consumer Research, and Klaus Zwerina (996), The Imortance of Utility alance in Efficient Choice Designs, Journal of Marketing Research, 33, (ugust),

185 Johnson, Richard (99), Comment on `dative Conjoint nalysis: Some Caveats and Suggestions, Journal of Marketing Research, 8, (May), 3-5., Joel Huber, and Lynd acon, dative Choice ased Conjoint, Sawtooth Software Proceedings, 003. Kanninen (00), Otimal Design for Multinomial Choice Exeriments, Journal of Marketing Research, 39, (May), 4-7. Kuhfeld, Warren F., Randall D. Tobias, and Mark Garratt (994), Efficient Exerimental Design with Marketing Research lications, Journal of Marketing Research, 3, 4, (November), Lord, Frederic M. (970), Some Test Theory for Tailored Testing, Wayne H. Holzman (Ed.), Comuter ssisted Instruction, Testing, and Guidance, (New York, NY: Harer and Row). Johnson, Richard (99), Comment on `dative Conjoint nalysis: Some Caveats and Suggestions, Journal of Marketing Research, 8, (May), 3-5., Joel Huber, and Lynd acon (003), dative Choice ased Conjoint, Proceedings of the Sawtooth Software Conference, ril. Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkeohl, and T. C. Lee (985), The Theory and Practice of Econometrics, (New York, NY: John Wiley and Sons). Keeney, Ralh and Howard Raiffa (976), Decisions with Multile Consequences: Preferences and Value Tradeoffs, (New York, NY: John Wiley & Sons). Krantz, David H., R. Duncan Luce, Patrick Sues, and mos Tversky (97), Foundations of Measurement, (New York, NY: cademic Press). Liechty, John, Venkatram Ramaswamy, Steven Cohen (00), Choice-Menus for Mass Customization: n Exerimental roach for nalyzing Customer Demand With an lication to a Web-based Information Service, Journal of Marketing Research, 38,, (May). Louviere, Jordan J., David. Hensher, and Joffre D. Swait (000), Stated Choice Methods: nalysis and lication, (New York, NY: Cambridge University Press), Mahajan, Vijay and Jerry Wind (99), New Product Models: Practice, Shortcomings and Desired Imrovements, Journal of Product Innovation Management, 9, Marshall, Pablo, and Eric T. radlow (00), Unified roach to Conjoint nalysis Models, Journal of the merican Statistical ssociation, 97, 459 (Setember), McFadden, Daniel (974), Conditional Logit nalysis of Qualitative Choice ehavior, Frontiers in Econometrics, P. Zarembka, ed., (New York: cademic Press),

186 Moore, William L. (003), Cross-Validity Comarison of Conjoint nalysis and Choice Models, Working Paer, (Salt Lake City, Utah: University of Utah), February. Orme, ryan (999), C, CC, or oth: Effective Strategies for Conjoint Research, Working Paer, Sawtooth Software, Sequim, W. and Joel Huber (000), Imroving the Value of Conjoint Simulations, Marketing Research,, 4, (Winter), -0. Sandor, Zsolt and Michel Wedel (00), Designing Conjoint Choice Exeriments Using Managers Prior eliefs, Journal of Marketing Research, 38, 4, (November), and (00), Profile Construction in Exerimental Choice Designs for Mixed Logit Models, Marketing Science,, 4, (Fall), and (003), Differentiated ayesian Conjoint Choice Designs, working aer, University of Michigan usiness School. Sawtooth Software (00), C 5.0 Technical Paer, Sawtooth Software Technical Paer Series, (Sequim, W: Sawtooth Software, Inc.) Shugan, Steven M. (980), The Cost of Thinking, Journal of Consumer Research, 7,, (Setember), 99-. Ter Hofstede, Frenkel, Youngchan Kim, and Michel Wedel (00), ayesian Prediction in Hybrid Conjoint nalysis, Journal of Marketing Research, 39, (May), Thomke, Stefan and Eric von Hiel (00), Customers as Innovators: New Way to Create Value, Harvard usiness Review, (ril), Toubia, Olivier, John R. Hauser, and Duncan I. Simester (003), Polyhedral Methods for dative Choice-based Conjoint nalysis, forthcoming, Journal of Marketing Research., Duncan Simester, John R. Hauser, and Ely Dahan (003), Fast Polyhedral dative Conjoint Estimation, forthcoming, Marketing Science. Urban, Glen L. and John R. Hauser (003), Listening In to Find and Exlore New Combinations of Customer Needs, forthcoming, Journal of Marketing. von Hiel, Eric (00b), Persective: User Toolkits for Innovation, Journal of Product Innovation Management, 8,

187 endix Relative ias Due to Metric Utility alance Proosition. For the stylized model, on average, metric utility balance biases artworths uward and does so differentially deending uon the true values of the artworths. Proof. In the text we have already shown that one of the artworths is biased uwards for any zero-mean distribution that has non-zero measure below must now demonstrate that. We E[ eub eub ] > 0 for 0 with density functions which have non-zero density below. We differentiate the integral to obtain: E[ e e ] ub ub = e f ( e ) de ub ub h f ( e ub ) de ub f ( e ) de ( ) f ( ub ub = [ F( )]( ) f ( ) f ( e ub ) de ) E[ e [ F( ub e ub f ( e eub )] ub ) de ub f ( ][ F( ) = )] f ( ) ub < 0 where the last ste recognizes that both terms in the numerator are negative whenever f ( e ub ) has density below and 0. When =, the second term is still negative. > Choice alance vs. Magnitude and Non-focal Features From Equation 3 in the text we know that trace ( Σ ) = d P ( P ). ecause this function is searable in i, we focus on a single i and dro the i subscrit. Rewriting Pi in terms of the artworths, q K i= k = r, and allowing m to scale the magnitude of the ik i i 87

188 md md artworths, we obtain for a given i, T trace ( Σ ) = d ( e + e + ). We as- K > 0 sume, without loss of generality, that and yk kd k 0. Define = i K k= k K k rr rr K s K d k = k d K * and let be the otimal. d K Lemma. For y 0 * K > 0 and, d K 0. K * * Proof. ssume d * < 0. If a d + y < 0, then trace K = K K K * = * ( d ) K s K /(( e ma* + e + * ma ** * + ). Consider d ( a y ) / > 0 such that K = K K a ** = a * > 0. This assures that the denominator of the trace stays the same. Now ** * * * * d K > ( a y K ) / K > (a yk ) / K = d K because both a and y K are of the same sign. Thus, the numerator of the trace is larger and the denominator is unchanged and we have the result by contraction. In the secial case of y = 0 K, trace ( d K ) = trace( ), hence we can also restrict ourselves to 0. d K d K Lemma. For > 0, y < 0, then the trace has no minimum in on (0, ). K K d K T Proof. rr ( ) = [ + rr d m r r e md rr d /( e md rr md md md K d K e e + ) k k K ( e md ( )] + e + rr ) h( d K ) / b( d K * h ( d K ) = ( e k d k m * ( d K = k h ) rr * * * ) ' ( * *. Then, T d = 0 h( ) = T ( d ) = h K ) 0 and ( ) / b( ). will have the same sign as rr md + e rr md + ) + d ( e rr rr md md K + e ). rr md md d m (e e ) / d k rr K rr K m K K d K ( e rr md rr e rr md ) d K K d K m K ( e rr md d rr rr m md md K ( e + e k k e * ). Thus, h (d K ) and md md md md H ( e e ) md K K ( e + e ) have the same sign. ecause * 0by r r md r r md r assumtion, the FOC imly that e r e 0, hence md > 0. If md K K, then H d K rr md ) T ( Cancel the second and third terms. Using the FOC gives rr 88 d K * d K )

189 sumtion, the FOC imly that e md rr md rr r e 0, hence md r > 0. If md K K, then H < * 0 and so is T ( ). If md rr <, then md < because y < K 0. Thus, rr md d K K K 0 e < e < e, hence H < e md K K. Thus, H < 0 if md K K > ( e ) /. Call r r h h0 = ( e ) /. If md K K < h0, then 0 < m d < h 0 and H < 0 e md K K, which is 0 negative if md > h ( h For all h in (0,], we have (e h > e ) /. ) / > h, so by K K recursion we show that H < 0 if md and converging to zero. K K > h l h l Proosition. s resonse accuracy increases, choice balance increases and the level difference in the focal feature decreases. T Proof. ssume > 0 and 0 without loss of generality and rewrite K y K md md = u( d K, m) / v( d K, m) where u ( d K, m) = d k, v (d K,m) = ( e + e + ). For k r r rr * m > 0, d ( m ) satisfies f u / u = v / v g. The numerator, u, does not deend uon o K o md md m. Taking derivatives yields v / v = m ( e e ) /( e md + e md + ) and ( v / v) m = rr md rr md ( md md md m d K ( e e ) + m d rr rr rr rr K e + e )][ e + e + r r r r md md [ r r ( e K + e rr + ) rr rr rr ] m d r r K (e md rr e md rr ). We rearrange the numerator to obtain the following exression: rr md r r md md md e e e md rr rr r rr rr rr md r K ( )( + e + ) + mkd( e + e ) + 4m K d, which is ositive rr d > 0 md rr e md rr because m and e 0 as roven in Lemma. Hence, v / v is increasing * * in m. Thus, g( d K ( m o ),m ) > g( d K ( mo ), mo ) for m > mo, which imlies f ( d * K ( m ), m * T ( d ( m ), m ) K o o * ) = f ( d K < 0. ( m o ), mo ) = g( d * K ( m ), m ) o o < g( d * K ( m o ), m). Hence, We have shown that at m, the derivative to T with resect to is negative = m d K * * at d K ( m o ). y Lemma, it is non-ositive for all d K > d K ( m o ). Thus, * d ( m ) < * K d K ( mo ) which roves the second art of the roosition. ecause 89

190 * * y + d ( m ) 0, y + d ( m) 0, and y 0, we have K K o K > K K K > * < y d ( m ) < y d ( m ) 0 K + K K K + * K o K K <, which roves the first art of the roosition. increases. Proosition 3. s the levels of the non-focal features increase, choice balance Proof. We now consider changes in the non-focal features. In this case, m scales the non-focal features but does not scale d T = ( m yk + d K ) ( e K T = u( d, m) / v( d, m). Caution, the definitions of u and v (and f and g) are within the K K e myk + dk K ( myk + dk K ) + roofs follow the same rinciles. Thus, w.l.o.g., we have > 0, y K 0 which im- * * lies that 0 and that my + d 0. This is true for any m > 0. d K K K K >. Thus, + ). We again rewrite roofs and vary between the two roositions. Lemmas and still hold because the K Let m o > 0, then the FOC imly f ( * d K, m o ) v / v = u / u g( d * K, m o ). Let m m o + dm o and consider a * d K = d K ( mo ) + dd K such that utility is unchanged: * a ( m ) m o o y K + d * K ( m ) o K * = ( m o + dmo ) yk + ( d K ( m o ) + dd K ) K. Solving we obtain dd = dm y / K o K K. For reference call this Exression. Differentiating w.r.t. d K gives f ( d * K, m o ) = K ( e e a * * a ) /( e a * + e a * + ). ecause a * is unchanged we have * f ( d ( m ) + dd, m + dm ) = K o K o o f ( d * K ( m o ), m o ). K K o d K o K K * * * Differentiating u w.r.t. d gives: g(d, m ) = /( m s + d ). Further differentiation gives g( d m d = m s d K m s + d * * * K, o ) / K ( o K ) /( K K ) and g( d, m * K o ) / m = * g ( d + dd, m + dm K o K K ( m s d o * K * * 4d KmsK /( m sk + d K ). Consider dm o > 0. Using Taylor s o o * * imlies G g ( + dd, m + dm ) g( d, ) is of the same sign as G 4d o d K ( m ) /( m s ) m s * K o o K o K * ) g( d K, m o ) + d dd * K + K ) k 4 K o o Theorem, ignoring higher order terms because dmo and, hence, dd K are differentials: d * K K m osk /( mo s K m o + d * K ) dm * (m s d ( mo )) y K / K y < K. ecause K 0, o. Exression 90

191 G d m m s m s y / G * < 4 K ( o ) o K o K k * d K ( mo ) K mo yk < ( mo yk = + K. Further, G is of the same sign as d * K ( m o ) K ) < 0. Thus, for dm > 0 we have o * f ( d ( m ) + dd, m + dm ) = K o K o o f ( d * K ( m o ), m ) = g( d o * K ( m o ), m o ) * > g ( d ( m ) + dd, m + dm K o K o o ). Thus, if we increase while adjusting to maintain the same level of choice balance (constant a * ), T < 0. Hence, 0 < d ( m + dm ) < d ( m ) + dd * K o o * K o K m o * and 0 < a ( m ) a * o + dm o < ( m d K ). Thus, choice balance increases. Relative Efficiency of Configurator Questions for E-commerce Decisions o Lemma 3. If e, e, and e are zero-mean, i.i.d. normally distributed, then k k k Prob[ e ] > Prob[ e ± e t ] for k =,. l l t k Proof. If the errors are indeendent and identically distributed zero-mean normal random variables with variances, σ, then, for 0, the comarison reduces to the following equation where f ( σ ) is a normal density with variance σ. Let e T = e ± e t. k l 0 + f ( ek σ ) de + k 0 k f ( e T σ ) de T c 0 f ( e k σ ) de c 0 f ( e T σ ) de with c 0 To demonstrate that the last exression is true, we let I f ( et σ ) de = ex( e / 4σ ) /( σ 4 π ) de Then, / I = c ex( g 0 The conditions for / σ ) /( σ k c 0 = c. Change the variables in the integration setting g = e /. π ) dg ex( e / σ ) /( σ π ) de = f (et σ ) de. 0 yield equivalent conditions. 0 c 0 Proosition 4. For zero-mean normally distributed (i.i.d.) errors, the warmu questions (Q+Q) rovide more accurate estimates of feature choice than do airs of questions that include a metric aired-comarison question (Q3). 9

192 Proof. With Q+Q or Q+Q3, the error in estimating is based directly on and the robability of correct rediction is Prob[ e e ]. With Q+Q3 we obtain by adding Q and Q3 and the error in estimating is based on e + et. The robability of correct rediction is Prob[ e + et ]. Thus, for Feature, Q+Q and Q+Q3 are suerior to Q+Q3 if Prob[ e ] > Prob[ e + e t ]. For Feature, similar arguments show that Q+Q and Q+Q3 are suerior to Q+Q3 if Prob[ e ] > Prob[ e et ]. Thus, Q+Q is suerior to Q+Q3 and Q+Q3 if the condition in the text holds. Finally, by Lemma 3 this condition holds for zero-mean, i.i.d., normal errors. 9