Marriage and Online Mate-Search Services: Evidence From South Korea

Transcription

1 Marriage and Online Mate-Search Services: Evidence From South Korea Soohyung Lee 1 Department of Economics and MPRC University of Maryland, College Park [email protected] First Version: November, 2007 This Version: October, 2009 Abstract This paper examines the implications of online mate search for marriage, using data from a Korean online matchmaking company. Using the estimated marital preferences, I find that customized recommendations from an online matchmaker and individuals own online search generate similar marital sorting, though customized recommendations result in dates more often than individuals own online search. When compared to traditional offline search, online search generates different marital sorting and may account for changes in marital sorting observed in Korea since Finally, the estimated preferences were recently used by the company to change its recommendation system, dramatically improving its success rate. Keywords: Marriage, Online Search, Internet, Assortative Matching, Market Designs JEL Classification Numbers: D02; J12; C15 1 This was the main chapter of my PhD thesis, previously called Preferences and Choice Constraints in Marital Sorting: Evidence from Korea. I thank Peter Klenow, Luigi Pistaferri, John Pencavel and Michèle Tertilt for their advice and support throughout this project. I have benefited from discussions with seminar participants at Stanford, University of Minnesota, Cornell, Penn State, University of Maryland, Harvard Business School, UIUC, MIT, Rand, SMU, NUS, Collegio Carlo Alberto, Bocconi, Tokyo, Korea and KERI. I thank Ran Abramitzky, Mark Duggan, John Hatfield, John Ham, Han Hong, Ali Hortaçsu, Jakub Kastl, Yuan Chuan Lien, Ben Malin, Sri Nagavarapu, Muriel Niederle, Minjung Park, Alex Ponce-Rodriguez, Felix Reichling, Azeem Shaikh, and Joanne Yoong for detailed comments; Ken Judd, Hyunok Lee and Zsolt Sándor for sharing their computational expertise; and the B.F. Haley and E.S. Shaw Fellowship and Stanford Graduate Research Opportunity Fellowship for financial support. I am indebted to Woong-Jin Lee, Heui-Gil Lee, Kang-Yong Ahn, and Hye-Rim Kim for sharing the data.

2 1 Introduction Rapid adoption of the Internet has influenced many aspects of people s behavior. The search for a mate is no exception. In many countries, people can use an online platform to post and respond to notes to a potential dating partner (e.g., Yahoo Personals and Match.com). Alternatively, they can use an online matchmaking service that suggests a potential dating partner based on their personal characteristics (e.g., eharmony and Chemistry.com). Use of online search can change the size and composition of people s choice sets of potential mates. In addition, by receiving an online matchmaker s customized recommendations for potential mates, people may date and marry certain types of individuals more often than otherwise. Therefore, use of online mate-search services can affect people s decisions for marriage and result in marital sorting different from that generated by traditional offline mate-search processes. The goal of this paper is to examine the implications of the increasingly wide use of online mate-search services for marriage by addressing the following two questions: Do customized recommendations from an online matchmaker lead individuals to different decisions about dating and marriage, as compared to their own online search? How has the increasingly wide use of online mate-search services impacted marital sorting in the population? To address these questions, I study the South Korean marriage market, which is a useful setting because of the early adoption of online mate-search services and their widespread use. In Korea, the services emerged in the late 1980s and in 2005 eight percent of newlyweds met their spouse through these services (Korea Marriage Culture Institute, 2005). 2 I use Korean vital records, combined with an unusually rich dataset from a major Korean online matchmaking company. The dataset provides detailed information on over 20,000 users, 13.4 percent of whom have gotten married through the service. It includes information about whom each user dated and married. Moreover, It has information about proposed dates that were turned down, which is rarely available 2 According to Madden and Lenhard (2006), three percent of the sample of U.S. Internet users met their spouse through the Internet, including online mate-search services, and one percent met on a blind date or through a dating service. In terms of service providers, in January 2006, the two most popular companies in the U.S. were Yahoo Personals and Match.com, which were established in 1997 and 1995, respectively. eharmony, which provides online matchmaking services and was established in 2000, was ranked 7th in the same survey. 1

3 in datasets typically used in the literature. Users characteristics are mostly verified by legal documents. The dataset from the company includes a wide spectrum of the Korean population in terms of age, education, geographic location, and many other dimensions. The company allows users to find a dating partner from the opposite sex in two ways. The user can directly browse other users profiles on the company s online website and request a first date, or the company can suggest a first date with another user who has no ongoing relationship. I will use the term proposal to refer to an event in which two users consider going on a date (or marriage) with each other and partner to refer to the person who is asked out by a user or suggested to another user by the company. I estimate users preferences for their spousal characteristics by analyzing the proposals initiated by the company, which constitute 87 percent of all proposals. The inference of users preferences is possible because the company suggests a wide variety of partners in terms of observable characteristics. 3 To infer users preferences, I develop a model in which an individual can have multiple dates with a partner to make a marriage decision. Within my model, multiple dates result from a desire to learn more about one s partner. Following Hitsch et al. (forthcoming), I assume that the marriage utility function may depend on sex and the similarity between a husband s and wife s characteristics. I estimate the model using a Laplace-type estimator suggested by Chernozhukov and Hong (2003). The estimation results suggest that for income and physical attractiveness, both men and women prefer someone who possesses these characteristics in abundance, regardless of their own traits. However, they prefer marrying a person who is similar to themselves in terms of age, height, religion, geographical location, and the industry in which one works. For educational attainment and father s educational attainment, men prefer women who are similar to them, whereas women prefer men with high educational attainment. Next, I examine individuals decisions for dates and marriage when the online service recommends a potential mate, as compared to their own search via the company s online website. Using the estimated preferences, I compute the probability that a proposal 3 Suppose that the company initiates a proposal only if a man and a woman have the same characteristics. Since there is no variation in terms of partners traits, users responses to being asked on a date are explained by a unobservable random shock, not by partners traits. Therefore, we cannot quantify the extent to which a person values a partner s observable trait. 2

4 initiated by a user would move to the next stage of the relationship if the proposal were made by the company and compare it with the actual outcomes. I find that the probability of a user accepting a first date with another user is significantly higher if the company introduces the two to each other, as compared to the case where the potential mate directy contacts the individual. However, conditional on having a first date, the probability of a proposal turning to a second date or marriage remains similar regardless of who initiates the proposal. In terms of marital sorting, I find that the sorting patterns among users whose spouse is suggested by the company are similar to those among users who directly contacted (or was contacted) their spouse. These results imply that recommendations of an online matchmaker can reduce search cost by raising a user s acceptance rate but do not change marital sorting, as compared to individuals own online search. Although online matchmaking services generate marital sorting similar to individuals own online search, it is still possible that online mate search may generate sorting patterns different from traditional offline mate-search processes, thus changing marital sorting in population. According to the census of newlyweds in Korea, marital sorting between 1991 and 2005 changed in the following ways: the probability of an individual marrying a spouse whose trait is the same as his/her own has decreased for hometown; increased for marital history (never-married vs. not); and remained similar for educational attainment. I undertake two exercises to examine the possibility that wider use of online mate-search services changed marital sorting in Korea. In the first, I weight the users of the online matchmaking service to replicate the characteristics of the average individual (for each sex) in the census of newlyweds. I then compute how likely this average individual is to marry someone with his/her same traits. I find that if an individual uses the online matchmaking company, he/she is less likely to marry a spouse with the same trait in terms of hometown but more likely to marry a spouse with the same marital history. The prediction for the probability of marrying a spouse with the same educational attainment is ambiguous. Therefore, the wider adoption of online mate-search services could explain the patterns observed in the census between 1991 and In the second exercise, I use the estimated preferences to compute the male-optimal 3

5 stable matching with the Gale-Shapley algorithm (1962) and calculate the probability of the average individual marrying a spouse with the same traits. I use the maleoptimal stable matching from the Gale-Shapley algorithm because I find it generates sorting comparable to the sorting among users of the online matchmaking service who ultimately marry. This exercise allows me to simulate marriages where the distribution of traits for both men and women in the matchmaking company is representative of the population, whereas, in the first exercise, the distribution of only one sex s traits is representative. The results are qualitatively similar to the findings from the first exercise, providing further evidence that the wider use of online mate-search services may account for changes in marital sorting in Korea from 1991 to This paper is closely related to three strands of research. The first is the literature estimating marital preferences (e.g., Abramitzky et al., 2009; Angrist, 2002; Banerjee et al., 2009; Bisin et al., 2004; Choo and Siow, 2006; Fernandez et al., 2005; Fisman et al., 2006, 2008; Hitsch et al., 2006, forthcoming; Kurzban and Weeden, 2005; and Wong, 2003). Among studies in this strand of literature studies, the overall analytic framework of this paper is most closely related to Hitsch et al. (forthcoming) who estimate people s preferences based on first date outcomes in a U.S. online platform and predict marital sorting if people use the online platform using the Gale-Shapley algorithm. This paper builds upon their original contribution and other studies in this literature in three important ways. First, to recover marital preferences, my analysis uses both dating and marriage decisions as well as user characteristics that have, to a large extent, been verified by third-parties. 4 Second, to the best of my knowledge, this paper is the first to compare matching outcomes initiated by individuals with those initiated by an online intermediary (i.e., matchmaker). Third, this paper uses the wider adoption of online mate-search services to understand the time trend in marital sorting in a country. A second related literature studies online search and online labor market intermediaries (e.g., Autor 2001, 2008; Kuhn and Skuterud, 2004; Bagues and Labini, 2008). This 4 My findings on marital preferences generally confirm the findings in the literature. For example, my results are consistent with findings of Fisman et al. (2006) and Hitsch et al. (forthcoming) that men value appearance more than women do. Banerjee et al. (2009) find preferences for similar social background (i.e., caste) which is similar to my finding that men prefer women with similar family backgrounds. 4

6 paper adds the marriage market to the list of search markets that have been affected by online search. My finding that the wider use of online mate-search services may account for the decline in marital sorting by geographical location in Korea is consistent with that in Bagues and Labini (2008). They find that the introduction of online job search in Italy increased workers geographical mobility. Third, this paper is related to studies of market design (e.g., Niederle and Roth, 2003, 2008; Niederle and Yariv, 2008). They highlight the possibility that a well-designed centralized matching system can improve the welfare of market participants as compared to a decentralized search. For example, Niederle and Roth (2003) find that residents geographical mobility in the U.S. gastroenterologists increased under the centralized system. Their finding is consistent with mine in the sense that an online matchmaking company that offers a centralized matching system generates sorting patterns different from those generated under a decentralized traditional dating environment. Moreover, the matchmaking company used my estimated marital preferences to update its matching algorithm for generating a proposal, and increased the probability of a proposal turning into an actual date by a factor of 2. This fact suggests that insights from the market design literature can be beneficially applied for a wider range of economic environment such as marriage markets, in addition to school-choice and kidney-exchange problems which have been extensively studied in the literature. A brief overview of the remainder of this paper is as follows. Section 2 describes the institutional background and the data. Sections 3 and 4 present an empirical framework for estimation and the results, respectively. Section 5 provides the results of the counterfactual analysis. I then discuss several potential issues, such as selection bias, in Section 6. Section 7 concludes. 2 Industry and Data 2.1 Industry Online matchmaking companies emerged in Korea in the late 1980s and rapidly expanded their market. These matchmaking companies typically provide access to an Internet database where users can browse one another s profiles: the companies then 5

7 Table Table1: 1: Route of Finding a Spouse Survey Conductor KMCI Pollever Survey year Sample 305 couples 1,941 unmarried married in 2005 internet users Fraction of men Age Groups - 29 and younger and older Fraction of survey participants who are college students, graduates, or beyond* Route of finding a spouse/dating partner by age groups** all (1) (2) (3) all Online matchmaking companies Internet/Club Friends, College, or Work Place Family/Relatives/Matchmakers Others * In the 2005 marriage register, the fraction of people with tertiary education was percent. ** Definition of age groups: (1) younger than 30, (2) between 30 and 33, and (3) older than 34. The survey by Pollever does not provide statistics broken down by age group. : The survey by Pollever does not provide statistics depending on [according to? broken down by?] age group. use a computerized algorithm to introduce singles to each other. These users are recruited through advertisements and pay a fixed advance fee for a pre-specified period, Source: usually see Section a year. 2.1 The use of online matchmaking services is quite common in Korea (see Table 1). According to the Korea Marriage Culture Institute (KMCI), 7.6 percent of couples who married in 2005 met through matchmaking companies. The use of online services is small among young people but still non-negligible. Similar results are found in another study of young Internet users conducted by a Korean research organization, Pollever. 2.2 Data The main dataset for this study comes from a major Korean online matchmaking company, which helps its users find a spouse among other users of the opposite sex. I have detailed information about 20, 689 individuals who started using the company s services between January 2002 and June 2006 including their individual characteristics, stated marital preferences, and the history of dating outcomes. 6

8 Figure 1: Regions of South Korea Motivation of Users and Reliability of Information The annual membership of the company costs 900,000 won in 2007 (approximately 900 US dollars), which is about 3.5 percent of the average annual income in Korea. 5 The fraction of users who have married as a result of using the matchmaking service is 13.4 percent. Because of the high membership cost and significant fraction of users getting married, I think it is reasonable to assume users are primarily motivated to seek marriage rather than casual dating. The information users provide about their characteristics is subject to several checks by the company. As much as possible, key information is legally verified (e.g., age, education, employment, marital status) or independently evaluated by the company (e.g., a facial grade). For some characteristics for which the company does not require a third-party verification (e.g., income and height), the company monitors the accuracy of the information via user feedback. The company routinely surveys its users about their experiences and asks them to verify the correctness of other users information. The company s contract specifies that the service will be terminated if a user is found to provide incorrect information. 7

9 Table 2: Users Characteristics 1 This table compares characteristics of users in the matchmaking data set (MM) with the official marriage register (MR). Table 2: Users s Characteristics 1 Matchmaking dataset MR Year January, 2002 ~ June, ~2005 All Married Number of individuals 20,689 1,594 2,477,648 Composition (percentage) Women Divorced Non-Korean Age 26 and younger and older Educational attainment Middle School or less High School College or more Technical College University Master s and Ph.D Region Seoul or Gyeonggi Gangwon Chungcheong Jeolla Gyeongsang Jeju and others Hometown Seoul or Gyeonggi Gangwon Chungcheong Jeolla Gyeongsang Jeju and others

10 Table 3: Users Characteristics 2 This table compares users of the matchmaking service with the general population. For population data, the top panel uses the WS ( ) and the bottom panel uses the PT (2004). Table 3: Users s Characteristics 2 Matchmaking General dataset Population Year Jan. 2002~June, 2006 WS( ) Distribution across industries (Percentage) Agriculture, forestry, fishing, Mining Manufacturing Public, electric power, gas, water supply Construction Wholesales & retail trade, consumer goods, restaurants & hotels Transportation, storage, communication Finance & insurance Real estate rental & business services Education services Health & social welfare Entertainment, housekeeping, personal service International & other foreign institution Others or unemployed Annual income (10,000 won) Mean Mean between 5 th and 95 th percentiles N.A Median N.A. Gender-specific Physical Traits PT (2004) Height (feet, inches) 34 and younger Men Women and older Men 5 8 [5 4, 5 7 ]** Women Weight (lb) 34 and younger Men [153.2, 157.0] Women [116.0, 120.4] 35 and older Men [151.9, 158.3] Women [123.9, 131.0] Body Mass Index* 34 and younger Men 22.8 [22.6, 24.0] Women 19.0 [20.3, 21.7] 35 and older Men 23.0 [24.7, 25.0] Women 19.4 [22.8, 25.1] * BMI = 703 * weight (pounds) / (height (inches)) 2 ** [a,b] denotes the case where the corresponding statistic ranges from a to b. * obs with no income: very initial men %, women percent. 9

11 2.2.2 Comparison between Users and the General Population I use four separate nationally representative datasets because no single population-based dataset captures all the features observed in my data. The closest analog to the matchmaking dataset is the marriage register (MR). The MR, the population of newlyweds in South Korea in a given year, provides information about husband and wife s age, education, residence, hometown, and marital history (never-married vs. not married). I use the MR as a baseline for drawing comparisons to the general population, and I supplement the analysis with three other datasets: the Basic Statistics Survey of Wage Structure (WS) for industries and income, the National Household Income and Expenditure Survey (HIS) for income of husbands and wives, and the Survey of Physical Traits of Koreans (PT) for height and weight. I find that, in terms of observable traits, the users of the company represent a wide spectrum of Koreans. As shown in Tables 2 and 3, the users include all types of Koreans, in terms of marital status, educational attainment, geographical location, and industry. However, the users overrepresent people who are older, more educated, and currently live in, or are originally from, Seoul and its surroundings (i.e., Gyeonggi province). As discussed earlier, the company does not request legal documents for a user to verify his/her reported income and physical traits. To gauge the reliability of the information, I compare the average income and BMI of users with those in population whose characteristics are the same as users in terms of age, gender, and educational attainment (for income): see Appendix C.1 6. As shown in the middle of Table 3, the average income among users is over 40 million won (about 40,000 dollars), larger than the average annual income in the population whose characteristics are the same as users is 30 million won. However, excluding people whose reported income is less than 5 th percentile or more than 95 th percentile among users, the 10 percent trimmed mean income is comparable to that of the population. The average height and weight of the matchmaking company s users are remarkably similar to those in the PT. 5 In contrast, online dating services in the United States, such as Yahoo Personals and eharmony, cost about 160 to 250 dollars for a comparable one-year contract. 6 Appendix was separately submitted. 10

12 2.2.3 Stated Marital Preferences The company surveys users, asking them to rank the three most important traits for their prospective spouse, as well as any religion or geographic location that they wish to avoid (see Appendix Table A.1). Male users top priority is appearance (44.6 percent), which is chosen most often, followed by personality (33.7 percent), and occupation and income (11.0 percent). In contrast, female users choose occupation and income (55.6 percent) most often, followed by personality (26.8 percent), and appearance (5.1 percent). A Kolmogorov-Smirnov test shows that the distribution of female users top priority is statistically different from that of male users. This gender difference in stated marital preferences is consistent with the findings in Fisman et al. (2006) and Hitsch et al. (forthcoming), both of whom find that women put greater weight on income while men respond more to physical attractiveness. Most users are open to all religions and geographic location Search System and Dating Outcomes Each user can find a partner for a date in two ways: he/she can search the company s database independently or have the company suggest a partner. In the first case, the user accesses the company s database via a website. The database contains users profiles with the users photograph, education level, names of schools attended, occupation, geographic location, birth order, and number of siblings. For online security and privacy reasons, the company does not immediately reveal income, weight, parental marital status, or parental wealth, but this information can be obtained prior to a first date by asking a staff member. Having found a suitable profile, the user then can send an electronic note to propose a first date (a user-initiated proposal). Users cannot initiate a proposal to other users if they have an ongoing relationship with any user of the company. In the second case, the company may introduce two users based on its algorithm (a company-initiated proposal). First, the company assigns each user a single-dimensional index (called OSI) based on all of the user s observable characteristics except geographical location, marital status, religion, and age. The OSI is intended to measure the extent to which a user should be attractive to the opposite sex as a spouse. The OSI 11

13 Table 4: Search System The top panel shows the distribution of partners overall attractiveness index depending on users index quintiles. Q1 is the lowest quintile and Q5 is the top quintile. Statistics in parentheses present the cumulative density of the corresponding statistics. The middle and bottom panels show the distribution of partners educational attainment and facial grade, respectively, given the men s own characteristics. Statistics in parentheses show the fraction of females who have the corresponding characteristics. Table 4: Search System Quintile of Users Own Overall Index (OSI) Q1* Q2 Q3 Q4 Q5 Men Partner s Index - Mean (cumulative density, %) (18.34) (30.98) (44.56) (59.95) (76.01) - SD MIN (cumulative density, %) (0.01) (0.01) (0.04) (0.01) (0.04) - MAX (cumulative density, %) (99.71) (99.92) (99.99) (99.99) (100.00) Women Partner s Index - Mean (cumulative density, %) (30.34) (45.70) (58.11) (70.78) (83.95) - SD MIN (cumulative density, %) (0.01) (0.01) (0.01) (0.01) (0.03) - MAX (cumulative density, %) (99.88) (99.99) (99.99) (99.99) (100.00) * Q1 refers to the lowest quintile and Q5 to the highest. Men s Educational Attainment ranges from 25 to 98, and the higher the OSI High gets, School the more Tech. a /Univ. user is expected Master/Ph.D. to be attractive. Women s Educ. A weight Attainment assigned to each characteristic is based on surveys of the company s - High School (8.59) staff - Tech. members, College who or University are experienced (75.89) in assisting60.95 users. Note that how the weights are assigned - Master s remains or Ph.D. the same throughout (15.52) the period 5.02 covered by my dataset. Next, the company selects a male and a female user whose OSI are, on average, similar to each Men s Facial Grade other among users who have no ongoing relationship; A it thenb sends ~ C an electronic D ~ F note to Women s the twofacial users, Grade along with each other s profile. This means that the company s algorithm - B ~ C generates a proposal to two (79.90) users whose observables are similar to each other, - A (8.74) D ~ F (11.36) more often than not. However, there are still large variations in terms of partners index (thus observables) among company-initiated proposals. The large variation among partners observables is important for us to identify users preferences for spousal traits. To see this point, consider a case in which users are the same except educational attainment and the company generates a proposal to two users only if the two have the same educational attainment. Then, the probability of a user accepting a date (or marriage) depends only on unobservable shock; thus, we cannot know people s preferences 12

14 Table 5: Description of Search Outcomes Table 5: Description of Search Outcomes Proposals First Second Marriage Date Date ** Men No. of users with obs.>0 * 9,538 8,911 6,690 1,370 [Percentage out of all users] [100] [93.43] [70.14] [14.37] Median Mean Standard Deviation Women No. of users with obs.>0 * 11,151 10,006 7,351 1,409 [Percentage out of all users] [100] [89.73] [65.92] [12.64] Median Mean Standard Deviation Proposals No. of all proposals 360,509 58,845 14,886 1,537 [conditional survival rate] [16.32%] [25.30%] [10.32%] No. of user-initiated proposals 44,986 4,547 1, [conditional survival rate] [10.11%] [26.63%] [10.57%] * The unit of observation is a proposal which reaches each stage. For example, users with obs.>0 for a second date means the number of users who have at least one proposal that reaches the second date. ** There is a discrepancy between the number of male and female users who eventually married because 185 male users and 224 female users married persons who joined the matchmaking company prior to on spousal educational attainment by analyzing the dataset. The distribution of OSIs for partners suggested by the company is shown in Table 4. I classify the members into ten groups based on gender and quintile of their own OSI. For each of the groups, I calculate the mean, standard deviations, minimum, and maximum value of the partners index. To gauge the magnitude of the statistics, I also include the cumulative density of the corresponding statistics in parentheses. For example, the first row shows that the average OSI of women suggested to men in the first quintile (Q1) is and percent of female users have an OSI lower than Regardless of a user s own OSI quintile, the minimum of OSI among partners belongs to the first percentile and the maximum of OSI among partners is in the 99 th percentile. Similarly, I find that users receive suggestions to meet all types of partners in terms of education, facial grade, marital history, and other characteristics. Once a proposal is made, either by the company or by a user, the company contacts the users to check whether they would like to have a first date. If two users agree to 13

15 have a first date, then the company contacts each of them after the first date and asks whether they would like to meet again for a second date. This response is recorded. Although the company does not examine the results of any subsequent dates in the same automatic fashion, a staff member assigned to each user regularly contacts his/her user and follows up on whether the proposal eventually resulted in marriage. Table 5 shows that there are 9,538 male users and 11,151 female users. All users in the dataset have at least one proposal; about 91 (68) percent of users have at least one actual first (second) date; about 13 percent of users get married to a person they found through the matchmaking company. For a median user, the user has about 27 proposals, 4 first dates, and 2 second dates. There are 360,509 proposals in the dataset, 16.3 percent of which (58,845 proposals) reach a first date. Among the proposals reaching a first date, 25.3 percent reach a second date, and among the proposals reaching a second date, 10.3 percent result a marriage. As shown in the bottom panel of Table 5, user-initiated proposals constitute only 12.5 percent of the total 7 and their probability of reaching a first date is 10 percent, much lower than the average of company-initiated proposals (17 percent). This fact suggests the possibility that the company s recommendation of a dating partner may affect a user s decision, especially for a first date Patterns of Sorting Table 6 presents the degree of sorting among users over the stages of the relationship. I calculate statistics measuring the degree of sorting for three groups: pairs who both wanted to have a first date, pairs who both wanted to have a second date, and couples who married. Column (1) presents the corresponding statistics among pairs formed by randomly drawing a man and a woman among users (random matching). The difference in sorting between the actual outcomes and random matching reveals the degree of sorting. Table 6 shows that users positively sort on all dimensions, and the degree of sorting across various dimensions is generally similar at different relationship stages. 7 The high ratio of company-initiated proposals to user-initiated proposals can be attributed to two factors: First, we can observe a user-initiated proposal only if at least one user wants to have a first date. Thus, the total number of profiles users reviewed for a first date is not necessarily smaller than the number of company-initiated proposals. Second, the company frequently sends a proposal to each user when the user is not in an ongoing relationship. For example, when a user declines a proposal, the company proposes a first date with another user within four days (median value). 14

16 population data, whereas those in column (7) are computed using weights based on women. In column (8), measures of sorting along industry and income are computed using weights based on men and women as shown in the Basic Statistical Survey of Wage Structure, because the HIS is not a representative sample of workers. When the HIS is used, the statistics using weights based on husbands are presented first followed by those based on wives weights. Def. 1 classifies education into 4 categories (high school or less, technical college, university, and Master or Ph.D.), and Def.2 classifies education either high school or less or college or more. Table 6: Sorting Pattern Random All Proposals User-Initiated Proposals 1 st date 2 nd date Married 1 st date 2 nd date Married (1) (2) (3) (4) (5) (6) (7) Mean difference of age Fraction of couples with - Same education Same marital history Same region Same hometown Same industry Income correlation Generally, the sorting patterns in the user-initiated proposals are comparable to the overall sorting (columns (5) to (7)), although the degree of sorting along region and hometown is lower. 3 Empirical Framework In analyzing a user s decision problem, one faces the choice of using a fully specified dynamic model or using a model approximating the user s optimization problem. I choose the second option for the following reason. Although my dataset is exceptionally rich compared to standard datasets on dating and marriage, three important data limitations related to measuring opportunity cost are impediments to the first option. First, the dataset does not have information on the date when a user decided to continue/discontinue a relationship. This information is key for determining how long a user needs to wait for a partner s response, which is essential to quantify the opportunity cost of accepting a proposal. Second, there is no information about the number of dates between the second date and marriage or, if the couple did not eventually marry, about which partner ended the dating relationship. Third, the dataset has little information on a user s own search process, such as what other profiles he/she browses in the database or whom he/she meets outside the matchmaking service. Thus, I employ a model in which a user s opportunity cost of accepting a date is approximated as a function of the user s observable characteristics, instead of introducing the additional assumptions that would be necessary for estimating a fully specified dynamic model with my data. 15

17 It is important to note that my model, which is described in the sections below, is consistent with the empirical facts shown in Section 2: I allow for the possibility of gender-specific preferences because the distribution of stated importance of spousal traits differs between men and women. I introduce learning about a partner s traits over the relationship to generate multiple dates with the same partner. In addition, my model generates some qualitative predictions similar to those that would be predicted by a fully specified dynamic model (see Section 3.1). The remainder of this section presents an empirical model, then discusses issues that arise in estimating the model, and presents identification and estimation methods. 3.1 Individual s Problem I begin by introducing some terminology and notation. A pair (m, w) refers to a specific combination of man m and woman w and also to the proposal between m and w since the company will introduce m and w only once. Subscript s {1, 2, 3} indicates the stage of the relationship for two users. Stage 1 represents the decision to have a first date. Stage 2 represents the decision to have a second date, and stage 3 contains the marriage decision. Superscript M or W indicates the gender of the decision maker in the pair. A binary variable Ys M (m, w) is one if a man m wants to continue a relationship with w at stage s and zero otherwise. Likewise, Ys W (m, w) is one if woman w wants to do so. I define the outcome of a proposal between m and w as a sequence {Y M 1 (m, w), Y W 1 (m, w), Y M 2 (m, w), Y W 2 (m, w), Y 3 (m, w)} where Y 3 (m, w) as the product of two users responses at s = 3 (i.e., Y 3 (m, w) = Y M 3 (m, w) Y W 3 (m, w)). Note that Y M 2 (m, w) and Y W 2 (m, w) are observable only if Y M 1 (m, w) = Y W 1 (m, w) = 1, and Y 3 (m, w) is observable only if Y M 2 (m, w) = Y W 2 (m, w) = 1. Because the notation is symmetric, from now on I describe the model considering the case where m receives a proposal from the company to date w. Let U M (m, w) denote m s utility from marrying w, and U M s (m, w) denote the corresponding expected utility given the information available at stage s. R M s (m) denotes the utility from ending a relationship at stage s, which is the expected utility from waiting for a new proposal in the next period. I assume R M s (m) depends on stage because the number of days for a user to wait for a partner s response may vary by stage. p M s (m, w) is m s expected 16

18 probability that m and w eventually get married at stage s. If m wants to continue a relationship with w at stage s but m and w eventually do not marry to each other, then m receives the utility d M s (m, w). d M s (m, w) can be interpreted as the utility from just having a first (second) date if it is positive, or disutilty from being rejected in each stage of the relationship if it is negative. If man m wants to continue a relationship with woman w at stage s, then he will get Us M (m, w) with probability p M s (m, w) and Rs M (m) + d M s (m, w) with probability 1 p M s (m, w). If he does not want to continue, he receives the utility Rs M (m). Therefore, if p M s (m, w) > 0, the condition that holds if m wants to continue a relationship with w at stage s is Y M s (m, w) = Us M (m, w) + (1 pm s (m, w)) d M p M s (m, w) Rs M (m) > 0. (1) s (m, w) In the rest of this paper, I use the term reservation utility to refer to the sum of the last two terms in Eq. (1). This model generates two qualitative predictions that would arise from a fully specified dynamic model. First, in a fully specified dynamic model, m can accept a date with w because he has an option value to reject her at stage 2 or 3. My model can generate this prediction if m gets sufficiently large utility from just having a date and the expected probability of eventually marrying w is not one (e.g, d M 1 (m, w) > 0 and p M 1 (m, w) < 1). Second, in a fully specified dynamic model, m can reject a date with w because m expects the probability that w wants to marry him to be low and he will suffer if he wants a date with w but w rejects him. My model also allows for this possibility, since if m has large disutility from rejection and the probability of eventually marrying w is not one (i.e., d M s (m, w) < 0 and p M s (m, w) < 1), then m will reject a date (or marriage) with w. 3.2 Preferences Let X m and X w be the vector of characteristics of man m and woman w, respectively. X m (i) (X w (i)) denotes its i th element. The utility that m receives from marrying w is a 17

19 function of the observable attributes of m and w and a pair-specific random utility ɛ M m,w: U M (m, w) = i { α M i X m (i) + β M i X w (i) + γ M i h (X m (i), X w (i)) } + ɛ M m,w (2) where h (x, y) = (x y) 2 if x and y are continuous, and h (x, y) = 1 (x = y) otherwise. The variable ɛ M m,w summarizes the characteristics of w that m cares about but that are unobservable to researchers (e.g., personality). It is drawn from a N(0, (σ M ɛ ) 2 ) distribution and is independent from ɛ W m,w, ɛ M m,w and ɛw m,w for all (m, w ) (m, w). This utility function has two key features. First, it allows men and women to have different utility functions because {α M, β M, γ M } can differ from {α W, β W, γ W }. Second, the utility function depends on the interaction between a husband s and a wife s characteristics because of the function h (X m (i), X w (i)) in Eq. (2). If either of the parameters {γ M, γ W } is not zero, then two men may rank potential mates differently depending on their own characteristics, and thus the estimated utility function may imply the complementarity between a husband s and wife s characteristics. Table 8 presents the attributes that may affect a user s utility from marriage. Some of the attributes require additional explanation. First, the variable facial grade ranges from A to F where a facial grade A is the most attractive and F is the least attractive. 8 Second, the variable hours worked is the average of the number of hours worked per year given a worker s gender, age group, educational attainment, and industry, constructed from the population wage surveys. I assume that after controlling for income and hours worked, individuals are indifferent about their spouse s industry. I take this approach for reasons of parsimony in order to reduce the computational burden of estimation. Third, Body Mass Index (BMI) is a height-adjusted measure of weight and ranges between 18.5 and 24.9 for normal-weight adults 20 years old and older. 9 Fourth, primary care-provider is a binary variable that is one if a man is the eldest son or if a woman is the eldest daughter and has no male siblings. This indicates whether a user is likely to be the primary care provider for his or her parents and thus the user may need to share the burden with his/her spouse. Marital status of parents is a binary variable 8 In the data, the distribution of facial grades is as follows: A(7.1 percent), B(38.3 percent), C(42.7 percent) and D F(9.6 percent). 9 Source: U.S. Centers for Disease Control and Prevention, Department of Health and Human Services 18

20 that is zero if the biological parents of a user are alive and still married to each other. Finally, I define a binary variable hometown conflict that is one if a user from Jeolla meets a partner from Gyeongsang because substantial political tensions exist between these two regions. 3.3 Expected Utility from Marriage and Learning Processes At each stage of decision, m forms an expectation on the utility from marrying w based on the available information set Ω M m,w,s (i.e., Us M (m, w) = E(U M (m, w) Ω M m,w,s)). This section presents two types of learning processes that govern m s expectation. In Type 1, m acquires additional information about w s characteristics not revealed in the online database but observable to researchers, discussed in Section In Type 2, m acquires additional information about w s characteristics unobservable to researchers (i.e., ɛ M m,w in Eq.(2)) Type 1 Learning Process: Linear Projection Let X w 1 and X w 2 denote w s characteristics observable to m at stage 1 and at stage 2, respectively. 10 Because the dataset does not provide information about the exact range of a partner s characteristics obtained by a user prior to a first date, I make the following assumption: X w 1 is all the characteristics included in the utility function, except for the four variables (denoted by X w 2 ) that are not presented in the online database (i.e., income, parental wealth, BMI, parental marital status). I assume that at stage 1 m predicts w s income as a linear function of her education and hours worked and does w s parental wealth as a linear function of her father s educational attainment. I assume that X w 1 is not correlated with w s BMI and parental marital status In theory, I can assume that some observable traits can be observable after a second date. However, in that case, estimation is more difficult because, after a second date, only the joint marriage decision is observable, not each user s response for marriage. 11 Although I introduce these assumptions to reduce computational burden, they seem plausible. For example, I find that a user s income is mainly accounted for by education and hours worked, and parental wealth is accounted for by father s education. In an OLS regression of income on the entire set of characteristics, education and hours worked account for over 93 percent of R-squared. In an OLS regression of parental wealth on the entire set of characteristics, father s education accounts for over 50 percent of R-squared. For BMI, over 92 percent of users have a normal weight. 19

21 3.3.2 Type 2 Learning Process: Bayesian Updating I assume that man m receives a noisy signal ζm,w,s M of woman w s true type ɛ M m,w when the two actually meet in person (i.e., stage s with s 2). I assume that a signal ζm,w,s M is the sum of the true type ɛ M m,w and noise νm,w,s. M The noise is assumed normally distributed with mean zero and variance (σν M ) 2. Man m uses Bayes rule to update the expectation of ɛ M m,w from the observed signals. 12 The assumption of no Type 2 learning at s = 1 is used for identification and discussed further in Section 3.6. Given the information set at stage s, the distribution of ɛ M m,wcan be written as: ɛ M m,w Ω M m,w,1 N ( 0, (σɛ M ) 2) ( s ) (3) ɛ M m,w Ω M m,w,s N (σν M ) 2 ζm,w,i M i=2 (σ ɛ M ) 2 + (s 1)(σν M ), 1 2 (σɛ M ) 2 + (s 1)(σν M ) 2 for s 2 where Ω M m,w,1 = {X m, X1 w } Ω M m,w,2 = {X m, X1 w, X2 w, ζm,w,2} M Ω M m,w,3 = {X m, X1 w, X2 w, ζm,w,2, M ζm,w,3}. M Having multiple dates with w improves the precision of m s prediction of ɛ M m,w since the conditional variance of w s unobserved attributes (V ar(ɛ M m,w Ω M m,w,s)) decreases in s. 3.4 Reservation Utility I assume that R M s (m) depends on four components. The first component c M s is a genderstage-specific common component. The second component L m is the number of singles of the opposite sex per km 2 in the region where m lives. This component captures the option value of finding a spouse outside the matchmaking service. 13 The third component, a user-specific random utility η m, incorporates unobserved users characteristics, such as willingness to marry. The fourth component, ω M m,w,s, is a pure idiosyncratic shock which is correlated with neither observables nor other random variables. 12 Examples of papers that employ a Bayesian learning process include Parent (2002), Gibbons et al. (2005), and Brien et al. (2006). 13 I examined an alternative specification using both L m and the sex-ratio. I find that the sex-ratio is not statistically significant at a conventional level, after controlling for L m. It is 20

22 normally distributed and its variance at stage 1 is assumed to be one. Thus, I have: R M s (m) = c M s + χ M L m + η m + ω M m,w,s (4) with η m N(0, (σ M η ) 2 ), ω M m,w,s N(0, (σ M ω,s) 2 ), and (σ M ω,1) 2 = 1. Next, I assume p M s (m, w) is the likelihood that a man whose type is the same as m would marry a woman whose type is the same as w, if the man were to get married. Using the marriage registers (MR), I define the type of a person based on age group, education, and location. I then compute p M s (m, w) by dividing the number of new marriages between men whose type is the same as m and women whose type is the same as w by the number of new marriages by men whose type is the same as m. Lastly, I assume d M s (m, w) to be constant given gender and stage (d M s ). 3.5 Issues Regarding Estimation Non-randomness of Proposals Proposals in the data are not randomly generated. With the company-initiated proposals, the non-randomness generated by the company s algorithm results in over-sampling of observations when two users OSIs (thus observables) are similar. The user-initiated proposals are observed only if at least one user wants a first date with the other user. When estimating the model, I use only company-initiated proposals for two reasons: First, the majority of proposals (87 percent) are company-initiated; Second, I do not have information about who the user browsed without asking out. Without the information on a user s own search process, we cannot weigh the importance of one user-initiated proposal relative to one company-initiated proposal. To address the non-randomness due to the company s algorithm, I construct weights as described below. I first classify users into 1,136 groups based on gender, OSI decile, age group, geographical location, and marital history. I then compute the probability of a user in group i of getting a proposal with a user in a group j. For proposals with i and j, I use the probability of observing a proposal between i and j under random matching divided by the observed probability as weights To see the need to use weights, consider a simple example as follows: Ym = α 1 +α 2 1(X m = X w )+ɛ 21

23 Table 7: Distribution of User s Tenure: First year of membership purchase Table 7: Year of Membership Purchase and Representation in the Sample Men Women Users Proposals Users Proposals (1) (2) (3) (4) (Jan ~ June) Sum Censoring Of the company-initiated proposals, 2.6 percent are censored, either because two users had a first date but the data does not have information about their second date or marriage, or because the two users had the first two dates but the data does not have information about whether or not they married. To estimate the model, I assume that the censoring occurs at a random manner. However, because only a small fraction of proposals are censored, the estimation results changed little even if I alternatively assumed that all censored proposals eventually did not result in marriage Sample Distribution of User-Specific Unobserved Reservation Utility I assume that the distribution of user-specific unobserved reservation utility (η m in Eq. (4)) in my data is the same as the population distribution, assumed to be N(0, (σ M η ) 2 ) for men and N(0, (σ W η ) 2 ) for women. However, it is possible that those who have high value of η m may remain at the service and thus be over-represented in the proposals. Alternatively, it is also possible that those who have a high value of η m may get disappointed by the quality of other users and stop using the service, thus making them under-represented. 15 Table 7 shows that except for people joining in the first half of 2006, the number of new users and their share in the proposals remain similar across where Y m = 1(Y m > 0),{X m, X w } 0, 1 and ɛ N(0, 1). Then, the estimate of α 2 is P r(y m = 1 X m = X w ) P r(y m = 1) = (1 P r(x m = X w ))(P r(y m = 1 X m = X w ) P r(y m = 1 X m X w )). If the company s algorithm generates proposals for people whose types are similar to each other, then (1 P r(x m = X w )) among the observed proposals is lower than (1 P r(x m = X w )) among random matching. 15 For example, about 13 percent of users who started to use the service between 2002 and 2005 stopped using the service after three months although they did not get married during this period and could have used the service for nine more months. 22

24 users year of membership purchase. Thus, this assumption can be plausible if, among users who join the company in the same year, the distribution of η m for men who left the service early is balanced out by those who used the service longer. I discuss the robustness check regarding this issue in Section Identification and Estimation Methods Parameters can be identified up to scale due to the feature that a user s response at a given stage is binary (i.e., whether to continue the relationship with a partner). Thus, I normalize the total variance at the first stage as one by assuming that the variance of random shock ωm,w,1 M and ωm,w,1 W is one and users do not receive noisy signal of a partner s type at the first stage. Because of the restrictions across stages, the variances of the composite random variables at stage 2 and 3 are identified (see Appendix A for the details). I use a Laplace type estimator (LTE) as suggested by Chernozhukov and Hong (2003), who show that the LTE performs well in applications where the parameter dimension is high and many local optima exist. The LTEs are defined similarly to Bayesian estimators, but use more general objective functions such as method of moments in place of the likelihood function in Bayesian estimators. For the LTE, I define the objective function to minimize the distance between actual moments and simulated moments from the model. The moments of interest consists of five categories: the probability of accepting a first date for a man and that for a woman, the probability of accepting a second date for a man and that for woman, and finally the probability of a pair of a man and a woman getting married to each other. Each probability is multiplied by instrumental variables that are the users observable characteristics. In total, I have 172 identifiable parameters and 245 moments (see Appendix B). 4 Estimation Results This section presents the estimated parameters in the model, trade-offs between spousal income and other traits, and the model fit. 23

25 Table 8: Estimation Results (Baseline Model) This table presents the estimation results for users surplus from marriage. Details of the regressors are in Appendix Table A.2. Table 8: Estimation Results Men Women Variables Source and unit Estimate SE Estimate SE Age: own Birth certificate, 10yrs Age: spouse Birth certificate, 10yrs Age: sq. diff. Birth certificate, 10yrs Edu: own high school Diploma Edu: own =tech. college Diploma Edu: own=master s or Ph.D. Diploma Edu: spouse high school Diploma Edu: spouse=tech. college Diploma Edu: spouse=master s or Ph.D. Diploma Edu: own=spouse Diploma Industry: own=spouse Proof of employment Hours worked: own Author s calculation Hours worked: spouse Author s calculation Hours worked: sq. diff Author s calculation Log income: own Reported* Log income: spouse Reported* Log income: sq. diff Reported* Dad's edu: own high school Reported Dad's edu: own=tech. college Reported Dad's edu: own=master s or Ph.D. Reported Dad's edu: spouse high school Reported Dad's edu: spouse=tech. college Reported Dad's edu: spouse=master s or Ph.D. Reported Dad's edu: own=spouse Reported Log parental wealth: own Reported* Log parental wealth: spouse Reported* Log parental wealth: sq. diff Reported* Facial grade: own=a Company s evaluation Facial grade: own=b Company s evaluation Facial grade: own=d~f Company s evaluation Facial grade: spouse=a Company s evaluation Facial grade: spouse=b Company s evaluation Facial grade: spouse=d~f Company s evaluation Facial grade: own=spouse Company s evaluation Height: own Reported, 1 meter Height: spouse Reported, 1 meter Height: sq. diff. Reported, 1 meter Body Mass Index: own Reported, Body Mass Index: spouse Reported, Body Mass Index: sq. diff Reported, Martial history: own=ever divorced Legal documents Martial history: spouse=ever divorced Legal documents Martial history: own=spouse Legal documents Primary care provider: own=yes Legal documents Primary care provider: spouse=yes Legal documents Primary care provider: own=spouse Legal documents

26 Table 8: Estimation Results (cont.) Region: own=spouse Legal documents Religion: own=spouse Reported Hometown: own=spouse Legal documents Hometown conflict: yes Legal documents Parental marital status: own Legal documents Parental marital status: spouse Legal documents Parental marital status: own=spouse Legal documents Density: own Author s calculation Inverse of the success rate at s=1 Author s calculation Inverse of the success rate at s=2 Author s calculation Inverse of the success rate at s=3 Author s calculation s.d. of composite shocks at s=2 (σ M 2) s.d. of composite shocks at s=3 (σ M 3) cov. btw. shocks at s=2 and s=3(k M ) s.d. of random reservation util. (σ M η) No. of proposals 165,896 No. of users 14,818 * The unit of income (parental wealth) is 10,000 won. The log of the variable is divided by 10 for scaling. 4.1 Net Utility from Marriage The estimates parameters described in Section 3 are presented in Table 8 in such a way that a positive coefficient of a variable implies that, ceteris paribus, a user is more likely to want to continue a relationship with a partner as the value of the variable increases. Many traits are statistically significant at a conventional level for explaining dating and marriage decisions, suggesting that people consider a large number of partner traits when they make their decisions on dating and marriage. In particular, parental socioeconomic status, such as father s education and parental wealth, still affects people s decisions, even after controlling for a large number of individual characteristics. 16 This finding suggests that the impact of family background on marital sorting can be important in studying intergenerational mobility. The estimated parameters governing the utility from the interaction between a husband s and wife s traits (i.e., γ M and γ W in Eq.(2)) are statistically different from zero for many traits. This implies that people may have different preference rankings for potential mates, depending on their own characteristics. For instance, consider age. On average, male users receive higher utility by marrying a young partner; a one unit in- 16 Charles et al. (2006) also find positive marital sorting by parental wealth, even after controlling for individual characteristics, among married couples in the United States. 25

27 Utili Women Utility from marriage Women Men Utility from marriage Men Women Figure 2: Utility from Spouse s Age Figure 3: Utility from Spouse s Income 6 2 crease in spousal age lowers the utility from marriage by However, thewomen larger the 0 age difference between a husband and a wife, the -2 lower the utility. Thus, the optimal -4 age of a user s spouse can vary by the user s own age. Figure 2 (the solid line) plots the expected utility that the median man (33-year-old) will receive from marrying a -8 Men -10 woman, as a function of the woman s age. The-12graph shows that, ceteris paribus, the median man considers a 27-year-old woman to be ideal. Similarly, the median woman (30-year-old) considers a 32-year-old man to be ideal (the dashed line in Figure 2). Preferences for similar types are also observed for height and marital history. On the other hand, for some characteristics, the utility from marrying a good type dominates the utility from marrying a similar type. For example, regardless of a user s own facial grade, both men and women strictly prefer a spouse with better facial features (i.e., A > B > C > {D F }). For spousal income, the more a spouse earns, the higher utility both the median man and woman enjoy in marriage (Figure 3). Interesting gender differences in preferences are observed for educational attainment and father s educational attainment. Consider two men (m 1, m 2 ) and two women (w 1, w 2 ). Suppose the highest educational attainment of m 1 and w 1 is a college degree and that of m 2 and w 2 is a master s degree. The estimation results suggest that man m 1 receives higher utility from marrying w 1 than marrying w 2 (0.15 vs ). Similarly, man m 2 receives higher utility from marrying w 2 than marrying w 1 (0.03 vs. 0.00). Therefore, the two men prefer marrying a woman whose educational attainment is the same as theirs. In contrast, both women receive higher utility from marrying m 2 than Utility from marriage

28 marrying m 1. Thus, women prefer marrying a man with high educational attainment regardless of their own. 17 We observe the same pattern for father s educational attainment. People in a region where there are many singles of the opposite sex have a higher reservation utility. This may reflect the fact that a high density of available singles increases the opportunity of finding a spouse more attractive than the current partner. A user s expectation of the probability of marriage affects the user s responses differently across stages. For example, if there are many marriages between men and women whose types are the same as m and w, respectively, then m is more likely to accept a first date with w, but less likely to accept a second date with w. The estimated covariance of the composite random shocks implies that compared to having only a first date, having a second date reduces uncertainty due to imperfect information about the partner s unobservable type up to 47 percent for men and 20 percent for women (see Appendix A). 4.2 Trade-offs between Spouse s Income and Traits I compute the trade-offs between spousal income and other traits to gauge the magnitude of these traits contribution to the utility from marriage. I take the median man and woman in terms of all observable characteristics and compute how much of spouse s annual income they are willing to forgo to marry someone who is identical to their spouse along all but one dimension. Columns (1) and (2) of Table 9 report the results. For example, the median man whose facial grade is C is willing to forgo about 159 million won to marry a person whose facial grade is two notches higher than the median woman (i.e., facial grade C to A). The columns show that having attractive facial features and having an desirable height provide a user s spouse with a sizable utility which should be compensated by a large amount of income. To marry a spouse with facial grade A, men are willing to forgo much larger spousal income than women, which is consistent with the findings in Fisman et al. (2006) and Hitsch et al. (forthcoming). However, interestingly, women s willingness to pay to marry a spouse with the optimal height is comparable to men s. The amount of spousal income a user is willing to forgo in order 17 For high school educated men and women, preferences for spousal educational attainment depends on specifications (see Tables A.2 and A.3 in Appendix for further comparison). 27

29 Table 9: Trade-Offs The first row shows the preference ranking of partners varying education and facial grade but holding all other conditions constant. The subsequent rows present the annual income of partners that the median men (or women) are willing to forgo in order to change their partner s characteristics. Table 9: Trade-offs Baseline Alternative I Median Men Median Women Median Men Median Women (1) (2) (3) (4) Facial grade C A , * C B C D ** Height Median of the opposite sex 5' 4" 5' 8" 5' 4" 5' 8" Optimal height 5' 6" 5' 10" 5' 5" 5' 11" Median Optimal Education Univ High (high school) Univ Tech (technical college) Univ Master s/ph.d ** Father s Education High Tech (technical college) High Univ (university) High Master s/ph.d Unit: million won (roughly equivalent to 1,000 US dollars) * Maximum value in the sample, ** Minimum value in the sample to marry someone with desirable educational attainment is similar to that for finding a spouse with desirable father s educational attainment, but less than that for having a mate with facial grade A. 4.3 Goodness of Fit Table 10: Model Fit To examine goodness of fit of the model, I present a subset of the 245 moments in Table P(Y 1 M =1) is the probability of a man accepting a first date; P(Y 2 M =1 Y 1 M = Y 1 W =1) is the probability of a man 10. accepting Columns a second (1) date and conditional (2) show on having the statistics a first date; from P(Y W 1 =1) theand rawp(ydata W 2 =1 and Y M 1 = the Y W 1 weighted =1) are similarly ones, defined for women. P(Y 3 =1 Y M 1 = Y W 1 =Y M 1 = Y W 1 =1) is the probability of a pair getting married after two dates. R(m,w) describes the outcome of a proposal among 8 possible events. Table 10: Model Fit Data (Company) Prediction unweighted weighted Baseline Alternative Alternative I II (1) (2) (3) (4) (5) P(Y M 1 =1) P(Y W 1 =1) P(Y M 2 =1 Y M 1 = Y W 1 =1) P(Y W 2 =1 Y M 1 = Y W 1 =1) P(Y 3 =1 Y M 2 = Y W 2 =1)

30 respectively. For example, in the data, a male (female) user on average accepts a first date with the probability of (0.266). When we weight the sample to correct nonrandomness of proposals, then the weighted probability of accepting a first date becomes for men and for women. Column (3) presents the predicted statistics based on the estimated model. Overall, the model fits the data well, although the predicted acceptance rate for a first date is slightly higher than the actual data. 5 Counterfactual Analysis In this section, I examine the outcome of a proposal depending on whether the proposal is initiated by the matchmaking company or by another user. Then, I study the implications of wider use of the online search for marital sorting in Korea. 5.1 Role of Online Matchmakers in Mate Search Around the world, a growing number of online mate-search services recommend potential mates to their users. Therefore, It is important to examine to what extent the recommendation of a potential mate by an online matchmaking affects an individual s decisions regarding dating and marriage. In particular, if recommendations by the online matchmakers affect people s marriage decisions, then the algorithms that the matchmakers use for recommendation can change search outcomes including marital sorting. Ex ante, it is not clear whether a matchmaker s recommendations would have any impact on people s dating or marriage decisions because people do not need to take the recommendations. On the other hand, since an individual cannot perfectly observe a potential mate s characteristics, the individual may form his/her expectation of the utility from marriage with the potential mate depending on the channel through which the individual encounters the potential mate: whether a third party (e.g, matchmaker) introduces them or the potential mate contacts the individual. 18 In theory, a user in my dataset may perceive partners suggested by the company 18 Studies on retirement saving decisions document that people are more likely to choose a saving option selected as a default and suggest, as a possible cause of the finding, that people may perceive the default option to be the best based on the assumption that the service provider has more information than they do (e.g., Beshears et al., 2008). 29

31 Table 11: Role of Online Matchmakers in Acceptance Behavior Table 11: Role of Online Matchmakers Data Prediction (User) Baseline Alternative I Alternative II (1) (2) (3) (4) Decision for a first date - male recipients P(Y M 1 =1) 0~ female recipients P(Y W 1 =1) 0~ Decision for a second date - P(Y M 2 =1 Y M 1 = Y W 1 =1) P(Y W 2 =1 Y M 1 = Y W 1 =1) Decision for a marriage - P(Y 3 =1 Y M 2 = Y W 2 =1) as better because the company has more information about them, such as income and family background, or because the user thinks a person with low pair-specific random utility (e.g., an aggressive person) is more likely to ask another user out. Note that in both examples, a user will be more likely to have a first date with a partner if the partner was introduced by the company. Therefore, I empirically examine to what extent the recommendation by an online matchmaking affects an individual s decisions, by investigating a user s behavior regarding willingness to continue a relationship. If a proposal is initiated by a user, not by the company, then the dataset does not inform us who, of the two users, initiated the proposal. However, we can infer who did by analyzing users responses. Among user-initiated proposals, 77.1 percent were accepted for a first date by men but not by women (thus, male user-initiated); 12.8 percent were not accepted by men but were accepted by women (thus, female user-initiated); and the remaining 10.1 percent were accepted by both men and women (ambiguous case). Therefore, we can compute the maximum (minimum) probability that a female user would accept a male user-initiated proposal for a first date, by assuming that all the ambiguous proposals are initiated by male (female) users. I report the range of the actual acceptance rate in column (1) of Table 11 and the predicted rate if the company introduced the two to each other in column (2). I find that a male user is at least 14 percentage point more likely to want to have a first date with a woman if the woman is introduced by the company, as compared to the case where the woman directly asks him out, although the results for females depend on how to classify the ambiguous proposals. Conditional on having a first date, the probability of a proposal reaching 30

32 a second date or marriage remains similar regardless of who initiates the proposal. I find the qualitatively same patterns regardless of whether a user and his/her partner have the same trait in terms of education, father s education, and industry. This finding implies that the company s recommendation increases the overall acceptance rate but appears not to increase the acceptance rate for certain types of partners more than not and thus does not change sorting patterns. This is consistent with our earlier finding in Section 2.2.5, which is the patterns of sorting among user-initiated proposals are similar to those among company-initiated proposals. 5.2 Online Mate Search and Marital Sorting In the top panel of Table 12, I report the fraction of newlyweds in Korea who have a spouse with the same characteristics. I find that marital sorting between 1991 and 2005 changed in the following ways: the probability of an individual marrying a spouse whose trait is the same as his/her own has decreased for hometown, increased for region and marital history (never-married vs. divorced), and remained similar for educational attainment (high school or less vs. college or more). To gauge the extent to which changes in the underlying distribution of people s traits account for this time trend, I perform the following exercise. For each trait, I first compute the fraction of couples who have a common value for this trait if a man and a woman were randomly selected from among the newlyweds. I then regress a dummy variable indicating whether a husband and a wife have a common value for this trait on a constant, a calendar year, and this fraction. The bottom panel of Table 12 presents the results of this regression for each trait. The results suggest that changes in the underlying distribution of people s traits alone cannot account for the time trend. 19 I perform two exercises to examine the possibility that wider use of online matesearch services account for the time trend in marital sorting. Note that in South Korea, the fraction of newlyweds who met their spouse through online mate-search services increased from nearly zero in 1991 to 8 percent in In the first exercise, I take an average man and a woman among the newlyweds in the population and compute their likelihood to marry a spouse with the same traits in the population (column (1) of Table 19 The finding of a time trend of marital sorting is robust to probit and logit specifications. 31

33 Table 12: Aggregate Trend of Martial Sorting Columns (1) to (4) in the top panel present the mean of a dummy variable indicating whether newlyweds in the marriage registrar have the same education level, live in the same region, grew up in the same hometown and have the same marital history, respectively. Numbers in brackets show the corresponding statistics under random matching. Columns (1) to (4) in the bottom present a regression analysis, where the dependant variable is the corresponding dummy variable in each column. Table 12: Aggregate Trend of Marital Sorting Traits Education* Region Hometown Marital History (1) (2) (3) (4) Degree of Marital Sorting [0.539] [0.303] [0.235] [0.899] [0.518] [0.320] [0.237] [0.874] [0.503] [0.331] [0.238] [0.828] [0.504] [0.350] [0.238] [0.783] [0.534] [0.356] [0.238] [0.777] Regression: OLS Sorting (random matching) (0.002) (0.002) (0.006) (0.001) Year/ (0.001) (0.001) (0.001) (0.000) Observations 1,246,887 1,250,266 1,250,266 1,247,450 R-squared * High school or less vs college or more 13). I also compute the corresponding statistics for people who have the same characteristics as the average individuals, but use the online matchmaking company (column (3) for the average man and column (4) for the average woman). I compute the statistics in columns (3) by weighting the male users to match the distribution of newlywed men in the population in terms of education, marital history, region, hometown, industry, and age group. Similarly, I compute the statistics in column (4) for women. Comparing column (1) with columns (3) and (4) shows that if the average person in the population and the corresponding person among the users have the same marital preferences, then using online match search services will change his or her likelihood of marrying a spouse with the same trait in the following way: he/she is less likely to marry a spouse with the same type in terms of hometown and industry, but more likely to marry a spouse with the same marital history. The prediction for the probability of marrying a spouse with the same educational attainment varies by sex. Therefore, the wider adoption of online mate-search services could explain the time trend of marital sorting since In the second exercise, I use the estimated preferences to compute the male-optimal stable matching with the Gale-Shapley algorithm (1962) and calculate the probability 32

34 Table 13: Adoption of Online Marriage Market Intermediaries and Marital Sorting Columns (1) and (3) show the marital sorting in the population. Column (2) shows the predicted marital sorting in 1991, if the OLS results in Table 2 are employed and the underlying distribution of traits is the same as in Column (4) shows calculated marital sorting in the case that nobody found their spouse via online matchmaking services in 2005, and column (5) computes the fraction of changes accounted for by the use of online matchmaking services. Columns (6) and (7) present marital sorting when the entire population uses online matchmaking services. The difference between the two is whether the simulation allows users to prefer being single rather than marrying an available partner (column (6)) or not (column (7)). Table 13: Marital Sorting and Online Mate-Search Services Actual Prediction 1991 level Weighted Weighted Male-opt. popularity (men) (women) stable matching Femaleopt. stable matching (1) (2) (3) (4) (5) (6) Fraction of couples with - same education* same marital history same region same hometown same industry 0.365~ * High school or less vs college or more of the average individual marrying a spouse with the same traits. I use the maleoptimal stable matching from the Gale-Shapley algorithm because I nd it generates sorting comparable to the sorting among users of the online matchmaking service who ultimately marry (see Appendix D). This exercise also allows me to simulate marriages where the distribution of traits for both men and women in the matchmaking company is representative of the population, whereas in the results of the previous analysis, the distribution of only one sex s traits is representative. To do the second exercise, I first sample the users of the matchmaking company with weights to match the distribution of characteristics in the sample with their distribution in the population. I assume that people have two dates with all possible candidates and rank them based on the observables and random shocks. Thus, for each of all possible pairs of men and women, I draw a set of random shocks that are mixture of nosy signals of a partners type and pure random component in reservation utility. I then compute the expected utility from marrying each other and construct peoples preference rankings over all potential mates. With the computed expected utilities, I compute the male-optimal stable matching by the Gale-Shapley algorithm (1962) and the marital sorting among the hypothetical marriages. I iterate this process an additional nine times and report the average statistics of marital sorting in column (5) of Table 13. Just as in the first exercise, the use of online matchmaking services generates less sorting along hometown and industry and more sorting along marital history, as compared to the actual sorting in the population. This finding suggests that marital sorting in the population has become similar to the sorting among users of the online matchmaking company. In Korea, the two largest companies 33

35 one of which provided my dataset account for more than 80 percent of sales in the industry (Korea Fair Trade Commission, 2004) and the other leading company is known to have business practices similar to the source of my dataset. Therefore, the wider use of online mate-search services may account for the changes in marital sorting in the population from 1991 to Robustness Checks 6.1 Range of the Type 1 Learning To certain extent, a user can get the information about another user s traits not included in the online database if the user contacts the company s staff member assigned to him/her. I estimate an alternative model (referred to as Alternative I ) in which users can see all observable characteristics of other users prior to a first date. I find that, as shown in column (3) of Table 11 and column (2) of Table 14, the estimated Alternative I generates the results for the counterfactual analysis quantitatively comparable to the baseline model, although the estimated preferences for spousal educational attainment is not the same as that in the baseline (see Appendix Table A.2) Lifetime Income It is possible that people may value a life-time income prospectus of their spouse, rather than the spouse s current income per se. I estimate another model (Alternative II ) in which a user s present discount value of the life-time income (PDV) replaces the user s income. To compute the PDV, I use WS, a population income survey, to estimate the growth rate of the income and the probability of being laid off, specific to sex, education, age, and industry. Based on the estimation results, I compute the PDV as the sum of annual expected income discounted by the average interest rate (see Appendix C.3). The estimated model yields the results for the counterfactual analysis similar to the baseline 20 In particular, in the baseline model, a college educated woman prefers a man with master s degree or Ph.D. the most, then a high school graduate, followed by a college graduate and a graduate of a technical college. On the other hand, in the alternative model, the preference ranking between a college graduate and a high school graduate is reversed. 34

36 is table presents simulation results using alternative estimates. Columns (1) and (2) are observed marital sorting and the baseline model prediction whe ire population uses the online matchmaking services. Columns (3) to (8) present results using estimates based only on first-date decisions, and columns ( employ a fixed-effects linear probability model. Column (4) uses the response of partners as a control variable; column (5) uses the duration of usin tchmaking company as a control variable; column (7) allows the possibility that high school educated and college educated may have different u ctions; and column (8) allows for the possibility that people may value the degree of dissimilarity between their own and partner s traits differently depen the direction of dissimilarity. ed on male optimal Table 14: Robustness Checks Model Baseline Alternative Models I II III IV (Sec. 6.1) (Sec. 6.2) (Sec. 6.3) (Sec. 6.5) (1) (2) (3) (4) (5) Percentage of couples with - same education* same marital history same region same hometown same industry * High school or less vs college or more case: see column (4) of Table 11 and column (3) of Table Selection on Willingness to Marry It is possible that the distribution of people s willingness to marry in my sample may not be the same as the population s distribution. Since people s willingness to marry is individual-specific, we can remove the potential source of bias by using a model with individual-fixed effects. In particular, I use a linear-probability model with individual fixed effects to recover marital preferences and then re-perform the counterfactual analysis. For simplicity, I use only first-date outcomes because Alternative I generates similar results to the baseline model, and parameters in Alternative I relevant to this robustness check are identifiable by analyzing the first date decision. Column (4) of Table 14 presents the results, which are comparable to the baseline one. 6.4 Selection on Marital Preferences It is possible that the preferences for spousal traits among the users may be very different from those in the population, but marital sorting in the population may randomly become more like that among the users. Although directly addressing this problem is impossible given the available data, I attempt to measure the importance of this bias as follows. I calibrate the parameter governing the preferences for spouse-working-inthe-same-industry, in order to match the marital sorting among marriages based on the Gale-Shapley algorithm with the actual marital sorting in the population. I then ex- 35

37 amine the implication of the magnitude of the calibrated parameter. I find that the magnitude of the calibrated parameter suggests that for both men and women, working in the same industry outweighs a spouse s appearance, educational attainment, and many other important characteristics. This implication appears to be implausible since, in several surveys of people s priorities for spousal traits (e.g., Gallup Korea, 2007 and DUO, 2007), single Koreans rarely indicate that working in the same industry is an important factor for marriage decision. Therefore, selection on marital preferences may not entirely account for the difference in marital sorting among population and the users of online mate-search services. 6.5 Heterogeneous Marital Utility Function I relax the assumption in the baseline model that all men (or women) have the same marriage utility function. In particular, I use a model that assumes high school graduates and college graduates to value their spousal educational attainment differently, but is the same as Alternative I in Section 6.3, otherwise. The results of this counterfactual analysis using the alternative estimates are shown in column (5) of Table 14 and are comparable to the baseline results in column (1) of the table. 6.6 Possibility of Polygamy One may be concerned that the estimated model does not restrict users from having multiple marriage partners, although polygamy is illegal in Korea. To check whether the absence of this restriction is quantitatively important, I simulate users responses to the company-initiated proposals by drawing expected utility from marrying partners using the estimates in Section 4. I find that polygamy rarely occurs: averaging across ten simulations, only two out of over 14,000 users have multiple spouses. 7 Conclusion The findings of this paper suggest several directions for future research. Using the estimated users preferences, we can design an alternative algorithm to improve the 36

38 probability of a proposal resulting in an actual date or marriage. Recently, the company partially adopted the estimation results to revise its algorithm for introducing two users. The new algorithm increased the probability of a proposal turning into an actual first date by a factor of 2, suggesting that the potential gain from designing an alternative matching algorithm can be large. Next, my estimation results show that parental socioeconomic status directly affects an individual s marriage decisions, even after controlling for the individual s socioeconomic status. Therefore, it may be useful to examine intergenerational mobility in an environment in which parental socioeconomic status partially determines not only a child s educational attainment but also the child s marriage. Note that previous studies have focused on the first mechanism but not on both. Finally, using a full dynamic search framework to analyze a two-sided search market, such as a marriage market, will be interesting, and I leave the task of extending my model to incorporate the full dynamic search aspects for future research. References Abramitzky, Ran, Adeline Delavande, and Luís Vasconcelos, Marrying Up: the Role of Sex Ratio in Assortative Matching. 2009, Working Paper, Stanford University. Angrist, Joshua, How Do Sex Ratios Affect Marriage And Labor Markets? Evidence From America s Second Generation, The Quarterly Journal of Economics, August 2002, 117 (3), Autor, David, Wiring the Labor Market, Journal of Economic Perspectives, 2001, 15(1), , The Economics of Labor Market Intermediation: An Analytic Framework, in David Autor, ed., Studies of Labor Market Intermediation, Chicago: University of Chicago Press, Bagues, Manuel F. and Mauro Sylos Labini, Do On-Line Labor Market Intermediaries Matter? The Impact of AlmaLaurea on the University-to-Work Transition, in David Autor, ed., Studies of Labor Market Intermiation, Chicago: University of Chicago Press, Banerjee, Abhijit, Esther Duflo, Maitreesh Ghatak, and Jeanne Lafortune, Marry for What? Mate Selection in Modern India, Working Paper, MIT. 37

39 Beshears, John, James Choi, David Laibson, and Brigitte Madrian, The Importance of Default Options for Retirement Saving Outcomes: Evidence from the United States, in Stephen J. Kay and Tapen Sinha, eds., Lessons from Pension Reform in the Americas, Oxford: Oxford University Press, 2008, pp Bisin, Alberto, Giorgio Topa, and Thierry Verdier, Religious Intermarriage and Socialization in the United States, Journal of Political Economy, 2004, 112, Brien, Michael J., Lee A. Lillard, and Steven Stern, Cohabitation, Marriage, and Divorce in a Model of Match Quality, International Economic Review, 2006, 47, Charles, Kerwin, Liqian Ren, and Erik Hurst, The Nature and Consequences of Marital Sorting by Parental Wealth. 2006, Working Paper, University of Chicago. Chernozhukov, Victor and Han Hong, An MCMC Approach to Classical Estimation, Journal of Econometrics, 2003, 115, Choo, Eugene and Aloysius Siow, Who Marries Whom and Why, The Journal of Political Economy, February 2006, 114 (1), DUO, Survey of Priorities for Spousal Traits, Technical Report Fernández, Raquel, Nezih Guner, and John Knowles, Love and Money: A Theoretical and Empirical Analysis of Household Sorting and Inequality, The Quarterly Journal of Economics, January 2005, 120 (1), Fisman, Raymond, Sheena S. Iyengar, Emir Kamenica, and Itamar Simonson, Gender Differences in Mate Selections: Evidence from a Speed Dating Experiment, The Quarterly Journal of Economics, May 2006, 121, ,,, and, Racial Preferences in Dating: Evidence from a Speed Dating Experiment, Review of Economic Studies, 2008, 75, Gale, David and Lloyd S. Shapley, College Admissions and the Stability of Marriage, The American Mathematical Monthly, January 1962, 69 (1), Gallup-Korea, Survey on Ideal Spouse among Korean, Technical Report Gibbons, Robert, Lawrence F. Katz, Thomas Lemieux, and Daniel Parent, Comparative Advantage, Learning, and Sectoral Wage Determination, Journal of Labor Economics, October 2005, 23 (4), Hitsch, Günter J., Ali Hortaçsu, and Dan Ariely, What Makes You Click? Mate Preferences and Matching Outcomes in Online Dating, Working Paper, University of Chicago. 38

40 ,, and, Matching and Sorting in Online Dating Markets, American Economic Review, Forthcoming. Korea Consumer Association, Survey of Matchmaking Services Providers Korea Labor Institute, Labor Statistics, The Korean Labor Institute, Korea Marriage Culture Institute, Survey of the Korean Marriage Culture Korean Agency of Techonology and Standards, Survey of Physical Traits of Koreans Kuhn, Peter and Mikal Skuterud, Internet Job Search and Unemployment Durations, American Economic Review, March 2004, 94(1), Kurzban, Robert and Jason Weeden, HurryDate: Mate Preferences in Action, Evolution and Human Behavior, 2005, 26 (3), Madden, Mary and Amanda Lenhart, Online Dating, Technical Report, PEW/INTERNET March Niederle, Muriel and Alvin E. Roth, Unraveling Reduces Mobility in a Labor Market: Gastroenterology with and without a Centralized Match, The Journal of Political Economy, December 2003, 111 (6), and, The Effects of a Central Clearinghouse on Job placement, Wages, and Hiring Practices, in David Autor, ed., Studies of Labor Market Intermiation, The University of Chicago Press., and Leeat Yariv, Matching Through Decentralized Markets, Working Paper, Stanford University. Parent, Daniel, Matching, Human Capital, and the Covariance Structure of Earnings, Labour Economics, 2002, 9 (3), Pollever, Survey of Korean Marriage Republic of Korea. Fair Trade Comission, Press Release March Republic of Korea. Ministry of Labor, Basic Statistical Survey of Wage Structure , Labor Demand Survey Republic of Korea. National Statistical Office, National Population and Fertility Survey , National Household Income and Expenditure Survey Wong, Linda Y., Structural Estimation of Marriage Models, Journal of Labor Economics, July 2003, 21 (3),