4 30 Journal of Interactive Advertising Fall 2010 site), or " 班 " (tutoring classes). A default level exists, and there are a total of six levels. To gain some intuitive understanding, we took three keywords as examples and decomposed their semantic components. We present the results in Table 1. Table 1. Examples for Keyword Recoding Keyword 北 京 家 教 (Tutoring Beijing) in 初 三 物 理 家 教 (Physics tutoring for students in the third year of secondary school) 小 学 英 语 辅 导 班 (Primary school English instructing class) Place 北 京 (Beijing) NA NA Action NA NA NA Grade NA 初 三 (third year of secondary school) 小 学 (primary school) Subject NA 物 理 (physics) 英 语 (English) Purpose 家 教 (tutoring) 家 教 (tutoring) 辅 导 (instructing) Classification NA NA 班 (class) Furthermore, we added the total clickthrough volume for the 38-day data collection period to derive a final clickthrough volume. To study the influence of keyword rank, we treated it as a categorical variable. Specifically, we classified rank into three categories: Category 1 consists of those keywords with an average rank between 1 to 1.5, Category 2 comprised those from 1.5 to 2.5, and Category 3 were keywords with ranks higher than 2.5. In addition, we obtained the length of each keyword by taking one Chinese character as two characters in length; therefore, the words " 北 京 家 教," " 初 三 物 理 家 教," and " 小 学 英 语 辅 导 班 " are 8, 12, and 14 characters in length, respectively. In total, the data include 648 samples. Model Setup Because the dependent variable (clickthrough volume) is a form of count data, we introduced Poisson regression to the analysis. We assume that the probability of clickthrough volume for the ith keyword is k times:, where is the average clickthrough volume for the ith keyword. Although clickthrough volume is an integer, the average clickthrough volume is continuous. Because has a large variance, we carried out a logarithmic transformation. The Poisson regression model is: where is the average clickthrough volume for the ith keyword; rank 1 and rank 2 refer to the first and second category of rank, with the third category as a baseline; and length i is the length of the ith keyword. We recoded the keyword as six factors, so refers to the kth level under the jth (j = 1,..., 6) factor of the ith keyword. It is a dummy variable, for which factor. Descriptive Analysis means the total number of levels under the jth RESULTS We summarize the descriptive statistics for clickthrough volume, rank, and length in Table 2.

5 31 Journal of Interactive Advertising Fall 2010 Table 2. Descriptive Analysis for Clickthrough Volume, Rank, and Keyword Length Variable Maximum Median Minimum Mean Standard Deviation Clickthrough volume Rank Keyword length As Table 2 shows, the largest clickthrough volume for a keyword is 3,143, and the smallest is just 1 click. The median clickthrough volume is 3, which means the distribution of this variable is significantly right-skewed; most of the clickthrough volume of a keyword is no more than 3 times, but some keywords were clicked on 3,143 times. The highest rank is 1, and the lowest is 3, due to the rearrangement of this variable. The lengths of the keywords range from 4 to 18 characters. For the basic characteristics of the keywords, we conducted a descriptive analysis of the six factors. Of the 22 place levels, the default accounts for 10.34% of total keywords, indicating that a significant portion of keywords do not include any specific place information. The keywords containing " 北 京 " (Beijing) and " 天 津 " (Tianjin) account for 6.64% and 6.17%, respectively, of total keywords. The action variable mostly is accounted for by the default (94.91%), followed by " 找 " (searching) at 5.09%. The default level of the grade also accounts for the largest proportion (50.31%), followed by " 高 中 " (senior secondary) and " 小 学 " (primary school), with shares of 13.73% and 11.88%, respectively. Similarly, of its six levels, the default subject level accounts for 55.56%, followed by " 数 学 " (mathematics) and " 英 语 " (English), which represent 14.51% and 13.43%, respectively. For the purpose variable, we included no default level, and most of the keyword sample consists of " 家 教 " (tutoring, 69.44%), followed by " 辅 导 " (instructing, 21.76%). Finally, classification mainly consists of the default level, at 83.95%, followed by " 班 " (classes) at 8.64%. Regression Results Judging from the overall fit of the model, the chi-squared goodness of fit is 22,688, with a p-value of less than.0001, so the model is significant. The estimated parameters of all variables, the standard errors, and the p-values are in Tables 3-9. The first level in each table is the benchmark; for this study, the default level of each factor, except purpose, provide that benchmark. Table 3. Regression Results: Intercept, Rank, and Keyword Length Variable Parameter Standard Error p-value Intercept <.0001 Rank (Category1) <.0001 Rank (Category 2) <.0001 Keyword length <.0001 In Table 3, we show that the coefficients of the two categories of rank are positive, so when the rank of the keyword is 1 or 2, the clickthrough volume is larger than that of one ranked 3. Specifically, the coefficient of Category 1 is smaller than that of Category 2, which indicates that the clickthrough volume is greater when the rank is 2. Furthermore, the keyword length coefficient is negative, that is, the longer the keyword, the less the clickthrough volume.

6 32 Journal of Interactive Advertising Fall 2010 Table 4. Regression Result: Place Beijing <.0001 Changchun <.0001 Chengdu <.0001 Dalian <.0001 All levels for the place factor are significant and negative. That is, the clickthrough volume of keywords that include place information is significantly less than that of keywords without any geographic information. The average regression coefficient for place is Table 5. Regression Result: Action 找 (searching for) <.0001 Action is also significant, and the coefficient is negative, indicating that keywords that do not express any action earn the largest clickthrough volumes. Table 6. Regression Result: Grade 初 三 (third year in junior secondary school) <.0001 初 中 (junior secondary school) <.0001 高 中 (senior secondary) <.0001 The regression coefficients for all grade levels are significant and negative. The clickthrough volumes for keywords containing information about grades thus are significantly lower than those with no grade information. The mean of the regression coefficients for all grade levels is

7 33 Journal of Interactive Advertising Fall 2010 Table 7. Regression Result: Subject 化 学 (chemistry) <.0001 数 学 (mathematics) <.0001 物 理 (physics) <.0001 Similar to the previous factors, all subject levels are significant, with negative regression coefficients. The clickthrough volumes of keywords containing information about subjects are significantly less than those of keywords without subjectrelated information. The mean of the regression coefficients for the subject factor at all levels is Table 8. Regression Result: Purpose 课 外 辅 导 (extracurricular instructing).0000 辅 导 (instructing) <.0001 家 教 (tutoring) <.0001 家 教 辅 导 (instructing by tutor) <.0001 Because there is no default level for the purpose factor, we use the level selected by the statistical software SAS as a benchmark. As Table 7 shows, the regression coefficients in all levels are significant; in addition, some are positive, whereas others are negative, among which the coefficient of " 家 教 " is greatest. Therefore, a keyword that includes " 家 教 " has a greater clickthrough volume than the others. Table 9. Regression Result: Classification 班 (class) <.0001 网 (Web site) <.0001 信 息 (information) <.0001

