- Clustering Taiwan s Real Estate Data for Market Structure Analysis

Transcription

1 Unlock the Value of Open Data - Clustering Taiwan s Real Estate Data for Market Structure Analysis 1 Sheng-Chi Chen, 2 Chien-hung Liu 1,2 Department of Management Information Systems, National Chengchi University, Taipei, Taiwan @nccu.edu.tw, 2 [email protected] Abstract In recent years, data mining has been a rapid growing area that is utilized for knowledge discovery in database. By the use of information technology with data mining techniques, large amount of data can be discovered, analyzed and converted into useful information and knowledge. Despite this upward trend, using cluster analysis for market structure analysis in the Taiwan s real estate is received limited attentions in past researches. This paper aims to fill this gap by applying cluster analysis to actual-sold real estate data acquired from government open database. The findings provide important insights into real estate market structures for Taipei City and New Taipei City. This paper suggests future scholars may consider open market information and expert domain knowledge to add more value onto analytical results commercially. Keywords: Data Mining, Cluster Analysis, Ward s Method, Real Estate Transaction. 1. Introduction In the era of knowledge economy, it is critical for enterprises to know how to better use information technology, and business data in order to formulate best business and customer strategy for more business value creation. Traditionally, business pursuing e-business strategy tends to focus on data aggregation and integration or sorely on automating the labor-intensive processes. However, as information technology evolves, the power of software and hardware lifts, and the volume of data grows rapidly, most enterprises today start to realize the urgency and importance of leveraging existing data on hands for competitive advantages. By nature, real estate is a type of product that is a long-lasting durable goods while also has a nature of investment. These characteristics make it very different from general merchandise. Traditionally, real estate industry values any information related to real estate products such as current status of buyer or seller (i.e. reason to buy or sell for a particular real estate), ownership and any transactional related information. Basically real estate agents make profits by broking sellers and buyers for real estate transaction. Equipping with critical information, real estate agents can act as a powerful intermediary between sellers and buyers and can expedite the process of transaction. However, buyers or sellers of real estate do not hold the same information as their real estate agents do. The advantage for real estate agents are mainly due to information asymmetry. Thus, there are many disputes happened among buyers and sellers of house and real estate agents during the transaction process. This information asymmetry and intransparancy not only reduce trust but also increase the rising house prices. This conflict indeed reflects the importance of openness and transparency in the real estate transaction information requested by general public. Taiwanese government expects to blasts real estate speculation through policy measures such as luxury tax, and real estate price disclosure. For example, to reduce information intransparency and asymmetry of real estate prices, Real Estate Value Laws was promulgated in 2011 and the general public is able to access timely and trustable information from governments now. The recent real estate policy reform in Taiwan along with the trend of open government data in the world have opened up many opportunities to researchers and industry practitioners for leveraging open government data to create more business value. This research applies a data mining approach to analyze real estate market data of Taiwan, especially cluster analysis, for better understanding of the real estate market structure in Taiwan. In addition, this research identified several segments for Taipei City and New Taipei City that have International Journal of Digital Content Technology and its Applications(JDCTA) Volume 8, Number 5, October

2 similarities as well as well as dissimilarities, which help provide more insights about real estate market structures in Taipei City and New Taipei City, and possible family types behind different product segments. The research results will not only provide insights into market structure of real estate in Taiwan but also provides product-customer relationship that may help consumers select suitable products or agencies recommend suitable products to consumers. 2. Literature review 2.1. Real estate transactions In real estate marketplace, buyers often lack of sufficient information to locate the right objects of their interests while sellers have difficulties in getting timely price information, neither. As a result, real estate agencies often manipulate market information without being found out at first place. However, more and more customers (buyer and seller) have disputes with real estate agencies during and after transactions. As times go by, the calls from general public for real estate reform has intensified. Consequently, the Real Estate Broking Management Act was enacted by congress in 1999 to order to raise the professionalism of the real estate agency industry and safeguard consumer interest in Taiwan. The ideal functions that real estate agents can provide to buyers and sellers are to create an efficient marketplace, reduce information search cost, and add moral hazard cost before and after transaction. Many scholars address about the relationship between real estate regulatory framework design and its market efficiency [7]. By nature, real estate products tend to be non-standardized and thus prices are determined on a case-by-case negotiation basis. In addition, the access to transaction data is often difficult, inaccurate or not real time. Today, majority of transactions in Taiwan are finished through real estate agents [3]. What bothers real estate buyers is that they neither could get accountable information from real estate agents nor could they acquired accurate information elsewhere. Information in transparency and asymmetry intensify the rising of real estate prices. Potential buyers of real estate suffer from high Misery Index. Therefore, Taiwan government started to pay attentions to this information in transparency and asymmetry issues. For example, Taiwan s Legislative Yuan has passed the review of Real Estate Value Law, which requires real estate buyers, real-estate agents and land administration agents register the actual transaction prices of properties within 30 days of deals being closed or face a fine. This Act was enhanced in August 1, Government and general public expect this Act would help real-estate transactions become transparent and a sound trading environment [10] Data mining Data mining has been grown quickly as an important issue in the application area of database. The objective of data mining is to discover knowledge hidden in a large scale of data. Data mining help analyzes large amount of data, through automatic or semi-automatic approach, builds effective models and rules [6]. Some scholar considers data mining as a process of searching and analyzing data to find the useful information hidden in the data [8]. Other scholars refer data mining to knowledge discovery from database, data warehouse, or other forms of large data storage. It extracts meaningful knowledge, including patterns, relationships or changes. From technical perspective, it refers to different forms and approaches of extracting information and knowledge from large volumes of data, which may include data visualization, machine learning and statistical techniques [4]. From business perspective, data mining is expected to extract potential, hidden and useful knowledge, pattern or trends from large volumes of daily transaction data. Today many governments start to promote open data policies in hopes that business and general public may use their innovations and capabilities to identify meaningful and valuable information. However, limited empirical researches are found in literatures about applying data mining techniques onto real estate transaction data. 49

3 2.3. Cluster analysis Cluster analysis is a statistical classification technique used to reveals patterns, relationships, and structures in large volumes of data in which data are divided, based on similarity into different groups such that data in a cluster are homogeneous while heterogeneous between groups. Cluster analysis can identify classification rules in a seemingly messy data. Cluster analysis is useful for market structure analysis: identifying groups of similar products according to competitive measures of similarity [11]. The advantage of using cluster analysis is that users do not need to have fully understanding about target being analyzed, and in that sense is purely data driven. In other words, anonymous data set can be split into groups without users understanding about data but purely rely on data. However, the disadvantage of this is that users cannot predict what kinds of clusters will be produced and consequently require users interpret the results by themselves [1]. Among the cluster techniques, K-means is a popular algorithm for cluster analysis in data mining, which was used by James MacQueenin 1967 [2]. Given D, a data set of n objects, and k, the number of clusters to form, K-means clustering starts with randomly selecting n observations as initial centroids for k clusters, which is also named Center of Mass. Then K-means algorithm assigns each of n data points to its closest cluster centroids, new clusters will be computed and produced. And then this iterative process start over again and will not finish until K-means algorithm finds n clusters with minimized variance and a maximized variance among n clusters. The advantage of K-means is quick and easy. However, K-means method is not appropriate when size of data set is too big or its density is too diverse. In addition, K-means clustering requires users to specify numbers of clusters to be developed. For example, given D, a data set of n objects, and k, the number of clusters to form, when a user specifies k groups, then K-means algorithm would randomly select n data points as initial centroids, and this iterative process will not end until all k clusters reached to the conditions as mentioned earlier. In other words, K-means is an iterative process searching for center of mass for each cluster. Ward s method, also called Ward s minimum variance method, is another popular algorithm for cluster analysis, especially applied in hierarchical cluster analysis, which was proposed by Joe H. Ward Jr. in 1963 [5]. Initially, Ward s method treats every single data points as a clusters, and then at each step find the pair of clusters are merged according to the variance within clusters after merge (within cluster variance). The key difference between K-means algorithm and Ward s method is that the former requires users to specify k, the number of clusters to form, while Ward s method automatically specify number of groups based on the minimized objective function, which is the minimized total within-in variance. 3. Research method Data mining is the analysis step of knowledge discovery process in large volume of data [8], such as database or data warehouse, which often contains a series of step such as data preparation, data mining and modeling, analysis and application according to predetermined objective. Outputs from data mining models assist in decision making process. There are five steps required for a typical data mining process [7], described as Figure 1. The goal of defining a problem is to define the project objectives from a business perspective and then decide the data mining problem to be solved. Data preparation defines the scope of data being included for data mining and implements these tasks to make final data set ready for data mining. Data preparation includes cleaning, normalization, transformation, feature extraction and selection, etc [9]. Data mining processes include data sampling, feature selection, analysis and process, attribute change, model building and evaluation. Result analysis step mainly to validate the outputs from data miming. The accuracy can be measured through comparing the learned pattern in training set with that of test set. Finally, the learned and validated knowledge will be applied in business as planned. The impact on business is collected, and evaluated, which completes the cycle of data mining. 50

4 3.1. Problem identification Figure 1. Research Framework Property, which means a fixed object on the land or a house and the right of transferring its ownership. House refers to a readily available house or a presale house and the right of transferring its ownership while Broking Agency, which refers to the company or incorporation dealing with real estate broking or sales. Real estate transaction data have owned by real estate agencies and has not been opened to general public until August 1, 2012 requested by government. The objective of data mining is to identify hidden valuable information from large amount of transaction data. This research aims to understand real estate market structure in Greater Taipei region (Taipei City and New Taipei City). Cluster analysis is applied to identify groups of similar products according to competitive measures of similarity from government open data. The result will provide insights into market structures in Taipei City and new Taipei City. The data used for this research is acquired from Taiwan s government open data platform (DATA.GOV.TW) which contains more than 1,890 data sets with wide and diverse subjects of database such as transportation and distribution, real estate transaction, price and consumption, environment monitor data, and etc, described as Table 1. This research uses real estate transaction data for data mining. Table 1. Real Estate Transaction Data Content Type Description Source Ministry of the Interior Total amount of rental Key field Area of rental land (square feet) Area of rental building (square feet) Land use zoning Format XML, CSV, TXT Cities 21 Record 14,000 Period Renewal date 1 st and 16 th /month Owing to the regional genetic characteristics of real estate transaction data, it is believed that the analysis of total cities will be hard to interpret. Thus, the research only uses the transaction data of Taipei City and New Taipei City as the data mining target. There are 3,982 records available for data mining analysis, including 1,415 records for Taipei City and 2,567 records for New Taipei City Data preparation Raw data is not mainly designed for analysis but for operations so it is not always easy to identify the relationship directly from raw data. Therefore, to make data mining a more insightful tool, this research takes more efforts on pre-processing step, including variable selection, data cleaning, data transformation, which will be explained in the following sections. 51

5 Variable selection The dataset used includes 25 variables but not all of them are used in this research. To select the candidate variables for cluster analysis, this research first exclude the irrelevant and redundant variables. The candidate variables for analysis includes OBJECT OF TRADE, TOTAL FLOOR AREA TRANSFERRED (Square Meters), Type of Building, Completion Data of Building, TOTAL PRICE, UNIT PRIC of BUILDING (Per Square Meter). The candidate variable will be reduced based on the level of importance after each cluster analysis is conducted Data cleaning Data cleaning removes incomplete, inaccurate, irrelevant record, or irregular outliers from database and include only necessary data. In the data set, object of transaction contains land, house, and car park. Land use zoning includes industrial land, farming land, cemetery and residential land. This research only keeps house as object of transaction and exclude land and car park in our analysis as they irrelevant to the objective of this research Data transformation To make raw more readable and easier for analysis, the raw data is cleaned in the data cleaning step so that outliners are removed. Several variables are transformed to become new variables. For example, the date of building being completed is shown as in data set. We first extract 102 from raw data to represent Taiwan Year and subtract it from current year to create AGE OF BUILDING. This year is 103 in Taiwan year so the age is 1 in this case. The unit of FLOOR AREAS Transferred is converted from Square Meter to Ping, a unit of the size of buildings in Japan, Korea and Taiwan. One ping is equivalent to square meters. This transformation is more suitable for communications in Taiwan. An extended variable is thus created to serve this purpose. After the data preparation step, there are 1,260 records available for data mining analysis, including 346 records for Taipei City and 824 records for New Taipei City Data mining This research aims to identify the grouping relationship of the real estate product and price and thus apply cluster analysis to actual price registered data of Taipei City and New Taipei City respectively. After data preparation step, the processed data set is different from the original ones. Two data sets are loaded. Relationship among variables are explored. Number of clusters are determined and data sets are clustered. The first step of cluster analysis is to determine the objective and variables to be included in the analysis. This research follows a two-stage cluster analysis. First, Ward s method is used to determine the number of clusters. Second, Ward, Average and Centroid are three algorithms used to build the model for cluster analysis. In addition, based on the number of cluster suggested by Ward s method. This research also takes a further step to fine-tune the number of clusters. In addition to the use of two-stage approach, this research bases on importance level variables to select variables. Many candidate models are created, compared and consequently final candidate model that is meaningful and interpretable is identified and chosen as final model. This searching process is iterative and often time consuming. 4. Result analysis This research applies cluster analysis techniques to analyze pre-processed data of Taipei City and New Taipei City and produce clusters for each City respectively. The variables used in the first data mining model includes OBJECT OF TRANSACTION, FLOOR AREAS, BUILDING TYPE, AGE OF BUILDING, TOTAL PRICE, UNIT PRICE (unit: ping). 52

6 At each model building process, variables are reduced and new models are developed. After many modeling building processes, the final model shows better meaningful results. Three variables used in this final model are AGE OF BUILDING, FLOOR AREA, and TOTAL PRICE. All of them have high level of importance, reflecting that these are most important factors considered by customers (buyers or sellers) of the house. The results of cluster analysis for Taipei City and New Taipei City are shown in respective section as follows Cluster analysis of Taipei City The characteristics of cluster analysis of Taipei City are shown in Figure 2 and Table 2. AGE OF BUILDING, FLOOR AREAS (unit: ping) and TOTAL PRICE came out as the most important criteria. The final model results of cluster analysis for Taipei City suggest 5 clusters. Figure 2. Cluster scatter of Taipei City City Big/ Extended Family: Cluster 1contains least records- only 5 records, equal to 1% of total sample size. Its FLOOR AREAS (79.16 ping) implied this product is a 5 bed-room product. This product is suitable for Big /Extended family that means a family that extends beyond the immediate family, consisting of grandparents, or relatives all living in the same household. City Three-generation Family: Cluster 2, with sample size of 16 records, groups products with averaged areas of 52,71 ping (roughly equal to a 4 bed room product), years of years old, and total price of 41,226,875 NT dollars. City Couple with Dependents: Cluster 5, with sample size of 100 records, group products that are suitable for Families with two dependents. City Young Family: Cluster 4 contains more records than those of the rest, which has a total of 154 records, weighing around 45% of total sample size. It shows lower price with middle range of FLOOR AREAS (25.13 ping) or around two small bed-room apartment. This product is very popular as its lower TOTAL PRICE with adequate housing space despite the house is 53

7 older (AGE OF BUILDING: years). This product cluster is suitable for young couple with young dependent. City Single or Newly Married: Cluster 3, with sample size of 71 records, shows that the TOTAL PRICE OF BUILDING is the lowest (10,596,030), AGE OF BUILDING is the youngest (12.3 year) and FLOOR AREAS is the smallest among all clusters. This distinguishing characteristic reflects a unique popular product called Small luxury apartment in Taiwan, which is suitable for Single or Newly married customers. Segment ID SEGMENT NAMING City Big/ Extended Family City Three-generation Family City Couple with Dependents Table 2. Cluster analysis of Taipei City AGE OF BUILDING FLOOR AREAS (ping) TOTAL PRICE (NTD) ,953, ,226, ,721,804 4 City Young Family ,691,419 3 City Single or Newly Married 4.2. Cluster analysis of New Taipei City ,596,030 The characteristics of cluster analysis of New Taipei City are shown in Figure 3 and Table 3. AGE OF BUILDING, FLOOR AREAS (unit: ping) and TOTAL PRICE came out as the most important criteria. The final model results of cluster analysis for New Taipei City suggest 6 clusters. Figure 3. Cluster scatter of New Taipei City Metro Young Family: Cluster 1 data shows buildings in Cluster 1 are aged years with ping and amount of 8,201,718 NT dollars, which contains most records than that clusters of the rest, which has a total of 323 records, weighing around 40% of total sample size. This result show this type of housing product is most popular in New Taipei city. Besides, 26 ping 54

8 implies a two bed-room setting suitable for young couple with one or without kids. From marketing perspective, Real estate companies can use this insight to prepare products beforehand to response to customers who are interested in this housing option. This product feature is similar to cluster 4 of Taipei City, expect for the total price. Thus, we name it as Metro Young Family to distinguish itself from City Young Family segment. Metro Big/ Extended Family: Cluster 2, with sample size of 12 records, has similar characteristic to Cluster 1 of Taipei City. This product is also suitable for Big /Extended family that means a family that extends beyond the immediate family, consisting of grandparents, or relatives all living in the same household. Due to the wide difference in TOTAL PRICE, we names this cluster as Metro Young Family to show the similarity and dissimilarity between the two clusters. Metro Couple with Dependents: Cluster 4, with sample size of 223 records, has similar product characteristic to that of Cluster 5 of Taipei City, expect that the price difference is almost double. Metro Single or Newly Married: Cluster 5, with sample size of 252 records, has similar product characteristic to that of Cluster 3 of Taipei City. Not only the price difference is double but also this product is not as luxurious as Small luxury apartment mentioned earlier. Cluster 3 and Cluster 6 contain extreme values compared with average. Both have only sample size of 1 records respectively. Cluster 3 contains only 1 record within this cluster, showing this unique characteristic housing type. Cluster 1 shows AGE OF BUILDING is less than 3 years, with ping of FLOOR AREAS and a TOTAL PRICE of 60,360,000 NT dollars. Cluster implied that there should be local rich existing in a price-friendly New Taipei City, which is different from what general public understand as this price is high enough to buy many housing products in Taipei City. While Cluster 3 shows similar results, Cluster 6 contains only 1 record of building aged 23 years old with ping and with a TOTAL PRICE of 3,780,000 NT dollars. Cluster 3 and Cluster 6 identify two extreme which could the insights to identify local rich ; however, the size of cluster 1 and 3 are too small to have value for marketing strategies. They are named as Metro Local Rich I and Metro Local Rich II respectively. Table 3. Cluster analysis of New Taipei City Segment ID SEGMENT NAMING AGE OF BUILDING FLOOR AREAS (ping) TOTAL PRICE (NTD) 1 Metro Young Family ,201,718 2 Metro Big/ Extended Family ,319,167 3 Metro Local Rich I ,360, Metro Couple with Dependents Metro Single or Newly Married ,619, ,345,196 6 Metro Local Rich II ,780,000 To sum up, except 2 extremes, overall result shows that most TOTAL PRICE of buildings in New Taipei City is less than 30 million NT dollars (equal to 988,084 US dollars) and majority is ranging from 5 million to 12 million NT dollars (equal to 164,681 to 395,233 US dollars) while most FLOOR AREAS is less than 79 ping (equal to square meters) and majority lies between 19 ping to 36 ping (equal to to square meters). Results of Taipei City shows all housing products costs more than 10 million NT dollars while with relatively small FLOOR AREAS if compared to New Taipei City. Interestingly, Cluster 3 of Taipei City (segment: City Single or Newly Married) shows a unique life style-small luxury rich suite, which costs more than 1 million NT per ping dollars and total Area Floor is only 15 ping (equal to square meters), roughly equally to a studio 55

9 or small one-bed room setting. Worthnotingly, this study did not include township into cluster analysis as it may dilute the difference among clusters and possibly impact the interpretation ability. 5. Contribution It is important to illustrate what data mining discovers from data in a way that everyone can possibly understand, and efficiently interpret implications behind data and help end users to make better judgment and timely decisions. Results from analysis can be classified into four types: (1) results from common sense, (2) possible results (many related but not validated results), (3) results that are not clear and hard to evaluate (possible results but hard to explain), and (4) impossible results (results impossible to occur). In the evaluation stage of Data Mining, data miners should prove (1) results from common sense, (2) possible results and make efforts to interpret (3) results that are not clear and hard to evaluate or leave to end users for further explorations. In the process of analysis, this study attempts to include different variables to explore the possible cluster structure existing within transaction data, identify cluster relationship and finally interpret findings. Cluster analysis is an unsupervised learning algorithm that requires careful variable selections and multiple trials in order to identify a cluster relationship that makes sense. In the experiment design, this study first consolidates the real estate open data of Taipei City and New Taipei City, filters out seven variables as candidates for cluster analysis in order to run a Greater Taipei Region cluster analysis (Taipei City plus New Taipei City), then compared its results with clusters analysis from Taipei City and New Taipei City respectively so as to test nature of data. Meanwhile, in variable selection process, each model adjusts its input variables for cluster analysis according to its level of importance. The results show that cluster analysis with two cities separately is better than clustering them as a whole. We believe this study has demonstrated the good use of the real estate open data provided by Taiwanese Government in the area of data mining, especially the cluster analysis, and provided the market structure results that are insightful. The contribution this study makes to practice is threefold as following: Offered a holistic view of market structure insights to house buyers, sellers and real estate agents, breaking the barriers that real estate agencies hold up information on their side. Provided industry with a business direction and data mining approach to leverage its own data. Real estate companies can compare their cluster analysis results with that of this study so that they can quickly understand how they perform, identify sales gaps, and discover new opportunities at different segments within market structure. Set a new model for utilizing government open platform for data mining the real estate open data and provided insights into how to leverage big data opportunity created by government for business value 6. Conclusion Data mining is a rapid growing area that produces many research reports, new systems or prototyping development. For example, many new applications based on early data mining researches and a wide array of algorithms have gradually unlock the value of data residing in database. The diversification of data mining approaches, including machine learning and statistical researches, multiply the efficacy of advanced knowledge discovery. Cluster analysis is widely used across different fields; however, there is no universal agreed criteria justifying the results produced by cluster analysis. Thus, selecting a proper criteria to evaluate cluster analysis is critical. However, using cluster analysis for market structure analysis in Taiwan is received limited attentions in real estate research. This research acquires the real estate actual price registered data from the government open data platform, applying cluster analysis to explore the data of Taipei City and New Taipei City and includes experience from real estate agency into consideration. The results show that AGE OF BUILDING, FLOOR AREAS (unit: ping) and TOTAL PRICE reflect different nature and meanings in different geographic areas. For example, there is a unique 56

10 cluster identified in this research, which have average small FLOOR AREAS with very high UNIT PRICE and are located in the prime area within Taipei City. This results reflects a unique product segment matching a special type of life style. Results of Taipei City and New Taipei City have many clusters have similar FLOOR AREAS which shows the similar needs for space and the family type beyond these clusters. However, interestingly, the TOTAL PRICE of these similar clusters are so different that shows even the needs for space is similar but the needs for geographic locations are not alike. This findings reflect the product choices of households in city area as well as metropolitan area are driven by different needs and wealth level. These also support the decision of our study in conducting cluster analysis separately for Taipei City and New Taipei City rather than analyzing them as a whole. Including opinions from experienced professional from real estate industry helps define the evaluation criteria as well as result interpretation. Future researchers may consider include more market information to better adjust evaluation criteria so that the research results will be more value added. In addition, it is suggested to use different data mining techniques to analyze real estate transaction data (total market data), which will enhance efficacy of government policy, create more values for businesses as well as general public, and eventually reach a win-win situations for all stakeholders. 7. Acknowledgment Data source: Government Open Data Platform- Real Estate Actual Price Registered Data ( 8. References [1] A. K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters, vol. 31, no. 8, pp , [2] J. B. MacQueen, Some Methods for classification and Analysis of Multivariate Observations, In Proceedings of the 5 th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp , [3] J. D. Benjamin, G. D. Jud, and G. S. Sirmans, What do we know about real estate brokerage? Journal of Real Estate Research, vol. 20, no. 1, pp.5-30, [4] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, [5] J. H. Ward Jr., Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol. 58, no. 301, pp , [6] M. J. A. Berry and G. Linoff, Data Mining: For Marketing, Sales, and Customer Support, Wiley Computer Publishing, [7] T. J. Miceli, Information costs and the organization of the real estate brokerage industry in the U.S. and Great Britain, AREUEA Journal, vol. 16, no. 2, pp , [8] U. Fayyad, G. Piatetsky-shapiro, and P. Smyth, The KDD Process for Extracting Useful Knowledge from Volumes of Data, Communication of the ACM, vol. 39, no. 11, pp.27-34, [9] S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, Data Preprocessing for Supervised Leaning, International Journal of Computer Science, vol. 1, no. 2, pp , [10] Real Estate Broking Management Act, Laow and Regulations Retrieving System, Ministery of Interior, Taiwan, available at Word=%E4%B8%8D%E5%8B%95%E7%94%A2 [11] G. Shmueli, N. R. Patel, and P. Bruce, Data Mining for Business Intelligence, John Wiley & Sons Inc,