II FİYAT ARAMA MOTORU ( ÖZET ) İnternetin insan hayatı üzerindeki etkileri ve kullanım yaygınlığı gün geçtikçe artmaktadır. İlk zamanlarında sadece haberleşmek ve bilgi edinmek için kullanılan internet, günümüzde kullanıcılara birçok işlevi sunmaktadır. İnternetin yaygın olarak kullanıldığı alanlardan biri de çevrimiçi alış veriştir. Gün geçtikçe sayıları artan çevrimiçi alış veriş siteleri sayesinde kullanıcılar alış verişlerini internet ortamında rahat bir şekilde yapabilir hale gelmişlerdir. Çevrimiçi alış veriş sitelerinin sayısının artmasıyla birlikte herhangi bir ürünü almak için en uygun alış verişi yapma, en uygun sağlayıcıyı bulma problemi ortaya çıkmıştır. En uygun alış veriş için, birçok siteye bakılmalı, aralarında karşılaştırma yapılmalıdır. Özellikle birden fazla ürün alınacağı zaman bu problem içinden çıkılmaz hale gelmektedir. Bu projede, internetten alış veriş yapacak kullanıcıların tek bir web sayfasından aradıkları ürün veya ürünler için en uygun çevrimiçi alış veriş koşulları hakkında bilgi edinmelerini ve bu bilgiler ışığında alış verişlerini daha ucuza ve daha hızlı yapabilmelerini sağlayan bir Fiyat Arama Motoru geliştirmiştir. Proje temel olarak; çevrimiçi alış veriş sitelerinden ürün ve ürün fiyatı bilgilerinin toplanması, toplanan bilgilerin web sitesi aracılığıyla kullanıcılara sunulması aşamalarından oluşmaktadır. Bu işlevlerin sağlanabilmesi için birbiriyle etkileşim içerisinde çalışan altı farklı proje bileşeni tasarlanmış ve geliştirilmiştir. Proje bileşenleri şu şekildedir: Çekirdek Kütüphane Veri Tabanı Site Ekleme Sihirbazı Site Tarama Robotu Ürün Entegrasyonu XML Web Servisi Web Sitesi Yapılan geliştirme ve testler sonucunda proje bileşenleri tamamlanmış ve beklenen özellikleri sağlayan, çalışan bir sistem elde edilmiştir.
III PRICE SEARCH ENGINE ( SUMMARY ) The effects of internet on daily life and its widespread use have been improving day by day. At the first times of internet, it was used for communication and access to basic information about general topics. Nowadays, the internet provides many other features to the users. One of the most widespread usage areas of internet is online shopping. By using online shopping and e-commerce websites, which s number is increasing day by day; users can supply their need of shopping as online. By rapid increase in number of online shopping websites, the problem of finding the most suitable provider has begun to occur. For the most suitable online shopping, the users have to search many online shopping web sites and compare them. Especially if the user is searching for more than one product or a group of products to buy together from a website the problem becomes much harder. In this project, a Price Search Engine is developed which enables users to gain information about a product or group of products that they are intended to buy by online shopping. In the light of these information users will be able to make online shopping cheaper and much faster. The project basically consists of collecting product information and product price information from online shopping websites and presentation of collected information to the users on the system web site. To achieve this functionality, six separate modules which are interacted with each other are designed and developed. These modules are: Core Library Database Website Adding Wizard Crawler Product Integration XML Web Service Website As an innovation, a new approach to the crawling mechanism is developed. This mechanism will be explained by explaining the Pattern Selector, Web Site Adding Wizard and Crawler in the corresponding topics below. Moreover two new features, which do not exist in Price Search Engines that are already on active in our country, are developed. These are: Storing the product price information which were crawled in early crawling processes and presenting them to the users via a chart interface Enabling users to search for more than one product and compare the results, to buy them together from an online shopping website Pattern Selector:
IV Most of the Price Search Engines use product integration or crawling methods to collect product information which exist on online shopping websites. If crawling method is used, the general approach is to develop a search method for each of the websites to be crawled. This is because a crawler program gets the html source code of a web page using HttpWebRequest and WebResponse technologies. For a crawler program which has to evaluate specific information from the source code, it is almost impossible to make an evaluation automatically using any parsing method even artificial intelligence technologies are used. Thus, for a crawler program to collect accurate information, general approach is to develop different search methods for each website to be crawled. In this project it is aimed to develop a generic crawling technique which should be able to crawl any online shopping website with a single search method. To achieve this functionality, first a generic product info and product price info data structures are designed. Generic product info data structure consists of these elements: Product name Category of product Brand of product Product image Address of the product detail web page Last updated date Stock status Generic product price info data structure consists of these elements: Raw price (excluding taxes) Final price (including taxes) Special price (money order discount etc.) Discount price (last 5 day discount etc.) Last updated date Using the generic product info and generic product price info structures, a generic crawling method becomes possible. Only a human operator is needed once in the process of adding an online shopping web site to the system to be crawled. Human operator shows the system, corresponding product info and product price info fields on any product detail page on the website which is intended to add in to the system to be crawled, then all the crawling process for all products on that website is done automatically by crawler. Pattern Selector is the technique which is used while the human operator shows corresponding product info and product price info fields to the system. An html web page can be represented as an html document tree and some of the elements of this document tree contain the necessary info that will be shown to the system by the human operator. As the human operator clicks a node on the webpage to show that node to the system as product info node, Pattern Selector analyzes that node and find outs its position in html document tree. In some websites especially which are developed by using technologies like ASP.NET or JSP, all of the nodes in html document tree have id attributes and position of these nodes can easily be figured out from html document tree. However in websites which are developed using technologies like PHP or ASP some nodes may not have an id attribute, so
V figuring out the position of these nodes becomes much harder. In this case, a method is developed in Pattern Selector structure which traces the nodes until reaching the root element (usually <html/> element) by keeping the order of a node among children nodes of parent node of that element. As a result of this process, a route in the document tree is formed which is called TagPath. Using the TagPath same node can be found easily by beginning from the root element and reverse tracking the route. So the crawler would be able to reach any node which is marked as product info or product price info using id attributes or TagPath structure which belongs to that node. This operation would be valid for any product detail page of the same website because the page structure would remain the same since the pages are created dynamically. Project Modules: Project modules which are developed in the project are explained below: Core Library: This is the module which consists of following structures: Data modeling classes (entity classes) which represent the data structures used in the project domain Manager classes for data modeling classes (entity manager classes) which manage data base operations like create, read, update, delete on entity classes Data access classes which provides running commands on SQL database Business tier classes which run various operations and are used common by other project modules Database: Database consists of tables which correspond to entity classes and tables which are used by the website. Site Adding Wizard: This module is a desktop application which is used by a human operator to add online shopping websites to the system to be crawled. Application consists of an integrated web browser control and Pattern Selector structure. Application has an interface with a wizard structure which asks the human operator to show the necessary info field on the product detail web page of web site to be crawled. Human operator simply clicks on the fields that asked by application. Once the operator clicks on a field, the position of underlying html node on the html document tree of the page is found out by Pattern Selector. At the end, human operator clicks on a button to save the information on the database about that online shopping website. Crawler: Crawler is a desktop application which runs on system tray of the operating system. Crawler application keeps track of online shopping websites which are added into the system to be crawled by human operator using Site Adding Wizard. For each website in the system, crawler creates a thread and starts crawling operation on that website. Crawler starts from the home page of the website and follows the hyperlinks on the web page within the same website. If the crawled web page is a product detail page, it processes product info and product price info using id attributes or TagPath structure for each necessary field and saves them to the database. Also for each product found, product image is downloaded, resized and saved into product images folder of system website. If crawler
VI program is closed by the user or interrupted because of any exception, crawling data for each website is saved into the disk and after restarting the application it continues from where it has remained. Product Integration XML Web Service: For the online shopping websites which choose direct integration of product info from their database, a web service is developed. This web service has a method which takes product info and product price info and saves them into the database after authenticating the web service user using username and password which is provided for that user by system operator and kept in the database. Website: In the website module products which are collected by crawler or added by integration web service are shown to the users. Users are able to make product search, product group search and view the results consists of product info, product price info (also shown using chart interface), user comments and votes on that product. User is also able to make comment on a product or vote the product and add the product in owns alarm list. For a product in user s alarm list, when any change in the price of that product is occurred, user will be informed via e-mail. Conclusion: As a result of the development and test phases, project modules are completed and a working system which satisfies the expected features is achieved. However, according to the test results when it is considered that most of the popular online shopping websites have very large product spectrum that contains ten thousands of products, crawling speed remains very slow for a realistic environment. So the crawler application may be improved to run parallel on many computers to achieve a faster crawling process.