A Collaborative Filtering Recommendation Algorithm Based On User Clustering And Item Clustering
|
|
|
- Jade Pierce
- 9 years ago
- Views:
Transcription
1 A Collaborative Filtering Recommendation Algorithm Based On User Clustering And Item Clustering GRADUATE PROJECT TECHNICAL REPORT Submitted to the Faculty of The School of Engineering & Computing Sciences Texas A&M University-Corpus Christi Corpus Christi, TX in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science by SHIVANI REDDY ATIGADDA Fall 2014 Committee Members Dr. MARIO A GARCIA Committee Chairperson Dr. DAVID THOMAS Committee Member
2 i
3 ABSTRACT Recommendations that are personalized help the users in getting the list of items that are of their interest in e-commerce sites. Majority of recommender systems use Collaborative Filtering techniques to generate recommendations to their users. This project implements an information filtering technique called as Collaborative Filtering for generating personalized recommendations in movies for user. Collaborative Filtering is of two types, namely, collaborative filtering based on users and collaborative filtering based on items. Collaborative Filtering based on users is more expensive computationally but it produces better results. Collaborative Filtering based on users is not preferred because it encounters the problems of Scalability when the number of users increases. Therefore, we use item-based Collaborative Filtering which is an alternative method. Collaborative Filtering, which is based on items uses two techniques- Pearson correlation technique and Adjusted cosine technique for calculating the similarity between items and to generate recommendations to users. In this Project both the above techniques are used and to measure the accuracy of the predictions generated by these techniques, Root Mean Square Error is computed. ii
4 CONTENTS Contents Page No. 1. INTRODUCTION Recommender System Collaborative filtering Data Set Scalability Correlation Analysis 4 2. LITERATURE REVIEW Existing Methods 8 3. NARRATIVE System Overview System Features Problem Statement Proposed System Software Requirements Specification Functional Requirements Non Functional Requirements Usecase Diagram DESIGN Detailed Design Architecture of Proposed System Algorithm 19 iii
5 4.4 Class Diagram Sequence Diagram IMPLENTATION Java Technology Steps of Implementation TESTING Testing Objectives Various Methods Of Testing Levels Of Testing Unit Testing IMPLEMENTATION RESULTS Generating Recommendations Viewing Search Results Calculating Error Results CONCLUSION FUTURE ENHANCEMENTS REFERENENCES 48 Appendix 50 iv
6 LIST OF TABLES Table No. Name Page No. 1.1 User-item rating Different Data Sets Use Case User-item rating Test Case Test Case Test Case Test Case Test Case 5 35 v
7 LIST OF FIGURES Figure No. Name of the Figure Page No. 1.1 MovieLens Use case diagram Block diagram Flow chart System Architecture Pearson Similarity Formula Adjusted Cosine Similarity Formula Prediction Formula Class Diagram Sequence Diagram Test Case Test Case Test Case Test Case Test Case Test Case Base Case Test Case 38 vi
8 6.6.3 Program to calculate rmse error RMSE error Generating Recommendations Generating Recommendations Viewing Search Results Viewing Search Results Calculating Error rates Error rates 46 vii
9
10 INTRODUCTION 1.1 Recommender Systems With the increase in the use of e-commerce sites, it has become very easy for the users to find the items of their interest without wasting a lot of time. Websites like Amazon and Ebay examples for Recommender systems which provide recommendations to the users based on their search history and purchase history. Recommender systems provide recommendations of almost all the items ranging from books to movies to music. FaceBook and Twitter are also recommender sites which provide recommendations for friends. NetFlix.com is very famous as a movie recommender website. Yahoo News and Google news are very famous for news [3]. When a user tries to find an item using search engines, for example, Google Search engine and Yahoo Search engine, the user needs to type the exact name of the item. The data in the internet is huge which makes it very difficult for the user to find the items of his interest. Hence, there is a need for a system which learns the likes and dislikes of the user and generates recommendations based on his interest [1]. Many algorithms need to be used while designing a recommender system. Recommender systems employ Information Filtering technique that focuses on providing the recommendations of the items to the users that are likely to be of the users interest [2]. A recommender system is defined as: if U is the set of users and I is the set of all possible items that can be recommended, then there exists a function from U*I to R where R is a totally ordered set of nonnegative integers or real numbers within a certain range. 1.2 Collaborative Filtering The main use of Collaborative Filtering methods is in the field of Business to consumer e-commerce where the recommendations are provided to the user by the owner. The owner provides recommendations to the customer based on his search history and past purchase history 1
11 [4]. Collaborative Filtering techniques are used in the e-commerce sites like Amazon and Ebay [5] which deal with items on a very large scale. The importance of Collaborative Filtering methods is: guessing the usefulness of a particular item for a particular user depending upon the previous history of other users. Consider Table1.1, where 'U' and 'S' represent user and item respectively. U1, U2,U3, U4, U5 represent users and S1, S2, S3, S4, S5 represent items. Assume that we want to suggest a particular item for 'U4'(user 4), S5(item 5) is recommended for the U4(user 4) who is similar to U3 (user 3) as both of them have given high rating for the item S3 User 4 is likely to like item S3. The recommendations are made by comparing the likes and dislikes of users who have taste that is similar to that of the current user. Table1.1 user-item rating table S1 S2 S3 S4 S5 U U2 1 2 U U ? 1.3 Data Set A data set is a collection of data, usually presented as in table 1.2. In this project the data is taken from the dataset provided by MovieLens site and is internally stored in the form of arrays. Table 1.2 shows the list of recommender websites- Movielens, Eachmovie and NetFlix are movie recommender sites, Jester joke recommends jokes, BookCrossing recommends books and Newsgroups recommends news. The table shows which information is provided by these websites like the demographic data, ratings, clean-up and non-rating. 2
12 Table 1.2 Different data sets Demographic Rating Cleaned Non rating Movielens yes Yes yes JesterJoke yes Yes yes Each Movie yes Yes BookCrossing yes Yes Netflix yes Yes News Groups yes Yes yes The data is taken from Movielens site, shown in figure 1.1. The users can download the information for free from this site and use it for testing. The data set in the site consist of 943 users and 1682 movies where each user has given ratings for at least 20 movies. This data given to the program that I have designed which stores this raw data in the form of arrays. The data from the arrays is used for further computations. Figure 1.1 Movielens 3
13 1.4 Scalability The data available in the websites is very huge. The data comprises of both data of the users and the data related to the items available on that website. It is very important for the websites to function properly with the increasing number of items and users day by day. Many items are added to the websites for instance, e-commerce sites daily and many new users are added which makes it very important for a site to scale well. Scalability is an important factor for the websites. 1.5 Correlation Analysis Correlation analysis focuses on analyzing the relation between two random variables or observed variables. Correlations are important for predicting the relationship between entities. For instance, consider prepaid CPL electricity billing process. This site provides information of the available balance along with the details like in how many days the balance is going to fall to 0$. They estimate the average usage cost based on the previous week's usage details. Two random variables are said to be dependent if they do not satisfy probabilistic independence condition. The correlation coefficients, denoted by 'ρ' or 'r', for calculating the degree of correlation. The method that is commonly used is 'Pearson correlation coefficient' method. Pearson Correlation technique is used to calculate the linear relationship between two variables [9]. In this project Pearson correlation technique is used to calculate the relation among two entities and provide suggestions to the user. 4
14 LITERATURE REVIEW The concept of recommender systems emerged in mid-1990s. In past 10 years there has been a tremendous growth in the development of recommender sites. The people using the recommender systems is increasing exponentially which makes it very important for these systems to generate recommendations that are close to the items of users interest. Jia Zhou and Tiejan Luo [4], has published a paper on Collaborative Filtering applications. The paper describes about the collaborative filtering techniques which were currently in used in that generation. It is stated that the Collaborative Filtering techniques used in that generation could be divided into heuristic-based method and model-based method. The paper discusses about the limitations of the Collaborative Filtering techniques in that generation and suggests some improvements to increase the recommendation capabilities of the systems. SongJie Gong and Zhejiang [10] proposes that 'personalized recommendation systems' are widely utilized in e-commerce websites to provide recommendations to its users. The paper states that the recommendation systems use Collaborative Filtering technique which has been successful in providing recommendations. A technique to solve the common problems that are encountered in recommender systems namely, sparsity and scalability is suggested in this paper. This paper suggests the recommender system which combines both user clustering and item clustering can be used to provide recommendations. This approach is employed to provide recommendations in this project which makes the prediction smoother. In this approach, item clustering is done using the two techniques Pearson correlation technique and Adjusted cosine similarity technique to find the similarity between the items. Then, users are clustered depending on alikeness between the user targeted and cluster center. Users are grouped into clusters based on their likes and dislikes for an item and every cluster has a center. The authors state that the proposed method is more accurate than the traditional method in generating recommendations. Robert M Bell and Yehuda Koren [11], state that recommender systems provide recommendations to the users based on past user-item relationship. Based on past user-item relationship the neighbors are computed which makes the prediction easy. The weights of all the neighbors are calculated separately and are interpolated concurrently for many interactions to 5
15 provide optimized solution to the problem. The proposed method is stated to provide recommendation in 0.2 milliseconds. The training also takes less time unlike very lengthy time in large scale applications. The proposed method was tested on Netflix data which consisted of 2.8 million queries which was processed in 10 minutes. Micheal Pazzani [12] discusses about recommending data sources for news articles or web sites after learning the taste of the user by learning his profile. This paper mentions various types of information that can be considered to learn the profile of a user. Based on ratings given by a user to different sites, ratings that other users have given to those sites and demographic information about users the recommendations can be made. This paper describes how the above information can be combined to provide recommendations to the users. Lee W. S [13], proposed a method in which he assumes that each user is likely to belong to any one of the 'm' clusters and the rating of each user depends upon one of the items that belong to the n cluster of items. Bayesian sequential probability is used to calculate the performance of this method. Heuristic approximations are proposed to Bayesian sequential probability for making experiments on the data set comprising of the ratings of movies. The method suggested is believed to have good performance and tested results are observed to be near to the actual values. 'Liu et al.' [14] suggested a 'hybrid recommendation system' to eliminate the issues of scalability and sparsity. Based on the similarity between rated and non-rated items, weighted average ratings are computed. The Collaborative Filtering techniques are applied on the matrix consisting of user and item ratings. The additional method proposed to solve the scalability problem is dividing the users into different clusters based on the common features. All the users having common taste are present in a group and the target user would belong to any one of the groups. Yang et al. [15] proposed a novel Collaborative Filtering technique to solve the disadvantages that are found in the current Collaborative Filtering technique. The disadvantages are, the conventional technique is over assertive, removes some important information from the user's profile and often makes conclusions that are not trust worthy. 'Yang et al.' divided the jointly 6
16 rated items into three classes based on the difference in the given ratings to the items that are of equal weight in the same class. Koren [16] stated that the user tastes change with the changing time and developed a recommendation system implementing temporary data into 'factor modeling' and 'item-item neighbor' modeling. [21] The author also suggested a novel 'neighbor model' in which he considers both implicit data and the explicit data. 'Salakhutdinov and Srebro' [17] presented a weighted form to achieve uniformity. To follow standard rules is a prominent attribute of a system for finishing the 'user item' rating lattice in 'Collaborative Filtering'. Be that as it may, the performance of the system is bad when entrances of the 'user item' rating framework are inspected non-consistently. Keeping in mind the end goal to take care of the issue, they proposed a follow standard weighted by the recurrence of clients. Takács et al. [20] proposed "a few grid factorization (MF) based routines (i.e., a regularized MF, a quick semi-positive MF, an exact energy based MF, an incremental variation of MF)". What's more, they plot a remedy technique for MF. 'Shambour and Lu' [18] investigated the 'recommender frameworks' in the setting of the e- government space, and stated a reliable business accomplice proposal e-administrations for little to-medium organizations. They created (1) an understood trust sifting proposal methodology fusing trust engendering and 'Jaccard metric' and (2) a client based 'CF' suggestion methodology upgraded via J'accard metric'. Furthermore, to make the preferences of the two methodologies, they created a mixture trust-upgraded CF suggestion approach 'Tecf' which incorporates both methodologies. 'Yin and Peng' [19] given a reasonable schema to the execution correlation among another suggestion calculation and the delegate/generally acknowledged calculations. They introduced a relative assessment of eight CF calculations in point of interest (i.e., k-closest neighbor (KNN), solitary worth deterioration (SVD), non-negative MF, weighted non-negative MF, central part examination (PCA) KNN, Svd knn, Nmf knn, and Eigentaste) on two datasets (i.e., Jester and Movielens) by utilizing three quality measurements (i.e., root mean square blunder (RMSE), review, and standardized separation based execution measure (NDPM). 7
17 The ultimate goal of all the methods proposed above is to provide accurate recommendations to the users. The main problems encountered by the recommender systems is scalability, sparsity and cold start. The most common methods used in recommender systems is 'Pearson correlation' and 'Adjusted cosine similarity' methods. I have utilized both these methods to measure the alikeness among the items using item clustering and computed the Root Mean Square Error for each of these methods to show the accuracy of the predictions. 2.1 Existing Methods Amazon and YouTube uses Item Clustering Collaborative Filtering technique. Last.fm and Reddit use Collaborative Filtering technique [21]. Pandora uses Content based approach. Facebook, MySpace, LinkedIn use 'Collaborative Filtering technique' to make friend suggestions, groups and other social connections by observing the network of connections between a user and people present in their connections [22]. Twitter [22] makes use of several signals and in-memory calculations for suggesting whom to follow. Netflix is a hybrid system. They make recommendations by comparing the watching and searching habits of similar users (Collaborative Filtering) as well as by offering movies that share characteristics with films that a user has rated highly (Content based Filtering). Pandora [22] utilizes the features of a song or artist for tuning into a station which plays music with alike features. Feedback given by the users is used to tune the station, not considering some features like dislikes and gives more importance to the other features like liking a particular song. This is a 'Content Based approach'. With little information Pandora can get started and has limited scope. If the song is alike to the original seed, only then it is suggested. In Last.fm, [22] a station is created in which songs are suggested to the users based on history of the user and compares the taste of current user against other users. Last.fm plays songs which the users with similar taste listen frequently. This is an example of 'User based Collaborative Filtering' because suggestions are made by considering other users choice. 8
18 Last.fm needs huge data related to a particular user to make reliable suggestions. This experiences cold start problem. NARRATIVE 3.1 System Overview With the increase in the use of e-commerce sites, it has become very easy for the users to find the items of their interest without wasting a lot of time. Websites like Amazon and Ebay examples for Recommender systems which provide recommendations to the users based on their search history and purchase history. Recommender systems provide recommendations of almost all the items ranging from books to movies to music. FaceBook and Twitter are also recommender sites which provide recommendations for friends. NetFlix.com is very famous as a movie recommender website. Yahoo News and Google news are very famous for news [3]. 3.2 System Features The main focus of recommender systems is the user-rating data of the operating system prior to introducing the algorithms of recommender systems, therefore, we define the terms related to this Information that will be used throughout: User: A user is referred to that person who uses the system and the system suggests items to that person. The total set of users belongs to one community. Item: An item is an entity in a 'Recommender system' which can be a song or a movie or any product. Rating is a number given by a user to a particular item on a particular scale based on liking of that person. profile. Profile: the ratings given to all items by a particular user is regarded as the user's 9
19 User-Item Matrix: The matrix which contains ratings given by a user to an item is defined as 'User-Item matrix'. 'Recommender systems' consider a set of user profiles; by having a collection of ratings of the available content, this set provides a source of data that is not valid and that can be used to provide recommendations for each user. Ratings can be collected either explicitly or implicitly; In general, recommendation process has the following phases: i. Learn likes and dislikes of a user. ii. Compare him to other users. iii. Recommend items based on the rating patterns of his and the ones similar to him. There are many variations possible in the three phases but the purpose remains the same. Learning likes and dislikes is the process of feedback. It is important to learn likes and dislikes in the best possible manner so that results can be personalized as well as possible. There are broadly two forms of feedback, namely, implicit feedback and explicit feedback. As the name implies, implicit feedback is taken implicitly without asking the user to fill up data. It is done by using web logs, by analyzing user activity in terms of downloads, ratings given etc. Analyzing implicit data is relatively difficult because the behavior of users is not deterministic. Not rating a movie can either mean that he did not like the movie or it can also mean he liked it but was not interested in rating it. Predicting the likes and dislikes this way is error prone and also, data is very sparse. But, in some situations, implicit feedback is the only way possible. For example, in music recommendations, implicit feedback is the way to go as a song has a lot of possible dimensions/traits to it and it is not possible for a user to rate them on all of them or even a subset. Thus, in such circumstances, listening habits of users serve as the best option. On the other hand, movies and books are great examples of explicit feedback as they can be rated on a scale of a number. As each one has its pros and cons, most recommender systems learn about a new user by using explicit feedback and then keep improving the user profile by using implicit feedback 10
20 3.3 Problem Statement The existing recommender systems only used one similarity measure to chalk out neighbors for users as stated in "Towards an Introduction to Collaborative Filtering" [4]. But, each similarity measure has its own drawbacks and thus, all those drawbacks cause faulty and bad recommendations. Pearson correlation coefficient is the most widely cited in the literature but it is not so suitable when the users being compared are active and inactive [4]. Thus, in such situations, adjusted cosine works better because it takes into consideration the average of the users. Also, when the number of co rated movies is less, which means the movies that are rated by users under consideration, then, there has to be a way to bring normalcy to the comparison [4]. 3.4 Proposed System: The proposed system implements an information filtering technique called as Collaborative Filtering. Broadly, collaborative filtering can be of two types, namely, 'User-based collaborative filtering' and 'item-based collaborative filtering' [4]. Although, 'user based collaborative filtering' has been proved to have produced better results, it is highly computationally expensive and does not scale well with increase in the number of users. Thus, we favored item-based collaborative filtering. We used two different similarity techniques, namely, 'Pearson correlation coefficient' and 'Adjusted cosine similarity' and we also implemented global effects for the dataset. For measuring the error rate, we used Root Mean Square Error (RMSE) [4]. 3.5 Software Requirements Specification Following are the software requirements specifications: Purpose: 11
21 Recommender system is developed using the algorithms proposed. This makes the user recommendations more reliable. This project aims at minimizing the flaws present in the traditional systems Specific Requirements This section describes the hardware software requirements of the project System Requirements Below are the specifications of software and hardware needed for the execution of the project Software Requirements The software needed for the demonstration of the project are: Operating System Language User Interface : Any Windows Operating System. : JAVA for implementing the Algorithms & JSP. : HTML Hardware Requirements The hardware requirements for the project are: Processor : Pentium IV RAM : 500 MB HARDDISK : 40 GB 3.6 Functional Requirements 12
22 The functions that have to be performed by the proposed system are listed below: Tabulating the user-item rating data The main use of Collaborative Filtering recommendation algorithm is to predict the rating of selected item which is not rated by the user depending upon the item-ratings of items that the user has already rated. Similarity between Items: Similarity computation is the bottleneck in a collaborative filtering system. There are several ways of doing it. One of the most popular ways of computing similarity is by using Pearson Correlation co-efficient (PCC) & adjusted cosine similarity. Selection of Neighbors Neighbors are selected based on the similarity with the current user. Threshold based selection is a technique in which a user is considered to be the neighbor of the current user if the similarity exceeds a threshold value. Prediction using Similarities in Item-Item: Once comparisons between the user and the rest of the community of recommenders (regardless of the method applied) are complete, predicted ratings of unrated content can be computed. As above, there are a number of means of computing these predictions. 3.7 Non-functional Requirements Following are Non-functional requirements: Performance 13
23 The software should use very less amount of memory. The processor must be used efficiently by the process. User must complete the operations in a short interval of time. Reliability When user calls a software over a specific period of time, the software must deliver the expected services. It the product provides wrong services the product is not reliable. Availability The software must provide proper service to the user when it is run. The requested services should be delivered in time. Security Only admin user should know the id and password. All data is protected and not accessible by others. Maintainability Our project was highly flexibility and maintainability if we want we want we can add any new features to this project at any time. It is more understandable to others who want to maintain this project. Portability It is portable it can be use any system where the specified requirements are satisfied 3.8 Use Case Diagram The Use diagram is represented in figure 3.1 and table
24 Figure 3.1 Table 3.1 Use Case Usecase Functionn Rate Items Search movies Get recommendations Rate the movies To search movies Generate recommandations 15
25 DESIGN 4.1Detailed Design Design phase involves planning different stages for implementing the Software. A good design helps in developing a good software with less effort. The system is totally menu driven. It is a very powerful tool for interactive control the menu structure is designed in such a way that the user can navigate easily by selecting the links. In a system, result of a process can be seen in the output. An efficient output design improves the system s association with the user and helps in making a decision. The design of output specification carried out with much user friendliness. The system being developed provides the output in the form of screens. The system also provides messages for user friendliness. The screen is provided with help menus and messages that they can help users at difficult situations. Also provides error message, wherever necessary. In the user login screen, the user enters the login id. After login into the system the user home page opens & it contains four links.. The links are home, get recommendations, rate movies & logout. There is search option also. Figure represents the block diagram for the system. 16
26 Figure Block diagram When clicked on the home link, the movies rated most number of times are displayed along with these the movies rated most highly are also present in home page. When clicked on the get recommendations link the movies are recommended for the user using both the methods pearson & adjusted methods. When clicked on rate movies, the movies which were unrated by the user are displayed along with the options of the rating from 1-5. At a time only one movie can rated. Figure represents the flow chart. 17
27 Figure Flow chart When clicked on the search option by entering the data in the search space, the details of the movie along with prediction are displayed on the screen using both Pearson & Adjusted Cosine similarity. The error rates are found on the command prompt by running the class, the mean error rates are displayed for five test cases for all the methods. For measuring the accuracy of the techniques(pearson and Adjusted cosine), the data is divided into two parts base case and test case, where base case is used for machine learning to predict the ratings for a movie that is not listed in that base case list. The test case has the list of actual ratings which is used for comparing with the calculated values. I have designed a program which calculates the prediction for that unlisted movie in base case using both the techniques and compares its values to the actual rating present in the test case. These values are displayed in the command prompt. Refer to Figure 6.6.1, Figure 6.6.2, Figure and Figure
28 4.2 Architecture of the proposed system This structure in figure 4.1 shows the system architecture and how the system is implemented and achieves the functionalities proposed. The data is read from the input files and is tabulated as a item-user rating table. Then the similarity analysis is done on the data by applying the suitable methods like Pearson and adjusted cosine similarity. The similarity analysis helps in identify the neighbors and generate the prediction. The effectiveness of the methods are observed by their error rates calculated from the test cases data which is actual data split into 80-20, 80 for learning and 20 for testing. Then the differences are squared followed by mean of them is calculated and square root is calculated for each user item having rating greater than zero. All these error rates mean is presented. Input data files Reading & tabulating data Calculate similarity Show Predictions Test data cases Calculate the difference of predicions Root mean square error Show error rates Figure System Architecture 4.3 Algorithm The algorithms used in the project are Pearson correlation technique and Adjusted cosine similarity technique for calculating the similarity between the items. Tabulating the user-item rating data The task of the 'collaborative filtering recommendation algorithm' is to make suggestions to the user about an item that the user has not rated based on the past history of the user. The 'user-item rating' database is used for making observations [4]. 19
29 Each user has item-rating pairs, and can be represented in a 'user-item table', whichh has ratings 'Rij' that are given by i th user for the j th item, as shown in table 4.1 Table 4.1 User item rating S1 S2 S3 S4 S5 U U2 1 2 U U ? Similarity between Items: Similarity computation is the bottleneck in a collaborative filtering system. There are several ways of doing it. One of the most popular ways of computing similarity is by using Pearson Correlation co-efficient (PCC) as mentioned in [4]. The formula is in figure 4.2 Figure 4.2 Pearson Similarity Formula PCC shows how proportional are changes between two given variables. The units of the variables does not matter in the case of pcc.the similarity is between -1 and shows least similarity and +1 shows highest similarity. Another way of doing it is Adjusted Cosine Similarity is as shown in figure 4.3 Figure 4.3 Adjusted cosine similarity formula In cosine similarity, the similarity is never negative and so, it is always between 0 and 1. Adjustedd cosine similarity is better than Cosine similarity because it takes into consideration the 20
30 user average. This implies that if a user gives always 5 to each movie and another gives 1 or 2 to all, then, that pattern of ratings is taken into consideration when recommending movies [4]. Prediction using Similarities in Item-Item: Once comparisons between the user and the rest of the community of recommenders (regardless of the method applied) are complete, predicted ratings of unrated content can be computed [4]. As above, there are a number of means of computing these predictions. Here we present one method, guessing the rating P i of an item I for user a is computed as a "weighted average of neighbor ratings R b,i ". The first is a weighted average of neighbor ratings as shown in figure 4.4 Figure 4.4 Prediction formula 4.4 CLASS DIAGRAM Class diagrams are the diagrams used in object-oriented systems for modeling. Various objects and how each object is related to other object are depicted in a class diagram Collaborative class The major purpose of this class is reading the data from the files and calculates the basic operations on the data like averages and identifying top items in the data. The functions present in the class are gettopmovies() which identifies the movies which are rated most number of times, minmax() function returns either minimum or maximum based on the arguments, getusermovielist() returns all the movies rated by the use,. Search() returns the movie id of the movie searched by the user, UserAverage() function calculates and returns the average of all the ratings given by him, getrate() function takes the rating given by the user and stores in the files. 21
31 4.4.2 CFAlgorithms class The major purpose of the this class is implementing the similarity analysis like pearson and adjusted cosine similarity. The functions present in the class are calcpear() which calculate the pearson similarity between the items, calcadjustedcosine() which calculate the adjusted cosine similarity between the items, gettopsimilar() returns the neighbors of the itemid passed, getrecommendation() returns the recommendations to the user passed RMSE Class The main purpose of the RMSE class is to calculate the error rates of the methods used in the recommendations generation. The main functions in this class is computermse(), it is used for calculating the error rates for all the methods. Figure 4.5 Class diagram 22
32 4.5 SEQUENCE DIAGRAM A sequence diagram is a diagram which shows the flow of the project. It shows various modules and also the messages passed in between them. In the Figure 4.6 different modules of the project are shown. Rate items, Search items, Recommendation and Predict rate are the modules and the function of each module is shown in Figure 4.6. Figure 4.6 Sequence diagram 23
33 IMPLEMENTATION 5.1 Java Technology 'Applets and applications' are different programs in Java Programming language. An Applet is a small application written in java which runs in a java enabled browser [23]. Applets are very useful in developing interactive websites. An application is defined as a 'standalone program' which runs on Java platform. An application server is used to support clients on the network. Proxy servers, mail servers, and print servers are different types of servers. The main advantages of application servers are high security, good performance, centralized configuration data integrity and code integrity. Another specialized program is a servlet. "A servlet can almost be thought of as an applet that runs on the server side". "Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts". "Servlets are similar to applets in that they are runtime extensions of applications". "Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server".[23] 5.2 Steps of Implementation Step 1: Tabulating the 'user-item' rating data The ratings given by a user to an item are tabulated in a 'User-Item' rating table. This data is used to make predictions. 24
34 Step 2: Similarity between Items: Computing similarity is by using Pearson Correlation co-efficient (PCC) and Adjusted Cosine Similarity. Step 3: Selecting Neighbors Neighbors are selected based on a threshold value which is calculated by using Threshold-based selection technique. If the similarity value exceeds a threshold value that user is considered to be a neighbor. 25
35 TESTING 6.1 Testing Objectives Testing process constitues checking errors in the code. The main objective of testing is to find the undiscovered errors in the code. Testing is a very important module in software life cycle. 6.2 Various methods of testing There are various methods of testing the code White Box Testing: To check the control structure of the program White box Testing can be used. Test cases ensure that all the functionalities of the software have been tested at least once. Black Box Testing: Black box testing is designed to check if the software validations are properly done without considering the internal working of the software. 6.3 Levels of Testing The various levels of testing are listed below: Unit testing: In Unit testing all the modules of the software are tested to check if they are in accordance with the modules provided during the design phase. Testing the internal logic of the code is the main motive of Unit testing. White box testing techniques are utilized to check if the control structure is good and does maximum error detection. System Testing: 26
36 In System testing the software is checked to see if the software meets the requirements. In this project, lowest level of unit testing is done to test the data, reports and stored procedures that are developed. Testing Process: Software testing is done for finding the errors in the software. This is the last phase before software is delivered 6.4 UNIT TESTING Test Case 1 This test case checks if ratings are read properly using the dataset taken from movielens website. Test case 1 is represented in table 6.1 it describes the objective of the test which is to check if the values are read properly, the input given is the data set taken from MovieLens and the expected output and the output obtained are the user ratings. Table 6.1 Test case 1 Test Name Objective Input Expected output Original output Unit test Check if ratings are read properly using data in files Data in files Users ratings Users ratings Error info Solution for error
37 Test performed by Shivani Dated on The Figure 6.1 shows the page which loads proper data from the data set. The home page displays the list of movies which the user has given ratings for. Figure 6.1 Test case Test Case 2 This test case is to check if the recommendations are generated properly. The table 6.2 describes the objective which is to check if the recommendations are generated properly. The recommendations are given based on the values obtained from the algorithms. 28
38 Table 6.2 Test case 2 Test Name Objective Unit test Check if recommendations are generated Input Expected output Original output Ratings of user Recommendations for user Recommendations for user Error info Solution for error Test performed by Shivani Dated on Figure 6.2 shows the page which displays the recommendations which are obtained from Pearson Correlation method and Adjusted Cosine similarity methods. When the user clicks on GET RECOMMENDATIONS option he gets recommendations computed by using both the techniques. 29
39 Figure 6.2 Test case Test Case 3 This test case checks if the search option is working properly. Search option is present at the home page which can be used to search details of a particular movie. Table 6.3 Test case 3 Test Name Objective Unit test Check if Proper function of search option Input Expected output Original output Movies data in files Similar movies of searched movie Similar movies of searched movie Error info
40 Solution for error Test performed by Shivani Dated on Figure 6.3 Test case Test Case 4 This test case is used for testing if the ratings are updated properly. The user can rate the movies which he has not rated before. This test case is to ensure that the ratings are updated properly. 31
41 Table 6.4 Test case 4 Test Name Objective Unit test Check if ratings are updated properly Input Expected output Original output Movies data in files Updated ratings Updated ratings Error info Solution for error Test performed by Shivani Dated on Figure shows the page where user can provide his rating. Figure shows that the value has been updated. 32
42 Figure Test case 4.1 Figure Test case
43 6.4.5 Test Case 5 This test case is to check if the validations are properly made. When a user clicks on rate option without selecting the number ranging between 1-5 an error is prompted. Table 6.5 Test case 5 Test Name Objective Unit test Check if Proper validations are made Input Expected output Original output Movies data in files Error when movie not rated Error when movie not rated Error info Solution for error Test performed by Shivani Dated on
44 Figure 6.5 shows that error is thrown when user clicks rating without selecting the number. Figure 6.5 Test case Test Case 6 This Test case is for error calculation to check the accuracy of the algorithms used. As mentioned in 4.1, The error rates are found on the command prompt by running the class, the mean error rates are displayed for five test cases for all the methods. For measuring the accuracy of the techniques(pearson and Adjusted cosine), the data is divided into two parts base case and test case, where base case is used for machine learning to predict the ratings for a movie that is not listed in that base case list. The test case has the list of actual ratings which is used for comparing with the calculated values. I have designed a program which calculates the prediction for that unlisted movie in base case using both the techniques and compares its values to the actual rating present in the test case. 35
45 Figure shows the base case data which is fed to the program for learning purpose. The first column represents user, second column shows the movie and the third column is for movie ratings. It can be noticed that user 1 has not given rating for movie 6. Figure Base case 36
46 The following Figure is the actual data which is used to compare with the values obtained using the algorithms Pearson correlation and Adjusted cosine similarity. Figure Test Case 37
47 Figure shows the program rmse.java which is used to test the accuracy of the algorithms. This program takes the data from base case, computes the recommendations using Pearson correlation technique and adjusted cosine technique. It calculates root mean square error for the actual values which are present in the test case file and the values obtained from these techniques. Figure Program to calculate rmse 38
48 The following figure shows the root mean square errors of Adjusted cosine method and Pearson Correlation technique. It is seen that the error rate for Pearson is less than the error from Adjusted Cosine method, which indicates that Pearson correlation method is a better method for using in recommender systems. Figure rmse errors 39
49 IMPLEMENTATION RESULTS 7.1 Generating Recommendations The main aim of the project is to provide recommendations to the users using two algorithms, Pearson Correlation and Adjusted cosine similarity methods. In the figure 7.1, we can see the recommendations generated for the user 450 using Pearson correlation analysis. The recommendations are obtained by clicking on the get recommendation link. Top 15 recommendations are displayed for the user. Apart from the recommendations the prediction values are also shown in each recommendation. Figure 7.1 Generating recommendations 1 40
50 In the below figure7.2, the recommendations generated using adjusted cosine similarity method is shown for user 450. The top 15 recommendations are shown using this method, apart from the recommendation the prediction value is also shown. The adjusted cosine similarity method results are shown in the above figure. Figure 7.2 Generating recommendations 2 41
51 7.2 Viewing Search Results In the figure 7.3, the search details are displayed for item searched using the textbox shown in the top right. The details of movie shown are count of the users who rated this movie, the average rating given to that movie, the rating given by the current user (if any). Rate option given to the user if he did not rate the movie. Figure 7.3 Viewing search results 1 42
52 In the figure 7.4, apart from the movie details in the search option the similar movies are also presented to the user. The similar movies for the searched item are also presented using both the similarity methods. The top 15 movies similar to the searched item are displayed. Both the similarity methods are used for calculating the similarity for the searched item. Figure 7.4 Viewing Search results 2 43
53 7.3 Calculating the Error Rates In the above figure 7.5, the error rates are shown for all the similarity methods for prediction generation. This test is done on five test cases with learning to test ratio. The above error rates which are well below the 1.0 indicate the effectiveness of the system. The main motive of a 'recommender system' is to provide accurate suggesttions to a user. In my project I implemented two algorithms for providing recommendations. Calculating rmse values for these both techniques helps us in understanding which technique gives more accurate results. As already mentioned in 4.2 and 6.6 the figure 7.5 shows values which are calculated for 4 different set of test cases. In all the cases Pearson correlation is found to be more accurate. Figure 7.5 Calculating Errors 44
54 CONCLUSION Two methods are used to predict ratings for a given item for a given customer. They are Pearson correlation coefficient, adjusted cosine similarity and. The root mean square error (RMSE) for all the two cases are compared and following comparison is made: Figure 8.1 Error rates Clearly, Pearson correlation is not as good as adjusted cosine similarity as explained in Test Case 6. The reason is that Pearson correlation gives a biased result whenever the number of co-rated items is less as shown in figure 8.1. For example, when comparing two items, one of which has hundred ratings as 5 and the other having just 5 ratings, all 5, their Pearson correlation 45
55 coefficient will come as 1 (which shows most similar) because it doesn t take into consideration the fact that the number of ratings for the second one is less. Adjusted cosine takes into consideration the difference in rating habit of users and thus, subtracts the user average from the ratings. FUTURE ENHANCEMENTS Also, I feel that RMSE is not the right way to measure the accuracy of a recommender system. It is a topic of hot debate in recommender system research and we second that. Recommender systems were originally meant for showing users items that they did not know were there. It is about discovery. Using RMSE, all we are testing is how accurate we are in predicting something which has already been rated. This is an effective way of testing if our algorithms are working correctly, but this is not how a recommender system should be judged. We should have statistical measures to correctly judge user satisfaction, the fact that when did he discover things he was not aware of and if he/she was happy/unhappy about it. I would like to say that similarity computation is the most expensive part of the collaborative filtering process and it is an unsolved problem. The rating data is always sparse because if the rating data is not sparse and full of data, then, the recommendation process becomes trivial. There are several other ways to compute similarity, like trust-based collaborative filtering, where there is a circle of trust for all users and only that circle is asked for recommendations. There is one scenario where a few experts are only used to rate movies and their opinion is used for all others. 46
56 REFERENCES [1] Introduction to Recommender Systems Handbook. Francesco Ricci, Lior Rokach and Bracha Shapira, 2011 [2] Recommender Systems for the web J.Ben Schafer, Joseph A Konstan John T Reidl, 2006 [3] Recommender System- Wikipedia, the free encyclopedia, 2013 [4] Towards an Introduction to Collaborative Filtering. Jia Zhou, Tiejian Luo, 2009 [5] Implementation of Item and Content based Collaborative Filtering Techniques based on ratings average for Recommender Systems. Rohini Nair, Kavita Kelkar,University of Mumbai, 2013 [6] Extraction and integration of MovieLens and IMDb Data. Veronika Peralta, 2007 [7] Scalability-Wikipedia, the free encyclopedia [8] Correlation and dependence- Wikipedia, the free encyclopedia [9] Statistical dependence: Independence, Correlation and dependence, Copula, Spearman s rank correlation coefficient [10] A Collaborative Filtering Recommendation Algorithm based on User Clustering and Item Clustering. SongJie Gong, Zhejiang Business Technology Institute, 2010 [11] Improved Neighborhood-based Collaborative Filtering. Robert M. Bell and Yehuda Koren, 2007 [12] A Framework for Collaborative, Content-Based and Demographic Filtering. Micheal Pazzani, University of California, 2004 [13] Online Clustering for collaborative filtering. Lee W.S, School of Computing Technical Report,
57 [14] A hybrid collaborative filtering recommendation mechanism for P2P networks. Z. Liu, W. QU, H. Li, C.Xie, 2010 [15] Recommendation based on rational inferences in collaborative filtering. J.M. Yang, K.F. Li, D.F. Zhang, 2013 [16] Collaborative Filtering with temporal dynamics. Y. Koren, 2009 [17] Collaborative Filtering in a Non-Uniform World. R. Salakhutdinov, N. Sebro, 2010 [18] A Hybrid trust-enhanced collaborative filtering recommendation approach for personalized government-to-business e-services. Q. Shambour, 2011 [19] A careful assessment of recommendation algorithms related to dimension reduction techniques. C.X. Yin, Q. K. Peng, 2012 [20] Scalable collaborative filtering approaches for large recommender systems. G. Takacs, 2009 [21] Amazon.com Recommendations Item-to-Item Collaborative Filtering. Greg Linden, Brent Smith, and Jeremy York, 2006 [22] [23] The Java Tutorial, Third Edition. Mary Campione, Kathy Walrath, Alison Huml 48
58 APPENDIX A1 User Manual 5.3 Sample Coding public void calcpearson() { int i,j; int itemsrateduser[][]=col.getitemsrate(); for(i=0;i<itemstotal;i++) { for(j=i;j<itemstotal;j++) { float num=(float) 0.0; float denom1=(float) 0.0; float denom2=(float) 0.0; int coratecount=0; float iavg=col.getitemaverage(i); float javg=col.getitemaverage(j); for(int k=0;k<col.getitemratecount(i);k++) { int m=itemsrateduser[i][k]-1; if(rate[m][j]!=0) 49
59 { num+=(rate[m][i]-iavg)*(rate[m][j]-javg); denom1+=(rate[m][i]-iavg)*(rate[m][i]-iavg); denom2+=(rate[m][j]-javg)*(rate[m][j]-javg); coratecount++;} } pearson[i][j]=(float) (num/math.sqrt(denom1*denom2)); if(coratecount<15){pearson[i][j]=pearson[i][j]*coratecount/15;} if(coratecount==0)pearson[i][j]=-1; pearson[j][i]=pearson[i][j]; } } } public void calcadjustedcosine() { int i,j; int itemsrateduser[][]=col.getitemsrate(); for(i=0;i<itemstotal;i++) { for(j=i;j<itemstotal;j++) 50
60 { float num=(float) 0.0; float denom1=(float) 0.0; float denom2=(float) 0.0; int coratecount=0; for(int k=0;k<col.getitemratecount(i);k++) { int m=itemsrateduser[i][k]-1; float uavg=col.getuseraverage(m); if(rate[m][j]!=0) { num+=(rate[m][i]-uavg)*(rate[m][j]-uavg); denom1+=(rate[m][i]-uavg)*(rate[m][i]-uavg); denom2+=(rate[m][j]-uavg)*(rate[m][j]-uavg); coratecount++; } } adjustcosine[i][j]=(float) (num/math.sqrt(denom1*denom2)); if(coratecount<15){adjustcosine[i][j]=adjustcosine[i][j]*coratecount/15;} if(coratecount==0)adjustcosine[i][j]=-1; adjustcosine[j][i]=adjustcosine[i][j];} } } 51
RECOMMENDATION SYSTEM
RECOMMENDATION SYSTEM October 8, 2013 Team Members: 1) Duygu Kabakcı, 1746064, [email protected] 2) Işınsu Katırcıoğlu, 1819432, [email protected] 3) Sıla Kaya, 1746122, [email protected]
Recommendation Tool Using Collaborative Filtering
Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining [email protected] Outline Predictive modeling methodology k-nearest Neighbor
Usage of OPNET IT tool to Simulate and Test the Security of Cloud under varying Firewall conditions
Usage of OPNET IT tool to Simulate and Test the Security of Cloud under varying Firewall conditions GRADUATE PROJECT REPORT Submitted to the Faculty of The School of Engineering & Computing Sciences Texas
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
A puzzle based authentication method with server monitoring
A puzzle based authentication method with server monitoring GRADUATE PROJECT REPORT Submitted to the Faculty of The School of Engineering & Computing Sciences Texas A&M University-Corpus Christi Corpus
Implementation of Botcatch for Identifying Bot Infected Hosts
Implementation of Botcatch for Identifying Bot Infected Hosts GRADUATE PROJECT REPORT Submitted to the Faculty of The School of Engineering & Computing Sciences Texas A&M University-Corpus Christi Corpus
IPTV Recommender Systems. Paolo Cremonesi
IPTV Recommender Systems Paolo Cremonesi Agenda 2 IPTV architecture Recommender algorithms Evaluation of different algorithms Multi-model systems Valentino Rossi 3 IPTV architecture 4 Live TV Set-top-box
Data Mining for Web Personalization
3 Data Mining for Web Personalization Bamshad Mobasher Center for Web Intelligence School of Computer Science, Telecommunication, and Information Systems DePaul University, Chicago, Illinois, USA [email protected]
Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
Automated Collaborative Filtering Applications for Online Recruitment Services
Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,
NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, 2015. Version 4.0
NS DISCOVER 4.0 ADMINISTRATOR S GUIDE July, 2015 Version 4.0 TABLE OF CONTENTS 1 General Information... 4 1.1 Objective... 4 1.2 New 4.0 Features Improvements... 4 1.3 Migrating from 3.x to 4.x... 5 2
Utility of Distrust in Online Recommender Systems
Utility of in Online Recommender Systems Capstone Project Report Uma Nalluri Computing & Software Systems Institute of Technology Univ. of Washington, Tacoma [email protected] Committee: nkur Teredesai
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
A Workbench for Comparing Collaborative- and Content-Based Algorithms for Recommendations
A Workbench for Comparing Collaborative- and Content-Based Algorithms for Recommendations Master Thesis Pat Kläy from Bösingen University of Fribourg March 2015 Prof. Dr. Andreas Meier, Information Systems,
Software Engineering I CS524 Professor Dr. Liang Sheldon X. Liang
Software Requirement Specification Employee Tracking System Software Engineering I CS524 Professor Dr. Liang Sheldon X. Liang Team Members Seung Yang, Nathan Scheck, Ernie Rosales Page 1 Software Requirements
Intelligent Web Techniques Web Personalization
Intelligent Web Techniques Web Personalization Ling Tong Kiong (3089634) Intelligent Web Systems Assignment 1 RMIT University [email protected] ABSTRACT Web personalization is one of the most
Pharos Control User Guide
Outdoor Wireless Solution Pharos Control User Guide REV1.0.0 1910011083 Contents Contents... I Chapter 1 Quick Start Guide... 1 1.1 Introduction... 1 1.2 Installation... 1 1.3 Before Login... 8 Chapter
Planning the Installation and Installing SQL Server
Chapter 2 Planning the Installation and Installing SQL Server In This Chapter c SQL Server Editions c Planning Phase c Installing SQL Server 22 Microsoft SQL Server 2012: A Beginner s Guide This chapter
Test Run Analysis Interpretation (AI) Made Easy with OpenLoad
Test Run Analysis Interpretation (AI) Made Easy with OpenLoad OpenDemand Systems, Inc. Abstract / Executive Summary As Web applications and services become more complex, it becomes increasingly difficult
Application Notes for Configuring Dorado Software Redcell Enterprise Bundle using SNMP with Avaya Communication Manager - Issue 1.
Avaya Solution & Interoperability Test Lab Application Notes for Configuring Dorado Software Redcell Enterprise Bundle using SNMP with Avaya Communication Manager - Issue 1.0 Abstract These Application
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender
Content-Boosted Collaborative Filtering for Improved Recommendations
Proceedings of the Eighteenth National Conference on Artificial Intelligence(AAAI-2002), pp. 187-192, Edmonton, Canada, July 2002 Content-Boosted Collaborative Filtering for Improved Recommendations Prem
RECOMMENDATION SYSTEM USING BLOOM FILTER IN MAPREDUCE
RECOMMENDATION SYSTEM USING BLOOM FILTER IN MAPREDUCE Reena Pagare and Anita Shinde Department of Computer Engineering, Pune University M. I. T. College Of Engineering Pune India ABSTRACT Many clients
eg Enterprise v5.2 Clariion SAN storage system eg Enterprise v5.6
EMC Configuring Clariion and SAN and Monitoring Monitoring storage an system EMC an eg Enterprise v5.2 Clariion SAN storage system eg Enterprise v5.6 Restricted Rights Legend The information contained
An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
End User Guide The guide for email/ftp account owner
End User Guide The guide for email/ftp account owner ServerDirector Version 3.7 Table Of Contents Introduction...1 Logging In...1 Logging Out...3 Installing SSL License...3 System Requirements...4 Navigating...4
A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs
A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs Michael Scherger Department of Computer Science Texas Christian University Email: [email protected] Zakir Hussain Syed
Optimal Service Pricing for a Cloud Cache
Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,
Software Requirements Specification
METU DEPARTMENT OF COMPUTER ENGINEERING Software Requirements Specification SNMP Agent & Network Simulator Mustafa İlhan Osman Tahsin Berktaş Mehmet Elgin Akpınar 05.12.2010 Table of Contents 1. Introduction...
SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS
SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Bachelor of Science with Honors Research Distinction in Electrical
NETWRIX EVENT LOG MANAGER
NETWRIX EVENT LOG MANAGER QUICK-START GUIDE FOR THE ENTERPRISE EDITION Product Version: 4.0 July/2012. Legal Notice The information in this publication is furnished for information use only, and does not
Tableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
Oracle Service Bus Examples and Tutorials
March 2011 Contents 1 Oracle Service Bus Examples... 2 2 Introduction to the Oracle Service Bus Tutorials... 5 3 Getting Started with the Oracle Service Bus Tutorials... 12 4 Tutorial 1. Routing a Loan
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide
vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide This document supports the version of each product listed and supports all subsequent versions until a new edition replaces
Building Java Servlets with Oracle JDeveloper
Building Java Servlets with Oracle JDeveloper Chris Schalk Oracle Corporation Introduction Developers today face a formidable task. They need to create large, distributed business applications. The actual
Visualisation in the Google Cloud
Visualisation in the Google Cloud by Kieran Barker, 1 School of Computing, Faculty of Engineering ABSTRACT Providing software as a service is an emerging trend in the computing world. This paper explores
Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2
Job Reference Guide SLAMD Distributed Load Generation Engine Version 1.8.2 June 2004 Contents 1. Introduction...3 2. The Utility Jobs...4 3. The LDAP Search Jobs...11 4. The LDAP Authentication Jobs...22
SYSTEM REQUIREMENTS SPECIFICATIONS FOR THE PROJECT INVENTORY CONTROL SYSTEM FOR CALCULATION AND ORDERING OF AVAILABLE AND PROCESSED RESOURCES
SYSTEM REQUIREMENTS SPECIFICATIONS FOR THE PROJECT INVENTORY CONTROL SYSTEM FOR CALCULATION AND ORDERING OF AVAILABLE AND PROCESSED RESOURCES GROUP 9 SIMANT PUROHIT BARTLOMIEJ MICZEK AKSHAY THIRKATEH ROBERT
A Social Network-Based Recommender System (SNRS)
A Social Network-Based Recommender System (SNRS) Jianming He and Wesley W. Chu Computer Science Department University of California, Los Angeles, CA 90095 [email protected], [email protected] Abstract. Social
Evaluation of different Open Source Identity management Systems
Evaluation of different Open Source Identity management Systems Ghasan Bhatti, Syed Yasir Imtiaz Linkoping s universitetet, Sweden [ghabh683, syeim642]@student.liu.se 1. Abstract Identity management systems
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
recommendation in e-commerce
recommendation in e-commerce Luminis Recommendation Services We are leaving the age of information and entering the age of recommendation Chris Anderson Authors: Hans Bossenbroek Hans Gringhuis Luminis
BillQuick Agent 2010 Getting Started Guide
Time Billing and Project Management Software Built With Your Industry Knowledge BillQuick Agent 2010 Getting Started Guide BQE Software, Inc. 2601 Airport Drive Suite 380 Torrance CA 90505 Support: (310)
Using News Articles to Predict Stock Price Movements
Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 [email protected] 21, June 15,
SYSTEM REQUIREMENTS...
Contents INTRODUCTION... 1 BillQuick HR Setup Checklist... 2 SYSTEM REQUIREMENTS... 3 HARDWARE REQUIREMENTS... 3 SOFTWARE REQUIREMENTS... 3 Operating System Requirements... 3 Other System Requirements...
Sports Management Information Systems. Camilo Rostoker November 22, 2002
Sports Management Information Systems Camilo Rostoker November 22, 2002 Introduction We are in the information age The availability of technology has brought forth a new problem domain how do we manage
L1: Introduction to Hadoop
L1: Introduction to Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Content-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
HELP DESK SYSTEMS. Using CaseBased Reasoning
HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind
Safewhere*Identify 3.4. Release Notes
Safewhere*Identify 3.4 Release Notes Safewhere*identify is a new kind of user identification and administration service providing for externalized and seamless authentication and authorization across organizations.
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
PATROL Console Server and RTserver Getting Started
PATROL Console Server and RTserver Getting Started Supporting PATROL Console Server 7.5.00 RTserver 6.6.00 February 14, 2005 Contacting BMC Software You can access the BMC Software website at http://www.bmc.com.
Characterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
Windows Server 2012 Server Manager
Windows Server 2012 Server Manager Introduction: Prior to release of Server Manager in Windows Server 2008, Enterprise solution was to use different third party vendors which includes CA, HP utilities
Data Transfer Management with esync 1.5
ewon Application User Guide AUG 029 / Rev 2.1 Content Data Transfer Management with esync 1.5 This document explains how to configure the esync 1.5 Server and your ewons in order to use the Data Transfer
Similarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
Online Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011
Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with
BUILDER 3.0 Installation Guide with Microsoft SQL Server 2005 Express Edition January 2008
BUILDER 3.0 Installation Guide with Microsoft SQL Server 2005 Express Edition January 2008 BUILDER 3.0 1 Table of Contents Chapter 1: Installation Overview... 3 Introduction... 3 Minimum Requirements...
SC-T35/SC-T45/SC-T46/SC-T47 ViewSonic Device Manager User Guide
SC-T35/SC-T45/SC-T46/SC-T47 ViewSonic Device Manager User Guide Copyright and Trademark Statements 2014 ViewSonic Computer Corp. All rights reserved. This document contains proprietary information that
OVERVIEW HIGHLIGHTS. Exsys Corvid Datasheet 1
Easy to build and implement knowledge automation systems bring interactive decision-making expertise to Web sites. Here s proven technology that provides customized, specific recommendations to prospects,
How To Set Up An Intellicus Cluster And Load Balancing On Ubuntu 8.1.2.2 (Windows) With A Cluster And Report Server (Windows And Ubuntu) On A Server (Amd64) On An Ubuntu Server
Intellicus Cluster and Load Balancing (Windows) Intellicus Enterprise Reporting and BI Platform Intellicus Technologies [email protected] www.intellicus.com Copyright 2014 Intellicus Technologies This
Online Courses Recommendation based on LDA
Online Courses Recommendation based on LDA Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura Cruz Quispe, José Ochoa Luna National University of St. Agustin Arequipa - Perú {r.guzmanap,elizavvc,lvcruzq,eduardo.ol}@gmail.com
PREA: Personalized Recommendation Algorithms Toolkit
Journal of Machine Learning Research 13 (2012) 2699-2703 Submitted 7/11; Revised 4/12; Published 9/12 PREA: Personalized Recommendation Algorithms Toolkit Joonseok Lee Mingxuan Sun Guy Lebanon College
http://docs.trendmicro.com
Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice. Before installing and using the product, please review the readme files,
Client/server is a network architecture that divides functions into client and server
Page 1 A. Title Client/Server Technology B. Introduction Client/server is a network architecture that divides functions into client and server subsystems, with standard communication methods to facilitate
Polynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
Configuring and Monitoring Hitachi SAN Servers
Configuring and Monitoring Hitachi SAN Servers eg Enterprise v5.6 Restricted Rights Legend The information contained in this document is confidential and subject to change without notice. No part of this
Introduction. Regards, Lee Chadwick Managing Director
User Guide Contents Introduction.. 2 Step 1: Creating your account...3 Step 2: Installing the tracking code.. 3 Step 3: Assigning scores to your pages.....4 Step 4: Customising your lead bands..5 Step
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
High Level Design Distributed Network Traffic Controller
High Level Design Distributed Network Traffic Controller Revision Number: 1.0 Last date of revision: 2/2/05 22c:198 Johnson, Chadwick Hugh Change Record Revision Date Author Changes 1 Contents 1. Introduction
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
THE BUSINESS VALUE OF AN ERP SYSTEM
THE BUSINESS VALUE OF AN ERP SYSTEM AJMAL BEG THE BUSINESS VALUE OF AN ERP SYSTEM AJMAL BEG ii Copyright c 2010 by Ajmal Beg. All rights reserved. This technology described in this publication is based
Zinstall HDD User Guide
Zinstall HDD User Guide Thank you for purchasing Zinstall. If you have any questions, issues or problems, please contact us: Toll-free phone: (877) 444-1588 International callers: +1-877-444-1588 Support
4, 2, 2014 ISSN: 2277 128X
Volume 4, Issue 2, February 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Recommendation
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
RingStor User Manual. Version 2.1 Last Update on September 17th, 2015. RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816.
RingStor User Manual Version 2.1 Last Update on September 17th, 2015 RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816 Page 1 Table of Contents 1 Overview... 5 1.1 RingStor Data Protection...
Tips and Tricks SAGE ACCPAC INTELLIGENCE
Tips and Tricks SAGE ACCPAC INTELLIGENCE 1 Table of Contents Auto e-mailing reports... 4 Automatically Running Macros... 7 Creating new Macros from Excel... 8 Compact Metadata Functionality... 9 Copying,
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
GRAVITYZONE HERE. Deployment Guide VLE Environment
GRAVITYZONE HERE Deployment Guide VLE Environment LEGAL NOTICE All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including
Performance Comparison of Database Access over the Internet - Java Servlets vs CGI. T. Andrew Yang Ralph F. Grove
Performance Comparison of Database Access over the Internet - Java Servlets vs CGI Corresponding Author: T. Andrew Yang T. Andrew Yang Ralph F. Grove [email protected] [email protected] Indiana University
Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010
Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide An Oracle White Paper October 2010 Disclaimer The following is intended to outline our general product direction.
