Context-aware Search for Personal Information Management Systems

Size: px
Start display at page:

Download "Context-aware Search for Personal Information Management Systems"

Transcription

1 Context-aware Search for Personal Information Management Systems Jidong Chen 1,2 Wentao Wu 1 Hang Guo 2 Wei Wang 1 1 Fudan University, Shanghai, China 2 EMC Research China, Beijing, China jidong.chen@emc.com, wentaowu@fudan.edu.cn, hang.guo@emc.com, weiwang1@fudan.edu.cn Abstract With the fast growth of disk capacity in personal computers, keyword search over personal data (a.k.a. desktop search) is becoming increasingly important. Nonetheless, desktop search has been shown to be more challenging than traditional Web search. Modern commercial Web search engines heavily rely on structural information (i.e., hyperlinks between Web pages) to rank their search results. However, such information is not available in the circumstance of desktop search. Therefore, state-of-the-art desktop search systems such as Google Desktop Search usually leverage pure textbased ranking approaches (e.g., TF-IDF), which often fail to give promising rankings due to the misinterpretation of user intention. We observed that in desktop search, the semantics of keyword queries are often context-aware, i.e., they are related to the current activity state (e.g., writing a paper, navigating a website, etc.) of the user. In this paper, we present a novel context-aware search framework by taking this activity information into consideration. Specifically, we use Hidden Markov Model (HMM) to capture the relationships between user s access actions (e.g., opening/closing files, sending/receiving s, etc.) and activity states. The model is learned from user s past access history and is used to predict user s current activity upon the submission of some keyword query. We further propose a ranking scheme with this predicted context information incorporated. Experimental evaluation demonstrates both the effectiveness of the proposed context-aware search method and the enhancement to user s search experience. 1 Introduction With the fast advance in storage technologies, disks with capacity from hundreds of gigabytes to even several terabytes are now very common in personal computers. As a result, the number of documents stored in the local file system also grows very quickly, and functionalities that support effective search for a particular document is of ever growing importance. However, achieving such kind of functionalities turns out to be quite challenging. Unlike in the case of Web search, where powerful ranking schemes such as PageRank [22] and HITS [19] can be employed, there is no structural information (i.e., hyperlinks between webpages) that can be directly used to rank documents inside personal computers in similar ways. Therefore, state-of-the-art desktop search engines, such as Google Desktop Search, usually adopt pure textbased ranking approaches (e.g., TF-IDF scores). Nonetheless, TF-IDF scores do not always work well, for the following two reasons: The same keyword query can have multiple meanings. For example, apple may stand for the name of either a fruit or a company. Michael Jordan may refer to either a basketball player or a computer science professor. TF-IDF scores only consider the frequency that the keyword appears in the documents, and cannot distinguish such ambiguities. This problem has already been extensively studied in Web search. For instance, in [21], clickthrough data is analyzed to learn user s preference over different semantics of the query. But to the best of our knowledge, there is still no previous work on distinguishing query semantics in desktop search. The goal of desktop search is to find some particular item. Even if the keyword query is not ambiguous (i.e., its meaning is unique), a document with high TF- IDF score often does not mean it is the particular item that the user is trying to find. Consider, for example, a computer science student studying machine learning can keep hundreds of related papers in his personal computer. Suppose he wants to find some recent paper about Bayesian Network that he has downloaded from the ACM digital library. Using Bayesian Network as keywords may not help since there may be still dozens of papers talking about this topic, and those

2 documents with high TF-IDF scores are most likely classic papers or surveys in this area. Note that this is quite different from Web search, in which users usually are interested in seeking relevant pieces of information (a.k.a. navigation). The goal of seeking particular items in desktop search has also been phrased as re-finding known items [12] in the literature. We observed that, the semantics of keyword queries are often highly related to the context when they are issued. For example, a computer science student John may wish to search for a related paper written by Michael Jordan when he is writing his thesis, while he may be interested in the playing history of Michael Jordan when he is reading some NBA news. While intuitively this context information is valuable in addressing the first problem listed above, it is also helpful in resolving the second one. For instance, the paper about Bayesian Network may be downloaded several days ago when John was also writing his thesis. It thus provides important evidence for us to infer that this paper is highly relevant to the context writing thesis. As a result, it is reasonable to boost the rank of this paper when the same context happens again. A natural question is then whether it is practical to leverage this kind of context information. In the case of Web search, such context information usually cannot be automatically captured by Web search engines, since due to security consideration, Web browsers are not allowed to monitor user s behavior outside the browser. Therefore, the Web browser is unaware of context information such as writing thesis, and this piece of information is lost when the query is issued. However, desktop search engines, on the other hand, do not have such security issues since everything they are concerned with is confined within user s personal computer. As a result, they can have more access privilege to the local system, and thus have higher chance to record this context information, which can then be probably used to help learn the semantics of the query. In this paper, we address the problem of improving the quality of keyword search over personal data by taking context information into consideration. To achieve this, we have developed a novel desktop search system called imecho. Figure 1 illustrates its architecture. The context here is defined as the user s activity state, such as writing a paper, navigating a website, etc., which will be referred to as task in this paper. When performing a task, the user may have a series of access events. For example, when writing a paper, the user may first open the document file, then type in some text, and finally save and close the file. Before closing the file, he may also browse some web pages and copy a couple of references into the paper. Based on this observation, we propose to infer the current task by analyzing these access events. As shown in Figure 1, in imecho, such events are recorded by various Figure 1: System Architecture user activity monitors 1. We then build a user model upon the events collected. As will be elaborated in Section 2, we use Hidden Markov Model (HMM) to capture the relationships between user s access events and (latent) tasks. The model is built and periodically updated offline. At runtime, when user submits a keyword query, his recent access events are used by the model to predict the current task. This information is finally incorporated into imecho s ranking scheme, which will promote those items that are more related to user s current task. The rest of the paper is organized as follows: In Section 2 and Section 3, we discuss the user model and contextaware ranking scheme in details, respectively. Experimental results are presented in Section 4, and related work is summarized in Section 5. We conclude the paper in Section 6. 2 User Model The user model lies in the core of imecho. We assume that at any time, the user is performing some task, and we use an empty task to address the special case when the user is idle. Under this perspective, the behavior of the user could 1 The user activity monitors are application-specific. For example, we use VSTO (Visual Studio Tools for Office) to implement monitors for Microsoft Word/Excel/Powerpoint/Outlook.

3 be viewed as a sequence of tasks. However, these tasks are latent. What could be directly observed and recorded are the access events of the user, which also consist of a sequence. It is then quite natural to make the Hidden Markov Model (HMM) [23] a good candidate for the user model, which elegantly captures the relationship between one observable sequence and one hidden sequence. Meanwhile, another basic hypothesis lying behind HMM is that the current task should only depend on its previous task, which also makes sense in most cases under the circumstances of desktop activities. An HMM can be formally represented as λ = (T, R, π, A, B), in which T is a set of tasks; R is a set of resources, e.g., files, Web pages, s, etc.; π stands for the initial probability vector of tasks, i.e., π(i) is the probability that the user is currently performing task i when no access event is observed; A denotes the T T transition matrix of tasks, i.e., A(i, j) is the transition probability from task i to task j; B is the T R association matrix between tasks and resources, i.e., B(i, j) is the probability that resource j is accessed when the current task is i. Training an HMM is a well-known process in research community. After appropriately initializing parameters π, A, B, running the Baum-Welch algorithm [25] will give us a locally optimized HMM λ = (T, R, π, A, B ), with respect to the user activity log. This optimized HMM will be used as the user model in imecho. According to previous researches (e.g., [23]), giving uniform initial estimates to the π and A parameters is adequate in almost all cases, which therefore is also done in imecho. The challenge here is to give good estimates to the B parameter. In this section, we elaborate how to estimate B. Specifically, we propose a burst-based task mining framework to establish the relationships between tasks and resources. A burst is an impulse of access events in the user activity log, which intuitively indicates the start of a new task. The whole framework is divided into two steps: i) burst detection; and ii) burst-task mapping. Next, we first describe the user activity log, which is closely related to our framework, and then illustrate each of the above two steps in details. 2.1 User Activity Log Here we briefly describe the information we recorded in the user activity log L. Logically, L can be viewed as a sequence of access events. Each event e is represented as a triple e = (r e, t e, a e ), where r e is the URI (for Uniform Resource Identifier) of the resource, t e is the time when the access event e occurs, and a e is the particular type of action. For example, the action on a Microsoft Word Figure 2: Bursts for John s one afternoon document may be Open, Close, Save, etc.. For all the application monitors we implemented, r e and t e can be precisely recorded for every e. Actually, this can be done even if we do not develop specific monitors for most applications, since nearly all modern operating systems now provide some API to notify access events, which can be used to implement a monitor overseeing the entire file system. Due to this general setting, we assume that the availability of r e and t e is independent of our specific implementation in imecho. However, a e may be specific to different applications, depending on the API exposed by the software provider. Therefore, for applications that do not have fine-granularity event notification API, we simply set a e to be the default type Access. The user model described in this paper will not rely on the availability of a e, and we will discuss some possible refinements in Section if a e is also available. 2.2 Burst Detection We first use an example to illustrate the concept of a burst. Suppose Figure 2 shows the activities in one afternoon of John, our computer science student introduced in Section 1. EXAMPLE 1. (JOHN S ONE AFTERNOON) At 1 pm John began to do some investigation on Machine Learning, he read some papers and web pages. That was the first burst of his activities. At around 1:30 he felt tired so he took a nap. There was no user activities when he was sleeping. Half an hour later he woke up and began to work on his demo program. He started his editor and opened several documents for references. At 2:30 he began to write the code. He focused on programming so he opened fewer documents at that time. At 3 he encountered some problem with a software package that was used in his code. Therefore he searched his PC and read some documents. Finally the problem was solved at 3:30. At 4 John received an important from his co-author, who found a subtle issue in their paper. John had to pause his current task by saving the code already written. He read the paper mentioned in his co-author s and got the problem. He remembered that the solution was in a web page. So he accessed Internet and found that web page. Then he replied to his co-author at 4:10. After that John made himself a cup of coffee and had some relax. Almost no activities were detected between 4:10

4 and 4:30. John continued writing his code after 4:30, and there were quite a few activities from 4:30 to 4:45. Finally he finished his work at 5. In the above example, John conducted three tasks in that afternoon: t 1 : investigation on Machine Learning, which was started at 1 and ended at 1:30. t 2 : code writing, which was started at 2 and paused at 4. At 4:30 it was resumed and finally finished at 5. t 3 : replying , which was lasted from 4:00 to 4:10. We observed that, each time a new task is started or resumed, there is a burst of activities. A burst can be part of a task, or a complete task. Previous research [18] has also shown that the appearance of a topic in a document stream is signaled by a burst of activity, with certain features rising sharply in frequency as the topic emerges. Therefore, by employing some burst detection technology, it is possible to discover tasks from the stream of access events. Formally, a burst b can be represented as a time interval b = [τ s, τ e ], where τ s and τ e are the start and end time of the burst, respectively. We use the burst detection algorithm proposed in [18]. Figure 2 also shows bursts detected by applying this algorithm 2. Here, bursts could be located at different levels, with respect to their intensiveness. The intensiveness of a burst is measured by computing the length of the time gap between successive access events within the burst. The higher level the burst is at, the higher intensiveness it has. For example, the highest level (Level 3) in Figure 2 consists of time periods involving intensive activities of John, such as writing code and replying . Due to the nature of the algorithm, any burst at level k + 1 will be completely contained in some burst at level k. As a result, these bursts can be further organized into a hierarchical structure called burst tree, as shown in Figure 3. A burst b at level k+1 becomes a child of a burst b at level k if b is contained in b. The root of the tree is actually a virtual burst that spans the whole time interval. Our next step is to identify tasks from the bursts detected. 2.3 Burst-Task Mapping As illustrated in Example 1, bursts at higher levels are usually signals of some tasks. However, the mapping from bursts to tasks is not one-toone. Rather, it is quite common that a task may contain several bursts. For example, the code writing task (t 2 ) of John contains the bursts B3a, B3b and B2c. On the other 2 Each rectangle stands for a single burst. The label of the rectangle represents the identity and the level of the burst. For example, the first burst of level 1 is marked with B1a, the second burst of level 3 is marked with B3b, and so forth. t1 B2a B3a t2 B1a Root B1b B2b B2c t2 B3b B3c B3d t2 t3 Figure 3: Burst Tree hand, it is very unlikely that a burst could belong to multiple tasks at the same time. Therefore, the mapping from bursts to tasks should be many-to-one. Formally, let B and T be the set of bursts and tasks, respectively. Equivalently, B could also be viewed as the set of nodes in the burst tree. A burst-task mapping m is a surjective function from some subset B B to T. A further comparison of Figure 2 and 3 leads to the following property of m. PROPERTY 2.1. Let Child(b) = {c 1, c 2,..., c k } be the children of node b in the burst tree, and Child(b) B. If m(c 1 ) = m(c 2 ) = = m(c k ) = t T, then b B and m(b) = t. Intuitively, Property 2.1 says that, if all children of some burst b could be mapped to the same task t, then b itself could also be mapped to task t. This is due to the observation that, some long tasks may be paused and resumed several times during the whole time period. EXAMPLE 2. (BURST-TASK MAPPING) In Figure 3, suppose B2a is mapped to t 1 ; B3a, B3b, and B3d are mapped to t 2 ; and B3c is mapped to t 3. Since B2a is the only child of B1a, according to Property 2.1, B1a could also be mapped to t 1. Similarly, B2c could be mapped to t 2, since B3d is its unique child. Property 2.1 then naturally suggests a bottom-up approach to find the burst-task mapping m. We first determine corresponding tasks for the bursts at the leaf nodes of the tree. We then climb up the tree and inductively determine the tasks for each non-leaf node, with respect to Property 2.1. The remaining problem is how to determine tasks for each leaf node. This is difficult, since at the beginning we even have no idea of how many tasks there are at all! The tasks t 1, t 2, and t 3 in Example 1 are actually the result from the task detection algorithm instead of the prior knowledge we have.

5 To overcome this issue, we need to use some information beyond merely time in the user activity log. Recall that as described in Section 2.1, the user activity log L also records the URI for each resource accessed in the event. Intuitively, if two bursts could be mapped to the same task, then it is highly likely that the sets of resources accessed in the two bursts are similar. Therefore we could identify tasks by checking the similarity of bursts. Formally, let R(b) be the set of resources accessed in the burst b 3. Two bursts b 1 and b 2 are said to be similar, denoted as Sim(b 1, b 2 ), if s(r(b 1 ), R(b 2 )) > δ, where s(a, B) is some similarity function computing the similarity of the sets A and B, and δ is some prespecified threshold. Any reasonable similarity function could be used. In our implementation, we define (2.1) s(a, B) = A B max( A, B ), and set δ = 0.5. Algorithm 1 summarizes the task mining framework discussed so far. Line 3 to 5 first mark all m(b) as Nil, i.e., the mapping from burst b to some task is not defined yet. Line 6 to 14 try to determine the mapping for leaf nodes of the burst tree, while Line 15 to 20 try to determine the mapping for non-leaf nodes, with respect to Property 2.1. When determining the mapping for a leaf node b, we check whether there is some other leaf burst b that is similar to b and has already been mapped to some task t (line 8). If such a b exists, we simply map b to t as well (line 9), otherwise we map b to a new task (line 11). Finally, we obtain tasks by merging corresponding bursts (line 21 to 30). A burst b is considered as (part of) some task if: i) m(b) is determined (i.e., m(b) Nil); and ii) m(p arent(b)) is not determined (i.e., m(p arent(b)) = Nil), where P arent(b) is the parent burst of b in the burst tree. Figure 3 also shows the bursts that will be treated as (part of) some task in the end (circled by the red ellipses), after running Algorithm 1 on the burst tree. The sets of resources of these bursts will be merged to form the set of resources of the corresponding task (line 25). In Figure 3, B1a will be merged to t 1 ; B3a, B3b and B2c will be merged to t 2 ; and B3c will be merged to t 3. The set of tasks T returned by Algorithm 1 is then used to initialize the HMM λ = (T, R, π, A, B), where R = t T R(t). Suppose t T. For any resource r R, if r R(t), we define p(r t) = n(r, t)/n(t), where n(r, t) 4 represents the number of accesses to r in task t, and n(t) = r R(t) n(r, t). If r R(t), we simply set p(r t) = 0. This completes the initialization of the parameter B, and we have discussed the initialization of the parameters π and A at the beginning of Section 2. 3 We also use R(t) to denote the set of resources accessed in the task t. 4 n(r, t) is obtained by summing up the number of events involving r in the corresponding bursts that are mapped to t. Algorithm 1: Task Mining Input: L: the user activity log Output: T : the set of tasks 1 /* Let T be the burst tree. */ 2 T BurstDetection(L) 3 foreach b Nodes(T ) do 4 m(b) Nil 5 end 6 i 1 7 foreach b Leafs(T ) do 8 if b Leafs(T ) s.t. Sim(b, b ) and m(b ) = t then 9 m(b) t 10 else 11 m(b) t i 12 i i end 14 end 15 foreach b NonLeafs(T ) do 16 /* Let Child(b) = {c 1, c 2,..., c k }. */ 17 if m(c 1 ) = m(c 2 ) = = m(c k ) = t then 18 m(b) t 19 end 20 end 21 T 22 foreach b Nodes(T ) do 23 if m(b) Nil and m(p arent(b)) = Nil then 24 if m(b) T then 25 R(m(b)) R(m(b)) R(b) 26 else 27 T T {CreateT ask(b)} 28 end 29 end 30 end 31 return T 2.4 Discussion We discuss two related issues in this section, namely, i) possible refinements to tasks and ii) updates to the user model Task Refinements If more information is available in the user activity log, some possible refinements of the tasks could be performed. Here we only discuss the case when the action type a e of an event e is known, since it is true in imecho (although it is not required in the framework discussed in this section). Since we use some similarity function to determine whether we should merge two bursts according to the overlap of their sets of resources, it is likely that sometimes two bursts not merged could actually be mapped to the same task. The semantic information of action types may help in this situation. For example, if we find that

6 resource r (e.g., some Microsoft Word document) is opened in burst b 1, but is never closed until burst b 2, where b 1 and b 2 are successive, then b 1 and b 2 are likely to be mapped to the same task, even if their similarity fails to pass the threshold δ Model Update The user model is trained using the user activity log. However, the access history will evolve over time, and as a result, new tasks beyond those used in the training stage will appear as time goes by. An immediate question is then how often the model should be updated. It is hard to imagine that a particular task would be repeated one year later. On the other hand, if we update the model every day, effective predication will then be difficult to achieve, since only limited number of repeated tasks could be observed. Therefore the time interval between two updates should be neither too long nor too short. We argue that one week may be a good choice and this setting is currently used in imecho, although the user is allowed to change it freely. The reason is simply that people usually organize their work schedule week by week. A related question is whether there is certain number of repeated tasks during a week. Actually, according to the user study in [12], about 40% tasks are reperformed by the user within a week. 3 Context-aware Ranking Based on the user model introduced in the previous section, we propose a context-aware ranking framework. Top-ranked items then stand for desktop resources that are most likely to be accessed next by the user at the time when he submits the query. In this section, we first formalize the context-aware ranking problem, and then propose our ranking function. 3.1 Problem Definition Suppose user submits a keyword query q at time k. Fix some length l 5 such that 0 l k, and let L k be the snippet of the user activity log consisting of events occurring in the time window [k l + 1, k]. The context-aware ranking problem is defined as follows: DEFINITION 1. (CONTEXT-AWARE RANKING) Given the user model λ = (T, R, π, A, B), the log snippet L k, and the keyword query q, the context-aware ranking problem is to find some function f : R R + {0} such that for any r 1, r 2 R, f(r 1 ) > f(r 2 ) if and only if r 1 is more relevant than r 2, with respect to λ, q and L k. Here R + is the set of all positive real numbers. 3.2 Ranking Function Without loss of generality, we could require that 0 f 1. It is then quite natural to 5 We set l = 10 in imecho by default, which is adjustable. let f be the conditional probability p(r q, λ, L k ). We have (3.2) p(r q, λ, L k ) = p(r, q, λ, L k) p(q, λ, L k ) t T = p(r, q, t, λ, L k). p(q, λ, L k ) Since p(r, q, t, λ, L k ) = p(r, q t, λ, L k ) p(t λ, L k ) p(λ, L k ), and p(q, λ, L k ) = p(q λ, L k ) p(λ, L k ), we then have (3.3) p(r q, λ, L k ) = t T What s more, since (3.4) p(r, q t, λ, L k ) = p(r, q, t, λ, L k) p(t, λ, L k ) p(r, q t, λ, L k ) p(t λ, L k ). p(q λ, L k ) = p(q, t λ, L k, r) p(λ, L k, r), p(t, λ, L k ) and we assume that given λ, L k and r, q and t are independent of each other, therefore (3.5) p(q, t λ, L k, r) = p(q λ, L k, r) p(t λ, L k, r). Substituting Eq. (3.4) and (3.5) into Eq. (3.3), and notice the fact that we obtain (3.6) p(r t, λ, L k ) = p(t λ, L k, r) p(λ, L k, r), p(t, λ, L k ) p(r q, λ, L k ) = t T p(q λ, L k, r) p(r t, λ, L k ) p(t λ, L k ). p(q λ, L k ) Clearly, q is independent of λ and L k. Thus (3.7) p(q λ, L k ) = p(q). On the other hand, (3.8) p(q λ, L k, r) = p(q, λ, L k, r) p(λ, L k, r) = p(q, λ, L k r) p(λ, L k r), and again due to the independence of q with λ and L k, (3.9) p(q, λ, L k r) = p(q r) p(λ, L k r). Therefore, by substituting Eq. (3.7), (3.8), and (3.9) into Eq. (3.6), we get (3.10) p(r q, λ, L k ) = p(q r) p(q) [p(r t, λ, L k ) p(t λ, L k )]. t T

7 Since p(q r) p(r) = p(r q) p(q), we finally have (3.11) p(r q, λ, L k ) = p(r q) p(r) [p(r t, λ, L k ) p(t λ, L k )]. t T Eq. (3.11) can be interpreted quite naturally. p(r q) could be viewed as the relevance of r and q when context information is not considered, while p(r) is the prior probability that r is accessed even without the query q. The summation in Eq. (3.11) is the relevance of r and the current context information of the user, namely, the probability that user may next access r with respect to the user model λ and the most recent access history L k. For notational convenience, we now define (3.12) f 1 (r) = p(r q) p(r), and (3.13) f 2 (r) = t T[p(r t, λ, L k ) p(t λ, L k )]. As a result, f(r) = p(r q, λ, L k ) could be simply rewritten as (3.14) f(r) = f 1 (r) f 2 (r). However, directly using Eq. (3.14) suffers some problems. First, f 1 (r) and f 2 (r) may have different magnitude scales. Second, user may desire some control over f(r), by making the context-free part (i.e., f 1 (r)) and the contextaware part (i.e., f 2 (r)) tunable. We thus introduce some tuning factor α (0 α 1), and refine f(r) to be (3.15) f(r) = f 1 (r) 1 α f 2 (r) α. In our implementation, given λ = (T, R, π, A, B), L k, and q, to compute f 1 (r), we set p(r q) = s(r, q), where s(r, q) represents the cosine similarity score (i.e., TF-IDF score) between r and q, which is widely used in term-vector based information retrieval models. s(r, q) could be directly obtained by querying the full-text index in imecho, which is implemented by using the Lucene 6 library (See Figure 1). 1 p(r) is simply set to be R, i.e., we do not have prior knowledge of user s preference over resources, which is hard to measure. In practice, we could even set p(r) = 1 since it does not affect the ranking result, and we then simply have f 1 (r) = s(r, q). To compute f 2 (r), we first get the current task t now by applying the well-known Viterbi algorithm [23] to λ and L k. We then compute each summand in f 2 (r) by setting p(t λ, L k ) = p(t t now ), which could be obtained from A, and setting p(r t, λ, L k ) = p(r t), which could be obtained from B Experimental Evaluations We now experimentally evaluate our context-aware search approach from two aspects: i) the search overhead, and ii) the effectiveness of the context-aware ranking. 4.1 Experiment Design As pointed out in [12], evaluation of personal search systems is an extremely difficult task, due to the privacy issues concerning personal information. Meanwhile, incorporating tasks into the evaluation makes the problem even harder, since different people may understand the concept of task in slightly different ways. For instance, in Example 1, some people may not treat t 3 as a task. To overcome the privacy issue, we developed a logging module (different from the application monitors and user activity log) in imecho to record the query history of the user, which could be browsed by the user in the query viewer (see Figure 1). To overcome the multiple interpretation problem of tasks, we have to do some individual user studies. We invited 10 volunteers to take part in the experiment. Each of the participants installs imecho on his own personal computer. We first help them be familiar with imecho s functionality and user interface. Each user then uses imecho s indexing module to index the resources (files, s, etc.) that are expected to be related to the user s daily work in the following week. imecho will also automatically update the index when new resources appear in the directories specified by the user. The experiment lasts for one week. The user activity log of the first three days is used to train the model, and the system is tested in the later two days. During the training phase, the participants just perform their daily work as usual. In the testing phase, participants are encouraged to submit 10 queries to imecho when they want to search some resource in their computer. For each query submitted, imecho s logging module will record the information of the query, the set of resources retrieved and their rankings, and also user s click actions on these resources. To see the effect of the tuning factor α in the ranking function, we have to ask participants to adjust α to be 0, 0.2, 0.5, and 0.8, respectively, and check ranking results for each of these rankings. By setting α = 0, we actually ignore the context-aware part in the ranking function, and in this case the items are purely ranked by their TF-IDF scores (i.e., f(r) = f 1 (r) = s(r, q)). After some anonymization (e.g., replacing URI s by integers), such information is collected to analyze the effectiveness of the proposed ranking framework. 4.2 Performance Evaluation Due to the diversity of the configuration of participants personal computers and the privacy issue, here we only report some performance statistics on a typical laptop with 2.0 GHz Intel Dual Core CPU and 2GB main memory, which is used by one of the authors (not in the 10 participants).

8 Figure 4: Comparison of the Average Overhead Data Set. The data set contains 9,431 desktop files in 1,019 directories. The average directory depth is 9 with the longest being 15. On average, directories contain 10.3 subdirectories and files, with the largest containing 241 ones. 75% of the files are smaller than 16 KB, and 95% of the files are smaller than 40 KB. The largest file is of size 21.5MB. The user log produced by the event monitor records totally 1,601 desktop events. Since the user s context is computed at the query time, there is some overhead for query context estimation. We compare the overhead with the search cost of Lucene. The results are given in Figure 4. In the case of desktop search, it is not common that the number of search results will grow to above 500. Therefore the overhead introduced by the context-aware search module is trivial (less than 10%) comparing to overall search costs. 4.3 Effectiveness Evaluation We next compare the effectiveness of our context-aware ranking scheme with the ranking scheme based on TF-IDF score (i.e., ranking scheme used by Lucene and other commercial desktop systems). Our evaluation consists of two parts. First, we present a case study of some user s search experience with imecho, and the information released here is upon his agreement. Second, we analyze the feedbacks (i.e., clickthrough history recorded by imecho s logging module) on the search results from all the 10 participants, to give quantitative comparisons between the two ranking schemes Case Study Participant A issued the query PageRank model twice in the testing phase of the experiment when he was performing two different tasks: Task 1: To implement the PageRank model in his program. Task 2: To compare different ranking models. Though the query terms are the same, they were used for quite different purposes. His first task was to write some code that implements the PageRank algorithm, while the second task was to compare PageRank with other ranking models like HITS. The ranking results of the two queries are given in Table 1. To save space, only the top 5 results are shown in each case. The first row gives the results ranked by pure TF- IDF scores got from Lucene (i.e., by setting α = 0), while imecho s results (by setting α = 0.8) under Task 1 and 2 are shown in the second and third row, respectively. Both the final score (i.e., f(r)) and the context-aware part of the score (i.e., f 2 (r)) are shown in the last column of Table 1. Here are some explanations about the results. The ranking in the second row (when the user was in Task 1) is exactly the same as the ranking returned from Lucene. This is because no results were ever accessed by the user when the query was issued, which means these resources are not closely relevant to Task 1. But when the user was in Task 2, according to Table 1, the context-aware scores (f 2 (r)) of some documents are increased although their context-free scores (f 1 (r)) are relatively low. User A recalled that he accessed these promoted documents when he was in Task 2. The above case study shows that taking context information into consideration could sometimes better serve the user queries by correctly predicting user s search intention. Traditional desktop search engines will return the same results under both Task 1 and Task 2, which are not what the user wants in both cases Quantitative Comparison We next quantitatively compare the effectiveness of the two ranking schemes, namely, the ranking scheme based on pure TF-IDF scores and the one by also considering context information. Precision and Recall We first apply the well-known measures precision and recall to evaluate ranking quality. In general, precision measures the ability of a system to return only relevant results. It is defined as: Precision = # of relevant results # of results returned by the scheme. On the other hand, recall measures the ability of a system to return all relevant results, and is defined as: Recall = # of relevant results # of relevant results returned by both schemes. The definition of recall here follows [9], since in general it is extremely difficult to measure the total number of

9 Table 1: The ranking results of User A Ranking Scheme Ranking Result List f 1 (r) f(r) / f 2 (r) Jena-2.5.5/doc/javadoc/com/hp/hpl/jena/rdf/model/Model.html openrdf-sesame-2.1.3/docs/system/ch03.html Lucene (TF-IDF) openrdf-sesame-2.1.3/docs/system/ch06.html i.e. α = 0 Jena-2.5.5/doc/images/Ont-model-layers-import.png N/A Jena-2.5.5/doc/images/Ont-model-layers.png imecho (T1) α = 0.8 imecho (T2) α = 0.8 Jena-2.5.5/doc/javadoc/com/hp/hpl/jena/rdf/model/Model.html / openrdf-sesame-2.1.3/docs/system/ch03.html / openrdf-sesame-2.1.3/docs/system/ch06.html / Jena-2.5.5/doc/images/Ont-model-layers-import.png / Jena-2.5.5/doc/images/Ont-model-layers.png / The PageRank Citation Ranking- Bringing Order to the Web (1998).pdf / Inside PageRank.pdf / RandomWalks.ppt / Jena-2.5.5/doc/javadoc/com/hp/hpl/jena/rdf/model/Model.html / openrdf-sesame-2.1.3/docs/system/ch03.html / Average Precision Average Recall Lucene HMM Rank (a) Average precision Lucene HMM Rank (b) Average recall Figure 5: Comparison of average precision and recall relevant resources to the query in the entire system. Instead, we define this set of relevant resources to be the union of the results treated as relevant by the user under each ranking scheme. For each query, we get the top 10 results returned by each ranking scheme. The results that are actually clicked by the users are marked as relevant. We then compute average precision and recall over the 100 queries in total for both ranking schemes. Figure 5 compares the average precision and recall of top-k results returned by both ranking schemes, where k increases from 1 to 10. For the HMM based context-aware ranking scheme used in imecho, the statistics are computed over the results returned when setting α = 0.5. We can see that the context-aware ranking method outperforms the TF-IDF based ranking method in terms of both precision and recall. MRR and Top-k% Clicks We then use the Mean Reciprocal Rank 7 (MRR for short) and the Top-k% Clicks as the metrics. In our scenario, the reciprocal rank of a query is defined as the reciprocal of the rank of the first relevant result. The mean reciprocal rank is then the average of the reciprocal ranks over all the queries. Formally, supposing there are N queries q 1,..., q N and the rank of user s first choice in the result list of q i is r i, MRR is computed as: MRR = 1 N N i=1 1 r i. The top-k% clicks metric denotes the percentage of the user s clicks that fall into the top k% results returned by the ranking algorithm. For example, if the user chooses the first and third result out of the ten retrieved results, the top-10% and top-20% clicks are 0.5, while the top-30% clicks (and 7 reciprocal rank

10 also top-k% clicks with k > 30) is 1. Formally, we define (a) MRR Top-k% Clicks = # of clicks in top k% of the results. # of total clicks Intuitively, MRR and top-k% clicks measure the user s preference over the top-ranked results. Figure 6(a) shows the MRR of imecho (α = 0.5) and Lucene, respectively. It is clear that imecho s ranking results are more favorable by the users. The average top-k% clicks of imecho and Lucene are shown in Figure 6(b). We can see that imecho s top-10% clicks is almost 2 times higher than that of Lucene. We thus conclude that top ranked results in imecho are more preferable than those in Lucene. Impact of the tuning factor α Average Precision Average Recall (b) TOPK Figure 6: MRR and Top k% Clicks =0.2 =0.5 = Rank (a) Average precision =0.2 =0.5 = Rank (b) Average relative recall Figure 7: Average precision and recall with different α s We further investigate the impact of the tuning factor α to the ranking results in imecho. As shown in Figure 7, when α increases, i.e., the weight of context-aware part of the score is increased, the precision is significantly improved while the recall is also slightly improved as well. This implies that many times, our ranking scheme is successful in predicting user s search intention when the query is issued. 5 Related Work In our previous work [8] and [6], we developed XSearcher (renamed as the current name imecho in [6]), an associative memory based desktop search system to enhance traditional keyword-based desktop search. However, the goal of XSearcher is quite different from the work described in this paper. XSearcher focuses on helping users in finding resources accessed a long time ago. The basic idea is to establish semantic associations among resources. For example, a document can be associated with an if it once served as an attachment of that . The key insight here is that, the associations among resources can play the same role as hyperlinks among Web pages, and ranking algorithms akin to PageRank could then be applied in desktop search. In this way, it is possible for users to find relevant items that may not contain the keywords in the issued query, as long as they are reachable by following paths starting from some resources hit by the keywords. Nonetheless, the approach used in that work cannot solve the two problems highlighted in Section 1, and the work in this paper can be viewed as complementary to that work, which helps users better seeking resources recently accessed by considering the context information. Figure 8 illustrates the big picture of our work on personal search. The effectiveness of the imecho system by leveraging the context-aware search framework introduced in this paper has also been shown in our very recent demo [7]. Many research prototypes and commercial systems have been developed to support various forms of search on per-

11 Figure 8: The big picture of our work on personal search sonal desktop resources. The most prominent desktop search applications for Windows include Google desktop search and Microsoft Windows Desktop Search, and the Beagle open source project for Linux [1]. Apple Inc. also integrated an advanced desktop search application (named Spotlight Search) into their operating system, Mac OS Tiger. All these industrial desktop search engines support search over a variety of file types, yet none of them incorporates personalization, i.e. no user preferences are discovered to provide personalized results. In addition, there are currently only limited insights into the question of how to rank desktop search results. When ranking is available in some of these applications, it is usually performed according to some variations of TF-IDF, a textual-relevance-based criterion used in classic information retrieval. Some PIM systems have been constructed in order to facilitate re-finding of various stored resources on the desktop. Stuff I ve Seen [11] for example provides a unified index of the data that a person has seen on his computer, regardless of its type. Based on the fact that the user has already seen the information, contextual cues such as time, author, thumbnails and previews can be used to search for and present information. Similarly, MyLifeBits [13] targets storing locally all digital media of each person, including documents, images, audio and videos. They organize these data into collections and connect related resources with links. Haystack [16] emphasizes the relationship between a particular individual and her corpus. It automatically creates connections between documents with similar content and it exploits usage analysis to extend the desktop search results set. Feldspar [5] is a linkbased desktop search prototype with new interface. Users can propose associative queries via a well-designed interface rather than simple keywords. But user activities are not tracked in Feldspar for associations and no activity-based associations are mined from user access patterns. Semex [10] is also a link-based personal information system. It employs a fancy reference reconciliation algorithm to integrate data from different sources and construct content-based associations by extracting metadata. Beagle++ [9] is a semantic desktop search prototype, which proposed various activity specific heuristics to generate links between resources that associate desktop resources. Beagle++ also logs user activities, such as attachment saving and file downloading, to generate associations, but it only focuses on associations from predefined user actions between web pages, messages and files, not mining from a sequence of user access activities. Their approach was also limited to specific desktop contexts (e.g., publications, or web pages), whereas in our methods we explore much more general information sources such as file access patterns, which are applicable to any desktop resource. Unlike traditional one-size-fits-all search engines, personalized search systems attempt to take into account preferences of individual users in order to improve the relevance of search results and the overall retrieval experience. Personalization techniques have been developed in diversified ways for web search. In a nutshell, the techniques can be classified into three categories, namely, content based personalization, link-based personalization, and function-based personalization [17]. Content-based personalization deals with the relevance measure of Web pages and the user s queries. In this approach, the query is modified to adapt the search results for the specific user. In order to manage user interests, a content-based personalization technique is used to construct user profiles, which store user interests derived from each user s search history [20]. Link-based personalization performs personalization based on link analysis techniques. Traditional link analysis techniques, like the PageRank algorithm, compute scores that reflect a democratic importance with no preferences in regard to any particular pages. However, in reality, a user may have a set of preferred pages in mind. The link-based personalized searching techniques redefine the importance of Web pages according to different users preferences such as bookmarks as a set of preferred pages [14]. The function-based personalization first discovers user preferences on the search results from clickthrough data and then the ranking function is optimized according to the discovered preferences [15, 21]. However, these personalized techniques still miss contextual information often resulting or inferable from explicit and implicit user activities. For the personalization in desktop search, the desktop environment is comparably limited in the sense that we will be able to describe most relevant contexts more easily. It is possible to obtain the complete trace of user activity on his desktop, and therefore his accurate interests, goals, and preferences can be discovered for the contextaware search.

12 Context-aware search has been mainly presented in web search for query suggestions [4] and query classification [2]. Search contexts from users search or browsing logs are effective for disambiguating Web queries and can help improve the quality of multiple search services [20]. However, in their approaches, only clickthrough information is considered as an important part of context. They also take little consideration of the user behavior at the time of querying. Though a recent work [3] also applied the HMM model in context-aware search, it only focused on Web search by considering only the query strings and simple clickthrough data. In comparison, we exploit the task context by analyzing the access events in the user activity log and use the Hidden Markov Model to capture the relationships between tasks and resources, which models the user s behavior at a higher level of semantics. In [24], besides clickthrough information, the authors also propose to treat preceding queries as implicit user feedback when considering the ranking of the documents. Nonetheless, the study was still confined in the sense of Web search, and it could be an interesting future work for imecho to also incorporate preceding user queries as additional context information. 6 Conclusion In this paper, we propose an HMM based context-aware search approach to help users improve search experience on personal desktop systems. The user model is built by analyzing the access events recorded in the user activity log, which captures the semantic relationship between user behaviors (tasks in this paper) and resources. In particular, we propose a task mining algorithm to detect tasks from the log, which are then used to initialize the user model. Based on this model, we further propose a novel ranking scheme by taking the context information into consideration when answering user queries. The proposed framework has been implemented in the imecho system, and the experimental evaluation shows that many times imecho could outperform traditional TF-IDF based ranking schemes. References [1] Gnome beagle desktop search. [2] H. Cao, H. Hu, D. Shen, D. Jiang, J. Sun, E. Chen, and Q. Yang. Context-aware query classification. In SIGIR, [3] H. Cao, D. Jiang, J. Pei, E. Chen, and H. Li. Towards contextaware search by learning a very large variable length hidden markov model from search logs. In WWW, [4] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through and session data. In KDD, [5] D. Chau, B. Myers, and A. Faulring. What do do when search fails:finding information by association. In CHI, [6] J. Chen, H. Guo, W. Wu, and W. Wang. imecho: an associative memory based desktop search system. In CIKM, pages , [7] J. Chen, H. Guo, W. Wu, and W. Wang. imecho: a contextaware desktop search system. In SIGIR, pages , [8] J. Chen, H. Guo, W. Wu, and C. Xie. Search your memory! - an associative memory based desktop search system. In ACM SIGMOD, [9] P. Chirita, S. Ghita, W. Nejdl, and R. Paiu. Beagle++: Semantically enhanced searching and ranking on the desktop. In ESWC, [10] X. Dong and A. Halevy. A platform for personal information management and integration. In CIDR, [11] S. Dumais, E. Cutrell, J. Cadiz, G. Jancke, R. Sarin, and D. Robbins. Stuff i ve seen: a system for personal information retrieval and re-use. In SIGIR, [12] D. Elsweiler and I. Ruthven. Towards task-based personal information management evaluations. In Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pages 23 30, [13] J. Gemmell, G. Bell, R. Lueder, S. Drucker, and C. Wong. Mylifebits: fulfilling the memex vision. In Proceedings of the ACM Conference on Multimedia, [14] T. Haveliwala. Topic-sensitive pagerank. In WWW, [15] T. Joachims. Optimizing search engines using clickthrough data. In ACM SIGKDD, [16] D. Karger. Haystack: A customizable general-purpose information management tool for end users of semistructured data. In CIDR, [17] Y. Ke, L. Deng, W. Ng, and D. L. Lee. Web dynamics and their ramifications for the development of web search engines. Comput. Netw. J. (Special Issue on Web Dynamics), 50: , [18] J. Kleinberg. Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4): , Octomber [19] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5): , [20] F. Liu, C. Yu, and W. Meng. Personalized web search for improving retrieval effectiveness. IEEE Trans. Knowl. Data Eng., 16:28 40, [21] W. NG, L. Deng, and D. Lee. Mining user preference using spy voting for search engine personalization. ACM Transactions on Internet Technology, 7(4), [22] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bring order to the web. Technical report, Stanford University, [23] L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): , February [24] X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In SIGIR, pages 43 50, [25] L. Welch. Hidden markov models and the baum-welch algorithm. IEEE Information Theory Society Newsletter, 53(4):1 13, 2003.

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

How To Cluster On A Search Engine

How To Cluster On A Search Engine Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

A UPS Framework for Providing Privacy Protection in Personalized Web Search

A UPS Framework for Providing Privacy Protection in Personalized Web Search A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer. REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

More information

A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH

A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH International Journal of Computer Science and System Analysis Vol. 5, No. 1, January-June 2011, pp. 37-43 Serials Publications ISSN 0973-7448 A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED

More information

LDA Based Security in Personalized Web Search

LDA Based Security in Personalized Web Search LDA Based Security in Personalized Web Search R. Dhivya 1 / PG Scholar, B. Vinodhini 2 /Assistant Professor, S. Karthik 3 /Prof & Dean Department of Computer Science & Engineering SNS College of Technology

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Web Database Integration

Web Database Integration Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Subordinating to the Majority: Factoid Question Answering over CQA Sites Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

Profile Based Personalized Web Search and Download Blocker

Profile Based Personalized Web Search and Download Blocker Profile Based Personalized Web Search and Download Blocker 1 K.Sheeba, 2 G.Kalaiarasi Dhanalakshmi Srinivasan College of Engineering and Technology, Mamallapuram, Chennai, Tamil nadu, India Email: 1 sheebaoec@gmail.com,

More information

Coveo Platform 7.0. Microsoft Dynamics CRM Connector Guide

Coveo Platform 7.0. Microsoft Dynamics CRM Connector Guide Coveo Platform 7.0 Microsoft Dynamics CRM Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds to changing

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Automatic Advertising Campaign Development

Automatic Advertising Campaign Development Matina Thomaidou, Kyriakos Liakopoulos, Michalis Vazirgiannis Athens University of Economics and Business April, 2011 Outline 1 2 3 4 5 Introduction Campaigns Online advertising is a form of promotion

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

One of the fundamental kinds of Web sites that SharePoint 2010 allows

One of the fundamental kinds of Web sites that SharePoint 2010 allows Chapter 1 Getting to Know Your Team Site In This Chapter Requesting a new team site and opening it in the browser Participating in a team site Changing your team site s home page One of the fundamental

More information

Usage Analysis Tools in SharePoint Products and Technologies

Usage Analysis Tools in SharePoint Products and Technologies Usage Analysis Tools in SharePoint Products and Technologies Date published: June 9, 2004 Summary: Usage analysis allows you to track how websites on your server are being used. The Internet Information

More information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

More information

Optimization of Image Search from Photo Sharing Websites Using Personal Data

Optimization of Image Search from Photo Sharing Websites Using Personal Data Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search

More information

Managing DICOM Image Metadata with Desktop Operating Systems Native User Interface

Managing DICOM Image Metadata with Desktop Operating Systems Native User Interface Managing DICOM Image Metadata with Desktop Operating Systems Native User Interface Chia-Chi Teng, Member, IEEE Abstract Picture Archiving and Communication System (PACS) is commonly used in the hospital

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College

More information

Evaluator s Guide. PC-Duo Enterprise HelpDesk v5.0. Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved.

Evaluator s Guide. PC-Duo Enterprise HelpDesk v5.0. Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved. Evaluator s Guide PC-Duo Enterprise HelpDesk v5.0 Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved. All third-party trademarks are the property of their respective owners.

More information

CYCLOPE let s talk productivity

CYCLOPE let s talk productivity Cyclope 6 Installation Guide CYCLOPE let s talk productivity Cyclope Employee Surveillance Solution is provided by Cyclope Series 2003-2014 1 P age Table of Contents 1. Cyclope Employee Surveillance Solution

More information

What to Mine from Big Data? Hang Li Noah s Ark Lab Huawei Technologies

What to Mine from Big Data? Hang Li Noah s Ark Lab Huawei Technologies What to Mine from Big Data? Hang Li Noah s Ark Lab Huawei Technologies Big Data Value Two Main Issues in Big Data Mining Agenda Four Principles for What to Mine Stories regarding to Principles Search and

More information

Dynamics of Genre and Domain Intents

Dynamics of Genre and Domain Intents Dynamics of Genre and Domain Intents Shanu Sushmita, Benjamin Piwowarski, and Mounia Lalmas University of Glasgow {shanu,bpiwowar,mounia}@dcs.gla.ac.uk Abstract. As the type of content available on the

More information

DYNAMIC QUERY FORMS WITH NoSQL

DYNAMIC QUERY FORMS WITH NoSQL IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH

More information

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

More information

Mirtrak 6 Powered by Cyclope

Mirtrak 6 Powered by Cyclope Mirtrak 6 Powered by Cyclope Installation Guide Mirtrak Activity Monitoring Solution v6 is powered by Cyclope Series 2003-2013 Info Technology Supply Ltd. 2 Hobbs House, Harrovian Business Village, Bessborough

More information

Computing and Communications Services (CCS) - LimeSurvey Quick Start Guide Version 2.2 1

Computing and Communications Services (CCS) - LimeSurvey Quick Start Guide Version 2.2 1 LimeSurvey Quick Start Guide: Version 2.2 Computing and Communications Services (CCS) - LimeSurvey Quick Start Guide Version 2.2 1 Table of Contents Contents Table of Contents... 2 Introduction:... 3 1.

More information

Identifying Best Bet Web Search Results by Mining Past User Behavior

Identifying Best Bet Web Search Results by Mining Past User Behavior Identifying Best Bet Web Search Results by Mining Past User Behavior Eugene Agichtein Microsoft Research Redmond, WA, USA eugeneag@microsoft.com Zijian Zheng Microsoft Corporation Redmond, WA, USA zijianz@microsoft.com

More information

Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

More information

Stop searching, find your files and emails fast.

Stop searching, find your files and emails fast. Stop searching, find your files and emails fast. Copernic Desktop Search Home allows individuals to instantly search their files, e-mails, and e-mail attachments stored anywhere on their PC's hard drive.

More information

Research Statement Immanuel Trummer www.itrummer.org

Research Statement Immanuel Trummer www.itrummer.org Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Supporting Privacy Protection in Personalized Web Search

Supporting Privacy Protection in Personalized Web Search Supporting Privacy Protection in Personalized Web Search Kamatam Amala P.G. Scholar (M. Tech), Department of CSE, Srinivasa Institute of Technology & Sciences, Ukkayapalli, Kadapa, Andhra Pradesh. ABSTRACT:

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

ELPUB Digital Library v2.0. Application of semantic web technologies

ELPUB Digital Library v2.0. Application of semantic web technologies ELPUB Digital Library v2.0 Application of semantic web technologies Anand BHATT a, and Bob MARTENS b a ABA-NET/Architexturez Imprints, New Delhi, India b Vienna University of Technology, Vienna, Austria

More information

ATLAS.ti for Mac OS X Getting Started

ATLAS.ti for Mac OS X Getting Started ATLAS.ti for Mac OS X Getting Started 2 ATLAS.ti for Mac OS X Getting Started Copyright 2014 by ATLAS.ti Scientific Software Development GmbH, Berlin. All rights reserved. Manual Version: 5.20140918. Updated

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

An Efficient Algorithm for Web Page Change Detection

An Efficient Algorithm for Web Page Change Detection An Efficient Algorithm for Web Page Change Detection Srishti Goel Department of Computer Sc. & Engg. Thapar University, Patiala (INDIA) Rinkle Rani Aggarwal Department of Computer Sc. & Engg. Thapar University,

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

IQ MORE / IQ MORE Professional

IQ MORE / IQ MORE Professional IQ MORE / IQ MORE Professional Version 5 Manual APIS Informationstechnologien GmbH The information contained in this document may be changed without advance notice and represents no obligation on the part

More information

The Best of Both Worlds Sharing Mac Files on Windows Servers

The Best of Both Worlds Sharing Mac Files on Windows Servers The Best of Both Worlds Sharing Mac Files on Windows Servers March, 2008 1110 North Glebe Road Suite 450 Arlington, VA 22201 phone: 800.476.8781 or +1.703.528.1555 fax: +1.703.527.2567 or +1.703.528.3296

More information

Make search become the internal function of Internet

Make search become the internal function of Internet Make search become the internal function of Internet Wang Liang 1, Guo Yi-Ping 2, Fang Ming 3 1, 3 (Department of Control Science and Control Engineer, Huazhong University of Science and Technology, WuHan,

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Visualizing the Top 400 Universities

Visualizing the Top 400 Universities Int'l Conf. e-learning, e-bus., EIS, and e-gov. EEE'15 81 Visualizing the Top 400 Universities Salwa Aljehane 1, Reem Alshahrani 1, and Maha Thafar 1 saljehan@kent.edu, ralshahr@kent.edu, mthafar@kent.edu

More information

Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt>

Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt> Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.

More information

Using XACML Policies as OAuth Scope

Using XACML Policies as OAuth Scope Using XACML Policies as OAuth Scope Hal Lockhart Oracle I have been exploring the possibility of expressing the Scope of an OAuth Access Token by using XACML policies. In this document I will first describe

More information

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, 2015. Version 4.0

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, 2015. Version 4.0 NS DISCOVER 4.0 ADMINISTRATOR S GUIDE July, 2015 Version 4.0 TABLE OF CONTENTS 1 General Information... 4 1.1 Objective... 4 1.2 New 4.0 Features Improvements... 4 1.3 Migrating from 3.x to 4.x... 5 2

More information

Dissecting the Learning Behaviors in Hacker Forums

Dissecting the Learning Behaviors in Hacker Forums Dissecting the Learning Behaviors in Hacker Forums Alex Tsang Xiong Zhang Wei Thoo Yue Department of Information Systems, City University of Hong Kong, Hong Kong inuki.zx@gmail.com, xionzhang3@student.cityu.edu.hk,

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide 1 www.microsoft.com/sharepoint The information contained in this document represents the current view of Microsoft Corporation on the issues

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

How much can Behavioral Targeting Help Online Advertising? Jun Yan 1, Ning Liu 1, Gang Wang 1, Wen Zhang 2, Yun Jiang 3, Zheng Chen 1

How much can Behavioral Targeting Help Online Advertising? Jun Yan 1, Ning Liu 1, Gang Wang 1, Wen Zhang 2, Yun Jiang 3, Zheng Chen 1 WWW 29 MADRID! How much can Behavioral Targeting Help Online Advertising? Jun Yan, Ning Liu, Gang Wang, Wen Zhang 2, Yun Jiang 3, Zheng Chen Microsoft Research Asia Beijing, 8, China 2 Department of Automation

More information

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Index Terms Domain name, Firewall, Packet, Phishing, URL. BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet

More information

Revenue Optimization with Relevance Constraint in Sponsored Search

Revenue Optimization with Relevance Constraint in Sponsored Search Revenue Optimization with Relevance Constraint in Sponsored Search Yunzhang Zhu Gang Wang Junli Yang Dakan Wang Jun Yan Zheng Chen Microsoft Resarch Asia, Beijing, China Department of Fundamental Science,

More information

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

Managing large sound databases using Mpeg7

Managing large sound databases using Mpeg7 Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

LDAP andUsers Profile - A Quick Comparison

LDAP andUsers Profile - A Quick Comparison Using LDAP in a Filtering Service for a Digital Library João Ferreira (**) José Luis Borbinha (*) INESC Instituto de Enghenharia de Sistemas e Computatores José Delgado (*) INESC Instituto de Enghenharia

More information

BillQuick Agent 2010 Getting Started Guide

BillQuick Agent 2010 Getting Started Guide Time Billing and Project Management Software Built With Your Industry Knowledge BillQuick Agent 2010 Getting Started Guide BQE Software, Inc. 2601 Airport Drive Suite 380 Torrance CA 90505 Support: (310)

More information

QAD Business Intelligence Dashboards Demonstration Guide. May 2015 BI 3.11

QAD Business Intelligence Dashboards Demonstration Guide. May 2015 BI 3.11 QAD Business Intelligence Dashboards Demonstration Guide May 2015 BI 3.11 Overview This demonstration focuses on one aspect of QAD Business Intelligence Business Intelligence Dashboards and shows how this

More information

Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records

Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records Yueh-Hsuan Chiang, AnHai Doan 2, Jeffrey F. Naughton 3 University of Wisconsin Madison 20 W Dayton Street, Madison,

More information

SCHOLARSHIP OVERVIEW... & & 2 LOGGING IN TO OPEN SCHOLARSHIP... 2 SUBMITTING YOUR THESIS...

SCHOLARSHIP OVERVIEW... & & 2 LOGGING IN TO OPEN SCHOLARSHIP... 2 SUBMITTING YOUR THESIS... Electronic Thesis & Dissertation (ETD) Submission Guide MFA Candidates in Visual Art, Sam Fox School of Design & Visual Arts A Step- by- Step Guide to Submitting Your Thesis to OPEN SCHOLARSHIP OVERVIEW...

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

White Paper April 2006

White Paper April 2006 White Paper April 2006 Table of Contents 1. Executive Summary...4 1.1 Scorecards...4 1.2 Alerts...4 1.3 Data Collection Agents...4 1.4 Self Tuning Caching System...4 2. Business Intelligence Model...5

More information

Common Questions and Concerns About Documentum at NEF

Common Questions and Concerns About Documentum at NEF LES/NEF 220 W Broadway Suite B Hobbs, NM 88240 Documentum FAQ Common Questions and Concerns About Documentum at NEF Introduction...2 What is Documentum?...2 How does Documentum work?...2 How do I access

More information

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks OnurSoft Onur Tolga Şehitoğlu November 10, 2012 v1.0 Contents 1 Introduction 3 1.1 Purpose..............................

More information

Research and Development of Data Preprocessing in Web Usage Mining

Research and Development of Data Preprocessing in Web Usage Mining Research and Development of Data Preprocessing in Web Usage Mining Li Chaofeng School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. China Abstract Web Usage Mining is the

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Oracle Service Bus Examples and Tutorials

Oracle Service Bus Examples and Tutorials March 2011 Contents 1 Oracle Service Bus Examples... 2 2 Introduction to the Oracle Service Bus Tutorials... 5 3 Getting Started with the Oracle Service Bus Tutorials... 12 4 Tutorial 1. Routing a Loan

More information

Monitoring Replication

Monitoring Replication Monitoring Replication Article 1130112-02 Contents Summary... 3 Monitor Replicator Page... 3 Summary... 3 Status... 3 System Health... 4 Replicator Configuration... 5 Replicator Health... 6 Local Package

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Viral Marketing in Social Network Using Data Mining

Viral Marketing in Social Network Using Data Mining Viral Marketing in Social Network Using Data Mining Shalini Sharma*,Vishal Shrivastava** *M.Tech. Scholar, Arya College of Engg. & I.T, Jaipur (Raj.) **Associate Proffessor(Dept. of CSE), Arya College

More information

Vector Asset Management User Manual

Vector Asset Management User Manual Vector Asset Management User Manual This manual describes how to set up Vector Asset Management 6.0. It describes how to use the: Vector AM Console Vector AM Client Hardware Inventory Software Inventory

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

Getting Started with Access 2007

Getting Started with Access 2007 Getting Started with Access 2007 Table of Contents Getting Started with Access 2007... 1 Plan an Access 2007 Database... 2 Learning Objective... 2 1. Introduction to databases... 2 2. Planning a database...

More information

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University

More information

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 sheetal.raiyani@gmail.com

More information

Enhancing the Ranking of a Web Page in the Ocean of Data

Enhancing the Ranking of a Web Page in the Ocean of Data Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today

More information

MPD Technical Webinar Transcript

MPD Technical Webinar Transcript MPD Technical Webinar Transcript Mark Kindl: On a previous Webinar, the NTAC Coordinator and one of the Co-Chairs of the NTAC introduced the NIEM MPD specification, which defines releases and IEPDs. In

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

NJCU WEBSITE TRAINING MANUAL

NJCU WEBSITE TRAINING MANUAL NJCU WEBSITE TRAINING MANUAL Submit Support Requests to: http://web.njcu.edu/its/websupport/ (Login with your GothicNet Username and Password.) Table of Contents NJCU WEBSITE TRAINING: Content Contributors...

More information