User Behavior Analysis Using Alignment Based Grammatical Inference from Web Server Access Log



Similar documents
DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

A technical guide to 2014 key stage 2 to key stage 4 value added measures

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

Project Management Basics

A Spam Message Filtering Method: focus on run time

License & SW Asset Management at CES Design Services

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

CASE STUDY BRIDGE.

Mixed Method of Model Reduction for Uncertain Systems

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

Improving the Performance of Web Service Recommenders Using Semantic Similarity

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Bi-Objective Optimization for the Clinical Trial Supply Chain Management

Performance of a Browser-Based JavaScript Bandwidth Test

Unit 11 Using Linear Regression to Describe Relationships

Bio-Plex Analysis Software

SELF-MANAGING PERFORMANCE IN APPLICATION SERVERS MODELLING AND DATA ARCHITECTURE

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

Report b Measurement report. Sylomer - field test

Support Vector Machine Based Electricity Price Forecasting For Electricity Markets utilising Projected Assessment of System Adequacy Data.

Simulation of Sensorless Speed Control of Induction Motor Using APFO Technique

REDUCTION OF TOTAL SUPPLY CHAIN CYCLE TIME IN INTERNAL BUSINESS PROCESS OF REAMER USING DOE AND TAGUCHI METHODOLOGY. Abstract. 1.

Brand Equity Net Promoter Scores Versus Mean Scores. Which Presents a Clearer Picture For Action? A Non-Elite Branded University Example.

Apigee Edge: Apigee Cloud vs. Private Cloud. Evaluating deployment models for API management

TRADING rules are widely used in financial market as

Research Article An (s, S) Production Inventory Controlled Self-Service Queuing System

A note on profit maximization and monotonicity for inbound call centers

BUILT-IN DUAL FREQUENCY ANTENNA WITH AN EMBEDDED CAMERA AND A VERTICAL GROUND PLANE

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

Profitability of Loyalty Programs in the Presence of Uncertainty in Customers Valuations

Cluster-Aware Cache for Network Attached Storage *

Utility-Based Flow Control for Sequential Imagery over Wireless Networks

Two Dimensional FEM Simulation of Ultrasonic Wave Propagation in Isotropic Solid Media using COMSOL

1) Assume that the sample is an SRS. The problem state that the subjects were randomly selected.

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN


Opening for SAUDI ARAMCO Chair for Global Supply Chain Management

Laureate Network Products & Services Copyright 2013 Laureate Education, Inc.

Control Theory based Approach for the Improvement of Integrated Business Process Interoperability

Return on Investment and Effort Expenditure in the Software Development Environment

CASE STUDY ALLOCATE SOFTWARE

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

How Enterprises Can Build Integrated Digital Marketing Experiences Using Drupal

SPECIFICATIONS FOR PERIMETER FIREWALL. APPENDIX-24 Complied (Yes / No) Remark s. S.No Functional Requirements :

! Search engines are highly profitable. n 99% of Google s revenue from ads. n Yahoo, bing also uses similar model

Availability of WDM Multi Ring Networks

Benchmarking Bottom-Up and Top-Down Strategies for SPARQL-to-SQL Query Translation

Pekka Helkiö, 58490K Antti Seppälä, 63212W Ossi Syd, 63513T

Assessing the Discriminatory Power of Credit Scores

Final Award. (exit route if applicable for Postgraduate Taught Programmes) N/A JACS Code. Full-time. Length of Programme. Queen s University Belfast

Warehouse Security System based on Embedded System

Progress 8 measure in 2016, 2017, and Guide for maintained secondary schools, academies and free schools

A Note on Profit Maximization and Monotonicity for Inbound Call Centers

The Cash Flow Statement: Problems with the Current Rules

A Review On Software Testing In SDlC And Testing Tools

Scheduling of Jobs and Maintenance Activities on Parallel Machines

Review of Multiple Regression Richard Williams, University of Notre Dame, Last revised January 13, 2015

A Resolution Approach to a Hierarchical Multiobjective Routing Model for MPLS Networks

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Exposure Metering Relating Subject Lighting to Film Exposure

v = x t = x 2 x 1 t 2 t 1 The average speed of the particle is absolute value of the average velocity and is given Distance travelled t

Four Ways Companies Can Use Open Source Social Publishing Tools to Enhance Their Business Operations

A Novel Web-Based Student Academic Records Information System

DUE to the small size and low cost of a sensor node, a

CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK

QUANTIFYING THE BULLWHIP EFFECT IN THE SUPPLY CHAIN OF SMALL-SIZED COMPANIES

Risk Management for a Global Supply Chain Planning under Uncertainty: Models and Algorithms

Simulation of Power Systems Dynamics using Dynamic Phasor Models. Power Systems Laboratory. ETH Zürich Switzerland

Redesigning Ratings: Assessing the Discriminatory Power of Credit Scores under Censoring

Socially Optimal Pricing of Cloud Computing Resources

Change Management Plan Blackboard Help Course 24/7

Evaluating Teaching in Higher Education. September Bruce A. Weinberg The Ohio State University *, IZA, and NBER

Auction-Based Resource Allocation for Sharing Cloudlets in Mobile Cloud Computing

Software Engineering Management: strategic choices in a new decade

AN OVERVIEW ON CLUSTERING METHODS

Analysis of Mesostructure Unit Cells Comprised of Octet-truss Structures

Please read the information that follows before beginning. Incomplete applications will delay the review and approval process.

Design of Compound Hyperchaotic System with Application in Secure Data Transmission Systems

A Life Contingency Approach for Physical Assets: Create Volatility to Create Value

Bidding for Representative Allocations for Display Advertising

Algorithms for Advance Bandwidth Reservation in Media Production Networks

Transcription:

Uer Behavior Analyi Uing Alignment Baed Grammatical Inference from Web Server Acce Log Rameh Thakur, Sureh Jain, and Narendra S. Chaudhari Abtract Application of data mining technique to the World Wide Web refer to a Web mining. Web baed origination collect large volume of data for their operation. Analyi of uch data can help the organization for better working (Marketing trategy, ervice, evaluation of effectivene, promotional campaign etc). Thi type of analyi require dicovery of meaningful relationhip from the large collection of primarily untructured data tored in Web erver acce log. We propoe a new approach for automatically learning (context-free) grammar rule form erver acce log text (poitive et) ample, baed on the alignment between the entence. Our approach work on pair of untructured entence that have one or more word common. Index Term Web uage mining, computational learning, grammatical inference, alignment profile, information extraction. I. INTRODUCTION We conider the web a the larget knowledge bae ever developed and made available to the public. It become increaingly neceary for the uer to utilize automated tool and analyze their ue pattern. Thi factor give rie to the neceity of creating erver ide intelligent ytem that can efficiently mine for Knowledge of erver ue. Web mining can be broadly defined a the dicovery and analyi of ueful information from the World Wide Web. Dicovery of uer acce pattern form web erver i known a web uage mining. The web erver ue detail i generally gathered automatically by web erver in web erver acce log which i untructured data et (text file). Web ue mining ha everal application [1] uch a analyi of maive volume of click tream or click flow data, Peronalization for a uer, determining acce behavior of uer, etc. Information extraction from textual data ha variou application, uch a emantic earch [2]. If the entence confirm to a language decribed by a known grammar, everal technique exit to generate the yntactic tructure of thee entence, paring [3] i one of uch technique that rely on knowledge of grammar. In automated grammar learning, the tak i to infer grammar rule from given information about the target language. The entence are given a example for uch learning. If the example belong to the Manucript received Augut 15, 2012; revied December 21, 2012. Rameh Thakur i with the International Intitute of Profeional Studie, Devi Ahilya Univerity, and Indore, India (e-mail: r_thakur@rediffmail.com). Sureh Jain i with the KCB Technical Academy, Indore, India (e-mail: ureh.jain@rediffmail.com). Narendra S.Chaudhari i with the Indian Intitute of Technology, Indore, India (e-mail: nc183@gmail.com). target language, it i called poitive example otherwie it i called negative example. In fact Gold [4] how that not all language can be inferred from poitive example only. A language that can be inferred by looking at a finite number of poitive example only aid to be identifiable in the limit [4]. A per thi theorem, it i not poible to identify the target language form only poitive example. One main approach for learning ome ubclae of regular language i by plitting the tate in the determinitic finite automata (DFA) [5]. Prefix tree acceptor are often contructed from the given ample a a tarting DFA, and they are ueful for modeling poitive ample. Other approache include learning by querie [6], learning by tructural information [7], learning ubcla of language [8], learning by genetic algorithm [9], neural network [10], Markov approache [11] and other related work can be found in [12]-[15]. In thi paper we propoe a grammar inference methodology for web erver log file of untructured data to automate the contruction of context free grammar rule and facilitate the proce of information extraction. We are uing Grammatical Inference methodology baed on alignment between text of given collection of entence of the web erver log text file (un-tructured document). Thi method work on pair of untructured entence that have one or more common word. When two entence are divided into two part having equal part (ame et of word) and unequal part (different et of word then thee part are taken a poible contituent of the grammar. II. WEB LOGS A web log file [16] record activity information of web uer requet on web erver. The main ource of raw data i web erver log which i tored on web erver for debugging purpoe. A log file are tored at three different place (i) Server-ide Log (ii) Proxy- ide Log (iii) Client-ide Log. A. Web Log Structure Web erver log are plain text (ASCII) file that are independent from the erver platform. Generally there are four type of erver log: tranfer log, Agent log, Error log and Referrer log. A web log [17] i the file to which the web erver write information each time a uer requet a ource from that particular ite. Mot of the web erver ue the common log format. Following fragment are the erver log file entry [18]. 123.123.123.123 - - [26/Apr/2000:00:23:48-0400] "GET /pic/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafoft.com/actortf/" "Mozilla/4.05 (Macintoh; I; PPC)" DOI: 10.7763/IJFCC.2013.V2.223 543

Thi reflect the information a follow. 1) Remote IP addre or domain name: An IP addre i a 32-bit hot adder defined by internet protocol. A domain name i ued to determine a unique IP addre for any hot on the internet. 2) Authuer: Uer name and paword when the erver require uer authentication. 3) Entering and exiting date and time. 4) Mode of requet: GET, POST, or HEAD. 5) Statu: The HTTP Statu Code returned to the client. 6) Byte: the content length. etc. III. ALIGNMENT BASED LEARNING Alignment Baed Learning (ABL) i baed on alignment information [19]. In ABL Pairwie alignment for each pair of the input entence i done by finding equal and unequal part. Pairwie alignment i an arrangement of two equence, which how where the two equence are imilar and where they differ. A good alignment how the mot ignificant imilaritie, and leat difference. A core i aigned to an alignment called alignment core, to meaure the goodne of an alignment. Scoring cheme i uually defined on the pairing of different contituent and gap penalty for hift in the alignment. An example of alignment for the following two entence: Boy like freh red, green apple All Boy like to eat red, green apple are [Boy like] freh [red, green apple] All [boy like] to eat [red, green apple] Word that are located above each other and that are equal in alignment are called match. The hift caued by inertion or deletion are called gap. In alignment baed ytem, more gap mean le imilarity. Word that are located above each other and that are equal in the alignment are called ubtitution. In an alignment, if there i a ubtitution, then two ub-entence are aid to be aligned in the ame lot. Here the lot denote that the ub-entence are located in the alignment. For example freh and to eat are aligned in the ame lot, which are hown in the bracket. Boy like {freh} red, green apple All Boy like {to eat} red, green apple In the alignment phae, the matched part of entence are conidered a poible contituent. Non-terminal are aigned a they poibly generate the contituent. Such aignment are called hypothee. [Ram] See the [large, green] orange [My mother] See the [yellow] orange The above hypothei i ued to create the grammar rule by aigning new ymbol repreenting the ub-entence, which are alo called contituent that are in quare bracket pair. S A See the B orange A Ram A My mother B yellow B large green The entence i of unknown language then it i very hard if not impoible to ay anything about their language only we can conclude it i a entence. However, if two entence are available, it i poible to find part of the entence that are the ame in both and part that are not (provided that ome word are ame and ome word are different in both entence). The comparion of two entence fall into one of the three different categorie. 1) The entence are completely different. 2) All word in the two entence are the ame 3) Some of the word in the entence are ame in both and ome are different. A. Alignment Baed Grammar Inference The entence belonging to third cae, having two poibilitie for electing contituent for grammar (CFG) rule extraction i.e. (1) Select equal part a contituent (2) elect unequal part a contituent. Without the lo of generality conider the following imple example. 1) Mother eat bicuit. 2) Mother eat cake. 3) Mother eat bicuit. 4) Mother eat cake. In cae one unequal part (underlined) part are elected a contituent. In cae two the equal part (underlined) part are elect a the contituent. The reulting grammar are hown bellow. (Table I) TABLE I: CONSTITUENTS INFERENCE COMPARISON Method Structure Grammar Equal Part Unequal part Mother eat bicuit Mothereat cake. Bicuit Cake Mother eat Mother eat S bicuit S Cake Mother eat S Mother eat Bicut Cake When unequal part of entence are taken to be contituent, thee reult in more compact grammar rather than when equal part of entence are taken to be contituent. In other word, the grammar i more compreed. B. Overlapping Contituent While extracting context-free grammar form untructured web acce log text overlap hould never occur within tree tructure. We ue a proper data tructure for electing contituent when ever candidate contituent are elected for grammar generation, firt it i checked in the contituent data tructure if it already exit then it return ame rule otherwie a new grammar rule i returned. 544

IV. FINDING PATTERN FORM WEB ACCESS LOG UNSTRUCTURED TET A. Context-Free Grammar We define an alphabet Σ a finite et of ymbol. A tring over an alphabet Σ i a finite ordered equence of ymbol from Σ. The length of the tring α i the number of ymbol in the tring, with repetition and denoted by α (e.g. aabbcc =6). The empty tring i denoted by ε, i the tring of length zero. A CFG (context free Grammar) G i a four tuple (N, Σ, P, S) where N i the et of non-terminal, Σ i the et of alphabet (alo called a non terminal ymbol P, P i et of production rule and S Є N i the tart ymbol Conventionally A, B,. Denote non-terminal, a, b, Denote terminal, and α, β, repreent tring in (N Σ)* The production in CFG Si of the form A α A i called left-hand ide (LHS), and α i the right hand ide (RHS). We define A production to be a production with LHS a the non-terminal A. Given the production A α, we ay that βα γ i derived from βaγ in one tep and we denote it by βaγ βαγ. If δ i derived from γ in zero or more tep, the * γ δ derivation i denoted a. The language of G, which i denoted by L (G), i the et of all terminal tring that can be derived from the tart ymbol S. Formally, * * L( G) = { S }. A entence S of length S =n i a non-empty lit of word {w 1, w 2,., w n }. The word are conidered elementary. A word w in entence S i written a S[i] =w i. Our algorithm learn the Grammar (CFG) from the et of entence. Thee entence are tored in a lit called corpu. Note in our cae the corpu i web acce log file. B. Corpu A corpu of ize S =n i lit of entence [S 1, S 2,,S n ]. C. Contituent A contituent A contituent in entence S i a tuple C S ={b, e, n} where 0 b e n, b and e are indice in S denoting repectively the beginning and end of contituent, n i the non-terminal of contituent and i taken from the et of non-terminal. S may be replaced when it value i clear from the context. D. Sub-Sentence or Word Group A ub-entence or word group of entence S i a lit of word u i = j uch that S = u + vi = j + w (the + i defined to be the concatenation operator on lit), where u and w are lit of word and u i= j j with i j i a lit of j-i element where for each k with 1 k j-i : v i = j[ k] = [ i + k] A ub-entence may be empty (when i=j) or it may pan the entire entence (when i=0 and j= S ). S may be omitted if it meaning i clear from the context. E. Subtitutability A ub-entence Sub-entence u and v are ubtitutable for each other if 1) The entence S 1 = t + u+ w and S2=t + v + w (with t and w ub-entence) are both valid, and 2) For each k with 1 k u it hold that u [k] v and for each l with 1 l v it hold that v [l] u. Note that thi definition of ubtitutability allow for the ubtitution of empty ub-entence. We aume that for two ub-entence to be ubtitutable, at leat one of the two ub-entence need to be non-empty. For example conider following cae 3) Mother eat bicuit. Mother eat cake. In above cae, the word bicuit and cake are the unequal part of the entence. Thee word are the only word that are ubtitutable according to definition. The word group eat bicuit and cake are not ubtitutable, ince the firt condition in the definition doe not hold (t = Mother in 3a and t = Mother eat in 3b) On the other hand, the word group eat bicuit and eat cake are not ubtitutable, ince thee clah with the econd condition. The word eat i preent in both word group. The advantage of thi notion of ubtitutability i that the ubtitutable word group can be found eaily by earching for unequal part of entence. V. WEB ACCESS LOG MINING ALGORITHM We plit the problem of web acce mining in to following phae:- 1) Data Cleaning: Before we can apply the algorithm we need to eliminate the irrelevant item form the erver acce log file o that the file contain a et of tring that have only ueful data for mining. Elimination of irrelevant item can be accomplihed by checking the uffix of the URL name. For intance, all log entrie with filename uffixe uch a gif, jpeg, GIF, JPEG and map can be removed. For our analyi the uername paword are alo removed. Data cleaning phae may be ued to tranlate the data with context of information extraction. 2) Finding Contituent: The entence are canned and baed on their alignment information among the entence, the contituent (i.e. equal unequal part) are identified. 3) The reulting contituent are checked for overlapping and if no overlapping exit then a new rule i added in the reult. 4) Multiple Production alternative: If the occurrence of i in uch a poition that multiple production alternative are poible ( u w) then new production i Y u w and Y. A. Algorithm Input: A corpu of flat entence of acce log (tring) Output: Set of CFG rule R Begin Initialize rule et R= While α C α > 1 do For β C and β α do {D}, {S} FindutableSubentence(α, β i ) // {D}, {S} are the et repreenting identical and ditinct ub-entence For γ {D} do 545

//aign non-terminal to contituent that i ditinct part of α and β. If α > 1 then N= elect next non- terminal for if no overlapping exit otherwie it will return ame non-terminal preent in R R=R {N γ} // apply replacement rule for each tring in the corpu Update(C, N γ) End if End for End for End while End B. Contituent Selection The alignment learning phae may generate unwanted overlapping contituent. Since we aume the underlying grammar for corpu i context-free grammar o we want to know the mot appropriate diambiguated tructure of the entence of the corpu. We ue two different approache for the election of contituent. 1) Aume the contituent learned firt i correct. Thi mean that when new contituent overlap with older one, they are ignored. 2) Contituent are elected baed on their Support Factor. The algorithm compute upport factor of contituent by counting the number of time the equence of word in the contituent occur. Normalize by total number of contituent in corpu C. Count of γ in entence length of γ SFγ = Length of entence where N= number of entence in corpu and γ i the elected contituent. 123.456.78.9-- [25/Apr/2011:03:94:41-0580] "GET/A.html HTTP/1.0" 200 3290 "Mozilla/4.05 (Macintoh;)" 123.456.78.9-- [25/Apr2011:03:05:34-0500]] "GET/B.html HTTP/1.0" 200 2050 A.html "Mozilla/4.05 (Macintoh;)" 123.456.78.9 -- [25/Apr:2011:03:05:39-0500] "GET/C.html HTTP/1.0" 200 4130 - "Mozilla/4.05 (Macintoh;)" 123.456.78.9-- [25/Apr2011:03:06:02-0500] "GET/D.html HTTP/1.0" 200 5096 B.html "Mozilla/4.05 (Macintoh;)" 123.456.78.9 -- [25/Apr2011:03:10:45..0500] "GET/G.html HTTP/1.0" 200 9430 - "Mozilla/4.05 (Macintoh;)" 123.456.78.9 -- [25/Apr2011:03:12:23-0500] "GET/D.html HTTP/1.0" 200 7220 - "Mozilla/4.05 (Macintoh;)" 123.156.78.9 -- [25/Apr2011:03:07:55-0500] "GET/R.html HTTP/1.0" 200 8140 L.html "Mozilla/4.05 (Macintoh;)" 123.156.78.9 -- [25/Apr2011:03:09:50-0500] "GET/C.html HTTP/1.0" 200 1820 A.html "Mozilla/4.05 (Macintoh;)" 123.156.78.9 -- [25/Apr2011:3:10:02..0500] "GET/A.html HTTP/1.0" 200 2270 - "Mozilla/4.05 (Macintoh;)" 209.458.78.2 -- [25/,Apr2011:05:05:22-0500] "GET/A.html HTTP/1.0" 200 3290 - " Mozilla/4.05 (Macintoh;)" 209.458.78.3 -- [225/Apr2011:05:06:03-0500] "GET/A.html HTTP/1.0" 200 1680 - "Mozilla/4.05 (Macintoh;)" Fig. 1. Sample web erver acce log file VI. EPERIMENTAL RESULTS In thi paper we have extracted context free grammar uing alignment baed learning approach of even day web acce log text file of http://www.dauniv.ac.in available in the format of untructured data ource. Variou analye have been carried out to identify uer behavior. For implicity the time and ize are ignored. A ample web acce log file i hown a fallow. A. Grammar Rule Extraction for Uer Identification Uer identification mean individual uer by oberving their IP addre. To identify uer, we propoe ome rule if there i new IP addre then there i new uer, if the IP addre i the ame but the operating ytem of browing oftware are different we aume that different agent type therefore an IP addre repreent the different uer. For each uer, following grammar rule are extracted uing alignment baed learning the time of acce ha been ignored. (Table II) 1) Iteration TABLE II: THE ARRANGEMENT OF CHANNELS Itration Contituent Grammar I1 I2 GET/A.html HTTP/1.0, GET/B.html HTTP/1.0, GET/C.html HTTP/1.0,.. Mozilla/4.05(Macintoh;) Mozilla/4.7[en]C-SYMPA (win95;u) In.. B. Output S 123.456.78.92[5/Apr/2011 ] 200 Mozilla/4.05 Macintoh;) GET/A.html HTTP/1.0 GET/B.html HTTP/1.0.. S 123.456.78.92[5/Apr/2011 ] 200 Mozilla/4.05 Macintoh;) GET/A.html HTTP/1.0 GET/B.html HTTP/1.0 Z Mozilla/4.05(Macintoh;) After applying the propoed algorithm the following et of grammar reult: S 123.156.78.9[25/Apr/2011]YZ S 209.158.78.2[25/Apr/2011]YZ S 209.158.78.3[25/Apr/2011]YZ GET/A.html HTTP/1.0, GET/B.html HTTP/1.0,. Y 200, Y 305, Z Mozilla/4.05(Macintoh;), Day No.of Entrie TABLE III: USER PROFILE No. of IP addre No of unique uer Failure 1 63567 7587 576 2931 2 61264 7632 613 2463 3 87565 19103 1766 1738 4 64536 8340 868 795 5 75535 12638 1039 1421 6 67342 22706 2033 2304 7 88233 32657 1765 2897 We interpret thi a fallow: the tart ymbol repreent individual uer complete text. We can ee that a text begin with fixed preamble; followed by a variable number of occurrence of YZ (Page acce) each repreent a liting in erver log file. Here the non-terminal YZ repreent to reference to the web page. The data field for each liting can be extracted by mapping the text ymbol to their actual 546

content. Then the domain pecific heuritic can be ued to identify the emantic meaning of different field. In thi web erver analyi domain knowledge i not ued in grammar generation. So thi approach can be eaily applied to other type of web acce analyi. (Table III) VII. CONCLUSION We have demontrated the ue of alignment baed grammatical inference to infer grammar rule (CFG) for web erver acce log file i.e. identification of grammatical rule from a given ymbolic ample in language entence. Our algorithm ue ditinction between entence to find poible contituent during the alignment learning phae and elect the mot probable contituent. For our experiment the dauniv.ac.in erver log data heet have been ued. Our approach employ alignment imilaritie among the entence to formulate grammar of the data heet. The reulting grammar ha been analyzed to identify the uer behavior on web erver. The reult of analyi i of great ue for ytem adminitrator, web deigner etc for their marketing planning, web peronalization, etc. REFERENCES [1] R. Cooley, B. Mobaher, and J. Srivatava, Web Mining: Information and Pattern Dicovery on the World Wide Web, in Proc. IEEE Computer Society, 1997, pp. 558. [2] P. Palaga, L. Nguyen, U. Leer, and J. Hakenberg, High-performance information extraction with alibaba, in EDBT, pp. 1140-1143, 2009. [3] J. Allen, Natural Language Undertanding, The Benjamin/Cumming Publihing Company, Inc., Redwood City, CA, USA. Second Edition, 1995. [4] E. M. Gold, Language identification in the limit, Inf. Cont. vol. 10, no. 5, pp. 447 474,1967. [5] P. Dupont, L. Miclet, and E. Vidal, What i the earch pace of the regular inference, in Proc. ICGI Lecture Note in Artificial Intelligence, vol. 862, Heidelberg, Berlin: Springer, 1994, pp. 25 37. [6] D. Angluin, Querie and concept learning, Mach. Learn., vol. 2, no. 4, pp. 319 342, 1988. [7] Y. Sakakibara, Learning context-free grammar from tructural data in polynomial time, Theor. Comput. Sci., vol. 76, pp. 223 242, 1990. [8] D. Angluin, Learning k-bounded context-free grammar, Yale Univ., New Haven, CT, Yale Tech. Rep. RR-557, 1987. [9] Y. Sakakibara, Learning context-free grammar uing tabular repreentation, Pattern Recognit., vol. 38, no. 9, pp. 1372 1383, 2005. [10] Y. Sakakibara and M. Golea, Simple recurrent network a generalized hidden markov model with ditributed repreentation, in Proc. IEEE Int. Conf. Neural Netw, EEE Comput. Soc, 1995, pp. 979 984. [11] K. Kerting, L. D. Raedt, and T. Raiko, Logical hidden Markov model, J. Artif. Intell. Re, vol. 25, pp. 425 456, 2006. [12] N. S. Chaudhari, and iangrui Wang, Language Structure Uing Fuzzy Similarity, IEEE Tran on fuzzy ytem, vol. 17, no. 5, pp. 1011-1024, Oct 2009. [13] R. Thakur, S. Jain, and N. S. Chaudhari, Incremental Dicovery of Sequential Pattern from Semi-tructured Document Uing Grammatical Inference, in Proc ICDCIT 2012, LNCS, vol. 7154, Springer-Verlag Berlin Heidelberg, 2012, pp. 269. [14] P. Adriaan and M. Vervoort, The EMILE 4.1 grammar induction toolbox, Lecture Note in Computer Science, 2002, vol. 2484, pp. 293 295. [15] V. R. Borkar, K. Dehmukh, and S. Sarawagi, Automatically extracting tructure from free text addree, in Bulletin of the IEEE Computer Society Technical committee on Data Engineering, IEEE, 2000. [16] J. Srivatava, R. Cooley, M. Dehpande, and P. N. Tan Web uage mining: Dicovery and Application of uage pattern from web data, in Proc. SIGKDD Exploration, vol.1, no.2, Jan 2000, pp. 12-33. [17] W. W. W. Conortium the Common Log File format. [Online]. Available: http://www.w3.org/daemon/uer/config/ [18] M. V. Zaanen, Implementing alignment-baed learning, in Proc. ICGI (Lecture Note in Computer Science), 2002, vol. 2484, pp. 312 314. [19] A Web erver log file explained [Online]. Available: http://www.jafoft.com/earchengine/log_ample.html. Rameh Kumar Thakur wa born in Samitipur, Bihar, India, in 1974. He received the B.E. degree in Computer Science and Engineering, the M.E. degree in computer engineering, and currently peruing Ph.D. in computer engineering, from Devi Ahilya Univerity, Indore, and Indore. In 1998, he wa appointed a At. Prof at SVITS, Indore India. In 2001, he joined a a At. Prof in Department of computer Engineering at Devi Ahilya Univerity, Indore, India, ince 2007 he i working a Aociate Prof at Devi Ahilya Univerity, Indore He i involved in coordinating graduate-level and potgraduate-level training program in computer cience for the univerity. During 2011 he wa viiting profeor at Indian Intitute of Technology, Indore, India. He i a member of IEEE. He ha publihed many reearch paper in variou national and international journal & participated in o many conference. Hi area of reearch i Information Extraction. Sureh Jain i a director and profeor in computer engineering at Suhila Devi Banal College of Engineering, Indore. He completed hi Bachelor of Engineering from MANIT Bhopal, Mater of Engineering from SGSITS Indore and Ph.D. in Computer Science from DAVV. He ha experience of over 25 year in the field of academic and reearch. He erved Devi Ahilya Univerity over 20 year in the capacity of Lecture, Reader and Profeor of Computer Engineering. He ha publihed more than 80 reearch paper in reputed Journal and Conference. He teache coure on Artificial Intelligence, Computer Graphic, Theory of Computation, and DBMS. He i a life member of CSI, IEEE and ISTE. He i guiding reearch in the field of machine learning, web mining and information retrieval. Narendra S. Chaudhari ha completed hi undergraduate, graduate, and doctoral tudie at Indian Intitute of Technology (IIT), Mumbai, India, in 1981, 1983, and 1988 repectively. Dr. Narendra S. Chaudhari ha houldered many enior level adminitrative poition in univeritie in India a well a abroad. A few notable aignment include: Dean - Faculty of Engineering Science, Devi Ahilya Univerity, Indore, Member - Executive Council, Devi Ahilya Univerity, Indore, Coordinator - International Exchange Program, Nanyang Technological Univerity, Singapore, Deputy Director - GameLAB, Nanyang Technological Univerity, Singapore. Currently, he i Dean - Reearch and Development, Indian Intitute of Technology (IIT) Indore, and Member - Adviory Board, ITM Univerity, Gwalior (M.P.). Narendra ha done ignificant reearch work on game AI, novel neural network model like binary neural net and bidirectional net, context free grammar paring, and graph iomorphim problem. He ha upervied more than 18 doctoral tudent and more than 80 Mater tudent. He ha delivered invited talk and preented hi reearch reult in everal countrie like America, Autralia, Canada, Germany, Hungary, Japan, United Kingdom, etc. A few intitute where he ha given talk on hi reearch include Maachuett Intitute of Technology (MIT) USA, Nagoya Intitute of Technology (NIT), Nagoya, Japan, Mancheter Metropolitan Univerity (MMU), Mancheter (U.K.), Beijing Normal Univerity (BNU), Beijing (P.R. China), etc. He ha more than 240 publication in top quality international conference and journal.he ha been invited a a keynote peaker in many conference in the area of Soft-Computing, Game-AI, and Data Management. He ha been referee and reviewer for a number of premier conference and journal including IEEE Tranaction, Neurocomputing, etc. Dr. Chaudhari i Fellow of the Intitution of Engineer, India (IE- India), a well a Fellow of the Intitution of Electronic and Telecommunication Engineer (IETE) (India), Senior member of Computer Society of India, Senior Member of IEEE, USA, Member of Indian Mathematical Society (IMS), Member of Cryptology Reearch Society of India (CRSI), and many other profeional ocietie. 547