NATURAL LANGUAGE PROCESSING AND LEGAL KNOWLEDGE EXTRACTION
|
|
|
- Emory Charles
- 10 years ago
- Views:
Transcription
1 NATURAL LANGUAGE PROCESSING AND LEGAL KNOWLEDGE EXTRACTION Simonetta Montemagni, Giulia Venturi Istituto di Linguistica Computazionale «Antonio Zampolli» (ILC-CNR) Summer School LEX2013-5th September 2013
2 Bridging the gap between text and knowledge: the crucial role of NLP tools Natural Language Processing tools Knowledge is mostly conveyed through text Content access requires understanding the linguistic structure Texts Structured knowledge We need a bridge to overcome the gap between text and knowledge Technologies based on Natural Language Processing allows accessing the domain-specific knowledge contained in texts structuring the textual content
3 From text to knowledge: the main challenge in the legal domain One of the main obstacles to progress in the field of artificial intelligence and law is the natural language barrier L. Thorne McCarty, International Conference on AI and Law (ICAIL-2007) Raw materials of the law are embodied in natural language (cases, statutes, regulations, etc.) Legal knowledge is heavily intertwined with natural language and common sense and therefore inherits all the hard problems that these imply Knowledge-based legal information systems need to access the content embedded in legal texts
4 IUSEXPLORER Legal search engine gathering Italian different sources of law (case laws, legislation, jurisprundence, journals, etc.)
5 IUSEXPLORER: an example of word search query danno (damage) Ambiguity between the verb and the noun
6 IUSEXPLORER: an example of word search query danno patrimoniale (patrimonial damage) It returns the single word (damage and patrimonial), the multi-word and also the negation
7 IUSEXPLORER Advanced search engine which provides customers with access to billions of searchable documents It is still linguistically rudimentary it does not exploit the potential offered by language technologies it does not support semantic queries allowing an advanced access to documents Need for increasingly sophisticated applications based on Natural Language Processing technologies for effectively accessing the content embedded in texts
8 Summary From text to knowledge The general approach Natural Language Processing tools What and what for The main challenges of the legal domain Legal Knowledge Extraction Identification and extraction of domain-relevant knowledge Semantic annotation of legal texts
9 From text to knowledge: the general approach Textual content (implicit knowledge) Dynamic content structuring Linguistic annotation Structured knowledge (explicit knowledge)
10 Natural Language Processing techniques and knowledge extraction Linguistic annotation tools Structured knowledge Lexico-semantic resources Knowledge extraction tools Balanced Cooperative Approach
11 Linguistic annotation tools: what Linguistic annotation the process in charge of reconstructing and making explicit the linguistic structure underlying texts State-of-the-art tools are based on machine-learning algorithms Annotation process as probabilistic classification task Basic requirements robustness to minimize failures due to lexical gaps, particularly complex linguistic constructions as well as ill-formed input accuracy of achieved results efficiency to deal with huge amounts of textual data portability to different domains, textual genres, linguistic registers, other languages incrementality of analysis
12 Linguistically annotated text Linguistic annotation: an incremental process text Sentence Splitter Splits the text into sentences Tokenizer Morphological analyzer PoS Tagger Dependency parser Segments each sentence into orthographic units (tokens) Assigns the possible morphological analyses to each token Selects the appropriate morphological interpretation in the specific context Identifies dependency relations between tokens (e.g. subject, object, etc.)
13 Linguistic annotation: an example text Sentence Splitter Il danno non poteva essere sottovalutato. Il sig. Rossi decise perciò di chiamare l avvocato. (The damage could not be understimated. Mr. Rossi decided therefore to call the lawyer.) Tokenizer Morphological analyzer PoS Tagger Dependency parser
14 Linguistic annotation: an example text Sentence Splitter Tokenizer Morphological analyzer Il danno non poteva essere sottovalutato. Il sig. Rossi decise perciò di chiamare l avvocato. (The damage could not be understimated. Mr. Rossi decided therefore to call the lawyer.) - Il danno non poteva essere sottovalutato. (The damage could not be understimated.) - Il sig. Rossi decise perciò di chiamare l avvocato. (Mr. Rossi decided therefore to call the lawyer.) PoS Tagger Dependency parser
15 Linguistic annotation: an example text Sentence Splitter Tokenizer Morphological analyzer PoS Tagger Dependency parser id 1 Il Il danno non poteva essere sottovalutato. Il sig. Rossi decise perciò di chiamare l avvocato. (The damage could not be understimated. Mr. Rossi decided therefore to call the lawyer.) - Il danno non poteva essere sottovalutato. (The damage could not be understimated.) - Il sig. Rossi decise perciò di chiamare l avvocato. (Mr. Rossi decided therefore to call the lawyer.) 2 danno 3 non form 4 poteva 5 essere 6 sottovalutato CoNLL tabular representation schema
16 Linguistic annotation: an example text Sentence Splitter Tokenizer Morphological analyzer PoS Tagger Il danno non poteva essere sottovalutato. Il sig. Rossi decise perciò di chiamare l avvocato. (The damage could not be understimated. Mr. Rossi decided therefore to call the lawyer.) - Il danno non poteva essere sottovalutato. (The damage could not be understimated.) - Il sig. Rossi decise perciò di chiamare l avvocato. (Mr. Rossi decided therefore to call the lawyer.) id form lemma PoS Feats 1 Il il RD MS 2 danno danno;dare S;V MS;P3IP 3 non non BN NULL Dependency parser 4 poteva potere V S3II 5 essere essere V F 6 sottovalutato sottovalutare V MSPR CoNLL tabular representation schema
17 Linguistic annotation: an example text Sentence Splitter Tokenizer Morphological analyzer PoS Tagger Dependency parser Il danno non poteva essere sottovalutato. Il sig. Rossi decise perciò di chiamare l avvocato. (The damage could not be understimated. Mr. Rossi decided therefore to call the lawyer.) - Il danno non poteva essere sottovalutato. (The damage could not be understimated.) - Il sig. Rossi decise perciò di chiamare l avvocato. (Mr. Rossi decided therefore to call the lawyer.) id form lemma PoS Feats 1 Il il RD MS 2 danno danno S MS 3 non non BN NULL 4 poteva potere V S3II 5 essere essere V F 6 sottovalutato sottovalutare V MSPR CoNLL tabular representation schema
18 Linguistic annotation: an example text Sentence Splitter Tokenizer Morphological analyzer PoS Tagger Dependency parser Il danno non poteva essere sottovalutato. Il sig. Rossi decise perciò di chiamare l avvocato. (The damage could not be understimated. Mr. Rossi decided therefore to call the lawyer.) - Il danno non poteva essere sottovalutato. (The damage could not be understimated.) - Il sig. Rossi decise perciò di chiamare l avvocato. (Mr. Rossi decided therefore to call the lawyer.) id form lemma PoS Feats head DEP 1 Il il RD MS 2 DET 2 danno danno S MS 6 SUBJ_PASS 3 non non BN NULL 6 NEG 4 poteva potere V S3II 6 MODAL 5 essere essere V F 6 AUX 6 sottovalutato sottovalutare V MSPR 0 ROOT CoNLL tabular representation schema
19 Linguistic annotation: an example text Sentence Splitter Tokenizer Morphological analyzer PoS Tagger Dependency parser Il danno non poteva essere sottovalutato. Il sig. Rossi decise perciò di chiamare l avvocato. (The damage could not be understimated. Mr. Rossi decided therefore to call the lawyer.) - Il danno non poteva essere sottovalutato. (The damage could not be understimated.) - Il sig. Rossi decise perciò di chiamare l avvocato. (Mr. Rossi decided therefore to call the lawyer.) id form lemma PoS Feats head DEP 1 Il il RD MS 2 DET 2 danno danno S MS 6 SUBJ_PASS 3 non non BN NULL 6 NEG 4 poteva potere V S3II 6 MODAL 5 essere essere V F 6 AUX 6 sottovalutato sottovalutare V MSPR 0 ROOT CoNLL tabular representation schema
20 Linguistic annotation: what for Linguistic annotation plays a crucial role in accessing the content of texts by making it explicit the linguistic structure through which knowledge is encoded Starting point for several Knowledge Extraction tasks extracting domain-relevant knowledge structuring the extracted knowledge in semantic resources, e.g. lexicons, thesauri, domain-specific ontologies (Ontology Learning) semantic indexing of text collections on the basis of the extracted knowledge Linguistic annotation and knowledge extraction increasingly complex knowledge extraction tasks differentially exploit individual levels of linguistic annotation Structuring of the extracted knowledge Extraction of domainrelevant knowledge Linguistic annotation Text collection
21 From text to knowledge: the general approach Textual content (implicit knowledge) Dynamic content structuring Linguistic annotation Structured knowledge (explicit knowledge)
22 The legal domain: the main challenges The typical knowledge acquisition bottleneck as knowledge is mostly conveyed through text, content access requires understanding the linguistic structure The peculiarity of legal language and its impact on NLP tools Legal syntax is convoluted and unnatural (McCarty, NaLEA 2009) with respect to ordinary language What is the performance of state-of-the-art NLP tools on legal texts? Discriminate between legal and regulated domain knowledge By its very nature, law deals with behaviour in the world: domain independent concepts of law are tainted with concepts referring to the world the legal domain is about
23 The knowledge acquisition bottleneck Technologies in the area of knowledge management are typically confronted with the problem of processing linguistic structure Particularly relevant in the legal domain, where law is strictly dependent on its linguistic expression Why legal language processing? Why parse statutes? To extract their logical structure, to refine the semantics of the domain, to develop a domain ontology (McCarty, 2009) What are the domain-specific issues to be addressed when processing legal language? Whether and to what extent legal language differs from ordinary language Impact of recorded differences on the performance of NLP tools
24 The legal domain: the main challenges The typical knowledge acquisition bottleneck as knowledge is mostly conveyed through text, content access requires understanding the linguistic structure The peculiarity of legal language and its impact on NLP tools Legal syntax is convoluted and unnatural (McCarty, NaLEA 2009) with respect to ordinary language What is the performance of state-of-the-art NLP tools on legal texts? Discriminate between legal and regulated domain knowledge By its very nature, law deals with behaviour in the world: domain independent concepts of law are tainted with concepts referring to the world the legal domain is about
25 The peculiarity of legal language Legal texts differ significantly with respect to ordinary language texts typically correlated with syntactic complexity Differences recorded at different annotation levels long sentences wrt newswire texts 50,00 IT_newspapers IT_regional&national_laws IT_EU_laws 50,00 EN_newspapers EN_EU_laws 40,00 40,00 30,00 30,00 20,00 10,00 20,00 10,00 0,00 0,00 Avg sentence length Avg sentence length
26 The peculiarity of legal language Italian: Legal texts differ significantly with respect a corpus to of newspapers ordinary language texts typically correlated with syntactic complexity English: a collection of laws enacted by the European Commission, Italian State and Regions a corpus of newspapers a collection of laws enacted by the European Commission Differences recorded at different annotation levels long sentences wrt newswire texts 50,00 IT_newspapers IT_regional&national_laws IT_EU_laws 50,00 EN_newspapers EN_EU_laws 40,00 40,00 30,00 30,00 20,00 10,00 20,00 10,00 0,00 0,00 Avg sentence length Avg sentence length
27 The peculiarity of legal language Legal texts differ significantly with respect to ordinary language texts typically correlated with syntactic complexity Differences recorded at different annotation levels long sentences wrt newswire texts Italian: a corpus of newspapers a collection of laws enacted by the European Commission, Italian State and Regions English: a corpus of newspapers a collection of laws enacted by the European Commission high % of prepositions and low % of verbs, adverbs, pronouns IT_newspapers IT_regional&national_laws IT_EU_laws EN_newspapers EN_EU_laws Prepositions Nouns Verbs Adverbs Pronouns 0 Prepositions Nouns Verbs Adverbs Pronouns
28 The peculiarity of legal language Legal texts differ significantly with respect Italian: to ordinary language texts typically correlated with syntactic complexity Differences recorded at different annotation levels long sentences wrt newswire texts high % of prepositions and low % of verbs, adverbs, pronouns a corpus of newspapers a collection of laws enacted by the European Commission, Italian State and Regions English: a corpus of newspapers a collection of laws enacted by the European Commission deep sequences of embedded prepositional complement chains 1,6 1,5 1,4 IT_newspapers IT_regional&national_laws IT_EU_laws 1,6 1,5 1,4 EN_newspapers EN_EU_laws 1,3 1,3 1,2 1,2 1,1 1,1 1 avg depth of embedded complement chains 1 avg depth of embedded complement chains
29 The peculiarity of legal language Legal texts differ significantly with respect Italian: to ordinary language texts typically correlated with syntactic complexity Differences recorded at different annotation levels long sentences wrt newswire texts high % of prepositions and low % of verbs, adverbs, pronouns a corpus of newspapers a collection of laws enacted by the European Commission, Italian State and Regions English: a corpus of newspapers a collection of laws enacted by the European Commission deep sequences of embedded prepositional complement chains 1,6 1,5 1,4 IT_newspapers IT_regional&national_laws IT_EU_laws 1,6 1,5 1,4 EN_newspapers EN_EU_laws 1,3 1,3 1,2 1,2 1,1 1,1 1 avg depth of embedded complement chains 1 avg depth of embedded complement chains
30 The peculiarity of legal language Legal texts differ significantly with respect to ordinary language texts typically correlated with syntactic complexity Differences recorded at different annotation levels long sentences wrt newswire texts high % of prepositions and low % of verbs, adverbs, pronouns IT_newspapers 22 long sequences of consecutive prepositional complement 20 IT_EU_laws long dependency links deep syntactic trees 14 - Statistical parsers have a drop in accuracy when analyzing long distance dependencies (McDonald and Nivre, 2007) - Parse tree depth is a well-known feature reflecting sentence complexity length of dependency links length of dependency links EN_newspapers EN_EU_laws parse tree depth IT_regional&national_laws parse tree depth
31 The impact of legal language on NLP tools What is the performance of state-of-the-art NLP tools on legal texts? A key issue for all NLP-based Knowledge Extraction tasks Generally speaking, a dramatic drop of accuracy is reported when syntactic parsers are tested on domains outside of the data from which they are trained or developed on Recently, two initiatives focused on dependency parsing of legal texts which represents a prerequisite for any advanced legal text processing task Domain Adaptation Track at Evalita 2011 Italian SPLeT-2012 Shared Task on Dependency Parsing of Legal Texts Italian and English both aimed at obtaining a clear idea of the current performance of state-of-the-art dependency parsing systems against legal texts investigating techniques for adapting state-of-the-art dependency parsing systems to the legal domain
32 The impact of legal language on NLP tools Results of the Dependency Parsing subtask of the SPLeT-2012 Shared Task on Dependency Parsing of Legal Texts Goal: testing the performance of general parsing systems on legal texts Accuracy results for Italian: Participant System Newspaper test Reg/Nat legal test EU legal test For both Italian and English, lower performance of parsing systems on legal texts wrt newspapers Accuracy results for English: Participant System Newspaper test EU legal test Different performances across different subvarietes of legal language Significant drops on the IT regional and national texts 2 out of the 3 participant systems do not show a significant drop of accuracy when tested on EU legal texts
33 The legal domain: the main challenges The typical knowledge acquisition bottleneck as knowledge is mostly conveyed through text, content access requires understanding the linguistic structure The peculiarity of legal language and its impact on NLP tools Legal syntax is convoluted and unnatural (McCarty, NaLEA 2009) with respect to ordinary language What is the performance of state-of-the-art NLP tools on legal texts? Discriminate between legal and regulated domain knowledge By its very nature, law deals with behaviour in the world: domain independent concepts of law are tainted with concepts referring to the world the legal domain is about
34 Discriminate between legal and regulated domain knowledge «As any legal source legislation, contracts, precedence-law reveals immediately: the majority of concepts in an individual source refers to specific domains of social activities. These domains are called world knowledge.» Breuker & Hoekstra 2004 Domain-specific terms of law are tainted with terms referring to the world the legal domain is about e.g. national provision, fundamental principle & hazardous substance, active ingredient «Therefore it is not surprise that one may find that many legal ontologies are mixtures of epistemological and ontological perspectives.» Breuker & Hoekstra 2004 Discriminating between legal and regulated domain terms and/or concepts is key in constructing a legal semantic resource It is closely related to the reusability and interoperability issue
35 Discriminate between legal and regulated domain knowledge According to the ontology design criteria, the level of generality in which concepts are organized is a distinctive characteristic Three different kinds of ontologies: top or upper-level ontologies (general concepts) core ontologies (top-level domain-specific concepts, e.g. legal) domain-specific ontologies (which organize world knowledge) Breuker & Hoekstra 2004: LRI-Core layers: foundational and legal core share anchors (high level concepts typical for law)
36 From text to knowledge: the general approach Textual content (implicit knowledge) Dynamic content structuring Linguistic annotation Structured knowledge (explicit knowledge)
37 Legal Knowledge Extraction: focus on Identification, extraction and structuring of domain-relevant knowledge Goal: constructing semantic resources such as domain-specific ontologies or lexicons Semantic annotation of legal texts Goal: content-based access and querying
38 Legal Knowledge Extraction: focus on Identification, extraction and structuring of domain-relevant knowledge Goal: constructing semantic resources such as domain-specific ontologies or lexicons Semantic Focus annotation on the Ontology of legal Learning texts The Goal: construction content-based of Legal access Ontologies and referred querying to as the «missing link» (Valente and Breuker, 2004) between Artificial Intelligence and Law and Legal Theory. Key process since the emergence of the Semantic Web (Van Engers et al., 2008)
39 Ontology Learning The various steps of Ontology Learning from texts can be arranged in a layer cake of increasingly complex subtasks (Buitelaar, Cimiano and Magnini, 2005) x, y (sufferfrom(x, y) ill(x)) cure (dom:doctor, range:disease) Axioms & Rules (Other) Relations is_a (DOCTOR, PERSON) DISEASE:=<Int,Ext,Lex> {disease, illness} disease, illness, hospital Taxonomy (Concept Hierarchies) Concepts Synonyms Terms
40 Ontology Learning First step of each Ontology Learning process: Terminology Extraction «Terms are linguistic realizations of domain-specific concepts and are therefore central to further, more complex tasks» (Buitelaar et al., 2005) x, y (sufferfrom(x, y) ill(x)) cure (dom:doctor, range:disease) Axioms & Rules (Other) Relations is_a (DOCTOR, PERSON) DISEASE:=<Int,Ext,Lex> {disease, illness} disease, illness, hospital Taxonomy (Concept Hierarchies) Concepts Synonyms Terms
41 Ontology Learning: Terminology Extraction Terms may consist of a single wordform so-called simple (or one-word) terms e.g. artist two or more wordforms, called multi-word (or complex) terms e.g. art movement Term extraction process articulated into two fundamental steps: identifying term candidates from text filtering through the candidates to separate terms from non-terms Different statistical measures are used For the extraction of simple terms: frequency occurrence distribution, measures of statistical relevance such as TF/IDF (Term Frequency/Inverse Term Frequency), etc. For the extraction of multi-word terms: association strength measures such as Mutual Information, C-NC Value, Log-likelihood, etc.
42 Ontology Learning The next step is the semantic structuring of the extracted terminology definition of concepts and relations between them x, y (sufferfrom(x, y) ill(x)) cure (dom:doctor, range:disease) Axioms & Rules (Other) Relations is_a (DOCTOR, PERSON) DISEASE:=<Int,Ext,Lex> {disease, illness} disease, illness, hospital Taxonomy (Concept Hierarchies) Concepts Synonyms Terms
43 Ontology Learning: Semantic Structuring The extracted terms are organized into fragments of taxonomical chains simple and multi-word terms are structured in a vertical hierarchy on the basis of their internal linguistic structure (head sharing) riduzione isa isa isa isa isa riduzione dei consumi riduzione dell inquinamento riduzione della produzione riduzione delle emissioni isa isa riduzione dell inquinamento acustico riduzione delle emissioni inquinanti
44 Ontology Learning: to sum up Knowledge extraction in two steps: Term Extraction: detection of single and multi-word terms Semantic Structuring: definition of concepts and relations between them text NLP tools Tokenizer Morphological Analyser POS Tagger Dependency Analyser Ontology Learning Terminology extraction Semantic structuring
45 Ontology learning in the legal domain: so far Overview of existing Legal Ontologies: Núria Casellas, Legal Ontology Engineering. Methodologies, Modeling Trends and the Ontology of Professional Judicial knowledge, 2011 Approaches to semi-automatically induce legal domain ontologies from texts focus on definitions in German court decisions from which legal concepts are identified together with relevant terminology and relations Walter and Pinkal (2006) extraction of domain relevant terminology from which domain relevant concepts are derived together with relations linking them Lame (2000, 2005): French Saias and Quaresma (2005): Portuguese Völker et al. (2008): Spanish Lenci at al. (2009): Italian ontology modelling LKIF Core ontology (Hoekstra et al., 2007) LOIS (Peters et al., 2005) OPJK (Casellas, 2008) DALOS (Agnoloni et al., 2009)
46 Ontology Learning: exemplifying Terminology Extraction Focus on the term extractor developed by ItaliaNLP Lab at ILC-CNR (Bonin et al., 2010) It follows a multilayered and contrastive approach to overcome the need to discriminate between legal and world knowledge It singles out legal terms, e.g. law, legislative decree (legal knowledge), from regulated-domain terms, e.g. consumer, hazardous substance (world knowledge) Tested in different case studies Corpus of environmental laws (Bonin et al., 2010) EU Directives (394,088 tokens) Case Law corpus (LIDER-Lab, Scuola Superiore Sant Anna, Pisa) Case law on personal offence (1,206,831 tokens) Case Law corpus (Lazari & Venturi, 2012) Case law on state liability (933,077 tokens)
47 Ontology Learning: exemplifying Terminology Extraction The multi-layered architecture developed by the ItaliaNLP Lab Input text NLP tools Tokenization Morphological analysis & PoS-tagging Lemmatization Extraction of candidate terms Linguistic filters Statistical filters Final list of terms ranked on contrastive score List of candidate single and complex terms ranked on statistical filters score Contrastive ranking Wrt a top list of opendomain terms Wrt a top list of terms from a different regulated domain
48 Ontology Learning: exemplifying Terminology Extraction Linguistic annotation until the Part-Of-Speech and Lemmatization levels E.g. Il piano nazionale di riduzione delle emissioni in nessun caso può esonerare un impianto dal rispetto della pertinente normativa comunitaria, compresa la direttiva 96/61/CE (The national emission reduction plan may under no circumstances exempt a plant from the provisions laid down in relevant Community legislation, including inter alia Directive 96/61/EC) Forma Lemma CPoSTag PosTag Tratti morfologici Il il R RD num=s gen=m piano piano S S num=s gen=m nazionale nazionale A A num=s gen=n di di E E _ riduzione riduzione S S num=s gen=f delle di E EA num=p gen=f emissioni emissione S S num=p gen=f in in E E _ nessun nessun D DI num=s gen=m caso caso S S num=s gen=m può potere V VM num=s per=3 mod=i ten=p esonerare esonerare V V mod=f Forma Lemma CPoSTag PosTag Tratti morfologici un un R RI num=s gen=m impianto impianto S S num=s gen=m dal da E EA num=s gen=m rispetto rispetto S S num=s gen=m della di E EA num=s gen=f pertinente pertinente A A num=s gen=n normativa normativa S S num=s gen=f comunitaria comunitario A A num=s gen=f,, F FF _ compresa comprendere V V num=s mod=p gen=f la il R RD num=s gen=f direttiva direttiva S S num=s gen=f 96/61/CE. 96/61/CE. S SP _
49 Ontology Learning: exemplifying Terminology Extraction The multi-layered architecture developed by the ItaliaNLP Lab Input text NLP tools Tokenization Morphological analysis & PoS-tagging Lemmatization Extraction of candidate terms Linguistic filters Statistical filters Final list of terms ranked on contrastive score List of candidate single and complex terms ranked on statistical filters score Contrastive ranking Wrt a top list of opendomain terms Wrt a top list of terms from a different regulated domain
50 Ontology Learning: exemplifying Terminology Extraction Single terms Linguistic filters: nouns, e.g. impianto (plant), direttiva (directive) Statistical filters: frequency of occurrence in the input text Corpus of European directives in the environmental domain (Bonin et al., 2010) impianto 1, amministratore 1, emissione 1, gas 1, sostanza 1, energia 1, serra 1, produzione 1, deposito 1, tabella 1, riduzione 1, stoccaggio 1, veicolo 1, quota 1, protocollo 1, fonte 1, costruttore 1, elettricità 1, inquinamento 1, autovettura 1, aria 1, strategia 1, unità 1, carbonio 1, quantità 1, acqua 1, gestore 1, misurazione 1, conte 1, trasporto 1,
51 Ontology Learning: exemplifying Terminology Extraction Single terms Linguistic filters: nouns, e.g. impianto (plant), direttiva (directive) Statistical filters: frequency of occurrence in the input text Multi-word terms Linguistic filters: noun+preposition+noun, e.g. riduzione di emissione (emission reduction); noun+adjective (S+A), e.g. piano nazionale (national plan), normativa comunitaria (Community legislation) Statistical filters: C-NC Value (Frantzi & Ananiadou 1999), assessing the likelihood for a term of being a well-formed and relevant multi-word term Corpus of European directives in the environmental domain (Bonin et al., 2010) impianto 1, amministratore 1, emissione 1, gas 1, sostanza 1, energia 1, serra 1, produzione 1, deposito 1, tabella 1, riduzione 1, stoccaggio 1, veicolo 1, quota 1, protocollo 1, fonte 1, costruttore 1, elettricità 1, inquinamento 1, autovettura 1, aria 1, strategia 1, unità 1, carbonio 1, quantità 1, acqua 1, gestore 1, misurazione 1, conte 1, trasporto 1, gas a effetto serra 505, norma di articolo 481, emissione di gas a effetto serra 428, amministratore di registro 421, gas a effetto 395, effetto serra 326, riduzione di emissione 322, emissione di gas 305, parlamento europeo 282, energia da fonte rinnovabile 265, piano nazionale di assegnazione 220, autorità competente 216, energia da fonte 211, conto di deposito 200, cambiamento climatico 195, paese in via di sviluppo 190, quota di emissione 184, fonte energetico rinnovabile 169, fonte rinnovabile 168, qualità di aria 163, tabella relativo al piano nazionale 135, procedura di regolamentazione con controllo 132, emissione specifico 129, amministratore centrale 121, fonte energetico 117, sistema comunitario 116, piano nazionale 112, parte di presente protocollo 112, sito di stoccaggio 112, presente protocollo 108,
52 Ontology Learning: exemplifying Terminology Extraction Input text NLP tools Extraction of candidate terms Linguistic filters Statistical filters Ranking of statistical filters autorità competente riferimento al presente direttivo destinatario di presente direttivo valore limite di emissione destinatario di presente decisione limite di emissione sostanza pericoloso giorno successivo anno precedente danno ambientale Output of the statistical filters: Open domain terms, legal domain terms, domain-specific terms (belonging to the environmental domain) are mixed
53 Ontology Learning: exemplifying Terminology Extraction The multi-layered architecture developed by the ItaliaNLP Lab Input text NLP tools Tokenization Morphological analysis & PoS-tagging Lemmatization Extraction of candidate terms Linguistic filters Statistical filters Final list of terms ranked on contrastive score List of candidate single and complex terms ranked on statistical filters score Contrastive ranking Wrt a top list of opendomain terms Wrt a top list of terms from a different regulated domain
54 Ontology Learning: exemplifying Terminology Extraction Input Output text of the 1st contrastive Extraction phase: of Open domain terms candidate are pruned, terms but legal NLP domain tools terms, domain-specific Linguistic Statistical terms (belonging to filters the environmental filters domain) are still mixed Ranking of statistical filters autorità competente riferimento al presente direttivo destinatario di presente direttivo valore limite di emissione destinatario di presente decisione limite di emissione sostanza pericoloso giorno successivo anno precedente danno ambientale Contrastive ranking 1st contrastive phase valore limite destinatario di presente limite di emissione valore limite di emissione sostanza pericoloso aria ambiente riferimento al presente direttivo autorità competente destinatario di presente direttivo Contrast against a top list of terms from a general language corpus (newspaper)
55 Ontology Learning: exemplifying Terminology Extraction Input text Output of the 2nd contrastive Extraction phase: of candidate terms legal domain terms are singled out by NLP tools domain-specific Linguistic terms (belonging Statistical to the environmental filters domain) filters Ranking of statistical filters autorità competente riferimento al presente direttivo destinatario di presente direttivo valore limite di emissione destinatario di presente decisione limite di emissione sostanza pericoloso giorno successivo anno precedente danno ambientale Final term list (2nd contrastive phase) sostanza pericoloso salute umano sviluppo sostenibile principio attivo inquinamento atmosferico norma nazionale testo di disposizione testo di disposizione essenziale disposizione nazionale funzionamento di mercato interno Contrastive ranking 1st contrastive phase valore limite destinatario di presente limite di emissione valore limite di emissione sostanza pericoloso aria ambiente riferimento al presente direttivo autorità competente destinatario di presente direttivo Contrast against a top list of terms from a corpus of European directives regulating a different domain (consumer protection) Contrast against a top list of terms from a general language corpus (newspaper)
56 Ontology Learning: using extracted terminology to build a legal ontology The DALOS (Drafting Legislation with Ontology based Support) European project (Agnoloni et al., 2009) Aimed at providing law-makers with linguistic and knowledge management tools to be used in the legislative processes, in particular within the phase of legislative drafting enhancing accessibility and alignment of legislation at European level Architecture of the DALOS Knowledge Organization System (DALOS ontology) the Ontological layer, containing the conceptual modelling at a language independent level the Lexical layer, containing multi-lingual terminology conveying the concepts represented at the Ontological layer
57 Ontology Learning: using extracted terminology to build a legal ontology The DALOS (Drafting Legislation with Ontology based Support) project Lexical layer Terms are automatically extracted from a corpus of Consumer Protection laws automatically organized into taxonomical structures linked to their translation equivalent Ontological layer Domain-specific concepts and their relationships manually defined by domain experts
58 Legal Knowledge Extraction: focus on Identification, extraction and structuring of domain-relevant knowledge Goal: constructing semantic resources such as domain-specific ontologies or lexicons Semantic annotation of legal texts Goal: content-based access and querying
59 Semantic annotation of legal texts: towards a virtuous circle Textual content (implicit knowledge) Incremental process of annotation acquisition annotation: knowledge acquired from linguistically annotated texts is projected back onto texts for extra linguistic information to be annotated and further knowledge layers to be extracted Structured knowledge (explicit knowledge) Dynamic content structuring Linguistic annotation
60 Semantic annotation of legal texts: what for Tasks requiring NLP-enabled knowledge extraction Legal Argumentation Mining Legal case elements and factors Extraction Legal Text Summarization Court decision Structuring Legal Metadata Extraction Legal definition Extraction Legal citation Extraction Legal Information Retrieval
61 Semantic annotation of legal texts: example (1) Legal case elements and factors Extraction for Legal Argumentation Mining Adam Wyner (tomorrow morning) NLP tools used to make explicit relevant legal facts and legal roles starting from their linguistic realization in a collection of legal cases E.g. the Appellee, Defendant, Plaintiff, etc. E.g. the Disclosure-in- Negotiation fact (i.e. the fact that the plaintiff disclosed information during negotiation with defendant) The annotation are the building blocks of a language of formal rules
62 Semantic annotation of legal texts: example (2) Legal definition Extraction Walter and Pinkal, 2006: from German court decisions NLP tools are used to identify legal definitions on the basis of the linguistic realization of definiendum and definiens One-family row-houses have insufficient noise insulation if the separating wall is onelayered The linguistic structure is transformed to a semantic representation by a series of heuristic rules Promising step for Ontology Learning purposes
63 Semantic annotation of legal texts: example (3) Legal Metadata Extraction Focus on MELT (Metadata Extraction from Legal Texts) system jointly developed by ILC and ITTIG It combines a set of tools which transform a plain text in XML, detect references and classify provisions (i.e. xmleges tools) a suite of NLP tools for the analysis of Italian texts It aims at supporting the consolidation of legislative texts process (in force law) It provides a formalized representation of textual amendments by a metadata set Repeal, substitution and integration The text modification is performed on the metadata interpretation
64 Semantic annotation of legal texts: example (3)
65 Semantic annotation of legal texts: example (3) Legal Metadata Extraction Focus on MELT (Metadata Extraction from Legal Texts) system jointly developed by ILC and ITTIG An example All articolo 1, comma 1, della legge 8 febbraio 2001, n. 12, la lettera d) è abrogata (In article 1, paragraph 1, of the act 8 February 2001, n. 12, letter d) is repealed)
66 Semantic annotation of legal texts: example (3) Legal Metadata Extraction Focus on MELT (Metadata Extraction from Legal Texts) system jointly developed by ILC and ITTIG An example All REF mod31-rif2#art1-com1, la lettera d) è abrogata (In REF mod31-rif2#art1-com1, letter d) is repealed)
67 Semantic annotation of legal texts: example (3) Legal Metadata Extraction Focus on MELT (Metadata Extraction from Legal Texts) system jointly developed by ILC and ITTIG An example The sentence was linguistically analyzed ay a shallow syntactic level of analysis
68 Semantic annotation of legal texts: example (3) Legal Metadata Extraction Focus on MELT (Metadata Extraction from Legal Texts) system jointly developed by ILC and ITTIG An example The annotation of informative metadata was carried out by a finite-state compiler which uses a specialized grammar covering the amendment types considered on the basis of patterns formalized in terms of regular expressions operating over sequences of chunks
69 Conclusion Natural Language Processing techniques represent a key ingredient for Legal Knowledge Management Knowledge Creation: Legal Ontologies and Lexicons Natural Language Processing tools Texts Structured knowledge Hopefully, thanks to NLP Legal Search Engines will be able to access the content embedded in texts more effectively Knowledge Use: Intelligent content access
70 Conclusion One of the main obstacles to progress in the field of artificial intelligence and law is the natural language barrier L. Thorne McCarty, International Conference on AI and Law (ICAIL-2007) Natural Language Processing combined with Knowledge Extraction techniques can help removing or at least penetrating the natural language barrier in the AI&Law field
71 Credits The NLP tools and techniques have been developed in the framework of the activities of the people of ItaliaNLP Lab at the Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR) Special thanks to Felice Dell Orletta
72 On-line demos Linguistic analysis of Italian and English texts ools&hl=it_it&tmid=tm_source Term extraction from Italian and English texts ools&hl=it_it&tmid=tm_term_extractor
73 References Ontology Learning Buitelaar, P. Cimiano, P., Magnini, B. (Eds.) Ontology Learning from Text: Methods, Evaluation and Applications. Frontiers in Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July Cimiano, Philipp Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. 2006, XXVIII, Springer, Maedche, Alexander Ontology Learning for the Semantic Web. Kluwer Academic Publishers, Ontology Learning in the legal domain (1) Núria Casellas, Legal Ontology Engineering. Methodologies, Modeling Trends and the Ontology of Professional Judicial knowledge. Law, Governance and Technology Series (Vol. 3), Springer-Verlag, 2011 Valente A. and Breuker J., Ontologies: the missing link between Legal Theory and AI & Law. In A. Soeteman (ed.) Legal Knowledge Based Systems JURIX 94, pp , Van Engers T., Boer A., Breuker J., Valente A., Winkels R., Ontologies in the legal domain. In Chen et al. (eds.) Digital Government. E-Government Research, Case studies and Implementation, Springer, pp , Cimiano, P., Völker, J. Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery. In Montoyo et al. (eds.), Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB). pp Springer, Alicante, Spain, June Lame, G. Knowledge acquisition from texts towards an ontology of French law. in Proceedings of the International Conference on Knowledge Engineering and Knowledge Management Managing Knowledge in a World of Networks (EKAW-2000), Juan-les-Pins, Lame, G. Using NLP techniques to identify legal ontology components: concepts and relations. In Benjamins et al. (eds.), Law and the Semantic Web. Legal Ontologies, Methodologies, Legal Information Retrieval, and Applications. Lecture Notes in Computer Science, Volume 3369: , Saias, J. and P. Quaresma A Methodology to Create Legal Ontologies in a Logic Programming Based Web Information Retrieval System. In Benjamins et al. (eds.), Law and the Semantic Web. Legal Ontologies, Methodologies, Legal Information Retrieval, and Applications. Lecture Notes in Computer Science, Volume 3369: , 2005.
74 References Ontology Learning in the legal domain (2) Breuker J and R. Hoekstra, Epistemology and Ontology in Core Ontologies: FOLaw and LRI-Core, two core ontologies for law, in Proceedings of the Workshop on Core Ontologies in Ontology Engineering (EKAW04), Northamptonshire, UK, pp , T. Agnoloni, L. Bacci, E. Francesconi, W. Peters, S. Montemagni, G. Venturi, 2008, A two-level knowledge approach to support multilingual legislative drafting, in Joost Breuker, Pompeu Casanovas, Michel C.A. Klein, Enrico Francesconi (eds.), Law, Ontologies and the Semantic Web - Channelling the Legal Information Flood, Frontiers in Artificial Intelligence and Applications, Springer, Volume 188, 2008, pp Alessandro Lenci, Simonetta Montemagni, Vito Pirrelli, Giulia Venturi, 2008, Ontology learning from Italian legal texts, in Joost Breuker, Pompeu Casanovas, Michel C.A. Klein, Enrico Francesconi (eds.), Law, Ontologies and the Semantic Web - Channelling the Legal Information Flood, Frontiers in Artificial Intelligence and Applications, Springer, Volume 188, 2008, pp Francesconi E., Montemagni S., Peters W., Tiscornia D. (eds.), 2010, SEMANTIC PROCESSING OF LEGAL TEXTS, LNCS 6036, Springer PART II - Legal Text Processing and Construction of Knowledge Resources Automatic Identification of Legal Terms in Czech Law Texts by Karel Pala, Pavel Rychlý, Pavel Šmerk Integrating a Bottom-Up and Top-Down Methodology for Building Semantic Resources for the Multilingual Legal Domain by Enrico Francesconi, Simonetta Montemagni, Wim Peters, Daniela Tiscornia Ontology Based Law Discovery by Alessio Bosca, Luca Dini Multilevel Legal Ontologies by Gianmaria Ajani, Guido Boella, Leonardo Lesmo, Marco Martin, Alessandro Mazzei, Daniele P. Radicioni, Piercarlo Rossi Francesca Bonin, Felice Dell'Orletta, Giulia Venturi, Simonetta Montemagni, 2010, Singling out Legal Knowledge from World Knowledge. An NLP-based approach, In Proceedings of LOAIT 2010, Fiesole, 7 July 2010
75 References Legal Language Processing (computational linguistics literature) Dell Orletta F., Marchi S., Montemagni S., Plank B. e Venturi G. (2012), The SPLeT 2012 Shared Task on Dependency Parsing of Legal Texts. In Proceedings of the 4th Workshop on Semantic Processing of Legal Texts, held in conjunction with LREC 2012, Istanbul, Turkey, 27th May Giuseppe Attardi, Daniele Sartiano and Maria Simi (2012), Active Learning for Domain Adaptation of Dependency Parsing on Legal Texts. In Proceedings of the 4th Workshop on Semantic Processing of Legal Texts Alessandro Mazzei, Cristina Bosco (2012), Simple Parser Combination. In Proceedings of the 4th Workshop on Semantic Processing of Legal Texts Dell Orletta F., Marchi S., Montemagni S., Venturi G., Agnoloni T. e Francesconi E. (2012), Domain Adaptation for Dependency Parsing at Evalita In Working Notes of EVALITA 2011, 24th 25th January 2012, Rome, Italy, ISSN Barbara Plank and Anders Søgaard (2012), Experiments in newswire-to-law adaptation of graphbased dependency parsers. In Working Notes of EVALITA 2011, 24th 25th January 2012, Rome, Italy, ISSN Giuseppe Attardi, Maria Simi, and Andrea Zanelli (2012), Domain Adaptation by Active Learning. In Working Notes of EVALITA 2011, 24th 25th January 2012, Rome, Italy, ISSN Venturi G. (2012), Design and Development of TEMIS: a Syntactically and Semantically Annotated Corpus of Italian Legislative Texts. In Proceedings of the 4th Workshop on Semantic Processing of Legal Texts, held in conjunction with LREC 2012, Istanbul, Turkey, 27th May Venturi G. (2008), Parsing Legal texts. A contrastive study with a View to Knowledge Management Applications. In Proceedings of the Ist Workshop Semantic Processing of Legal Texts, held in conjunction with LREC, Marrakech, Morocco, May 26 1, Venturi G. (2012) Lingua e diritto: una prospettiva linguistico-computazionale, tesi di dottorato dell Università di Torino, ottobre 2012.
76 References Semantic Processing of legal texts Francesconi E., Montemagni S., Peters W., Tiscornia D. (eds.), 2010, SEMANTIC PROCESSING OF LEGAL TEXTS, LNCS 6036, Springer Bartolini R., Lenci A., Montemagni S., Pirrelli V., Soria C., Automatic classification and analysis of provisions in legal texts: a case study. In A.R. Meersman, Z. Tari, A. Corsaro (eds.) On the Move to Meaningful Internet Systems (OTM 2004 Workshops), LNCS, vol. 3292, pp Springer-Verlag, Mazzei A., Radicioni D.P. and Brighi R., NLP-based extraction of modificatory provisions semantics. In Proceedings of the 12th International Conference on Artificial Intelligence and Law (ICAIL 2009), pp , Barcelona, Spain, Spinosa P.L., Giardiello G., Cherubini M., Marchi S., Venturi G. and Montemagni S., NLP-based metadata extraction for legal text consolidation. In Proceedings of the 12th International Conference on Artificial Intelligence and Law (ICAIL 2009), pp , Barcelona, Spain, Walter S. and Pinkal M., Definitions in Court Decisions. Automatic Extraction and Ontology Acquisition. In Proceedings of the 2009 Conference on Law, Ontologies and the Semantic Web, IOS Press, pp , Wyner A., Towards annotating and extracting textual legal case elements. In Proceedings of the IV Workshop on Legal Ontologies and Artificial Intelligence Techniques (LOAIT 2010), pp. 9-18, Fiesole, Italia, Wyner A. and Peters W., Lexical semantics and expert legal knowledge towards the identification of legal case factors. In Proceedings of Legal Knowledge and Information Systems (JURIX) Conference, pp , Liverpool, United Kingdom, 2010a. IOS Press. Wyner A. and van Engers T., From argument in natural language to formalised argumentation: Components, prospects and problems. In Proceedings of the Worskhop on Natural Language Engineering of Legal Argumentation (NaLEA2009), Barcelona, Spain, Zhang P. and Koppaka L., Semantics-base legal citation network. In Proceedings of the 11 International Conference on Artificial Intelligence and Law, pp , 2007.
Curriculum vitae et studiorum. Giulia Venturi
Curriculum vitae et studiorum Personal information Giulia Venturi Name: Date and place of birth: Place of residence: Place of domicile: E-mail: Giulia Venturi 31/01/1982, Moncalieri (TO) via Bonzanigo,
Semantic Processing of Legal Texts (SPLeT-2012) Workshop Programme
Semantic Processing of Legal Texts (SPLeT-2012) Workshop Programme 9:00-11:30 General Session 9:00 09:10 Introduction by Workshop Chairs 9:10 09:40 Giulia Venturi, Design and Development of TEMIS: a Syntactically
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Overview of the EVALITA 2009 PoS Tagging Task
Overview of the EVALITA 2009 PoS Tagging Task G. Attardi, M. Simi Dipartimento di Informatica, Università di Pisa + team of project SemaWiki Outline Introduction to the PoS Tagging Task Task definition
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian
Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian Cristina Bosco, Alessandro Mazzei, and Vincenzo Lombardo Dipartimento di Informatica, Università di Torino, Corso Svizzera
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872
Integration of a Multilingual Keyword Extractor in a Document Management System
Integration of a Multilingual Keyword Extractor in a Document Management System Andrea Agili *, Marco Fabbri *, Alessandro Panunzi +, Manuel Zini * * DrWolf s.r.l., + Dipartimento di Italianistica - Università
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
The Neurona Ontology: A Data Protection Compliance Ontology
The Neurona Ontology: A Data Protection Compliance Ontology Sergi Torralba 1, Núria Casellas 1, Juan-Emilio Nieto 1, Albert Meroño 1 Antoni Roig 1, Mario Reyes 2, Pompeu Casanovas 1 1 Institute of Law
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
xmlegeseditor: an OpenSource Visual XML Editor for supporting Legal National Standards
xmlegeseditor: an OpenSource Visual XML Editor for supporting Legal National Standards Tommaso Agnoloni, Enrico Francesconi, Pierluigi Spinosa {agnoloni,francesconi,spinosa}@ittig.cnr.it Institute of Legal
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es
KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Survey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Why are Organizations Interested?
SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty [email protected] +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
Curriculum Vitae et Studiorum
Curriculum Vitae et Studiorum MARIA FEDERICO March 2011 Personal Information Name and Surname: Maria Federico Work address: Dipartimento di Ingegneria dell'informazione, Università di Modena e Reggio Emilia,
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
DATA MODEL FOR STORAGE AND RETRIEVAL OF LEGISLATIVE DOCUMENTS IN DIGITAL LIBRARIES USING LINKED DATA
DATA MODEL FOR STORAGE AND RETRIEVAL OF LEGISLATIVE DOCUMENTS IN DIGITAL LIBRARIES USING LINKED DATA María Hallo 1, Sergio Luján-Mora 2, and Alejandro Mate 3 1 Department of Computer Science, National
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology [email protected] Abstract This paper is
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute
A Service Modeling Approach with Business-Level Reusability and Extensibility
A Service Modeling Approach with Business-Level Reusability and Extensibility Jianwu Wang 1,2, Jian Yu 1, Yanbo Han 1 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing,
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
Schema documentation for types1.2.xsd
Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................
Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database
I.J. Intelligent Systems and Applications, 2013, 12, 81-90 Published Online November 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.12.07 Semantic Analysis of Natural Language Queries
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
How To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
PiQASso: Pisa Question Answering System
PiQASso: Pisa Question Answering System Giuseppe Attardi, Antonio Cisternino, Francesco Formica, Maria Simi, Alessandro Tommasi Dipartimento di Informatica, Università di Pisa, Italy {attardi, cisterni,
LINGUISTIC PROCESSING IN THE ATLAS PROJECT
LINGUISTIC PROCESSING IN THE ATLAS PROJECT Leonardo LESMO, Alessandro Mazzei, Daniele Radicioni Interaction Models Group Dipartimento di Informatica e Centro di Scienze Cognitive Università di Torino {lesmo,mazzei,radicion}@di.unito.it
A Framework for Ontology-Based Knowledge Management System
A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: [email protected] Abstract Knowledge
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
Tool Support for Model Checking of Web application designs *
Tool Support for Model Checking of Web application designs * Marco Brambilla 1, Jordi Cabot 2 and Nathalie Moreno 3 1 Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci,
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
An Ontology Reference Model for Normative Acts
An Ontology Reference Model for Normative Acts Pedro Paulo F. Barcelos 1, Renata S. S. Guizzardi 2, Anilton S. Garcia 1 1 Electrical Engineering Department PPGEE 2 Department of Computer Science - PPGI
Text Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Electronic Critical Edition of Ancient Digital Manuscript Sources
Institute for Computational Linguistics Pisa Andrea Bozzi Electronic Critical Edition of Ancient Digital Manuscript Sources Archivi e biblioteche: dalla memoria del passato al web Cagliari November 25-26,
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
ISSN: 2278-5299 365. Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education
International Journal of Latest Research in Science and Technology Vol.1,Issue 4 :Page No.364-368,November-December (2012) http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 EDUCATION BASED
Computer Standards & Interfaces
Computer Standards & Interfaces 35 (2013) 470 481 Contents lists available at SciVerse ScienceDirect Computer Standards & Interfaces journal homepage: www.elsevier.com/locate/csi How to make a natural
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Building Ontology Networks: How to Obtain a Particular Ontology Network Life Cycle?
See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/47901002 Building Ontology Networks: How to Obtain a Particular Ontology Network Life Cycle?
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
How to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Using Ontologies for Geographic Information Intergration Frederico Torres Fonseca
USING ONTOLOGIES FOR GEOGRAPHIC INFORMATION INTEGRATION Frederico Torres Fonseca The Pennsylvania State University, USA Keywords: ontologies, GIS, geographic information integration, interoperability Contents
Overview of the TACITUS Project
Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for
