How To Build A Portuguese Web Search Engine
|
|
|
- Caren Ferguson
- 5 years ago
- Views:
Transcription
1 The Case for a Portuguese Web Search Engine Mário J. Gaspar da Silva FCUL/DI and LASIGE/XLDB [email protected] Community Web Community webs have distint social patterns (Gibson, 1998) Community webs can be identified through the Web Linkage (Kumar, 1999)
2 Portuguese Web There is an identifiable community Web, that we call the Portuguese Web The web of the people directly related to Portugal This is NOT a small community web 10M population PT 3+ M users No identifiable topic Community Identification Flake, Lawrence & Giles: A Community Web is the sub-set of all web pages which are referenced mostly from that community web. Seeds + Expectation-Maximization iterative procedure Max-flow/min-cut algorithm performs the E step Focused crawler performs the M step. National web: some tweaking required Some sites may be too close to the web graph center Many not have incoming links at all More identification features (language,...)
3 Portuguese Web Seeds: All sites registered under the.pt TLD Sites hinted by users and verified M Step: Web pages hosted linked from a.pt site E Step: written in Portuguese Under.COM.NET.ORG.TV and.tk. Tumba! (Temos um Motor de Busca Alternativo!) Public service Community Web Search Engine Web Archive Research infrastructure See it in action at
4 Motivations for a Portuguese Web Search Engine Sociologic What kind of stuff do the Portuguese people look for on the web? Cultural Information Preservation Linguistic What language does this people use to communicate? What vocabulary? Easy way to build corpora Security & Protection Tumba! Modest effort: 1 Prof., 4-5 graduate students, 4-5 servers for 2 years Still beta! Fault-tolerance will require substantially more hardware (replication) Periodic update willl demand more storage Full-time operators? Encouraging feedback
5 Statistics Up to 20,000 queries/day 3,5 million documents under.pt the deepest crawl! 95% responses under 0.5 sec Tumba! Web Crawlers Repository Indexing Engine Ranking Engine Presentation Engine
6 Tumba! Web Crawlers Repository Indexing Engine Ranking Engine Presentation Engine.PT DNS Authority crawling+archiving Seed URLs User Input Versus (Meta-data Repository) Web ViúvaNegra (Crawling Engine) WebStore (Contents Repository)
7 Versus Basic Idea Combine idea of Versions & Workspaces model for engineering data management with parallel processing techniques Designed for web data warehousing applications in general Web data repository with a time dimension Ability to see a web as it was somewhere in the past Versus - Class Model <<abstract>>source PartitionKey 1 * externid name value A source is a reference to a Web document; A version is a snapshot of a source at a given instant; Layer externaltime 1 * Version content * * VersionProperty name value Facet A layer represents a time unit in the repository; An partitionkey is a property associated to a source and therefore to every version of it, used for partitioning; A versionproperty is a property associated to a certain version; A Facet holds a reference to a content associated with a version.
8 Layers & Resolvers Layer agregates a set of versions created within a crawling period; Resolver a function that defines the layers upon which a client operates; Access to a specific version of a contents is resolved based on the layers (and search order) to be considered. Incremental crawls merged with stored data through this mechanism Facets Alternative views of crawled and archived contents Text view Links view Converted views Serves both search engine and preservation requirements
9 WebStore - Architecture Client VCR API Java Library NFS Volume READ_ONLY Volume READ_ONLY Volume WRITABLE Volume WRITABLE host1 host2 host3 hostn RAID Disks Contents spread by multiple volumes. When storage capacity of a volume is exhausted, it becomes read-only. New volumes added as needed to augment capacity or improve performance WebStore Software Architecture Clients WebStore API WebStore Volume Management (Content Keys, Duplicates Managent, Compression) Network File System, Lustre, IBM SAN File System,...
10 Content Addressing Webstore receives content and returns key Duplicates receive already assigned key About 30% of URLs are duplicates Key codes: volume + location of contents in volume + checksum Preservation Issues No backups! Volumes implemented with low cost hard disks as network appliances. Self-describing volumes accessible to standard protocols Assumes human infrastructure Historical archives also do!
11 Tumba! Web Crawlers Repository Indexing Engine Ranking Engine SIDRA Presentation Engine Query Processing Architecture (indexing phase) Versus (Meta-data Repository) Index DataStructs Generator WebStore (Contents Repository) Page Attributes (Authority) Word Index
12 SIDRA - Word Index Data Structure 2 files Term {docid} <Term,docID> {hit} Hit = position + attrib DocID assigned in Static Rank order SIDRA Index Range Partitioning docids index hits index
13 SIDRA - Ranking Engine Word Word Index Word Index Index Query Server Query Broker Page Attributes Addressing Multi-dimensionality Generalization: page-rank (page importance measure) isn t but one of possible ranking contexts. Query Servers may index data according to other dimensions time Location... Query Brokers perform the results fusion
14 Tumba! Web Crawlers Repository Indexing Engine Ranking Engine Presentation Engine Query processing architecture (run-time phase) Page Attributes Word Index Query Processing & Ranking Engine Presentation Engine Page Attributes WebStore (Contents Repository)
15 Tumba! User Interface Time Navigation See previous versions of any page Navigate on previous date Search at a past time Internet Archive Yes experimental Not supported Tumba! ( Yes, query to versus gives previous versions stored Requires definition of a resolve function for each period. Enabled by appropariate choice of indexes
16 Conclusion Tumba! demonstrates a running search engine for a relatively large (national, 10M ppl) community. Integrates tools specific to this community: Term tools (spell checking, pronunciation, dictionary,...) Integration with deep webs Supports archiving for preservation Scalable software architecture that operates on low-cost hardware Testbed for research activities in Compuational Processing of the Portuguese. Q?
17 XMLBASE João Campos, Mário J. Silva, Versus: A Model for a Web Repository, CRC'01-4ª Conferência de Redes de Computadores, Covilhã, Novembro de Daniel Gomes, Mário J. Silva, Tarântula - Sistema de Recolha de Documentos da Web, CRC'01-4ª Conferência de Redes de Computadores, Covilhã, Novembro de Daniel Gomes, João Campos, Mário J. Silva, Versus: A Web Repository, WDAS-2002: Workshop on Distributed Data & Structures, Paris, References Tumba! Bruno Martins, Mário J. Silva, Is it Portuguese? Language detection in large document collections, CRC'01-4ª Conferência de Redes de Computadores, Covilhã, Novembro de Miguel Costa, Mário J. Silva, Ranking no Motor de Busca TUMBA, CRC'01-4ª Conferência de Redes de Computadores, Covilhã, Novembro de Mário J. Silva, Tumba!: Relatório de Actividades de 2003 e Plano para 2003, Relatório Técnico TUMBA TR-02-1, Grupo XLDB da Faculdade de Ciências da Universidade de Lisboa, Dezembro de Mário J. Silva, The Case for a Portuguese Web Search Engine, Relatório Técnico DI/FCUL TR-03-3, Departamento de Informática da Faculdade de Ciências da Universidade de Lisboa, Março de Rachel Aires, Sandra Aluísio, Paulo Quaresma, Diana Santos, Mário J. Silva. An initial proposal for cooperative evaluation on information retrieval in Portuguese. Propor' VI Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada, Faro, Junho de (accepted) José Borbinha, Nuno Freire, Mário J. Silva, Bruno Martins. Internet Search Engines and OPACs: Getting the best of two worlds. ElPub ICCC/IFIP 7th International Conference on Electronic Publishing (accepted). Mário J. Silva, Tumba! A Web Search and Archive Combination. Mário J. Silva, The Case for a Portuguese Web Search Engine, Language Identification Problem: identify pages in domains other than.pt that are written in Portuguese (during crawl) Approach: use categorization technique based in n-gram analysis (Dunning 1994, Adams 1997). Open problem: how to identify BRASILIAN Portuguese pages?
18 Crawling Policy Search engine requirements Home page and all important pages Up to X pages, Anti-Spam protection Frequent updating Archival system requirements Anything that is worth preserving Images We can support both within our community, with incremental crawling and indexing Multiple indexes and index selection. Matching & Ranking Algorithm Phase 1: Query Matching QueryServers fetch matching docids (pre-sorted in static ranking order) QueryBrokers merge results using distributed merge-sort algorithm (preserves ranking order) Phase 2: Ranking Pick N (1000) first results from phase 1 Compute final rank using hits data Are terms also in title? What is the distance among query terms in the page? Terms in Bold, Italic?
19 Scalability Analysis Word Word Index Word Index Index Query Server Query Broker Presentation Engine User requests may be balanced among multiple Presentation Engines Contents may be replicated Requests may be balanced among multiple Query Brokers Page Attributes may be replicated Page Attributes VCR (Contents Repository) Query Brokers may balance requests to multiple Query Servers Multiple Query servers for a Word Index Word indexes may be replicated
The XLDB Group at CLEF 2004
The XLDB Group at CLEF 2004 Nuno Cardoso, Mário J. Silva, and Miguel Costa Grupo XLDB - Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa {ncardoso, mjs, mcosta} at xldb.di.fc.ul.pt
Design and selection criteria for a national web archive
Design and selection criteria for a national web archive Daniel Gomes Sérgio Freitas Mário J. Silva University of Lisbon Daniel Gomes http://xldb.fc.ul.pt/daniel/ 1 The digital era has begun The web is
DataStorm: Large-Scale Data Management in Cloud Environments
DataStorm: Large-Scale Data Management in Cloud Environments INESC-ID Data Management & Information Retrieval Group 1st DataStorm Workshop DataStorm W01: Outline Task H1 1 Task H1: Data Acquisition and
A survey of web archive search architectures
A survey of web archive search architectures Miguel Costa, Daniel Gomes (Portuguese Web Archive@FCCN) Francisco Couto, Mário J. Silva (University of Lisbon) The Internet Archive was founded in 1996 Web-archived
The University of Lisbon at CLEF 2006 Ad-Hoc Task
The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports
Managing duplicates in a web archive
Managing duplicates in a web archive Daniel Gomes Universidade de Lisboa 1749-016 Lisboa, Portugal [email protected] André L. Santos Universidade de Lisboa 1749-016 Lisboa, Portugal [email protected] Mário
CiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
Chapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
Application Note 116: Gauntlet System High Availability Using Replication
Customer Service: 425-487-1515 Technical Support: 425-951-3390 Fax: 425-487-2288 Email: [email protected] [email protected] Website: www.teltone.com Application Note 116: Gauntlet System High Availability
Backup and Recovery 1
Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup
The Viuva Negra crawler
The Viuva Negra crawler Daniel Gomes Mário J. Silva DI FCUL TR 2006 06-21 November 2006 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal
IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE
White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores
Scalable Internet Services and Load Balancing
Scalable Services and Load Balancing Kai Shen Services brings ubiquitous connection based applications/services accessible to online users through Applications can be designed and launched quickly and
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 [email protected] San Diego Supercomputer Center
Analysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Object Storage: Out of the Shadows and into the Spotlight
Technology Insight Paper Object Storage: Out of the Shadows and into the Spotlight By John Webster December 12, 2012 Enabling you to make the best technology decisions Object Storage: Out of the Shadows
Europass Curriculum Vitae
Europass Curriculum Vitae Personal information Surname(s) / First name(s) Address(es) Custódio, Jorge Filipe Telephone(s) +351 919687707 Email(s) Personal website(s) Nationality(-ies) Rua Francisco Pereira
A comprehensive guide to XML Sitemaps:
s emperpl ugi ns. com A comprehensive guide to XML Sitemaps: What are they? Why do I need one? And how do I create one? A little background and history A sitemap is a way of collecting and displaying the
NexentaConnect for VMware Virtual SAN
NexentaConnect for VMware Virtual SAN User Guide 1.0.2 FP3 Date: April, 2016 Subject: NexentaConnect for VMware Virtual SAN User Guide Software: NexentaConnect for VMware Virtual SAN Software Version:
www.coveo.com Unifying Search for the Desktop, the Enterprise and the Web
wwwcoveocom Unifying Search for the Desktop, the Enterprise and the Web wwwcoveocom Why you need Coveo Enterprise Search Quickly find documents scattered across your enterprise network Coveo is actually
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
REACTION Workshop 2013.07.31 Overview Porto, FEUP. Mário J. Silva IST/INESC-ID, Portugal REACTION
Workshop 2013.07.31 Overview Porto, FEUP Mário J. Silva IST/INESC-ID, Portugal Agenda 11:30 Welcome + Quick progress report and status summary 11:45 Task leaders summarize ongoing activities (10 min each
Long term retention and archiving the challenges and the solution
Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process
SharePoint Server 2010 Capacity Management: Software Boundaries and Limits
SharePoint Server 2010 Capacity Management: Software Boundaries and s This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,
OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA
OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,
EMC BACKUP MEETS BIG DATA
EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for
Introduction to Gluster. Versions 3.0.x
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features
Solaris For The Modern Data Center Taking Advantage of Solaris 11 Features JANUARY 2013 Contents Introduction... 2 Patching and Maintenance... 2 IPS Packages... 2 Boot Environments... 2 Fast Reboot...
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
Wikimedia architecture. Mark Bergsma <[email protected]> Wikimedia Foundation Inc.
Mark Bergsma Wikimedia Foundation Inc. Overview Intro Global architecture Content Delivery Network (CDN) Application servers Persistent storage Focus on architecture, not so much on
FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.
FIFTH EDITION Oracle Essentials Rick Greenwald, Robert Stackowiak, and Jonathan Stern O'REILLY" Beijing Cambridge Farnham Koln Sebastopol Tokyo _ Table of Contents Preface xiii 1. Introducing Oracle 1
How To Virtualize A Storage Area Network (San) With Virtualization
A New Method of SAN Storage Virtualization Table of Contents 1 - ABSTRACT 2 - THE NEED FOR STORAGE VIRTUALIZATION 3 - EXISTING STORAGE VIRTUALIZATION METHODS 4 - A NEW METHOD OF VIRTUALIZATION: Storage
Web Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
Collecting and Providing Access to Large Scale Archived Web Data. Helen Hockx-Yu Head of Web Archiving, British Library
Collecting and Providing Access to Large Scale Archived Web Data Helen Hockx-Yu Head of Web Archiving, British Library Web Archives key characteristics Snapshots of web resources, taken at given point
EVILSEED: A Guided Approach to Finding Malicious Web Pages
+ EVILSEED: A Guided Approach to Finding Malicious Web Pages Presented by: Alaa Hassan Supervised by: Dr. Tom Chothia + Outline Introduction Introducing EVILSEED. EVILSEED Architecture. Effectiveness of
Bringing Big Data into the Enterprise
Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?
New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...
Informatica Corporation B2B Data Exchange Version 9.5.0 Release Notes June 2012 Copyright (c) 2006-2012 Informatica Corporation. All rights reserved. Contents New Features... 1 Installation... 3 Upgrade
REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE STORAGE NICK JARVIS, DIRECTOR, FILE, CONTENT AND CLOUD SOLUTIONS VERTICALS AMERICAS
REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE STORAGE NICK JARVIS, DIRECTOR, FILE, CONTENT AND CLOUD SOLUTIONS VERTICALS AMERICAS WEBTECH EDUCATIONAL SERIES REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE
Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems
Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems 1 Some Numbers (2010) Over 260 Billion images (20 PB) 65 Billion X 4 different sizes for each image. 1 Billion
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
LOAD BALANCING IN WEB SERVER
LOAD BALANCING IN WEB SERVER Renu Tyagi 1, Shaily Chaudhary 2, Sweta Payala 3 UG, 1,2,3 Department of Information & Technology, Raj Kumar Goel Institute of Technology for Women, Gautam Buddh Technical
Diagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
CIGRE 2014: Udaljena zaštita podataka
CIGRE 2014: Udaljena zaštita podataka Žarko Stupar Product Manager [email protected] "" 1 Agenda Udaljena zaštita podataka - pristup Replikacija podataka između data centara Napredna backup rešenja Replikacija
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-
Oracle Database 11g: New Features for Administrators DBA Release 2
Oracle Database 11g: New Features for Administrators DBA Release 2 Duration: 5 Days What you will learn This Oracle Database 11g: New Features for Administrators DBA Release 2 training explores new change
Scalable Internet Services and Load Balancing
Scalable Services and Load Balancing Kai Shen Services brings ubiquitous connection based applications/services accessible to online users through Applications can be designed and launched quickly and
Data Warehouses in the Path from Databases to Archives
Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction
Search Engines. Stephen Shaw <[email protected]> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
Apache Lucene. Searching the Web and Everything Else. Daniel Naber Mindquarry GmbH ID 380
Apache Lucene Searching the Web and Everything Else Daniel Naber Mindquarry GmbH ID 380 AGENDA 2 > What's a search engine > Lucene Java Features Code example > Solr Features Integration > Nutch Features
A Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
Survey of Filesystems for Embedded Linux. Presented by Gene Sally CELF
Survey of Filesystems for Embedded Linux Presented by Gene Sally CELF Presentation Filesystems In Summary What is a filesystem Kernel and User space filesystems Picking a root filesystem Filesystem Round-up
2. Metadata Modeling Best Practices with Cognos Framework Manager
IBM Cognos 10.1 DWH Basics 1 Cognos System Administration 2 Metadata Modeling Best Practices With Cognos Framework Manager 3 OLAP Modeling With Cognos Transformer (Power Play Tranformer) 4 Multidimensional
From SDN to SDC. Requirements for the Next Generation Cloud. Lisboa, Junho 2014
From SDN to SDC Requirements for the Next Generation Cloud Lisboa, Junho 2014 Este documento é propriedade intelectual da PT e fica proibida a sua utilização ou propagação sem expressa autorização escrita.
Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014
Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
San Jose State University
San Jose State University Fall 2011 CMPE 272: Enterprise Software Overview Project: Date: 5/9/2011 Under guidance of Professor, Rakesh Ranjan Submitted by, Team Titans Jaydeep Patel (007521007) Zankhana
Data Backup and Archiving with Enterprise Storage Systems
Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia [email protected],
Frontera: open source, large scale web crawling framework. Alexander Sibiryakov, October 1, 2015 [email protected]
Frontera: open source, large scale web crawling framework Alexander Sibiryakov, October 1, 2015 [email protected] Sziasztok résztvevők! Born in Yekaterinburg, RU 5 years at Yandex, search quality
DATA AND LOG FILES FOR CENTRAL MANAGEMENT STORE
During the planning and deployment of Microsoft 2012 or Microsoft 2008 R2 SP1 for your Front End pool, an important consideration is the placement of data and log files onto physical hard disks for performance.
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Trends in Enterprise Backup Deduplication
Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data
Module 14: Scalability and High Availability
Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High
Scholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley
Cloud-integrated Enterprise Storage Cloud-integrated Storage What & Why Marc Farley Table of Contents Overview... 3 CiS architecture... 3 Enterprise-class storage platform... 4 Enterprise tier 2 SAN storage...
Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks
Oracle Business Intelligence Enterprise Edition (OBIEE) Training: Working with Oracle Business Intelligence Answers Introduction to Oracle BI Answers Working with requests in Oracle BI Answers Using advanced
IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications
IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents
CURRICULUM VITAE FERNANDO LUÍS TODO-BOM FERREIRA DA COSTA
CURRICULUM VITAE FERNANDO LUÍS TODO-BOM FERREIRA DA COSTA Full Name: Fernando Luís Todo-Bom Ferreira da Costa Living Address: R. Tomás da Fonseca 36, 7-B, 1600-275 Lisboa Cell Phone: 91 4426281 E-mail
The software platform for storing, preserving and sharing very large data sets. www.active-circle.com
The software platform for storing, preserving and sharing very large data sets www.active-circle.com The easiest solution for storing and archiving very large data sets! ACTIVE CIRCLE HIGHLIGHTS Software-based
[email protected] IST/INESC-ID. http://fenix.tecnico.ulisboa.pt/homepage/ist14264 R. Alves Redol 9 Sala 132 1000-029 Lisboa PORTUGAL
Sérgio Miguel Fernandes [email protected] IST/INESC-ID http://fenix.tecnico.ulisboa.pt/homepage/ist14264 R. Alves Redol 9 Sala 132 1000-029 Lisboa PORTUGAL Curriculum Vitae Personal Data
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002
IAF Business Intelligence Solutions Make the Most of Your Business Intelligence White Paper INTRODUCTION In recent years, the amount of data in companies has increased dramatically as enterprise resource
VMware vsphere Data Protection 6.0
VMware vsphere Data Protection 6.0 TECHNICAL OVERVIEW REVISED FEBRUARY 2015 Table of Contents Introduction.... 3 Architectural Overview... 4 Deployment and Configuration.... 5 Backup.... 6 Application
Oracle Warehouse Builder 10g
Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
