Selected Practical Software Development Lessons From A Large Digital Library System. Tibor Šimko <tibor.simko@cern.ch> August 2011 / openlab talk
|
|
- Griffin Lawrence
- 8 years ago
- Views:
Transcription
1 Selected Practical Software Development Lessons From A Large System <tibor.simko@cern.ch> Department of Information CERN August 2011 / openlab talk
2 Outline 1 2 3
3 Outline 1 2 3
4 What is? library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible by computers (1) institutional document repositories (2) world-wide subject-based information systems Example #1: CERN Document Server managing CERN and selected non-cern high-energy physics and related documents since 1993 more than 1,000,000 records articles, books, theses, photos, videos, and more powered by, free digital library software
5 What is? library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible by computers (1) institutional document repositories (2) world-wide subject-based information systems Example #1: CERN Document Server managing CERN and selected non-cern high-energy physics and related documents since 1993 more than 1,000,000 records articles, books, theses, photos, videos, and more powered by, free digital library software
6 CDS: Collection Tree
7 CDS: Search for Books
8 CDS: Search for Photos
9 CDS Features: Commenting
10 Features: Reviewing
11 CDS: Create Personal Alert
12 CDS: Add to Personal Basket
13 CDS: Display Personal Basket
14 CDS: Organize and Share Your Baskets
15 CDS: Journals and Bulletins
16 What is digital library? Example #2: INSPIRE world-wide high-energy physics information system run by CERN, DESY, FNAL, SLAC metadata curation since 1960s, technology since 2007 citation analysis, author/affiliation analysis close partnership with arxiv and ADS
17 INSPIRE: full-text search
18 INSPIRE: cite summary
19 INSPIRE: citation history
20 INSPIRE: author pages
21 Outline 1 2 3
22 Key Features navigable collection tree (regular, virtual) powerful search engine Google-like speed for up to 5M records combined metadata, reference and fulltext search flexible metadata (MARC, OA) handling any kind of document (multimedia) customizable input, formatting and linking personalization and collaborative features: alerts, baskets, groups, reviews, comments internationalisation (28 languages) open source, GNU General Public License co-developed by CERN (2002 ), EPFL (2004 ), DESY/FNAL/SLAC (2008 ), CfA (2009 ) installed at 30 institutions world-wide
23 Architecture: Overview Author
24 Architecture: Overview Author Ingestion
25 Architecture: Overview Author Ingestion Database
26 Architecture: Overview Author Sources Ingestion Database
27 Architecture: Overview Author Sources Processing Ingestion Database
28 Architecture: Overview Author Sources Processing Ingestion Database Dissemination User
29 Architecture: Overview Author Sources Ingestion Librarian Processing Database Curation Dissemination User
30 Architecture: Overview Author Sources Ingestion Librarian Processing Database Curation Overview Dissemination User
31 Modules: Ingestion Author
32 Modules: Ingestion Author WebSubmit WebSession, WebAccess
33 Modules: Ingestion Author WebSubmit WebSession, WebAccess metadata full-text document Metadata Full-text
34 Modules: Ingestion Author WebSubmit WebSession, WebAccess metadata BibConvert MARCXML BibUpload BibSched Metadata full-text document Full-text
35 Modules: Ingestion OAI Data Source Author BibHarvest WebSubmit WebSession, WebAccess metadata BibConvert MARCXML BibUpload BibSched Metadata full-text document Full-text
36 Modules: Ingestion OAI Data Source Author Non-OAI Data Source BibHarvest WebSubmit WebSession, WebAccess metadata ElmSubmit BibConvert MARCXML full-text document BibUpload BibSched Metadata Full-text
37 Modules: Ingestion OAI Data Source Author Non-OAI Data Source BibHarvest WebSubmit WebSession, WebAccess metadata ElmSubmit BibConvert MARCXML full-text document BibUpload BibSched Ingestion Metadata Full-text
38 Modules: Processing Metadata Full-text
39 Modules: Processing Metadata RefExtract Full-text BibClassify
40 Modules: Processing Clusters BibIndex Metadata RefExtract Full-text BibClassify
41 Modules: Processing WebColl Clusters BibIndex Metadata RefExtract Full-text BibClassify
42 Modules: Processing WebColl Clusters BibIndex Metadata RefExtract Full-text BibRank BibClassify
43 Modules: Processing WebColl Clusters BibIndex Metadata RefExtract Full-text BibRank BibClassify BibFormat
44 Modules: Processing WebColl Clusters BibIndex Metadata RefExtract Full-text BibRank BibClassify BibFormat Processing
45 Modules: Dissemination Clusters Metadata Full-text
46 Modules: Dissemination Clusters Metadata Full-text WebSearch User
47 Modules: Dissemination Clusters Metadata Full-text WebBasket WebSearch User
48 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch User
49 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert User
50 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest User OAI Harvester
51 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest WebComment User OAI Harvester
52 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest WebComment User OAI Harvester WebMessage
53 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest WebComment User OAI Harvester WebJournal WebMessage
54 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest WebComment User OAI Harvester WebJournal WebMessage BibCirculation
55 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest WebComment User OAI Harvester WebJournal WebMessage BibCirculation WebStat
56 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest WebComment User OAI Harvester WebJournal WebMessage BibCirculation WebStat WebHelp
57 Modules: Dissemination Clusters Metadata Full-text WebTag WebBasket WebSearch WebAlert BibHarvest WebComment User OAI Harvester WebJournal WebMessage BibCirculation WebStat Dissemination WebHelp
58 Modules: Curation Metadata Librarian Full-text
59 Modules: Curation Metadata BibEdit Librarian Full-text
60 Modules: Curation MultiEdit Metadata BibEdit Librarian Full-text
61 Modules: Curation MultiEdit Metadata BibEdit Librarian Full-text BatchUploader
62 Modules: Curation MultiEdit Metadata BibEdit Librarian Full-text BatchUploader BibCheck
63 Modules: Curation MultiEdit Metadata BibEdit Librarian Full-text BatchUploader BibCheck BibCirculation
64 Modules: Curation MultiEdit Metadata BibEdit Librarian BibDocFile Full-text BatchUploader BibCheck BibCirculation
65 Modules: Curation MultiEdit BibClassify Metadata BibEdit Librarian BibDocFile Full-text BatchUploader BibCheck BibCirculation
66 Modules: Curation MultiEdit BibClassify Metadata BibEdit Librarian BibDocFile Full-text BatchUploader RefExtract BibCheck BibCirculation
67 Modules: Curation MultiEdit BibClassify Metadata BibEdit Librarian BibDocFile Full-text BatchUploader RefExtract BibCheck BibCatalog Tasks BibCirculation
68 Modules: Curation MultiEdit BibClassify Metadata BibEdit Librarian BibDocFile Full-text BatchUploader BibKnowledge RefExtract BibCheck BibCatalog Knowledge Bases Tasks BibCirculation
69 Modules: Curation MultiEdit BibExport BibClassify Metadata BibEdit Librarian BibDocFile Full-text BatchUploader BibKnowledge RefExtract BibCheck BibCatalog Knowledge Bases Tasks BibCirculation
70 Modules: Curation BibMatch MultiEdit BibExport BibClassify Metadata BibEdit Librarian BibDocFile Full-text BatchUploader BibKnowledge RefExtract BibCheck BibCatalog Knowledge Bases Tasks BibCirculation
71 Modules: Curation BibMatch MultiEdit BibExport BibClassify Metadata BibEdit Librarian BibDocFile Full-text BibMerge BatchUploader BibKnowledge RefExtract BibCheck BibCatalog Knowledge Bases Tasks BibCirculation
72 Modules: Curation BibMatch MultiEdit BibExport BibClassify Metadata BibEdit Librarian BibDocFile Full-text BibMerge BatchUploader BibKnowledge RefExtract Curation BibCheck BibCatalog Knowledge Bases Tasks BibCirculation
73 Modules: Summary 33 modules codebase 290,000 lines of Python code 12,000 lines of JavaScript code 6,000 lines of XSL code 5,000 lines of autotools code 75 authors since inception 25 authors and contributors in 2010 many short-term students importance of informal coding standards 10 years of development started at CERN, first release in 2002 now co-developed world-wide (EU, US) lego programming... but no silver bullet
74 Outline 1 2 3
75 Why Python? easy to read and understand (good for many temporary developers) suitable for rapid prototyping (good for organic-growth software development model) write code to throw it away
76 Art of Ikebana Japanese art of flower arrangement way of flowers natural shapes, graceful lines minimalism disciplined art form in which nature and humanity are brought together
77 Art of Ikebana Programming Java? new Callable() { public Object call(object x) { return x.times(k) } } Python! lambda x: k * x
78 Art of Ikebana Programming Java? new Callable() { public Object call(object x) { return x.times(k) } } Python! lambda x: k * x
79 Speeding Up Python bytecode interpreted language: what about speed? Cython permits to write C extensions easily combining efficiency of C with high-levelness of Python Example: intbitset.pyx ctypedef unsigned long long int word_t ctypedef struct IntBitSet: int size int allocated word_t trailing_bits int tot word_t *bitset
80 Outline 1 2 3
81 Why Git? good for distributed teams offline development possible pull on demand collaboration model (as opposed to shared push collaboration model) inherent,natural code review process commit early, commit often (to private repositories) rebase and clean (before pushing for public consumption) interplay with SVN
82 Git Branches C 1 master
83 Git Branches C 1 C 2 master
84 Git Branches v1.0.0 C 1 C 2 C 3 master
85 Git Branches v1.0.0 C 1 C 2 C 3 C 4 master
86 Git Branches v1.0.0 M 1 maintenance C 1 C 2 C 3 C 4 master
87 Git Branches v1.0.0 M 1 M 2 maintenance C 1 C 2 C 3 C 4 master
88 Git Branches v1.0.0 M 1 M 2 maintenance C 1 C 2 C 3 C 4 C 5 master
89 Git Branches v1.0.1 v1.0.0 M 1 M 2 M 3 maintenance C 1 C 2 C 3 C 4 C 5 master
90 Git Branches v1.0.1 M 4 v1.0.0 M 1 M 2 M 3 maintenance C 1 C 2 C 3 C 4 C 5 master
91 Git Branches v1.0.1 M 4 v1.0.0 M 1 M 2 M 3 maintenance C 1 C 2 C 3 C 4 C 5 master N 1 next
92 Git Branches v1.0.1 M 4 v1.0.0 M 1 M 2 M 3 maintenance C 1 C 2 C 3 C 4 C 5 C 6 master N 1 next
93 Git Branches v1.0.1 M 4 v1.0.0 M 1 M 2 M 3 maintenance C 1 C 2 C 3 C 4 C 5 C 6 master N 1 N 2 next
94 Git Branches v1.0.1 M 4 v1.0.0 M 1 M 2 M 3 maintenance v1.1.0 C 1 C 2 C 3 C 4 C 5 C 6 C 7 master N 1 N 2 next
95 Git Branches v1.0.2 v1.0.1 M 4 M 5 v1.0.0 M 1 M 2 M 3 maintenance v1.1.0 C 1 C 2 C 3 C 4 C 5 C 6 C 7 master N 1 N 2 next
96 Git Branches v1.0.2 v1.0.1 M 4 M 5 v1.0.0 M 1 M 2 M 3 maintenance v1.1.0 C 1 C 2 C 3 C 4 C 5 C 6 C 7 master N 1 N 2 next maint release maintenance branch master new feature branch next things not yet release-ready
97 Git Development M 1 C 1 N 1 maint master next
98 Git Development B 1 some-bugfix M 1 C 1 N 1 maint master next
99 Git Development B 1 some-bugfix M 1 M 2 maint C 1 C 2 master next N 1 N 2
100 Git Development B 1 B 2 some-bugfix M 1 M 2 maint C 1 C 2 master next N 1 N 2
101 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint C 1 C 2 master next N 1 N 2
102 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 master next N 1 N 2
103 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 master next N 1 N 2 F 1 some-new-feature
104 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 master next N 1 N 2 F 1 F 2 some-new-feature
105 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 C 4 master merge next N 1 N 2 F 1 F 2 some-new-feature
106 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 C 4 master merge merge next N 1 N 2 N 3 F 1 F 2 some-new-feature
107 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 C 4 master merge merge next N 1 N 2 N 3 F 1 F 2 some-new-feature E 1 some-experimental-feature
108 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 C 4 master merge merge next N 1 N 2 N 3 F 1 F 2 some-new-feature E 1 E 2 some-experimental-feature
109 Git Development B 1 B 2 merge some-bugfix M 1 M 2 M 3 maint merge C 1 C 2 C 3 C 4 master merge merge next N 1 N 2 N 3 N 4 F 1 F 2 merge some-new-feature E 1 E 2 some-experimental-feature
110 Git collaboration model
111 Outline 1 2 3
112 Unit testing test-driven development when appropriate e.g. before/while developing strip_accents(), write: Example: search_engine_tests.py class TestStripAccents(unittest.TestCase): """Test for handling of UTF-8 accents.""" def test_strip_accents(self): """search engine - stripping of accented letters""" self.assertequal("memememe", search_engine.strip_accents('mémêmëmè')) self.assertequal("memememe", search_engine.strip_accents('mémêmëmè'))
113 Functional testing functional/acceptance/regression testing testbed site (Atlantis of Institute Fictive Science) e.g. Python mechanize module to emulate browser Example: websearch_regression_tests.py class WebSearchSearchEnginePythonAPITest(unittest.TestCase): "Check typical search engine Python API calls on the demo data." def test_search_engine_python_api_for_failed_query(self): "websearch - search engine Python API for failed query" self.assertequal([], perform_request_search(p='aoeuidhtns')) def test_search_engine_python_api_for_successful_query(self): "websearch - search engine Python API for successful query" self.assertequal([8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 47], perform_request_search(p='ellis'))
114 Web testing sometimes we need to run tests in real browser e.g. pages with heavy JavaScript using Selenium IDE extension for Firefox record and replay browser actions test for text existence or non-existence on pages test for link labels and targets Example: test_search_ellis.html <tr><td>open</td> <td> <td></td> </tr> <tr><td>type</td> <td>p</td> <td>ellis</td> </tr> <tr><td>clickandwait</td> <td>action_search</td> <td></td> </tr> <tr><td>verifytextpresent</td> <td>1. Thermal conductivity of dense quark matter and cooling of stars</td> <td></td> </tr>
115 Outline 1 2 3
116 Designing A Search Engine performance-driven design assumptions: high number of selects, low number of updates fast searching, slow indexation cache everything cacheable search functionality: search for words, phrases, regular expressions search in any field, authors, titles, etc index design: forward indexes: word1 [rec1, rec2,... ] word2 [rec2, rec7,... ] reverse indexes: rec1 [word1, word8,... ] rec2 [word1, word2,... ] Zipf s law on word frequency: few words occur very often (e.g. the) most words are infrequent (even e.g. boson)
117 Search Engine Under Cover
118 Measuring the Performance three important speed factors to consider: speed of finding sets (DB Server) speed of demarshaling sets (DB Web App Server) speed of intersecting sets (Web App Server) Example: speed of various parts (2002, before optimization) action / query: "CERN 2002" "of the this" fetching 0.28 sec 0.34 sec demarshaling 0.78 sec 1.10 sec adding colls 0.37 sec 0.63 sec intersecting 0.64 sec 1.19 sec total search time 2.07 sec 3.22 sec
119 Optimizing Data Structures data structures tested: sorted (lists, Patricia trees) unsorted (hashed sets, binary vectors) fast prototyping: (Python, Lisp in 2002) throw-away coding to test ideas Example: lists vs dicts, 350K sets in 800K universe marshaling lists bytes in 1.33 sec demarshaling lists items in 0.10 sec merging lists items in 0.34 sec intersecting lists items in 0.35 sec marshaling dicts bytes in 0.87 sec demarshaling dicts items in 0.36 sec merging dicts items in 0.09 sec intersecting dicts items in 0.15 sec
120 ... and the winner is: binary vectors found the best compromise! using Numeric Python module (in 2002) typical search time gain: 4.0 sec 0.2 sec (in 2002) typical indexing time loss: 7 hours 4 days (in 2002) mostly spare data modelled via mostly dense data structure? free your mind, think critically further optimization: Numeric module not addressing real bits, only bytes so home-made intbitset C extension in 2007 addressing real bits (factor of 8 already) saving space, saving (indexing) time
121 Outline 1 2 3
122 Splitting Web App Server and DB Server load of CDS Web and DB servers at the split time: split leads to efficient use of OS resources by lone, non-competing Web and DB daemon processes
123 Load-Balanced Setup useful for LHC First Beam Day rush situations with many concurrent visitors Apache mod_proxy_balancer App Worker 1 User Load Balancer App Worker 2 DB Server App Worker 3
124 Measuring Scalability using siege to simulate concurrent users and to measure throughput on a sample of typical URLs Example: inspirebeta.net under gentle siege $ siege -d 1 -c 20 -t 1m -f inspirebeta_urls.txt Transactions: 1329 hits Availability: % Elapsed time: secs Data transferred: MB Response time: 0.41 secs Transaction rate: trans/sec Throughput: 0.62 MB/sec Concurrency: 8.96 Successful transactions: 1329 Failed transactions: 0 Longest transaction: 3.05 Shortest transaction: 0.01
125 selected lessons from building a digital library system 300,000 LOCs from 75 authors over 10 years value of rapid prototyping value of organic-growth software development model value of coding aesthetics and minimalism morale from selected anecdotes? Never Lose A Holy Curiosity (A. Einstein)
Outline. Selected Practical Software Development Lessons From A Large Digital Library System. Tibor Šimko <tibor.simko@cern.ch>
Selected Practical Software Development Lessons From A Large System Department of Information CERN August 2010 / openlab talk Outline 1 2 3 What is? library in which collections are
More informationCERN Document Server
CERN Document Server Document Management System for Grey Literature in Networked Environment Martin Vesely CERN Geneva, Switzerland GL5, December 4-5, 2003 Amsterdam, The Netherlands Overview Searching
More informationComparison of Selected Software Systems for Creation of Digital Libraries. from the Field of Open Source for the Needs of the NRGL STL
Comparison of Selected Software Systems for Creation of Digital Libraries from the Field of Open Source for the Needs of the NRGL STL This document contains detailed characteristics and orientation comparison
More informationInvenio: A Modern Digital Library for Grey Literature
Invenio: A Modern Digital Library for Grey Literature Jérôme Caffaro, CERN Samuele Kaplun, CERN November 25, 2010 Abstract Grey literature has historically played a key role for researchers in the field
More informationWorking for Digital Library Services
CERN Summer Student Report Working for Digital Library Services Author: Ivi Dimopoulou Supervisor: Charalampos Tzovanakis August 21, 2015 Abstract The present report intends to briefly outline the work
More informationPerformance Testing Crash Course
Performance Testing Crash Course Dustin Whittle @dustinwhittle The performance of your application affects your business more than you might think. Top engineering organizations think of performance not
More informationUsing Redis as a Cache Backend in Magento
Using Redis as a Cache Backend in Magento Written by: Alexey Samorukov Aleksandr Lozhechnik Kirill Morozov Table of Contents PROBLEMS WITH THE TWOLEVELS CACHE BACKEND CONFIRMING THE ISSUE SOLVING THE ISSUE
More informationAdvanced Visualizations Tools for CERN Institutional Data
Advanced Visualizations Tools for CERN Institutional Data September 2013 Author: Alberto Rodríguez Peón Supervisor(s): Jiří Kunčar CERN openlab Summer Student Report 2013 Project Specification The aim
More informationCDS Invenio - a software solution for National Repository of Grey Literature
CDS Invenio - a software solution for National Repository of Grey Literature Tomáš Müller National Technical Library, Prague, Czech Republic HUtomas.muller@techlib.czU Third Seminar on Providing Access
More informationMission-Critical Database with Real-Time Search for Big Data
Mission-Critical Database with Real-Time Search for Big Data February 17, 2012 Slide 1 Overview About MarkLogic Why MarkLogic Case Studies Technology and Features Slide 2 About MarkLogic 10 years in business
More informationPERFORMANCE TESTING CRASH COURSE
PERFORMANCE TESTING CRASH COURSE Dustin Whittle dustinwhittle.com @dustinwhittle San Francisco, California, USA Technologist, Traveler, Pilot, Skier, Diver, Sailor, Golfer What I have worked on Developer
More informationThe New ADS Search Interface and API
The New ADS Search Interface and API Alberto Accomazzi - @aaccomazzi for the ADS team - @adsabs 28 September 2013 IVOA Kona Saturday, September 28, 13 The ADS Classic System No frameworks available in
More information1. Comments on reviews a. Need to avoid just summarizing web page asks you for:
1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of
More informationUsing S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier
Using S3 cloud storage with ROOT and CernVMFS Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier INDEX Huawei cloud storages at CERN Old vs. new Huawei UDS comparative
More informationTHE HELMHOLTZ INVENIO REPOSITORY PROJECT :
THE HELMHOLTZ INVENIO REPOSITORY PROJECT : BETWEEN ORGANIZATIONAL TASKS, NEEDS OF SCIENTISTS, AND CONSTRAINTS 1 Structure of the presentation Libraries of DESY, FZJ and GSI in the Helmholtz Association
More informationDistributed Version Control
Distributed Version Control Faisal Tameesh April 3 rd, 2015 Executive Summary Version control is a cornerstone of modern software development. As opposed to the centralized, client-server architecture
More informationCiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
More informationZero-Touch Drupal Deployment
Zero-Touch Drupal Deployment Whitepaper Date 25th October 2011 Document Number MIG5-WP-D-004 Revision 01 1 Table of Contents Preamble The concept Version control Consistency breeds abstraction Automation
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationTuning Tableau Server for High Performance
Tuning Tableau Server for High Performance I wanna go fast PRESENT ED BY Francois Ajenstat Alan Doerhoefer Daniel Meyer Agenda What are the things that can impact performance? Tips and tricks to improve
More informationLecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
More informationInformatica Data Director Performance
Informatica Data Director Performance 2011 Informatica Abstract A variety of performance and stress tests are run on the Informatica Data Director to ensure performance and scalability for a wide variety
More informationSearch-based business intelligence and reverse data engineering with Apache Solr
Search-based business intelligence and reverse data engineering with Apache Solr M a r i o - L e a n d e r R e i m e r C h i e f T e c h n o l o g i s t This talk will o Give a brief overview of the AIR
More informationBibliometrics and Transaction Log Analysis. Bibliometrics Citation Analysis Transaction Log Analysis
and Transaction Log Analysis Bibliometrics Citation Analysis Transaction Log Analysis Definitions: Quantitative study of literatures as reflected in bibliographies Use of quantitative analysis and statistics
More informationVersion Control with Git. Linux Users Group UT Arlington. Rohit Rawat rohitrawat@gmail.com
Version Control with Git Linux Users Group UT Arlington Rohit Rawat rohitrawat@gmail.com Need for Version Control Better than manually storing backups of older versions Easier to keep everyone updated
More informationOpenShift on you own cloud. Troy Dawson OpenShift Engineer, Red Hat tdawson@redhat.com November 1, 2013
OpenShift on you own cloud Troy Dawson OpenShift Engineer, Red Hat tdawson@redhat.com November 1, 2013 2 Infrastructure-as-a-Service Servers in the Cloud You must build and manage everything (OS, App Servers,
More informationElle Décor Lookbook ipad Application
Elle Décor Lookbook ipad Application www.appnovation.com Elle Décor Lookbook ipad Application Contents 1.0 Background P. 3 2.0 Project Overview P. 4 3.0 Goals & Requirements P. 5 4.0 Development P. 8 P.2
More informationNew Features in XE8. Marco Cantù RAD Studio Product Manager
New Features in XE8 Marco Cantù RAD Studio Product Manager Marco Cantù RAD Studio Product Manager Email: marco.cantu@embarcadero.com @marcocantu Book author and Delphi guru blog.marcocantu.com 2 Agenda
More informationWeb Applications Testing
Web Applications Testing Automated testing and verification JP Galeotti, Alessandra Gorla Why are Web applications different Web 1.0: Static content Client and Server side execution Different components
More informationJavaScript Applications for the Enterprise: From Empty Folders to Managed Deployments. George Bochenek Randy Jones
JavaScript Applications for the Enterprise: From Empty Folders to Managed Deployments George Bochenek Randy Jones Enterprise Development What is it? Source Control Project Organization Unit Testing Continuous
More informationCloud Powered Mobile Apps with Azure
Cloud Powered Mobile Apps with Azure Malte Lantin Technical Evanglist Microsoft Azure Agenda Mobile Services Features and Demos Advanced Features Scaling and Pricing 2 What is Mobile Services? Storage
More informationModern CI/CD and Asset Serving
Modern CI/CD and Asset Serving Mike North 10/20/2015 About.me Job.new = CTO Job.old = UI Architect for Yahoo Ads & Data Organizer, Modern Web UI Ember.js, Ember-cli, Ember-data contributor OSS Enthusiast
More informationArchiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 minor@sdsc.edu San Diego Supercomputer Center
More informationPerformance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
More informationCase Study. SaaS Based Multi-Store Market Place. www.brainvire.com 2013 Brainvire Infotech Pvt. Ltd Page 1 of 5
Case Study SaaS Based Multi-Store Market Place Page 1 of 5 Client Requirement Magento Multi-Store Ecommerce Management is a web based virtual mall. It s an e- commerce virtual mall cum SaaS based model
More informationAcronym: Data without Boundaries. Deliverable D12.1 (Database supporting the full metadata model)
Project N : 262608 Acronym: Data without Boundaries Deliverable D12.1 (Database supporting the full metadata model) Work Package 12 (Implementing Improved Resource Discovery for OS Data) Reporting Period:
More informationVersion Control Systems: SVN and GIT. How do VCS support SW development teams?
Version Control Systems: SVN and GIT How do VCS support SW development teams? CS 435/535 The College of William and Mary Agile manifesto We are uncovering better ways of developing software by doing it
More informationThe Learn-Verified Full Stack Web Development Program
The Learn-Verified Full Stack Web Development Program Overview This online program will prepare you for a career in web development by providing you with the baseline skills and experience necessary to
More informationDJANGOCODERS.COM THE PROCESS. Core strength built on healthy process
DJANGOCODERS.COM THE PROCESS This is a guide that outlines our operating procedures and coding processes. These practices help us to create the best possible software products while ensuring a successful
More informationOpen is as Open Does: Lessons from Running a Professional Open Source Company
Open is as Open Does: Lessons from Running a Professional Open Source Company Leon Rozenblit, JD, PhD Founder and CEO at Prometheus Research, LLC email: Leon@PrometheusResearch.com twitter: @leon_rozenblit
More informationA Tool for Evaluation and Optimization of Web Application Performance
A Tool for Evaluation and Optimization of Web Application Performance Tomáš Černý 1 cernyto3@fel.cvut.cz Michael J. Donahoo 2 jeff_donahoo@baylor.edu Abstract: One of the main goals of web application
More informationDISQUS. Building Scalable Web Apps. David Cramer @zeeg
DISQUS Building Scalable Web Apps David Cramer @zeeg Agenda Terminology Common bottlenecks Building a scalable app Architecting your database Utilizing a Queue The importance of an API Performance vs.
More informationPaaS - Platform as a Service Google App Engine
PaaS - Platform as a Service Google App Engine Pelle Jakovits 14 April, 2015, Tartu Outline Introduction to PaaS Google Cloud Google AppEngine DEMO - Creating applications Available Google Services Costs
More informationFunctional Requirements for Digital Asset Management Project version 3.0 11/30/2006
/30/2006 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 = required; 2 = optional; 3 = not required functional requirements Discovery tools available to end-users:
More informationRegression & Load Testing BI EE 11g
Regression & Load Testing BI EE 11g Venkatakrishnan J Who Am I? Venkatakrishnan Janakiraman Over 8+ Years of Oracle BI & EPM experience Managing Director (India), Rittman Mead India Blog at http://www.rittmanmead.com/blog
More informationRevision control systems (RCS) and
Revision control systems (RCS) and Subversion Problem area Software projects with multiple developers need to coordinate and synchronize the source code Approaches to version control Work on same computer
More informationViSION Status Update. Dan Savu Stefan Stancu. D. Savu - CERN openlab
ViSION Status Update Dan Savu Stefan Stancu D. Savu - CERN openlab 1 Overview Introduction Update on Software Defined Networking ViSION Software Stack HP SDN Controller ViSION Core Framework Load Balancer
More informationBeyond Unit Testing. Steve Loughran Julio Guijarro HP Laboratories, Bristol, UK. steve.loughran at hpl.hp.com julio.guijarro at hpl.hp.
Beyond Unit Testing Steve Loughran Julio Guijarro HP Laboratories, Bristol, UK steve.loughran at hpl.hp.com julio.guijarro at hpl.hp.com Julio Guijarro About Us Research scientist at HP Laboratories on
More informationContinuous Integration
Continuous Integration WITH FITNESSE AND SELENIUM By Brian Kitchener briank@ecollege.com Intro Who am I? Overview Continuous Integration The Tools Selenium Overview Fitnesse Overview Data Dependence My
More informationWHITE PAPER. Domo Advanced Architecture
WHITE PAPER Domo Advanced Architecture Overview There are several questions that any architect or technology advisor may ask about a new system during the evaluation process: How will it fit into our organization
More informationAn Integrated Library System on the CERN Document Server
Universidade de Évora Master s Thesis in Computer Science Engineering An Integrated Library System on the CERN Document Server CERN-THESIS-2010-115 30/04/2010 Author Joaquim Jorge Rodrigues Silvestre Supervisors
More informationVersion Control with Git. Kate Hedstrom ARSC, UAF
1 Version Control with Git Kate Hedstrom ARSC, UAF Linus Torvalds 3 Version Control Software System for managing source files For groups of people working on the same code When you need to get back last
More informationLoad and Performance Load Testing. RadView Software October 2015 www.radview.com
Load and Performance Load Testing RadView Software October 2015 www.radview.com Contents Introduction... 3 Key Components and Architecture... 4 Creating Load Tests... 5 Mobile Load Testing... 9 Test Execution...
More informationAnalysing log files. Yue Mao (mxxyue002@uct.ac.za) Supervisor: Dr Hussein Suleman, Kyle Williams, Gina Paihama. University of Cape Town
Analysing log files Yue Mao (mxxyue002@uct.ac.za) Supervisor: Dr Hussein Suleman, Kyle Williams, Gina Paihama University of Cape Town ABSTRACT A digital repository stores a collection of digital objects
More informationRegression & Load Testing BI EE 11g
Regression & Load Testing BI EE 11g Venkatakrishnan J Who Am I? Venkatakrishnan Janakiraman Over 8+ Years of Oracle BI & EPM experience Managing Director (India), Rittman Mead India Blog at http://www.rittmanmead.com/blog
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationScheduling and Load Balancing in the Parallel ROOT Facility (PROOF)
Scheduling and Load Balancing in the Parallel ROOT Facility (PROOF) Gerardo Ganis CERN E-mail: Gerardo.Ganis@cern.ch CERN Institute of Informatics, University of Warsaw E-mail: Jan.Iwaszkiewicz@cern.ch
More informationContents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11
Oracle Primavera Contract Management 14.1 Sizing Guide July 2014 Contents Introduction... 5 Contract Management Database Server... 5 Requirements of the Contract Management Web and Application Servers...
More informationVarious Load Testing Tools
Various Load Testing Tools Animesh Das May 23, 2014 Animesh Das () Various Load Testing Tools May 23, 2014 1 / 39 Outline 3 Open Source Tools 1 Load Testing 2 Tools available for Load Testing 4 Proprietary
More informationUsing GitHub for Rally Apps (Mac Version)
Using GitHub for Rally Apps (Mac Version) SOURCE DOCUMENT (must have a rallydev.com email address to access and edit) Introduction Rally has a working relationship with GitHub to enable customer collaboration
More informationDas HappyFace Meta-Monitoring Framework
Das HappyFace Meta-Monitoring Framework B. Berge, M. Heinrich, G. Quast, A. Scheurer, M. Zvada, DPG Frühjahrstagung Karlsruhe, 28. März 1. April 2011 KIT University of the State of Baden-Wuerttemberg and
More informationCase Study. Web Application for Financial & Economic Data Analysis. www.brainvire.com 2013 Brainvire Infotech Pvt. Ltd Page 1 of 1
Case Study Web Application for Financial & Economic Data Analysis www.brainvire.com 2013 Brainvire Infotech Pvt. Ltd Page 1 of 1 Client Requirement This is a highly customized application for financial
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationThe Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia
The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia Agenda Dataflow Requirements Solutions & Lessons learnt Open solution
More informationBig Data Storage, Management and challenges. Ahmed Ali-Eldin
Big Data Storage, Management and challenges Ahmed Ali-Eldin (Ambitious) Plan What is Big Data? And Why talk about Big Data? How to store Big Data? BigTables (Google) Dynamo (Amazon) How to process Big
More informationHow SourceForge is Using MongoDB
How SourceForge is Using MongoDB Rick Copeland @rick446 rick@geek.net Geeknet, page 1 SF.net BlackOps : FossFor.us User Editable! Web 2.0! (ish) Not Ugly! Geeknet, page 2 Moving to NoSQL FossFor.us used
More informationGitflow process. Adapt Learning: Gitflow process. Document control
Adapt Learning: Gitflow process Document control Abstract: Presents Totara Social s design goals to ensure subsequent design and development meets the needs of end- users. Author: Fabien O Carroll, Sven
More information1 How to Monitor Performance
1 How to Monitor Performance Contents 1.1. Introduction... 1 1.2. Performance - some theory... 1 1.3. Performance - basic rules... 3 1.4. Recognizing some common performance problems... 3 1.5. Monitoring,
More informationEducational Collaborative Develops Big Data Solution with MongoDB
CASE STUDY OVERVIEW Educational Collaborative Develops Big Data Solution with MongoDB INDUSTRIES Education, Nonprofit LOCATION Durham, NC PROJECT LENGTH 1 year, 5 months APPLICATION SUPPORTED Data driven
More informationDynamic Content Acceleration: Lightning-Fast Web Apps with Amazon CloudFront and Amazon Route 53
Dynamic Content Acceleration: Lightning-Fast Web Apps with Amazon CloudFront and Amazon Route 53 Constantin Gonzalez, Solutions Architect Amazon Web Services Germany GmbH 2014 Amazon.com, Inc. and its
More informationContent. Development Tools 2(63)
Development Tools Content Project management and build, Maven Version control, Git Code coverage, JaCoCo Profiling, NetBeans Static Analyzer, NetBeans Continuous integration, Hudson Development Tools 2(63)
More informationSUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS
SUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS @huibschoots & @mieldonkers INTRODUCTION Huib Schoots Tester @huibschoots Miel Donkers Developer @mieldonkers TYPICAL Experience with Continuous Delivery?
More informationThe continuum of data management techniques for explicitly managed systems
The continuum of data management techniques for explicitly managed systems Svetozar Miucin, Craig Mustard Simon Fraser University MCES 2013. Montreal Introduction Explicitly Managed Memory systems lack
More informationThe Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms
ICOODB 2010 - Frankfurt, Deutschland The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms Leon Guzenda - Objectivity, Inc. 1 AGENDA Historical Overview Inherent
More informationPractical Options for Archiving Social Media
Practical Options for Archiving Social Media Content Summary for ALGIM Web-Symposium Presentation 03/05/11 Euan Cochrane, Senior Advisor, Digital Continuity, Archives New Zealand, The Department of Internal
More informationFAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara
CS535 Big Data - Fall 2015 W1.B.1 CS535 Big Data - Fall 2015 W1.B.2 CS535 BIG DATA FAQs Wait list Term project topics PART 0. INTRODUCTION 2. A PARADIGM FOR BIG DATA Sangmi Lee Pallickara Computer Science,
More informationVimeo and the Open Source Community
Vimeo and the Open Source Community Summary Contact 1. Introduction 2. Architecture 3. FOSS 4. Conclusions Vittorio Giovara Video Encoding Engineer vittorio@vimeo.com https://vimeo.com/vittoriog FOSS 2
More informationDigital Asset Management (DAM) Protecting, preserving, retrieving and distributing digital assets
Digital Asset Management (DAM) Protecting, preserving, retrieving and distributing digital assets What is DAM? Digital asset management (DAM) consists of management tasks and decisions surrounding the
More informationEMC E20-120. EMC Content Management Foundation Exam(CMF) http://www.examskey.com/e20-120.html
EMC E20-120 EMC Content Management Foundation Exam(CMF) TYPE: DEMO http://www.examskey.com/e20-120.html Examskey EMC E20-120 exam demo product is here for you to test the quality of the product. This EMC
More informationIntegrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
More informationSharePoint Server 2010 Capacity Management: Software Boundaries and Limits
SharePoint Server 2010 Capacity Management: Software Boundaries and s This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,
More informationHow To Analyze Log Data On A Web Site
WSG Log Analysis Review I've reviewed ten different log analysis packages based on the following criteria: Analysis priorities (must include ) Traffic patterns (daily, hourly, etc) Most visited urls Referrer
More informationSource Control Systems
Source Control Systems SVN, Git, GitHub SoftUni Team Technical Trainers Software University http://softuni.bg Table of Contents 1. Software Configuration Management (SCM) 2. Version Control Systems: Philosophy
More informationSIDN Server Measurements
SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources
More informationMagento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)
Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) 1. Foreword Magento is a PHP/Zend application which intensively uses the CPU. Since version 1.1.6, each new version includes some
More informationVersion Control! Scenarios, Working with Git!
Version Control! Scenarios, Working with Git!! Scenario 1! You finished the assignment at home! VC 2 Scenario 1b! You finished the assignment at home! You get to York to submit and realize you did not
More informationBig Data Visualization with JReport
Big Data Visualization with JReport Dean Yao Director of Marketing Greg Harris Systems Engineer Next Generation BI Visualization JReport is an advanced BI visualization platform: Faster, scalable reports,
More informationZend Server 4.0 Beta 2 Release Announcement What s new in Zend Server 4.0 Beta 2 Updates and Improvements Resolved Issues Installation Issues
Zend Server 4.0 Beta 2 Release Announcement Thank you for your participation in the Zend Server 4.0 beta program. Your involvement will help us ensure we best address your needs and deliver even higher
More informationScalable Web Programming. CS193S - Jan Jannink - 1/12/10
Scalable Web Programming CS193S - Jan Jannink - 1/12/10 Administrative Stuff Computer Forum Career Fair: Wed. 13, 11AM-4PM (Just in case you hadn t seen the tent go up) Any problems with MySQL setup? Review:
More informationFact Sheet In-Memory Analysis
Fact Sheet In-Memory Analysis 1 Copyright Yellowfin International 2010 Contents In Memory Overview...3 Benefits...3 Agile development & rapid delivery...3 Data types supported by the In-Memory Database...4
More informationSuccessful PaaS and CI in the Cloud
Successful PaaS and CI in the Cloud Steven G. Harris steven.g.harris@cloudbees.com @stevengharris AgileALM/EclipseCon 2012 Platform as a Service As-a-Service Examples Today SaaS PaaS "Cloud computing is
More informationClient-aware Cloud Storage
Client-aware Cloud Storage Feng Chen Computer Science & Engineering Louisiana State University Michael Mesnier Circuits & Systems Research Intel Labs Scott Hahn Circuits & Systems Research Intel Labs Cloud
More informationSeattle Innovative Design
About Us is a design communications company based in Seattle, Washington. We specialize in interactive and print design. Our services are web site development, Search Engine Optimization, online store
More informationInformation Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
More informationShoal: IaaS Cloud Cache Publisher
University of Victoria Faculty of Engineering Winter 2013 Work Term Report Shoal: IaaS Cloud Cache Publisher Department of Physics University of Victoria Victoria, BC Mike Chester V00711672 Work Term 3
More informationUsing Toaster in a Production Environment
Using Toaster in a Production Environment Alexandru Damian, David Reyna, Belén Barros Pena Yocto Project Developer Day ELCE 17 Oct 2014 Introduction Agenda: What is Toaster Toaster out of the box Toaster
More informationData sharing in the Big Data era
www.bsc.es Data sharing in the Big Data era Anna Queralt and Toni Cortes Storage System Research Group Introduction What ignited our research Different data models: persistent vs. non persistent New storage
More informationhttp://support.oracle.com/
Oracle Primavera Contract Management 14.0 Sizing Guide October 2012 Legal Notices Oracle Primavera Oracle Primavera Contract Management 14.0 Sizing Guide Copyright 1997, 2012, Oracle and/or its affiliates.
More informationDATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.
More information