System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks



Similar documents
Features of AnyShare

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

How to Setup Scan to SMB to a Microsoft Vista Workstation Using a bizhub C451/ C550

SourceAnywhere Service Configurator can be launched from Start -> All Programs -> Dynamsoft SourceAnywhere Server.

P R O V I S I O N I N G O R A C L E H Y P E R I O N F I N A N C I A L M A N A G E M E N T

Advanced Peer to Peer Discovery and Interaction Framework

DiskPulse DISK CHANGE MONITOR

Uploading files to FTP server

Insight Video Net. LLC. CMS 2.0. Quick Installation Guide

PaperStream Connect. Setup Guide. Version Copyright Fujitsu

P2P: centralized directory (Napster s Approach)

Upload files to FTP server

Spector 360 Deployment Guide. Version 7.3 January 3, 2012

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

Managed Antivirus Quick Start Guide

owncloud Architecture Overview

HOW TO GUIDE. Pcounter Scan Server. For Support Click here INTRODUCTION

Easy Setup Guide 1&1 CLOUD SERVER. Creating Backups. for Linux

READYNAS INSTANT STORAGE. Quick Installation Guide

SAM XFile. Trial Installation Guide Linux. Snell OD is in the process of being rebranded SAM XFile

ReadyNAS Setup Manual

Overview. Timeline Cloud Features and Technology

CLEARPASS ONGUARD CONFIGURATION GUIDE

In the same spirit, our QuickBooks 2008 Software Installation Guide has been completely revised as well.

NAS 225 Introduction to FTP Explorer

Gladinet Cloud Backup V3.0 User Guide

AN APPLICATION OF INFORMATION RETRIEVAL IN P2P NETWORKS USING SOCKETS AND METADATA

Litigation Support connector installation and integration guide for Summation

Dropbox for Business. Secure file sharing, collaboration and cloud storage. G-Cloud Service Description

Laptop Backup - Administrator Guide (Windows)

DOCSVAULT Document Management System for everyone

VThis A PP NOTE USING ON THE SPOT MEDIA WITH TRAFFICMANAGER

Napster and Gnutella: a Comparison of two Popular Peer-to-Peer Protocols. Anthony J. Howe Supervisor: Dr. Mantis Cheng University of Victoria

Administration GUIDE. SharePoint Server idataagent. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 201

Yocto Project Eclipse plug-in and Developer Tools Hands-on Lab

StreamServe Persuasion SP4 StreamServe Connect for SAP - Business Processes

ThinPoint Quick Start Guide

USERS MANUAL FOR OWL A DOCUMENT REPOSITORY SYSTEM

CatDV Pro Workgroup Serve r

Administration Guide Novell Filr May 2014

QUANTIFY INSTALLATION GUIDE

Basic SQL Server operations

What Is Ad-Aware Update Server?

Leverage SharePoint with PSI:Capture

SchoolBooking LDAP Integration Guide

User s Guide For Department of Facility Services

Oracle Service Bus Examples and Tutorials

Managing Qualys Scanners

Search and Information Retrieval

Ekran System Help File

How to Set Up a Shared SQL Express Database with ManagePro 7 Standard version

NetIQ Identity Manager Setup Guide

Interoperability of Peer-To-Peer File Sharing Protocols

qliqdirect Active Directory Guide

SCOPE OF SERVICE Hosted Cloud Storage Service: Scope of Service

UNICORN 6.4. Administration and Technical Manual

safend a w a v e s y s t e m s c o m p a n y

Publishing Reports in Tableau

Manage Licenses and Updates

Oracle Enterprise Manager

Assignment # 1 (Cloud Computing Security)

Configuring SonicWALL TSA on Citrix and Terminal Services Servers

Setting Up a CLucene and PostgreSQL Federation

Connection Broker Managing User Connections to Workstations, Blades, VDI, and More. Quick Start with Microsoft Hyper-V

Deployment Guide: Unidesk and Hyper- V

SysPatrol - Server Security Monitor

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Open Directory. Contents. Before You Start 2. Configuring Rumpus 3. Testing Accessible Directory Service Access 4. Specifying Home Folders 4

Net Services: File System Monitor

Web Application Hosting Cloud Architecture

Installation and Setup: Setup Wizard Account Information

Docsoft:AV Appliance. User Guide. Version: 3.0 Copyright 2005,2009 Docsoft, Inc.

IIS, FTP Server and Windows

ReadyNAS Duo Setup Manual

SQL EXPRESS INSTALLATION...

Studio 5.0 User s Guide

TECHNICAL REFERENCE. Version 1.0 August 2013

Safe internet for business use: Getting Started Guide

What's New in SAS Data Management

Backing Up CNG SAFE Version 6.0

Table of Contents Chapter 1 INTRODUCTION TO MAILENABLE SOFTWARE... 3 MailEnable Webmail Introduction MailEnable Requirements and Getting Started

SSO Plugin. J System Solutions. Upgrading SSO Plugin 3x to 4x - BMC AR System & Mid Tier.

Desktop Surveillance Help

Table of Contents. OpenDrive Drive 2. Installation 4 Standard Installation Unattended Installation

Subversion Server for Windows

How do I Install and Configure MS Remote Desktop for the Haas Terminal Server on my Mac?

embeo Getting Started and Samples

Simple, Secure User Guide for OpenDrive Drive Application v for OS-X Platform May 2015

HP Data Protector Integration with Autonomy IDOL Server

Kofax Export Connector for Microsoft SharePoint

Web Conferencing Version 8.3 Troubleshooting Guide

Avira Management Console AMC server configuration for managing online remote computers. HowTo

Windows Peer-to-Peer Network Configuration Guide

Introduction CDR Dicom for Windows is designed as a fully functional DICOM (Digital Imaging Communications of Medicine) based client-server

PageR Enterprise Monitored Objects - AS/400-5

Sync Security and Privacy Brief

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations

IBM WebSphere Application Server Version 7.0

Transcription:

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks OnurSoft Onur Tolga Şehitoğlu November 10, 2012 v1.0

Contents 1 Introduction 3 1.1 Purpose.............................. 3 1.2 Project Scope........................... 3 1.3 Definitions, Acronyms, and Abbreviations.......... 4 1.4 Overview............................. 4 2 Background 5 2.1 Document Management Systems............... 5 2.2 Desktop Search Tools...................... 6 2.3 Peer to Peer Networking.................... 6 3 Overall Description 8 3.1 Product Perspective....................... 8 3.2 Product Functions........................ 8 3.3 User Types, Constraints and Dependencies......... 9 4 Product Features 10 4.1 External Interfaces....................... 10 4.1.1 Authentication..................... 10 4.1.2 File System........................ 10 4.1.3 Network Search Request................ 10 4.1.4 Network Download Request.............. 10 4.1.5 User Interface, Search................. 11 4.1.6 User Interface, Download............... 11 4.1.7 User Interface, Settings................. 11 4.1.8 User Interface, Connections.............. 11 4.1.9 User Interface, Logs................... 11 4.2 Software Functions....................... 12 4.2.1 Search index construction............... 12 4.2.2 Authentication..................... 12 4.2.3 Authorization...................... 13 4.2.4 Distributed search................... 14 4.2.5 Distributed download................. 16 1

4.2.6 Version management.................. 17 4.3 Performance requirements and Design constraints..... 18 4.3.1 Performance....................... 18 4.3.2 Security.......................... 18 4.3.3 Flexibility......................... 18 References 20 2

Chapter 1 Introduction 1.1 Purpose This document describes the requirements of a distributed desktop search and document sharing software called DistShare. It aims to describe the required product features, constraints, dependencies and form a basis for design and development phases of the project. 1.2 Project Scope Document exchange among members of a project is a daily routine activity which is in the core of the project life cycle. DistShare project proposes a solution to store and share documents through peer to peer communication facilitated with distributed content (i.e. keyword, meta-data) search and authentication. DistShare is a tool to share local documents of a desktop user with other members in the local area network. It is mostly a combination of desktop search tools and document management systems. DistShare watches the documents of a local user and constructs a document index as any desktop search tool. It also keeps track of sharing properties of documents, in other words who can access the documents. Then it enables users to execute a distributed content search on all computers in the DistShare network where users can see all documents containing the search words that are accessible to them. After the search, users use DistShare peer to peer download service to receive the documents. If same documents are available on multiple hosts, it is downloaded in parallel. 3

1.3 Definitions, Acronyms, and Abbreviations Document management system: A platform for storing, accessing, sharing documents. Most systems include mechanisms for authentication, version control of documents. Desktop search tool: A software to keep track of documents on local disk and provides means of fast access of documents through name, metadata and content keywords. P2P: Peer to peer network. 1.4 Overview This document describes the software requirements specification for Dist- Share. In the next chapter overall description of the project, main functions, dependencies and constraints are given. In Third chapter specific requirements of the project, system interfaces, functional requirements, performance and design requirements are given. 4

Chapter 2 Background 2.1 Document Management Systems A document management system is a document store providing access to multiple users. They are usually realized as central web based repositories that can be accessed through web browsers. They also have remote file system links so that user will be able to access documents on repository as if they were local documents. A document management system has the following components [10]: document meta-data integration (direct access through file system or other application) indexing searching versioning storage retrieval security distribution workflow capture (printed documents, fax, OCR) collaboration publishing 5

reproduction First eight items above also holds in the purpose of DistShare. The remaining items are mostly related to printed documents and business flow of documents so out of the scope of this project. Content management systems are similar to document management systems that are focused on especially web content and documents. Some existing document management tools are: [1, 3, 2]. 2.2 Desktop Search Tools Desktop search tools provide quick access to documents through previously generated indexes. As number of files on local filesystems increase, searching a specific file name, file attribute or content keyword becomes a common and repeated task. In order to accelerate this operation, desktop search software generates a local index to map search items to document paths. Basic indexing items are file names and file attributes. In addition to these, meta-data of documents like singer information of a music file or author of a text file can be extracted from document content and indexed. Another type of index is reverse index of a document which maps individual words contained in the document to documents as web search engines do. A well known framework for document indexing is Lucene [7]. In order to create indexes, a desktop search tool scans all local documents and generates the database. After creating the initial index, these tools keep the index up to date through periodic rescans or listening on file change events of the system. Beagle [8], Metatracker [9] and Google desktop are example of such tools. DistShare needs to repeat most of the tasks of a desktop search tool for a group of user supplied share directories. In addition, it needs to keep track of the authentication information of the documents. 2.3 Peer to Peer Networking One major drawback of central file sharing is the large volume of data and bandwidth requirements. However on a distributed environment copies of a documents can be distributed over a group of smaller capacity hosts and on demand, it can be downloaded in parallel from multiple hosts with smaller bandwidth. This idea is realized in peer to peer networks where each client is also a server for an object and shares it through peer to peer connections in contrast to all clients containing to a single master location. This allows more resilient operation and redundancy in nature. If some of the nodes or links fail, a peer to peer network can operate on remaining nodes and links. 6

Objects in a peer to peer network is addressed through a Distributed Hash Table which is basically a document digest to document meta-data mapping stored, searched and redistributed among all nodes of the network. DistShare aims to let everyone share documents with everyone else in the network, it is by nature suitable for peer to peer networking. Although authentication/authorization issues needs to be solved. Also a distributed search facility is needed. There are many frameworks and tools for peer to peer networking like [4, 5, 6]. 7

Chapter 3 Overall Description 3.1 Product Perspective DistShare is a tool working on a desktop computer, interacting with local filesystem, user for setup and a peer to peer network for search and download facilities. It may work along with existing desktop search tools, i.e. use their databases, or maintain its own search functionality. Besides that, it will work as a self contained tool. It will have some functions of a document management system like search and retrieval of documents, marking them to be shared etc. It will use one of the existing protocols for maintaining a distributed hash table of documents and distributed search and download facilities. 3.2 Product Functions DistShare consists of the following basics tasks: 1. Search index construction 2. Authorization 3. Authentication 4. Distributed search 5. Distributed download 6. Version management Search index construction is the on demand construction and incremental maintenance of document indexes. 8

Authorization is setting local documents sharing features so that they will be accessible to specific group of authenticated users remotely. Otherwise documents are not going to be displayed searches initiated by other hosts, nor they will be downloadable. Authentication is the mechanism of users to prove their identity to remote hosts. DistShare needs means of making sure that the host making the request is carrying the operation on behalf of the correct users. Distributed search is the tools ability to retrieve search results from other hosts in the P2P network. A search unstated in local host is repeated on other P2P hosts and results collected. Distributed download is executed when user asks download of a searched file. If same exact file is copied across multiple hosts, it will be downloaded in parallel to utilize network bandwidth. Version control is systems ability to maintain multiple revisions of the same file. System keeps track of local changes and try to associate the file with is origin. So that when the file is searched, results with version options could be returned. 3.3 User Types, Constraints and Dependencies Users of the system is desktop users working on a local area network or Internet. All users are in the same class. Operating environment is a desktop computer. Any of Linux, Windows or MacOs can be considered as target platform. Interoperability could be expected as platform independent documents are to be shared. A desktop with graphical interfaces are expected for user friendliness. Major constraint in the system is the security. Ordinary users should not be able to access documents of the desktop user. Another constraint is the lack of central control. As in other P2P applications almost all information should be kept in network nodes, not on a central server. There should be no central control other than an authentication server. There are freely available libraries for document indexing, P2P and authentication. Project depends on such libraries. 9

Chapter 4 Product Features 4.1 External Interfaces 4.1.1 Authentication Each user of software needs to be authenticated in order to access other hosts. Authentication information should be sharable among other members of the P2P network. Authentication can be operating system/workgroup based or network based. Depending on the design choices, multiple authentication mechanisms could be supported. Input is the operating system or a network service. Output is the identification of the user. 4.1.2 File System All files that are marked to be shared should be watched on the filesystem. On manual invocation, periodic invocation or as files change, system needs to update information about the file. Input is the file on a local file system and output is the database keeping search indexes. 4.1.3 Network Search Request System should wait for network requests to answer remote search requests. Input is a group of search sentence and identity of the remote user. Output is the list of matching files. 4.1.4 Network Download Request When a matched document is requested, it is served by the local file holders. Software needs to serve such requests. 10

Input is the file locator and identity of the remote user. Output is the file content probably in fragments. 4.1.5 User Interface, Search Major operation in graphical user interface is the search. Search operation consists of a search sentence, i.e. list of keywords to search or a specific syntax for search. This search is sent to active members of P2P network and local host. Input is the search sentence, output is the list of files that are matched, their details and their locators (locations they can be downloaded). 4.1.6 User Interface, Download User clicks the file to be downloaded and file is downloaded from probably multiple locations. Input is the file locators, output is the file download progress and the status of the download. 4.1.7 User Interface, Settings Authorization and other settings of local files can be configured by the user. The authorization and watch settings per directory/file can be carried out in user interface. Input is the file or directory and settings to be applied. 4.1.8 User Interface, Connections Existing connections, live hosts and users should be displayed. Input is the network and system state, output is the list of hosts, users and ongoing connections. 4.1.9 User Interface, Logs History of recent activity, executed searches, remotely activated searches, downloaded files, files downloaded by remote hosts are listed. Input is the system log database, output is the list of logs. 11

4.2 Software Functions 4.2.1 Search index construction 4.2.1.1 Meta-data extraction For each file, software shall extract meta data and keywords and update the index accordingly. Meta-data include text documents, office documents, image, music, video and other possible meta-data types based on file type. Activated internally. Input is a file path. Based on file type, result is written on search index database. severity: must 4.2.1.2 Index reconstruction On demand, software shall browse all local files and extract meta-data and construct indexes. Activated by user. Input is all files to be watched. 4.2.1.1 is invoked. 4.2.1.3 Index update When a local file under share control is updated, the database should be incrementally updated. Update can be instantaneous if system supports or through periodic rescans for changed files and updates. Activated internally by system on file change. 4.2.1.1 is repeated for file and database updated. 4.2.2 Authentication 4.2.2.1 System authentication In case of system supporting workgroup of network based authentication trust. System authentication can provide identity. Software should take this authentication information from system Activated by user on first execution. Operating system and environment is interacted. Result is success of failure and users identity. 12

4.2.2.2 Third party authentication All members of network can be authenticated through a trusted third party. Single login servers, kerberos authentication are examples. Activated by user on first execution. An external service is interacted. Result is success of failure and users identity. severity demanded 4.2.2.3 Public key authentication Public key exchange through P2P confirmation as in PGP is another mechanism for authentication. Software might support this authentication via user to user pairing. This mechanism might work as in the social networks as connection requests confirmed by two parties. Activated by user. Input are the users identity information and the other user to connect. Other user gets the request. If approves authentication is successful and process repeated automatically without user invocation afterwords. Otherwise fails. severity nice to have 4.2.3 Authorization 4.2.3.1 Individual user permissions User shall give individual users to search and download files. Activated by user. Input is a file and user identity. Authorization database is updated. 4.2.3.2 User grouping User should be able to group users and define permission based on user groups as well. Activated by user. Input is a group of users and group name. Authorization database is updated. Group add, delete, users group member modification cases are handled. severity demanded 13

4.2.3.3 Public and authenticated groups Authorization can be granted as public in case everyone can download without authentication and authenticated where anyone that is authenticated to the system. These groups should be supported. severity demanded 4.2.3.4 User/group exclusion User should be able to define permissions to exclude users and groups in permissions as everyone but selected users can access. severity demanded 4.2.3.5 Directory based permissions User should be able to define directories to have default permissions so that files under that directory recursively will have a default set of permissions. User initiated. Input is a directory and set of permissions. Authorization database updated. All filenames under directory inherits the permissions afterwards. severity demanded 4.2.3.6 Multiple access levels Permission can be given for multiple levels. For example operations like search, download, and create revision can be granted as different levels. severity demanded 4.2.4 Distributed search 4.2.4.1 Keeping track of alive hosts System shall keep track of currently active software that are connected. Depending on the underlying mechanism, hosts in broadcast network, registered hosts, hosts through P2P host exchange should be monitored for being alive. Activated periodically. Repeat host search, collect status results, store. 14

4.2.4.2 Local query execution Local user can invoke a query on local search database and resulting files are listed. Query can be on filename, meta-data or file content. Activated by user. Search query is the input. Search is executed on local database, results are output. 4.2.4.3 Query language support Queries can be guided through a query language. Partial matches, full matches, multiple work matches, meta-data matches can be supported. Input is search sentence. Sentence is parsed, verified. As a result, a query procedure is generated. severity demanded 4.2.4.4 Query executed by a remote user Software shall wait on a network port and carry out the queries sent by remote users. Query results are filtered for the identity of the user. Only permitted results are shown. Activated by network user. Identity of user and search sentence are input. Query executed as in 4.2.4.2 and result is returned on network. 4.2.4.5 Query executed on a remote host Software shall repeat users query on all alive hosts, collect the results and display. Activated by user. Input is search sentence. List of alive hosts are retrieved. For each host query is executed as in 4.2.4.4 and results shown in user interface. 4.2.4.6 Search result detail Based on the search query and type of match the information of each match is displayed. This information contain filename, owner and metadata. 15

Internally activated. Input is the matched file and the query. Output is the detailed information about the file and query type. 4.2.4.7 Result snippets In case of keyword match, the portion of the matched file containing the keywords, or a document summary is to be displayed. Internally activated. Input is the matched file and the query. Output is the snippet text. severity nice to have. 4.2.4.8 Query result caching Results of the last executed queries are cached in case of repetition and for history revisit. Internally activated. Input is the search result. Result is written on cache database. Effects 4.2.4.2,4.2.4.4 severity nice to have. 4.2.5 Distributed download 4.2.5.1 Keeping track of the files to download Search results contain a locator, probably a distributed hash table identity of the file. This identity should be searchable through all alive hosts. Multiple hosts may contain the same file. Results respect to users identity and authorization, if user is not authorized file is not displayed. Each promoted file to DHT should contain necessary information for download access. 4.2.5.2 Handling of file download request Software shall serve the request to download a portion of a file if the request coming from an authorized user. Activated over network. Input are the file locator, identity of the requester and the part of the file to be requested. The requested file part is transferred over P2P. 16

4.2.5.3 Downloading a file from remote hosts A download request initiated by user should be sent to remote hosts containing the file and file should be downloaded. P2P download semantics are followed. Initiated by user. Input is the file selected by user. File request is sent to remote hosts containing the file. If multiple, requests are sent as parts. File download executed in background and success status is displayed to user when available. 4.2.5.4 File integrity control File locator and DHT should contain a file and fragments digest. 1 This digest information should be used to control if fragments downloaded correctly. P2P frameworks have mechanisms for this. File download is only completed when all parts of the file pass integrity check. Internal operation. Input is the meta-data of the file containing fragment checksums, the fragment that is just received. Output is mark of the fragment as complete or not. 4.2.5.5 Download auto share User should set up if the downloaded file inherits the permissions and shared for other users to download Internal operation. Input is the file path that completed download. The file is added on search index of shared files. severity demanded 4.2.6 Version management 4.2.6.1 Version tracking When file is changed on disk, user is asked if a new revision is to be created. Documents link to its original file is kept and revision semantics is followed in all operations of search and download. Initiated by system on file write on shared directory. Input is the new file 1 most P2P applications support fragmentation of large files and keep a summary of integrity information per fragment 17

and old file information. Output is the new revision stored on search index. severity nice to have 4.2.6.2 Automated Versioning User can configure a specific file or directory to get versions with some maximum period automatically. In other words after a predefined time interval, when document is rewritten a version is automatically created without user intervention. severity nice to have 4.2.6.3 All versions download When multiple version exists for a document, all versions can be downloaded with a single click by the user. Modification of 4.2.5.3 with all download option downloading all revisions. severity nice to have 4.3 Performance requirements and Design constraints 4.3.1 Performance In order to provide fair interaction, search queries should be answered in couple of seconds. This implies preconstruction of search indexes. Processing the documents on demand per query is not acceptable. Download performance should be optimized to take advantage of P2P networking. 4.3.2 Security Since most private documents of a user could be revealed, security design is utterly important in the system. Proper authentication and authorization should be provided without annoying user. 4.3.3 Flexibility Types of files supported in search is not static. For example most of the electronic document formats we are using have been introduced in last ten years and new formats are still being introduced as epub and DejaVu. 18

Systems ability to adapt itself as new formats introduced is ultimately important. User should not need to reinstall software in order to introduce a new format to meta-data extraction (4.2.1.1). 19

References [1] OpenDocMan web site,http://www.opendocman.com/. [2] Dropbox web site,https://www.dropbox.com/. [3] Google docs web site,https://docs.google.com/. [4] Gnutella in Wikipedia,http://en.wikipedia.org/wiki/Gnutella. [5] Bittorrent in Wikipedia,http://en.wikipedia.org/wiki/ BitTorrent. [6] Direct Connect in Wikipedia,http://en.wikipedia.org/wiki/ Direct_Connect_(file_sharing). [7] Lucene web site,http://lucene.apache.org/core/. [8] Beagle web site,http://www.beagle-project.org/. [9] Meta Tracker web site,http://projects.gnome.org/tracker/. [10], Wikipedia article on Document Management Systems http:// en.wikipedia.org/wiki/dms/. 20