Longitudinal Study of Contents and Elements in the Scientific Web environment 1
|
|
- Kory Harrison
- 8 years ago
- Views:
Transcription
1 Longitudinal Study of Contents and Elements in the Scientific Web environment 1 José Luis Ortega, Isidro F. Aguillo and José Antonio Prieto Internet Lab, Centro de Información y Documentación Científica (CSIC). Joaquín Costa, Madrid (Spain). {jortega, isidro}@cindoc.csic.es Abstract. The aim of this work is the longitudinal study of the evolution and the state of 738 web sites in two different points in time (1997 and 2004). It tries to establish the rate of growth and decay of the Web and all the web elements. To this end, the structure and the contents of these web sites are extracted through a crawler and compared at the two different moments in time. The main results confirm a growth of web contents and elements in the web, although there is also a high degree of web content decay. The results suggest that in the seven year period covered by this study the web is characterized by both strong dynamism and instability. Keywords: Webomatrics; Web persistence; Web growth; Web decay; Linkrot 1. Introduction Since the beginning of the World Wide Web, different growth behavior patterns have been studied. Pennock et. al. [1] discovered that the incoming links of a web site grow with time in accordance with a power law. According to Internet Systems Consortium [2] web domains are growing since 1994 with a similar rate. However, the OCLC Web Characterization Project [3], carried out between 1998 and 2000, warns that although the WWW keeps growing, contents contribution rates slowed by 1% in period. Nevertheless, there is a bibliographic gap about web decay or the disappearance of pages in the World Wide Web. Harter and Kim [4] were the first to study the ephemeral nature of the Web, detecting that a third of the electronic citations in e- 1 This paper is a pre-print of: Ortega, J. L.; Aguillo, I. F., and Prieto, J. A. (2006) Longitudinal Study of Contents and Elements in the Scientific Web environment, Journal of Information Science, 32(4):
2 journals were not available. Lawrence et al. [5] also studied the problems of the electronics cites obtaining similar results. Koehler [6,7,8], one of the busiest authors in this field, monitored 360 pages and 343 web sites over several years, finding that in 2001 the operative pages had reduced 34.4% and in 2003, 33.8%. Nelson and Allen [9] tested the contents of different e-libraries during one year finding only 3% of unavailable objects (linkrot). However, they warn that these media are more stable than the rest of the World Wide Web and that their results have to be considered carefully. Fetterly et al. [10], continuing with the work of Cho and García-Molina [11], studied the evolution and persistence of 150 million pages for 11 weeks and found that the larger pages change more often and more deeply than the smaller ones. Bar-Ilan and Peritz [12] queried informetrics using the most important search engines for 5 years, with the intention of studying the evolution of that discipline in the web, finding a disappearance rate of 40%. Wouters, Hellsten and Leydesdorff [13] studied the time span features of Google and Altavista and detected a great variability. While Ortega et al. [14] also detected that the query results of Google decayed according to the isotope radiation decay. 2. Objectives The aim of this paper to study the state and evolution of 738 web sites in two different moments in time, 1997 and It intends to establish the increment and decrease of several of web objects, to detect the different growth patterns in the web sites studied and to describe the persistence of these objects with time. It also tries to analyse the relationship between several web elements with the intention of finding out their behaviour in these two moments in time. These web sites were crawled in 1997 and 2004, and the results compared with intention of analysing their evolution. 3. Methodology In 1997, web sites were analysed by NetCarta.com [15]. This web site gathered the 1000 high quality web sites in terms of importance and contents. For this reason, most of these web sites are directories, e-libraries and information resources for
3 scientists. These web sites were analysed with the WebMapper 2.0 software of NetCarta. 921 of these web sites were downloaded to develop this study. In 2004, with the intention of comparing the results obtained in 1997, these web sites were again analysed with the software Microsoft Site Analyst. This software was used because WebMapper was acquired by Microsoft, and merged with Site Analyst. In this way, Microsoft Site Analyst was the only software that could open the reports generated in For this reason, this study is limited to the features of this software and the elements arrangement supplied by this commercial crawler. This software works at different levels and it defines one web site according to the URL inserted. Thus, a web site can be a institutional domain, a directory or a unique page, and them it extracts information only of these unities. Table 1 shows the elements that Site Analyst generates in the crawl process and that are analysed in this study [16]. Element Images Gateways Internet Description GIF, JPEG, and other types of images. Representations of CGI Scripts. links to FTP, Telnet, Mailto, WAIS, NNTP, Gopher, and all other Internet services (except HTTP) Applications Java applets, executable files, PDF files, Microsoft Word documents, PostScript files, and other applications Audio Video Text Pages Internal Links Outlinks WAV, AIFF, AU, and other audio files MPEG and other video file types. TXT files and other text files (other than HTML pages), including plain text. Number of pages in the web site links from the web site that point to its own pages links from the web site that point to pages in other web sites. Table 1. Elements generated by Site Analyst and their description. At first observation, less than half of these web sites had changed their address; concretely 427 (46.3%) and 183 (19.8%) had disappeared or had produced failures in the conversion to Microsoft Site Analyst, since to compare both crawls it was necesary to open again the Webmapper files in Site Analyst; and only 311 (33.7%) are remained constant. Finally, apart from the disappeared and faulty web sites, 738 web sites were
4 analysed. The following URL contains ( the 921 resources obtained in 1997 and the 738 analysed in Next, the data of each web site were extracted from the final reports of Microsoft Site Analyst through a little software programmed in VBS, and were recorded in a Microsoft Access database. Finally, they were analysed in a Microsoft Excel spreadsheet. 4. Research field Web sites analysed are significant research web sites, which have been working from 1997 until These web sites are characterised by having a great volume of information and act as an information resource to the scientific community. Table 2 shows the distribution of these web sites according to the institutional domain. More than half the web sites belong to the academic and scholarly domain (56.91%), followed by a considerable government presence (18.56%). Nevertheless, the economic sector only represents 10.03%. As we can see, the commercial sector was hardly present in 1997, as the Web was almost exclusively used by academics, and the non profit sector takes up the whole web. Sectors Web sites Percentage University % Government % Organisations % Commercial % TOTAL % Table 2. Web sites by institutional sector. In the following Table 3, the web sites have been presented by country, first, from the TLD of each site and then from an heuristic exploration. The web sites of United States are more than half of the sites studied (52.85%), followed at a distance by United Kingdom (7.45%) and Canada (6.23%). However, there are minor presence of French (1.22%) and Japanese (0.95%) web sites. It is understandable that the United States takes up all the net and nevertheless it is surprising that other countries, who carry a considerable weight in science, were poorly represented, such as France and Japan, which could suggest that the Web was still expanding.
5 Countries Web Sites Percentage USA % UK % CA % DE % IT % AU % FI % FR % NL % JP % Other Countries % TOTAL Table 3. Web sites by country TLD. 5. Results Next, the result of the crawl process carried out in 2004 and its comparison with the initial data of 1997 is discussed Pages Internal Links OutLinks Images Multimedia Applications Figure 1. General evolution in the elements of the web sites.
6 Web elements Growth Pages 183,488 1,444, Internal Links 458,456 15,000, OutLinks 145, , Images 102,504 1,076, Multimedia 14,401 92, Applications 10, , TOTAL 914,464 18,557, Table 4. General growth of elements in web sites. Table 4 shows the growth in the number of web elements substantially, up by a factor of The elements that show the highest rate of growth are Internal Links (32.72 times) and Applications (25.33 times), and the elements that show a lower growth rate are Outlinks (4.67 times) and the Multimedia element (6.43 times). It is significant that the number of pages, the main element in a web site, only increased by 7.87 times. Figure 1 shows the high number of Internal Links with regards to the other elements in This could be due to the improvement of the pages navigability, due to both an improvement in the quality of the information architecture and the web design, because, as Koehler [8] saw, the percentage of navigational pages increases with respect to the number of content pages over time, confirming the proliferation of internal links for navigational reasons. Another element with a significant increase is Applications. This suggests a growing use of scripts and programming languages used to build web pages such as ASP or PHP. It is necessary to say that Applications contain text formats such as.pdf (Portable Document Format) and.ps (PostScript), which are the formats used mainly in the Web for the spreading of scientific results (articles, informs, reports, etc.), and suggests the adoption of new formats to disseminate the knowledge in the web. It can be seen that in 1997 the number of Images was smaller than the number of Outlinks, and are now almost double, which confirms the great weight of the graphical elements in the present web design.
7 Web elements Growth Gateways 4,735 30, Other Protocol 52, , Audio 667 3, Video 856 5, Text 9,321 51, Other Media 12,878 83, TOTAL 80, , Table 5. Increase of multimedia and other elements. Table 5 shows the increase of the remaining elements. Note that the Audio, Video and Other Media are included in the Multimedia element of the previous table. The increase of these other elements is much lower than the elements studied before (3.95 times). The element with the highest growth is Video (6.88 times) and the lowest is Other Protocol (2.76 times). Thus, the multimedia elements (audio, video, etc.) have not increased much, probably due to the low use of these formats to diffuse scientific results, although they were already introduced in the web some time ago. Other Media Text Applications Video Audio Other Protocol Images Gateways Constant Decrease Growth Outlinks Inlinks Pages Figure 2. Percentage and increase of web sites by type of element.
8 Element sites % Increment Pages Internal Links Outlinks Gateways Images < > = < > = < > = < > = < > = < Other Protocol > = Audio Video Applications Text < > = < > = < > = < > = < Other Media > = Table 6. Percentage and increase of web sites by type of element. Figure 2 and Table 6 show the behaviour of the web sites according to the elements studied. Table VI illustrates the number of web sites where each element have increased, decreased or remained constant since For instance, there are 253 (34.28%) web sites that have increased their number of plain text files (Text element)
9 by times since 1997, 244 (33.06%) web sites that have decreased the number by 0.06 times and 241 (32.66) sites that have the same number of files as in In this way, it is seen how the increase of different elements affects certain web sites. Figure 2 also shows the infrequent use of the Video, Audio and Text elements since The number of sites which show increases in the most important formats such as Pages, Internal Links, Outlinks, Images, Gateways and Other Protocol is similar to the number of sites showing decreases. Thus, although all the elements have grown, there is a significant percentage of sites in which some elements have decreased. This allows us to observe that the widespread growing seen in Table V is not present in all web sites studied, but the increase and decrease pattern is irregular. From this we can say that there is not a unique pattern to the evolution of the different elements of a web site. Added Changed Vanished Pages % 17.09% 80.67% Images % 11.17% 80.34% Gateways % 4.32% 65.08% Media % 7.56% 65.49% Internet % 10.62% 77.72% Average % 10.76% 75.22% Table 7. Average of added, changed and missed elements. Table 7 shows the persistence of several web elements relative to the crawl carried out in First of all, the percentage of added elements in all cases is very high, confirming the strong increase of the WWW in these seven years. 17% of Pages element has changed their URL over the seven years, which is close to the rate of 2,2% per year detected by Koehler [6]. The average of all elements that have changed their URL represents only a small percentage (10,76%). This indicates that the level of content reorganisation is low unless this leads to a modification of the page content, although the unchanged elements (24,88%) have a large redistribution. However, the percentage of vanished elements (75,22%) is very high because only 2 of 10 elements remain unchanged since 1997, indicating the low level of contents persistence in the web. The percentage of unchanged pages 19,3%, (2,7% per year) is also in line with Koehler [6]. The relationship between different elements have been studied, with the aim of detecting how the evolution of certain element affects others. Tables 8 and 9 show two correlation matrices among the most significant elements.
10 1997 Images Internet Pages Outlinks Images ** Internet ** Pages 0.684** OutLinks ** ** Correlation is significant at the 0.01 level (2-tailed). Table 8. Correlation matrix between the main elements in For 1997 (Table VIII), there is a high correlation degree between the rest of the services of Internet and Outlinks (ρ=0.68), that suggest that the use of Outlinks was only designed to connect the Web with other Internet services, confirming that this was a period when the WWW had not yet absorbed the rest of the services (FTP, Telnet, Mail, etc.). Also there is correlation between Pages and Images (ρ=0,684), that suggest that the graphics were an important part in the design of the web pages at that time Images Gateways Internet Applications Text Pages Internal Links Outlinks Images ** ** Gateways ** Internet ** Applications 0.704** ** 0.748** Text ** Pages 0.938** ** ** Internal Links ** Outlinks ** 0.445** ** Correlation is significant at the 0.01 level (2-tailed). Table 9. Correlation matrix between the main elements in In 2004, there are changes in the correlations. Table 9 highlights that Images have increased their correlation with Pages (ρ=0.938), indicating the heavy presence of the graphical formats in the present web design and that the creation of pages runs parallel to the growth in images. Moreover, in 2004 there is a new correlation between Pages and Applications (ρ=0.748), because the Applications element contains textual formats such as pdf and ps suggesting that there is a meaningful relationship between pages and the supply of new scientific contents in different formats. On the other hand, the Application element also contains dynamic pages (ASP, PHP) which reinforce this relationship because these are one type of web pages. There is also an important correlation between Outlinks and Gateways (ρ=0.464), that demonstrates the proliferation of web-based databases. In the same sense, the correlation detected in 1997 between Internet and Outlinks decreased in 2004 (ρ=0.445), confirming the hegemonic presence of the Web with respect to the rest of the Internet services, because the
11 outlinks point now to the Web more than other Internet services. Finally, the correlation between Internal Links and Pages (ρ=0.632) confirms the spread of navigational pages because the Internal Links act as structural elements which organise the pages of a web site. Therefore the more pages there are, the more internal links will exist to arrange these contents. 6. State and Permanence The permanence and stability of the outlinks of the 738 web sites in 1997 were studied. 145,092 outlinks were counted and checked with the software Xenu's Link Sleuth [17]. In Table 10 we can show the distribution of the outlinks according to their status in Status Frequency Percentage not found ,44% Ok ,33% object moved ,55% no such host ,56% Timeout ,20% Forbidden request ,76% no connection ,56% server error 177 0,12% invalid path 121 0,08% Redirection 103 0,07% Other 469 0,32% TOTAL ,00% Table 10. Outlinks status. The percentage of valid links, if we consider the redirections, is very low (25,40%) while the group of broken links (linkrot) or no operative is almost three quarters of all outlinks (74,28%). This percentage is similar to the average number of missed elements (75,22%) as shown in Table 7. This suggests that the number of missed elements and the percentage of broken links have a similar relationship with respect to stability, because the more elements disappear the more broken links will exist. 7. Conclusions This study shows two different moments in the evolution of the Web. On the one hand, in 1997 the Web was a young service that was yet to gain prominence. On the
12 other hand, in 2004 this service was consolidated in Internet as the main gateway to access to the net. This longitudinal view demonstrates that the Web, since 1997, has been characterised by an exponential growth, although the rate of growth of different web elements (pages, links, formats, etc.) is not the same. As we have seen in Table VI, the growth pattern for every element in a web site is yet to be determined. Certain web sites increase in one element or decrease in others in similar percentages. This is why we consider that it is hard to know the evolution of the Web in general because each web site evolves in a particular form. The high standard deviation in the objects distribution detected by Koehler [6], confirms our assessments. We think that to estimate the evolution of the Web is a very complex task and that in order to do this it is necessary to take a wide and heterogeneous sample to obtain satisfactory results. This sample is limited to the scientific field and it can not be extended to the whole Web. Moreover, this sample represents directories and information sources, which is why the results are only representative of this type of web pages. However, and according to the results obtained in this survey, we can claim that the observed growth is due to the high contribution of contents which hides the substantial elimination rate of the Web, or, phrasing it differently, the Web grows at the expense of the deleting of previous contents. This fact is reinforced by the contents and URLs persistence in these seven years. 75,22% of the original contents have disappeared and the broken links have increased in a similar percentage (74,28%). This fact is disguised by the strong contribution of new contents (1568%) in these seven years. In the future, we can ask if this rate of contents contribution could increase or decrease, if the Web will stop growing or if the contents will be more stable. We encourage future works to answer these questions. On the other hand, This study has try to know the relationships between different web elements. The found correlations allow us to see how these relationships have developed (or not) between 1997 and For instance, the significant correlation between Pages and Image, which showed an increase over time (1997, ρ=0.684; 2004, ρ=0.938), suggests that the images are used more as a graphic element in the web pages design than instead of content itself. Nevertheless, the gradual lost of correlation between Outlinks and Internet (1997, ρ= 0.68; 2004, ρ=0.445) suggests that the links from the Web to other Internet services are disappearing due to large amount of Web contents and a gradual absence of the remaining services such as Telnet, Wais, Gopher
13 etc., in favour of the Web. The correlations between web elements show how these elements interact between themselves and how they structured one web site. Both the growth of Internal Links (32.72 times) and Applications (25.33 times) demonstrate that the tested web sites have reached (a period of) maturity. On the one hand, the growth of Internal Links means more complexity in the design and structure as well as more content arrangement of one web site. On the other hand, the growth of Applications means there is a higher proportion of science-related contents, because these formats (pdf, ps, etc.) are used to publish final structural contents such as articles and reports. However, the low use of multimedia elements (6.43 times), suggests that many web sites use the web in a traditional way and do not fully exploit the facilities that the technology offers. We think that the Web is the best vehicle to disseminate scientific results in ways that are not easily done using more traditional methods, e.g the use of audio via the web by the Physics and Biology communities and the use of video via the web in Psychology or Surgery. 8. References [1] D. Pennock, G.W. Flake, S. Lawrence, E.J. Glover, C.L. Giles, Winners don't take all: Characterizing the competition for links on the web, Proc. Natl. Acad. Sci. USA 99 (8) (2002) Available at: (accessed 28 October 2005). [2] Internet Systems Cosortium, Inc, Redwood, CA. (2004). Available at: (accessed 28 October 2005). [3] E.T. O Neill, B.F. Lavoie, R. Bennet, Trends in the Evolution of the Public Web , D-Lib Magazine 9 (4) (2003). [4] S. Harter, H. Kim, Electronic journals and scholarly communication: a citation and reference study, Information Research 2 (1) (1996) paper 9. Available at: (accessed 28 October 2005) [5] S. Lawrence, F. Coetzee, E. Glover, D. Pennock, G. Flake, F. Nielsen, B. Krovetz, A. Kruger, L. Giles, Persistence of Web References in Scientific Research, IEEE Computer 34(2) (2003) [6] W. Koehler, An Analysis of Web page and Web site constancy and permanence, Journal of the American Society for Information Science, 50 (2) (1999) [7] W. Koehler, Web page change and persistence a four-year longitudinal study, Journal of the American Society for Information Science and Technology, 53 (2) (2002)
14 [8] W. Koehler, A longitudinal study of Web pages continued: a report after six years, Information Research, 9 (2) (2004) paper 174. Available at: (accessed 28 October 2005) [9] M. Nelson, B. Allen, Object persistence and availability in digital libraries, D-Lib Magazine 8 (1) (2002). Available at: (accessed 28 October 2005) [10] D. Fetterly, M. Manasse, M. Najork, J.L. Wiener, A Large-Scale Study of the Evolution of Web pages, Software Practice and Experience 1 (1) (2003) 1-27 [11] J. Cho, H. García-Molina, The evolution of the web and implications for an incremental crawler, Proceeding of the 26 th International Conference on Very Large Databases, (2000) [12] J. Bar-Ilan, B.C. Peritz, Evolution, Continuity, and Disappearance of Documents on a Specific Topic on the Web: A Longitudinal Study of Informetrics, Journal of the American Society for Information Science and Technology, 55 (11) (2004) [13] P. Wouters, I. Hellsten, L. Leydesdorff, Internet Time and the reliability of Search Engines, First Monday, 9 (10) (2004) Available at: (accessed 28 October 2005) [14] J.L. Ortega, J. A. Prieto, N. Arroyo, V.M. Pareja, I.F. Aguillo, Análisis de la persistencia y del estado de páginas web en los resultados de Google, 9ª Jornadas Españolas de Documentación FESABID 2005, Madrid, 14 y 15 de Abril (2005). Available at: (accessed 28 October 2005) [15] NetCarta.com (1997) NetCarta WebMap Library. Available at: (accessed 16 April 1997) [16] N. Arroyo, V. Pareja, I. Aguillo, Description of Web Data in D3.1. Deliverable. IST (2003) Available at: Data description.pdf (accessed 28 October 2005) [17] Xenu's Link Sleuth. Ver. 1.2f [s. l.]: Tilman Hausherr, c Software. Available at: (accessed 28 October 2005)
INTERNET DOMAIN NAME SYSTEM
INTERNET DOMAIN NAME SYSTEM http://www.tutorialspoint.com/internet_technologies/internet_domain_name_system.htm Copyright tutorialspoint.com Overview When DNS was not into existence, one had to download
More informationChapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
More information1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?
Questions 1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment? 4. When will a TCP process resend a segment? CP476 Internet
More informationOPENGREY: HOW IT WORKS AND HOW IT IS USED
OPENGREY: HOW IT WORKS AND HOW IT IS USED CHRISTIANE STOCK christiane.stock@inist.fr INIST-CNRS, France Abstract OpenGrey is a unique repository providing open access to European grey literature references,
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationComputer Networks. Lecture 7: Application layer: FTP and HTTP. Marcin Bieńkowski. Institute of Computer Science University of Wrocław
Computer Networks Lecture 7: Application layer: FTP and Marcin Bieńkowski Institute of Computer Science University of Wrocław Computer networks (II UWr) Lecture 7 1 / 23 Reminder: Internet reference model
More informationCustomer Profile Report for ABC Hosting Ltd
Customer Profile Report for ABC Hosting Ltd An Example Report ebusiness Profile Report About the Profile NetExtract is a provider of ebusiness information extracted from the web. NetExtract creates an
More informationGuide to Analyzing Feedback from Web Trends
Guide to Analyzing Feedback from Web Trends Where to find the figures to include in the report How many times was the site visited? (General Statistics) What dates and times had peak amounts of traffic?
More informationThe World Wide Web: History
The World Wide Web: History - March, 1989, Tim Berners-Lee of Geneva s European Particle Physics Laboratory (CERN) circulated a proposal to develop a hypertext system for global information sharing in
More informationAnalysing log files. Yue Mao (mxxyue002@uct.ac.za) Supervisor: Dr Hussein Suleman, Kyle Williams, Gina Paihama. University of Cape Town
Analysing log files Yue Mao (mxxyue002@uct.ac.za) Supervisor: Dr Hussein Suleman, Kyle Williams, Gina Paihama University of Cape Town ABSTRACT A digital repository stores a collection of digital objects
More informationWeb Hosting Features. Small Office Premium. Small Office. Basic Premium. Enterprise. Basic. General
General Basic Basic Small Office Small Office Enterprise Enterprise RAID Web Storage 200 MB 1.5 MB 3 GB 6 GB 12 GB 42 GB Web Transfer Limit 36 GB 192 GB 288 GB 480 GB 960 GB 1200 GB Mail boxes 0 23 30
More informationThe Importance of Web Crawling and Network Marketing
Page 1 of 1 A publicly accessible database of UK university website links and a discussion of the need for human intervention in web crawling Mike Thelwall School of Computing and Information Technology,
More informationSCI Gateway. 10.5 Newsletter er for Admin Users
Scottish Care Information SCI Gateway 10.5 Newsletter er for Admin Users The purpose of this newsletter is to highlight the changes to the admin function of SCI Gateway that occur in version 10.5. Changes
More informationChapter 27 Hypertext Transfer Protocol
Chapter 27 Hypertext Transfer Protocol Columbus, OH 43210 Jain@CIS.Ohio-State.Edu http://www.cis.ohio-state.edu/~jain/ 27-1 Overview Hypertext language and protocol HTTP messages Browser architecture CGI
More informationFinding What You Need on the Internet:
Finding What You Need on the Internet: Navigation and Search Celia M. Elliott Department of Physics University of Illinois cmelliot@uiuc.edu Copyright 2004 The Board of Trustees of the University of Illinois
More informationIntroduction to LAN/WAN. Application Layer (Part II)
Introduction to LAN/WAN Application Layer (Part II) Application Layer Topics Domain Name System (DNS) (7.1) Electronic Mail (Email) (7.2) World Wide Web (WWW) (7.3) Electronic Mail (Email) Mostly used
More informationEICSTES DELIVERABLE D1.2 PROJECT WEBSERVER
EICSTES DELIVERABLE D1.2 PROJECT WEBSERVER Isidro F. Aguillo, José A. Prieto CINDOC-CSIC {isidro,joseaprieto}@cindoc.csic.es CONTENTS 1. Introduction 2. Objectives 3. Methods 3.1. Phase 1. Static pages
More informationOct 15, 2004 www.dcs.bbk.ac.uk/~gmagoulas/teaching.html 3. Internet : the vast collection of interconnected networks that all use the TCP/IP protocols
E-Commerce Infrastructure II: the World Wide Web The Internet and the World Wide Web are two separate but related things Oct 15, 2004 www.dcs.bbk.ac.uk/~gmagoulas/teaching.html 1 Outline The Internet and
More informationWeb Extras. Customer Service Description. Version 3.0. February 26, 2002
Web Extras Customer Service Description Version 3.0 February 26, 2002 Proprietary: Not for disclosure outside of Interland except under written agreement This document is subject to change without notice.
More informationANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING
International Journal of Science, Environment and Technology, Vol. 2, No 5, 2013, 1008 1016 ISSN 2278-3687 (O) ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING 1 V. Jayakumar and 2 Dr.
More informationPersistence of Web References in Scientific Research
Lawrence, S., F. Coetzee, E. Glover, D. Pennock, G. Flake, F. Nielsen, R. Krovetz, A. Kruger, C. L. Giles. Persistence of Web References in Scientific Research, IEEE Computer, Volume 34, Number 2, pp.
More informationLesson 7 - Website Administration
Lesson 7 - Website Administration If you are hired as a web designer, your client will most likely expect you do more than just create their website. They will expect you to also know how to get their
More informationManual. Netumo NETUMO HELP MANUAL WWW.NETUMO.COM. Copyright Netumo 2014 All Rights Reserved
Manual Netumo NETUMO HELP MANUAL WWW.NETUMO.COM Copyright Netumo 2014 All Rights Reserved Table of Contents 1 Introduction... 0 2 Creating an Account... 0 2.1 Additional services Login... 1 3 Adding a
More informationLATINDEX GLOSSARY. Minimum time of existence required for a journal to qualify for the Latindex Catalogue.
LATINDEX GLOSSARY This Glossary includes only the terminology used by the Latindex System for a number of purposes including: the methodology for registering journals in the Catalogue, the technical terms
More informationWeb Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
More informationContent Filtering Client Policy & Reporting Administrator s Guide
Content Filtering Client Policy & Reporting Administrator s Guide Notes, Cautions, and Warnings NOTE: A NOTE indicates important information that helps you make better use of your system. CAUTION: A CAUTION
More informationContent Management Software Drupal : Open Source Software to create library website
Content Management Software Drupal : Open Source Software to create library website S.Satish, Asst Library & Information Officer National Institute of Epidemiology (ICMR) R-127, Third Avenue, Tamil Nadu
More information126.47. Web Design (One Credit), Beginning with School Year 2012-2013.
126.47. Web Design (One Credit), Beginning with School Year 2012-2013. (a) General requirements. Students shall be awarded one credit for successful completion of this course. This course is recommended
More informationEnhancing the Ranking of a Web Page in the Ocean of Data
Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today
More informationSocial-network tools for the assessment of the university web performance
Social-network tools for the assessment of the university web performance José Luis Ortega R&D Analysis, Vice-presidency for Science and Technology, CSIC, Madrid, Spain Isidro F. Aguillo Cybermetrics Lab,
More informationChapter 3. Outline the hardware and software technologies used to build an e business. its partners. access to the Internet and hosting of
Chapter 3 E business Infrastructure 1 Learning outcomes Outline the hardware and software technologies used to build an e business infrastructure t within an organisation and with its partners Outline
More informationWeb Design. Links and Navigation
Web Design Links and Navigation Web Design Link Terms HTTP, FTP, Hyperlink, Email Links, Anchor HTTP (HyperText Transfer Protocol) - The most common link type and allows the user to connect to any page
More informationFig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript.
Client-Side Dynamic Web Page Generation CGI, PHP, JSP, and ASP scripts solve the problem of handling forms and interactions with databases on the server. They can all accept incoming information from forms,
More informationInternet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)
Internet Technologies World Wide Web (WWW) Proxy Server Network Address Translator (NAT) What is WWW? System of interlinked Hypertext documents Text, Images, Videos, and other multimedia documents navigate
More informationIntroduction to Web Technology. Content of the course. What is the Internet? Diana Inkpen
Introduction to Web Technology Content of the course Diana Inkpen The Internet and the WWW. Internet Connectivity. Basic Internet Services. University of Ottawa School of Information Technology and Engineering
More informationSOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901.
Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901 SOA, case Google Written by: Sampo Syrjäläinen, 0337918 Jukka Hilvonen, 0337840 1 Contents 1.
More informationLesson Overview. Getting Started. The Internet WWW
Lesson Overview Getting Started Learning Web Design: Chapter 1 and Chapter 2 What is the Internet? History of the Internet Anatomy of a Web Page What is the Web Made Of? Careers in Web Development Web-Related
More informationManagement of Storage Devices and File Formats in Web Archive Systems
The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 356 361 Management of Storage Devices
More informationANALYZING OF THE EVOLUTION OF WEB PAGES BY USING A DOMAIN BASED WEB CRAWLER
- 151 - Journal of the Technical University Sofia, branch Plovdiv Fundamental Sciences and Applications, Vol. 16, 2011 International Conference Engineering, Technologies and Systems TechSys 2011 BULGARIA
More informationTHE RANKING WEB NEW INDICATORS FOR NEW NEEDS. 2 nd International Workshop on University Web Rankings CCHS-CSIC, Madrid (Spain).
2 nd International Workshop on University Web Rankings CCHS-CSIC, Madrid (Spain). April 21 st 2009 THE RANKING WEB NEW INDICATORS FOR NEW NEEDS Isidro F. Aguillo Cybermetrics Lab. CCHS-CSIC isidro.aguillo@cchs.csic.es
More informationCitations in scientific communication
**** 1 Citations in scientific communication Citing information sources in your documents 2 Citing information sources in your documents: introduction When writing a document, in most cases you should
More informationProtocolo HTTP. Web and HTTP. HTTP overview. HTTP overview
Web and HTTP Protocolo HTTP Web page consists of objects Object can be HTML file, JPEG image, Java applet, audio file, Web page consists of base HTML-file which includes several referenced objects Each
More informationMultimedia Systems Design
32027 Multimedia Systems Design Core Concepts of the WWW The WWW principle of universal readership is that once information is available, it should be accessible from any type of computer, in any country,
More informationN-CAP Users Guide Everything You Need to Know About Using the Internet! How Firewalls Work
N-CAP Users Guide Everything You Need to Know About Using the Internet! How Firewalls Work How Firewalls Work By: Jeff Tyson If you have been using the internet for any length of time, and especially if
More informationIC3 Internet and Computing Core Certification Guide
IC3 Internet and Computing Core Certification Guide Global Standard 4 Living Online Lesson 12: The World Wide Web CCI Learning Solutions Inc. 1 Lesson Objectives the difference between the Internet, the
More informationThe Open University s repository of research publications and other research outputs. Age differences in graduate employment across Europe
Open Research Online The Open University s repository of research publications and other research outputs Age differences in graduate employment across Europe Other How to cite: Little, Brenda and Tang,
More informationInternet Jargon. Address: See Uniform Resource Locator.
Internet Jargon Address: See Uniform Resource Locator. Address Bar: Part of the window in Internet Explorer that displays the URL of the web site currently being viewed. This is also the location where
More informationNovell ZENworks Asset Management 7.5
Novell ZENworks Asset Management 7.5 w w w. n o v e l l. c o m October 2006 USING THE WEB CONSOLE Table Of Contents Getting Started with ZENworks Asset Management Web Console... 1 How to Get Started...
More informationUSE OF INFORMATION SOURCES AMONGST POSTGRADUATE STUDENTS IN COMPUTER SCIENCE AND SOFTWARE ENGINEERING A CITATION ANALYSIS YIP SUMIN
USE OF INFORMATION SOURCES AMONGST POSTGRADUATE STUDENTS IN COMPUTER SCIENCE AND SOFTWARE ENGINEERING A CITATION ANALYSIS YIP SUMIN A dissertation submitted in partial fulfillment of requirements for the
More informationGenericServ, a Generic Server for Web Application Development
EurAsia-ICT 2002, Shiraz-Iran, 29-31 Oct. GenericServ, a Generic Server for Web Application Development Samar TAWBI PHD student tawbi@irit.fr Bilal CHEBARO Assistant professor bchebaro@ul.edu.lb Abstract
More informationScientific Research Activity and Communication Measured With Cybermetrics Indicators
Scientific Research Activity and Communication Measured With Cybermetrics Indicators Isidro F. Aguillo, Begoña Granadino, José L. Ortega, and José A. Prieto Laboratorio de Internet, CINDOC-CSIC, Joaquín
More informationCOURSE CONTENT FOR WINTER TRAINING ON Web Development using PHP & MySql
COURSE CONTENT FOR WINTER TRAINING ON Web Development using PHP & MySql 1 About WEB DEVELOPMENT Among web professionals, "web development" refers to the design aspects of building web sites. Web development
More informationScholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
More information1. Digital Asset Management User Guide... 2 1.1 Digital Asset Management Concepts... 2 1.2 Working with digital assets... 4 1.2.1 Importing assets in
1. Digital Asset Management User Guide........................................................................... 2 1.1 Digital Asset Management Concepts........................................................................
More informationHACKER INTELLIGENCE INITIATIVE. The Secret Behind CryptoWall s Success
HACKER INTELLIGENCE INITIATIVE The Secret Behind 1 1. Introduction The Imperva Application Defense Center (ADC) is a premier research organization for security analysis, vulnerability discovery, and compliance
More informationUser Guide to the Content Analysis Tool
User Guide to the Content Analysis Tool User Guide To The Content Analysis Tool 1 Contents Introduction... 3 Setting Up a New Job... 3 The Dashboard... 7 Job Queue... 8 Completed Jobs List... 8 Job Details
More informationERIE COMMUNITY COLLEGE COURSE OUTLINE A. COURSE TITLE: CS 103 - WEB DEVELOPMENT AND PROGRAMMING FUNDAMENTALS
ERIE COMMUNITY COLLEGE COURSE OUTLINE A. COURSE TITLE: CS 103 - WEB DEVELOPMENT AND PROGRAMMING FUNDAMENTALS B. CURRICULUM: Mathematics / Computer Science Unit Offering C. CATALOG DESCRIPTION: (N,C,S)
More informationEmail Data Protection. Administrator Guide
Email Data Protection Administrator Guide Email Data Protection Administrator Guide Documentation version: 1.0 Legal Notice Legal Notice Copyright 2015 Symantec Corporation. All rights reserved. Symantec,
More informationLink Analysis and Site Structure in Information Retrieval
Link Analysis and Site Structure in Information Retrieval Thomas Mandl Information Science Universität Hildesheim Marienburger Platz 22 31141 Hildesheim - Germany mandl@uni-hildesheim.de Abstract: Link
More informationGUIDE TO WEBSITES AND E-COMMERCE
GUIDE TO WEBSITES AND E-COMMERCE Version 1.0, 26-Sept-01 This document is available from www.webcentro.com.au 2001, WebCentro WebCentro Guide To Websites And E-commerce CONTENTS 1. What is a Website? 1
More informationCreating Interactive PDF Documents with CorelDRAW
Creating Interactive PDF Documents with CorelDRAW By Steve Bain When it comes to choosing file formats for digital publishing, the Adobe Portable Document Format (PDF) is the winner by far. It's essentially
More informationA Corpus Linguistics-based Approach for Estimating Arabic Online Content
A Corpus Linguistics-based Approach for Estimating Arabic Online Content Anas Tawileh Systematics Consulting anas@systematics.ca Mansour Al Ghamedi King Abdulaziz City for Science and Technology mghamdi@kacst.edu.sa
More informationWeb Design and Development ACS-1809
Web Design and Development ACS-1809 Chapter 1 9/9/2015 1 Pre-class Housekeeping Course Outline Text book : HTML A beginner s guide, Wendy Willard, 5 th edition Work on HTML files On Windows PCs Tons of
More informationTHE SERVICES A UNIVERSITY WEBSITE SHOULD OFFER
THE SERVICES A UNIVERSITY WEBSITE SHOULD OFFER J.L. BERNIER, M. BARCHÉIN, A. CAÑAS, C. GÓMEZ-VALENZUELA AND J.J. MERELO Dpto. Arquitectura y Tecnología de Computadores, Universidad de Granada, Granada,
More informationSUBJECT CODE : 4074 PERIODS/WEEK : 4 PERIODS/ SEMESTER : 72 CREDIT : 4 TIME SCHEDULE UNIT TOPIC PERIODS 1. INTERNET FUNDAMENTALS & HTML Test 1
SUBJECT TITLE : WEB TECHNOLOGY SUBJECT CODE : 4074 PERIODS/WEEK : 4 PERIODS/ SEMESTER : 72 CREDIT : 4 TIME SCHEDULE UNIT TOPIC PERIODS 1. INTERNET FUNDAMENTALS & HTML Test 1 16 02 2. CSS & JAVASCRIPT Test
More informationM3-R3: INTERNET AND WEB DESIGN
M3-R3: INTERNET AND WEB DESIGN NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered in the TEAR-OFF ANSWER
More informationRanking Web of repositories: End users point of view?
Brasilia (Brazil), 19 th October 2012 Ranking Web of repositories: End users point of view? Isidro F. Aguillo Editor of the Rankings Web Cybermetrics Lab CSIC. Spain 2 Agenda A classification of repositories
More informationHow To Understand The History Of The Web (Web)
(World Wide) Web WWW A way to connect computers that provide information (servers) with computers that ask for it (clients like you and me) uses the Internet, but it's not the same as the Internet URL
More informationInstructions for Access to Summary Traffic Data by GÉANT Partners and other Organisations
Contract Number: IST-2000-26417 Project Title: Deliverable D8 : Instructions for Access to Summary Traffic Data by GÉANT Partners and other Organisations Contractual Date: 31 May 2002 Actual Date: 14 August
More informationHTTP. Internet Engineering. Fall 2015. Bahador Bakhshi CE & IT Department, Amirkabir University of Technology
HTTP Internet Engineering Fall 2015 Bahador Bakhshi CE & IT Department, Amirkabir University of Technology Questions Q1) How do web server and client browser talk to each other? Q1.1) What is the common
More information11.5 E-THESIS SUBMISSION PROCEDURE (RESEARCH DEGREES)
11.5 E-THESIS SUBMISSION PROCEDURE (RESEARCH DEGREES) 1 E-THESIS SUBMISSION PROCEDURE File format: E-Thesis - the following file formats will be accepted for deposit: Format Minimum version PDF 6.0 Microsoft
More informationInternet Technologies_1. Doc. Ing. František Huňka, CSc.
1 Internet Technologies_1 Doc. Ing. František Huňka, CSc. Outline of the Course 2 Internet and www history. Markup languages. Software tools. HTTP protocol. Basic architecture of the web systems. XHTML
More informationBy : Khalid Alfalqi Department of Computer Science, Umm Al-Qura University
By : Khalid Alfalqi Department of Computer Science, Umm Al-Qura University History of Web History of the Internet Basic Web System Architecture URL DNS Creating Static and Dynamic Information Security
More informationArti Tyagi Sunita Choudhary
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Usage Mining
More informationThe FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols
The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols Claudia Nicolai; Imma Subirats; Stephen Katz Food and Agriculture Organization of the United
More informationEnhanced Library Database Interface at NTU Library
Enhanced Library Database Interface at NTU Library Nurhazman Abdul Aziz Michael Tan Siew Chye Hazel Loh Nanyang Technological University Library Abstract This project sought to develop an integrated framework
More informationEnd User Guide The guide for email/ftp account owner
End User Guide The guide for email/ftp account owner ServerDirector Version 3.7 Table Of Contents Introduction...1 Logging In...1 Logging Out...3 Installing SSL License...3 System Requirements...4 Navigating...4
More informationHow To Analyze Web Server Log Files, Log Files And Log Files Of A Website With A Web Mining Tool
International Journal of Advanced Computer and Mathematical Sciences ISSN 2230-9624. Vol 4, Issue 1, 2013, pp1-8 http://bipublication.com ANALYSIS OF WEB SERVER LOG FILES TO INCREASE THE EFFECTIVENESS
More informationC Consiglio Nazionale delle Ricerche
DL00 Experience in implementing a Document Delivery Service Francesco Gennai,Marina B uzzi,laura Abba Istituto perle ApplicazioniTelem atiche {Francesco.G ennai,m arina.buzzi,laura.abba}@ iat.cnr.it Silvana
More informationA Concept for an Electronic Magazine
TERENA-NORDUnet Networking Conference (TNNC) 1999 1 A Concept for an Electronic Magazine Alexander von Berg Helmut Pralle University of Hanover, Institute for Computer Networks and Distributed Systems
More informationXtreeme Search Engine Studio Help. 2007 Xtreeme
Xtreeme Search Engine Studio Help 2007 Xtreeme I Search Engine Studio Help Table of Contents Part I Introduction 2 Part II Requirements 4 Part III Features 7 Part IV Quick Start Tutorials 9 1 Steps to
More informationCSE 203 Web Programming 1. Prepared by: Asst. Prof. Dr. Maryam Eskandari
CSE 203 Web Programming 1 Prepared by: Asst. Prof. Dr. Maryam Eskandari Outline Basic concepts related to design and implement a website. HTML/XHTML Dynamic HTML Cascading Style Sheets (CSS) Basic JavaScript
More informationWeb Log Mining: A Study of User Sessions
Web Log Mining: A Study of User Sessions Maristella Agosti and Giorgio Maria Di Nunzio Department of Information Engineering University of Padua Via Gradegnigo /a, Padova, Italy {agosti, dinunzio}@dei.unipd.it
More informationSCHOOL DISTRICT OF ESCAMBIA COUNTY
SCHOOL DISTRICT OF ESCAMBIA COUNTY JOB DESCRIPTION Programmer Analyst I Web Technologies PROGRAMMER ANALYST I WEB TECHNOLOGIES QUALIFICATIONS: (1) Bachelor s Degree from an accredited educational institution
More informationAnalyzing Download Time Performance of University Websites in India
, pp.1-6 http://dx.doi.org/10.14257/ijwse.2014.1.1.01 Analyzing Time Performance of University Websites in India G. Sreedhar Associate Professor Department of Computer Science, Rashtriya Sanskrit Vidyapeetha
More informationEnriched Links: A Framework For Improving Web Navigation Using Pop-Up Views
Enriched Links: A Framework For Improving Web Navigation Using Pop-Up Views Gary Geisler Interaction Design Laboratory School of Information and Library Science University of North Carolina at Chapel Hill
More informationAge differences in graduate employment across Europe
November 2008 The Flexible Professional in the Knowledge Society new demands on higher education in Europe (Report 5) Age differences in graduate employment across Europe Report to HEFCE by Centre for
More informationChapter 1 Programming Languages for Web Applications
Chapter 1 Programming Languages for Web Applications Introduction Web-related programming tasks include HTML page authoring, CGI programming, generating and parsing HTML/XHTML and XML (extensible Markup
More informationOzgur Aktunc Assistant Professor of Software Engineering St. Mary s University
Ozgur Aktunc Assistant Professor of Software Engineering St. Mary s University WORLD INTERNET USAGE AND POPULATION STATISTICS World Regions Population ( 2010 Est.) Internet Users Dec. 31, 2000 Internet
More informationA CLIENT-ORIENTATED DYNAMIC WEB SERVER. Cristina Hava Muntean, Jennifer McManis, John Murphy 1 and Liam Murphy 2. Abstract
A CLIENT-ORIENTATED DYNAMIC WEB SERVER Cristina Hava Muntean, Jennifer McManis, John Murphy 1 and Liam Murphy 2 Abstract The cost of computer systems has decreased continuously in recent years, leading
More informationBradford Scholars Digital Preservation Policy
DIGITAL PRESERVATION The value of the research outputs produced by staff and research students at the University of Bradford cannot be over emphasised in demonstrating the scientific, societal and economic
More informationBusiness Process Management with @enterprise
Business Process Management with @enterprise March 2014 Groiss Informatics GmbH 1 Introduction Process orientation enables modern organizations to focus on the valueadding core processes and increase
More informationFunctional Requirements for Digital Asset Management Project version 3.0 11/30/2006
/30/2006 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 = required; 2 = optional; 3 = not required functional requirements Discovery tools available to end-users:
More informationModule 5 The Internet as an Information Resource
Module 5 The Internet as an Information Resource Lesson 2 How to search for Information on the Internet. UNESCO EIPICT MODULE 5. LESSON 2 1 Scope What are the ways to find information on the Internet?
More informationSYLLABUS & COURSE OUTLINE
1 GENERAL INFORMATION SYLLABUS & COURSE OUTLINE Course Title and Number: CMAT 212-WF1 (5073) Interactive Multimedia Design Instructor s Name: James House, Jr. Office Phone: 301-784-5308 e-mail: jhouse@allegany.edu
More information1 Introduction: Network Applications
1 Introduction: Network Applications Some Network Apps E-mail Web Instant messaging Remote login P2P file sharing Multi-user network games Streaming stored video clips Internet telephone Real-time video
More informationProgramming SIP Services University Infoline Service
Programming SIP Services University Infoline Service Tatiana Kováčiková, Pavol Segeč Department of Information Networks University of Zilina Moyzesova 20, 010 26 SLOVAKIA Abstract: Internet telephony now
More informationRemoving Web Spam Links from Search Engine Results
Removing Web Spam Links from Search Engine Results Manuel EGELE pizzaman@iseclab.org, 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features
More informationAnalyzing the Different Attributes of Web Log Files To Have An Effective Web Mining
Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining Jaswinder Kaur #1, Dr. Kanwal Garg #2 #1 Ph.D. Scholar, Department of Computer Science & Applications Kurukshetra University,
More informationA Multimedia Call Centre on the Internet
A Multimedia Call Centre on the Internet Chai Kiat Yeo, Siu Cheung Hui, *Ing Yann Soon School of Computer Engineering *School of Electrical & Electronics Engineering Nanyang Technological University Nanyang
More information