A VISION OF THE ROLE AND FUTURE OF WEB ARCHIVES Kalev H. Leetaru 1 Graduate School of Library and Information Science University of Illinois

Size: px
Start display at page:

Download "A VISION OF THE ROLE AND FUTURE OF WEB ARCHIVES Kalev H. Leetaru 1 Graduate School of Library and Information Science University of Illinois"

Transcription

1 A VISION OF THE ROLE AND FUTURE OF WEB ARCHIVES Kalev H. Leetaru 1 Graduate School of Library and Information Science University of Illinois Imagine a world in which libraries and archives had never existed. No institutions had ever systematically collected or preserved our collective cultural past: every book, letter, or document was created, read and then immediately thrown away. What would we know about our past? Yet, that is precisely what is happening with the web: more and more of our daily lives occur within the digital world, yet more than two decades after the birth of the modern web, the libraries and archives of this world are still just being formed. We ve reached an incredible point in society. Every single day a quarter billion photographs are uploaded to Facebook, 300 billion s are sent and 340 million tweets are posted to Twitter. There are more than 644 million websites with 150,000 new ones added each day, and upwards of 156 million blogs. Even more incredibly, the growth rate of content creation in the digital world is exploding. The entire New York Times over the last 60 years contained around 3 billion words. More than 8 billion words are posted to Twitter every single day. That s right, every 24 hours there are 2.5 times as many words posted to Twitter as there were in every article of every issue of the paper of record of the United States over the last half century. By some estimates there have been 50 trillion words in all of the books published over the last halfmillennia. At its current growth rate, Twitter will reach that milestone less than three years from now. Nearly a third of the planet s population is now connected to the internet and there are as many cell phones as there are people on earth. Yet, for the most part we consume all of this information as it arrives and discard it just as quickly, giving little thought to posterity. That s where web archives come in: to make sure that a few years, decades, centuries, and millennia from now we will still have at least a partial written record of human society at the dawn of the twenty first century. THE WEB ARCHIVE IN TODAY S WORLD The loss of the Library of Alexandria, once the greatest library on earth, created an enormous hole in our understanding of the ancient world. Imagine if that library had not only persisted to present day, but had continued to collect materials through the millennia? Yet, in the web era, we are repeating this cycle of loss, not through a fire or other sudden event that destroyed the Library of Alexandria, but rather through inaction: we are simply not collecting it. The dawn of the digital world exists in the archives of just a few organizations. Many mailing lists and early services like Gopher have largely been lost, while organizations such as Google have invested considerable resources in resurrecting others like USENET. The earliest years of the web are gone forever, but beginning in 1996 the Internet Archive began capturing snapshots, giving us one of the few records of the early iterations of this world. Organizations like the International Internet Preservation Consortium (IIPC) are helping to bring web archivists from across the world and across disciplines together to share experiences and best practices and forge collaborations to help advance these critical efforts. 1

2 UNINTENDED USES Archives exist to preserve a sample of the world for future generations. They accept that they cannot archive everything and don t try to: they operate as an opportunistic collector. Traditional humanities and social sciences scholarship was designed around these limitations: the tradition of deep reading of a small number of works in the humanities was born out of this model. Yet, a new generation of researchers is increasingly using archives in ways they weren t intended for and need a greater array of information on how those archives are created to anticipate biases and impacts on their findings. The Library of Congress Chronicling America site, while technically a web delivered digital library, not a web archive, offers an example of why greater insight into the archiving process is critical for research. Using the site recently for a project, my search returned ten times as many hits for my topic in El Paso, Texas newspapers, as it did for New York City. Further inspection showed this was actually because the Chronicling America site had more content from El Paso newspapers during this time period than it did from New York City papers, rather than this being a reflection of El Paso papers covering my topic in more detail. Part of this issue stems from the acquisition model of Chronicling America: each individual state determines the order it digitizes newspapers printed in its borders: one state might begin with smaller papers, while other begins with larger papers, one state might digitize a particular year from every paper, while another might digitize the entirety of each paper in turn. Chronicling America also excludes papers that have been digitized already by commercial vendors: thus New York City s largest paper, the New York Times, is not present in the archive. This landscape introduces significant artifacts into searches, but normalization procedures can help address them. In order to do so, however, a bibliography is needed that lists every page from every paper that has been included in the archive. This would have allowed me to switch my search results from a raw count of matching newspaper pages into a percent of all pages from each city, which would have accounted for there being more content in El Paso than New York City. This is even clearer when conducting searches of the historic New York Times. A search of the Times for any keyword over the period 1945 present will show its use declining by 50% over that period. This is not a reflection of that term declining in use, but rather reflects the fact that the Times itself shrunk by more than half over this period. Similarly, searches covering the year 1978 will show an 88 day period where the term was never used. This is not because the term dropped out of favor during that period, but rather because a machinist s strike halted the paper s publication entirely. Having an index of the total number of articles published each day (and thus the possible universe of articles the term could have been used in) allows the raw counts to be normalized to yield the true picture of the term s usage. However, no web archive today offers such a master index of its holdings. One of the core optimizations used by web crawlers can have a significant impact on certain classes of research. Nearly every web archive uses crawlers designed to measure the rate of change of a site (ie, how often on average pages on that site change) in order to crawl sites that change more often faster than those that rarely change. This allows bandwidth and disk storage to be prioritized towards sites that change often, rather than storing a large number of identical snapshots of a site that never changes. However, sometimes it is precisely that rare change that is most interesting. For example, when studying how White House press releases had changed, I was examining pages that should never show any change whatsoever, and when there was a change, I needed to know the specific day on which the change occurred to reconcile it with political winds at the time. However, the rare rate of the

3 change on that portion of the site meant that snapshots often were months or sometimes years apart, making it impossible to narrow some changes down below the level of several years. In other analyses, the dynamic alteration of the recrawl rate itself is a problem. For example, when studying the inner workings of the Drudge Report over the last half decade, a key research question revolved around the rate at which various elements of that site changed. If the rate of snapshotting was being varied by a software algorithm based on the very phenomena I was measuring, that would strongly bias my findings. In that particular case I was lucky enough to find a specialty archive that existed solely to archive the Drudge Report, and which had collected snapshots every 2 minutes nonstop for more than 6 years. This is not an easy problem, as archives must balance their very limited resources between crawling for new pages and recrawling existing pages looking for changes. Within recrawling, they must balance the need to pinpoint changes to the most narrow timeframe possible with ensuring they capture as many changes as possible from high velocity sites. Finally, the very notion of what constitutes change varies dramatically among research projects. Has a page changed if it still looks the same, but an HTML tag was changed? What about if the title changes, or the background color? Does a change in the navigation bar at the top count the same as a change to the body text? There are as many answers to these questions as there are research projects, and no single solution satisfies them all. When looking at changes to White House press releases, only a change to a page title or body text counted as change, while the Internet Archive counted all of the myriad edits and additions to the White House navigation bar as changes. This required downloading every single snapshot of each page and applying our own filters to extract and compare the body text ourselves. One possible solution to this might be the incorporation of hybrid hierarchical structural and semantic document models that allow a user to indicate which areas of the document he or she cares about and to return only those snapshots in which that section has changed. WHAT TO KEEP? As noted in the introduction to this blog post, the digital world is experiencing explosive growth, producing more content in a few hours than was produced in the greater part of a century in the print era. This growth is giving us an incredible view of global society and enabling communication, collaboration, and social research at scales unimaginable even a decade ago, yet the richer this archive becomes, the harder it is to archive. The very volume of material that makes the web so exciting as a communications platform means there is simply too much of it to keep. Even in the era of books, there were simply too many of them for any library to keep, but at least we could assume that some library somewhere was probably collecting the books that we weren t: an assumption that isn t necessarily true in the digital world yet. An age old mechanism for dealing with overflow is to determine which works are the most important and which can be discarded. Yet, how do we decide what constitutes noise and what should be kept? Talk to a historian writing a biography of a historic figure and he or she will likely point to routine day today letters and diary entries as a critical source of information on that person s mood, feelings, and beliefs. Emerging research on using Twitter to forecast the stock market or measure public sentiment are finding that only when one considers the entirety of all 340 million tweets each day do the key patterns emerge. A tweet of I m outside hanging the laundry, such a beautiful day might at first seem a prime candidate for discarding, but by its very nature, it reflects an author feeling calm and secure and

4 relaxed: critical population level dynamics of great interest to social scientists. Another mechanism is to discard highly similar works, such as multiple editions of the same work. Yet, an emerging area of research on the web is the tracing of memes, which are variations of a quote or story that evolve as they are forwarded across users and communities much like a realtime version of the telephone game. It is critical for such research to be able to access every version of a story, not just the most recent. The rise of dual electronic + print publishing pipelines has led to the need to collect TWO copies of a work, instead of just a single authoritative print edition. Digital editions of books released as websites may include videos, photographs, multimedia and interactive features that provide a very different experience from the print copy. Even in subject domains where print is still the official record, digital has become the defacto record through its ease of access. How many citizens travel to their nearest Federal Depository Library and browse the latest edition of the Public Papers of the President to find press releases and statements by their government? Most likely turn instead to the White House s website, yet a study I coauthored in 2008 found that official US government press releases on the White House website were being continually edited, with key information added and removed and dates changed over time to reflect changing political realities. In a world in which information is so easily changed and even supposedly immutable government material changes with a click of a mouse, how do we as web archivists capture this world and make it available? This brings up one very critical distinction between the print and digital eras: the concept of change. In the print era, an archive simply needed to collect an item as it was published. If a book was subsequently changed, the publisher would issue a new edition and notify the library of its availability. A book sitting on a shelf was static: if 20 libraries each held a copy of that book, they could be reasonably certain that all 20 copies were identical to each other. In the digital era, we must constantly scour for new pages to archive, but we also have a new role: checking our existing archive for change. Every single page every saved by the archive must be rechecked on a regular basis to see if it has changed. Websites don t make this easy. A study of the Chicago Tribune I conducted for the Center of Research Libraries in 2011 found there was no single master list of articles published on the Tribune s site each day and the RSS feeds were sorted by popularity, not date. To ensure one archived every new article posted to the site, an archivist would have to monitor all 105 main topic pages on the Tribune s site every few hours or risk losing new articles on a news heavy day. At the level of the web as a whole, one can monitor the DNS domain registry to get a continually updated list of every domain name in existence. However, even this provides only a list of websites like cnn.com, not a list of all of the pages on that site. In the era of books, a library needn t purchase a work the day it was released, as most books continued to be printed and available for at least months, if not years afterwards. A library could wait a year or two until it had sufficient budget or space to collect it. Web pages, on the other hand, may have halflives measured in seconds to minutes. They can change constantly, with no notice, and the velocity of change can be extreme. In addition, more content is arriving in streaming format on the web. Archiving Twitter requires being able to collect and save over 4,000 messages per second in realtime, with no ability to go back for missed ones. A network outage of 10 minutes means 2.5 million tweets that have been lost forever. In the web world, content producers set the schedule for collection and archivists must adhere to those schedules. Myron Gutmann, Assistant Director of the National Science Foundations Directorate for Social, Behavioral, & Economic Sciences recently gave a talk earlier this year where he argued that in the print

5 era the high cost of producing information meant that whatever was published was worth keeping because there were so many layers of review. In contrast, the tremendously low cost of publication in the digital era means anyone can publish anything without any form of review. This raises the question even in scholarly disciplines of what is worth keeping? If an archive becomes too full and a massive community of researchers is served by one set of content and just 10 users are served by another collection of material, whose voice matters the most in what is deleted? How do we make decisions about what to keep? Historically those decisions were made by librarians or archivists by themselves, but as users and data miners become increasing users of archives, this raises the question of how to engage those communities in these critical decisions. THE RISE OF THE PARALLEL WEB When we speak of archiving the web we often think of the web as a single monolithic entity in which all content that is produced or consumed via a web browser is accessible for archiving. The original vision of the web was based on this ideal: an open unified platform in which all material was available to all users. For the most part this vision survived the early years of the web, as users strove to reach the greatest possible audience. Yet, a new trend has begun over the past half decade, corresponding with the rise of social media: the creation of parallel versions of the web. Every one of those quarter billion photographs uploaded to Facebook each day is posted and consumed via the web, whether through browser on a desktop or a mobile app on a smartphone. Yet, despite transiting the same physical telecommunications infrastructure as the rest of the web, those photos are stored in a parallel web, owned and controlled entirely by a commercial entity. They are not part of the public web and thus not available to web archives. In many ways this is no different than the libraries and archives of the print era. Libraries focused on collecting books and pamphlets, while a good deal of communication and culture occurred in letters, diaries, drawings, and artwork that have largely been lost. The difference in the digital era is that instead of being scattered across individual households, all of this material is already being centralized into commercially owned archives and libraries. Not everyone desires every conversation of theirs to be preserved for posterity, but in the print era one had a choice: a letter or diary or photograph was a physical object, held by its owner and could be passed down to later generations. How many of us have come across a shoebox of old photographs or letters from a grandparent? In the digital era, a company holds that material on our behalf and while most have terms of service that agree we own our material, only one major social media platform today offers an export button that allows us to download a copy of the material we have given it over the years: Google Plus Google Takeout. Twitter has recognized the importance of the communications that occur via its service and has made a feed of its content available to the Library of Congress for archiving for posterity. Most others like Facebook and international platforms like Weibo or VK (formerly VKontakte) have not. Facebook in effect has become a parallel version of the web, hosted on the web, but walled off from it, with no means for users to archive their material for the future. Twitter offers a shining example of how such platforms can interact with the web archiving community and ensure that their material is archived for future generations. Self archiving services like Google Takeout offer an intermediate step in which users at least retain the ability to make their own archival copy of their contributions to the web for future generations. As more of the web moves behind paywalls, password protection, and other mechanisms, creating more and more parallel versions of the web, there must be greater discussion within the web archiving community about how we reach out to

6 these services to find ways of ensuring users of these communities may archive their material for the future. DATA MINING For millennia, scholarship in archives and libraries has meant intensive reading of a small number of works. In the past decade the digital humanities and computational social sciences has led to the growing use of computerized analysis of archives in which software algorithms are used to identify patterns and point to areas of interest in the data. Digital archives have largely been built around this earlier access modality of deep reading, while computational techniques need rapid access to vast volumes of content, often encompassing the entire archive. New programming interfaces and access policies are needed to enable this new generation of scholarship using web archives. Informal discussions with web archivists suggest a chicken or the egg dilemma in this regard: data miners want to analyze archives, but can t without the necessary programmatic interfaces, and archives for the most part want to encourage use of their archives, but don t know what interfaces to support without working with data miners. Few archives today support the necessary programmatic interfaces for automated access to their collections, and those that do tend to be aimed at metadata, rather than fulltext content, and use library centric protocols and mindsets. Some have fairly complex interfaces, with very fine grained toolkits for each possible use scenario. The few that offer data exports offer an either or proposition: you either download a ZIP file of the entire contents of everything in the archive or you get nothing: there is no in between. Though there are some bright spots: the National Endowment for the Humanities has made initial steps towards helping archivists and data miners work together through grand challenge programs like its Digging into Data initiative where a selection of archives made their content available to awardees for large data mining. Yet, one only has to look at Twitter for a model of what archives could do. Twitter provides only a single small programming interface with a few very basic options, but through that interface it has been able to support an ecosystem of nearly every imaginable research question and tool. It even offers a tiered cost recovery model: users needing only small quantities of data (a sip ) can access the feed for free, while the rest are charged at a tiered pricing model based on the quantity of data they need, up to the entirety of all 340M tweets at the highest level. Finally, the interfaces provided by Twitter are compatible with the huge numbers of analytical, visualization, and filtering tools provided by the Google s and Yahoo s of the world with their open cloud toolkits. If archives took the same approach with a standardized interface like Twitter s, researchers could leverage these huge ecosystems for the study of the web itself. For some archives, the bottleneck has become the size of the data, which has become too large to share via the network. Through a partnership with Google, data miners can request from the HathiTrust a copy of the Google Books archive, consisting of around 750 million pages of material. Instead of receiving a download link, users must pay the cost of purchasing and shipping a box full of USB drives, because networks, even between research universities, simply cannot keep up with the size of datasets used today. In the sciences, some of the largest projects, such as the Large Synoptic Survey Telescope, are going as far as to purchase and house an entire computing cluster in the same machine room as the data archive and allowing researchers to submit proposals to run their programs on the cluster, because even with USB drives the data is simply too large to copy.

7 Not all of the barriers to offering bulk data mining access to archives are technical: copyright and other legal restrictions can present significant complications. Though even here technology can provide a possible alternative: nonconsumptive analysis, in which software algorithms perform surface level analyses rather than deep reading of text, may satisfy the requirements of copyright. In other cases, transformations of copyright material to another form, such as to a wordlist, as was done with the Google Books Ngrams dataset, may provide possible solutions. Not everyone appreciates or understands the value web archives provide society and they are constantly under pressure just to find enough funds to keep the power running. This is an area where partnering with researchers may help: there are only a few sources of funding for the creation and operation of web archives compared with the myriad funding opportunities for research. The increased bandwidth, hardware load, and other resource requirements of large data mining projects comes at a real cost, but at the same time, it directly demonstrates the value of those archives to new audiences and disciplines that may be able to partner with those archives on proposals, potentially offering new funding opportunities. USER INSIGHT While some archives cannot offer access to their holdings for legal reasons and instead serve only as an archive of last resort, most archives would hold little value to their constituents if they were not able to provide some level of access to the content they archived. User interfaces as a whole today are designed for casual browsing by non expert users, with simplicity and ease of use as their core principles. As archives become a growing source for scholarly research, archives must address several key areas of need in supporting these more advanced users: Inventory. There is a critical need for better visibility into the precise holdings of each archive. With most digital libraries of digitized materials a visitor can browse through the collection from start to end, though even there one usually can t export a CSV file containing a master list of everything in that collection. Most web archives, on the other hand, are accessible only through a direct lookup mechanism where the user types in a URL and gets back any matching snapshots. Archives only store copies of material, they don t provide an index to it or even a listing of what they hold: it is assumed that this role is provided elsewhere. For domains that have been deleted or now house unrelated content, this is not always the case. This would be akin to libraries dropping their reading rooms, stacks, and card catalogs, and storing all of their books in a robotic warehouse. Instead of browsing or requesting a book by title or category, one could only request a book by its ISBN code, which had to be known beforehand, and it was someone else s responsibility to store those codes. A tremendous step forward would be a list from each archive of all of the root domains that it has one or more pages from, but ultimately having a list of all URLs, along with the number of snapshots and a list of the dates of those snapshots would really enable an entirely new form of access to these archives. This data could be used by researchers and others to come up with new ways of accessing and interacting with the data held by these archives. Meta Search. With better inventory data, we could build metasearch tools that act as the digital equivalent of WorldCat for web archives. Web archives today operate more like document archives than libraries: they hold content, but they themselves often have no idea the full extent of what they hold. A scholar looking for a particular print document might have to spend months or even years scouring archives all over the world looking for one that holds a copy of that document, whereas if she was looking for a book, a simple search on WorldCat would turn

8 up a list of every participating library that held a copy in their electronic catalog. This is possible because libraries have invested in maintaining inventories of their holdings and standardizing the way in which those inventories are stored so that third parties can aggregate and develop services that allow users to search across those inventories. Imagine being able to type in a URL and see every copy from every web archive in the world, rather than just the copies held by any one archive. Specialty Archives. Metasearch would allow federated search across all archives, but this also raises the concern about backups of smaller specialty archives. Larger whole web archives like the Internet Archive still can t possibly archive everything that exists. Specialty archives fill this niche, often with institutional focuses or through a researcher creating an archive of material on a particular niche topic for her own use. Often these archives are created for a particular research project and then discarded when that paper is published. How do we bring these into the fold? Perhaps some mechanism is needed for allowing those archives to submit to a network of web archives and say essentially if you re interested, here you go? They would need to be marked separately, since their content was produced outside of the main archive s processes, but as web crawlers become easier to use and more researchers create their own specialty curated collections, should we have mechanisms to allow them to be archived, to leverage their resources to penetrate areas of the web we might not be able to? Citability. For archives to be useful in scholarly research, a particular snapshot of a page must have a permanent identifier that can be cited in the references list of a publication. The Internet Archive provides an ideal example of this, in which each snapshot has its own permanent URL that includes both the page URL and the exact timestamp of that snapshot. This URL can be cited in a publication in the same format as any other webpage. Yet, not every archive provides this type of access, some make use of AJAX (interactive JavaScript applications) that provide a more desktop like browsing experience, but mask the URL for each snapshot, making it impossible to point others to that copy. TECHNICAL INSIGHT In the modern era libraries and archives have existed decoupled from their researchers: a professional class collected and curated their collections and scholars traveled about to whichever institutions held the materials they needed. Few records exist as to why a given library collected this work rather than that one, and as scholars we simply accept this. Yet, perhaps in the digital era we can do better, as most of these decisions are stored in s, memos, and other materials, all of them searchable and indexable. Web crawlers are seeded with starting URLs and crawl based on deterministic software algorithms, both of which can be documented for scholars. Most web archives operate as black boxes designed for casual browsing and retrieval of individual objects, without asking too many questions about how that object got there. This is in stark contrast to digitized archives, in which every conceivable piece of metadata is collected. A visitor to the Internet Archive today encounters an odd experience: retrieving a digitized book yields a wealth of information on how that digital copy came to be, from the specific library it came from to the name of the person who operated the scanner that photographed it, while retrieving a web page yields only a list of available snapshot dates. Snapshot Timestamps. All archives store an internal timestamp recording the precise moment when a page snapshot was downloaded, but their user interfaces often mask this information. For example, when examining changes in White House press releases, we found that clicking on

9 a snapshot for April 4, 2001 in the Internet Archive would always take us to a snapshot of the page we requested, but if we looked in the URL bar (Internet Archive includes the timestamp of the snapshot in the URL), we noticed that occasionally the snapshot we were ultimately given was from days or weeks before or after our requested date. Upon further research, we found that some archives automatically redirect a user to the nearest date when a given snapshot date becomes unavailable due to hardware failure or other reasons. This is an ideal behavior for a casual user, but for an expert user tracing how changes in a page correspond to political events occurring each day, this is problematic. Archives should provide a notice when a requested snapshot is not available, allowing the user to decide whether to proceed to the closest available one, or select another date. Page Versus Site Timestamps. Some archives display only a single timestamp for all pages collected from a given site during a particular crawl: usually the time at which the crawlers started archiving that site. Even a medium sized site may take hours or days to fully crawl when rate limiting and other factors are taken into account, and for some users it is imperative to know the precise moment when each page was requested, not when the crawlers first entered the site. Most archives store this information, so it is simply a matter of providing access to it via the user interface for those users requesting it. Crawl Algorithms. Not every site can be crawled to its entirety: some sites may simply be too large or have complex linking structures that make it difficult to find every page, or they may be dynamically generated. Some research questions may be affected by the algorithm used to crawl the site (depth first vs breadth first), the seed URLs used to enter the site (the front page, table of contents pages, content pages, etc), where it aborted the crawl (if it did), which pages errored during the crawl (and thus whose links were not crawled), etc. If, for example, one wishes to estimate the size of a dynamic database driven website, such factors can be used to draw estimates on its total size and composition, but only if users can access these technical characteristics of the crawl. Raw Source Access. Current archives are designed to provide a transparent time machine view of the web, where clicking on a snapshot attempts to render the page in a modern browser in a way that reproduces what it originally looked like when it was captured, as faithfully as possible. However, a page might contain embedded HTML instructions such as a <META REFRESH> tag or JavaScript code that automatically forwards the browser to a new URL. This may happen transparently without the user noticing. In our study of White House press releases, we were especially interested in pages that had been blanked out, where a press release had been replaced with a <META REFRESH> tag and an editorial comment in an HTML comment in the page. Clicking on these pages using the Internet Archive interface just forwarded us to the new URL indicated by the refresh command, so we had to download the pages via a special downloading software package so we could review the source code of the page without being redirected. This is a relatively rare scenario, but it would be helpful for archives to provide a view source mode, where clicking on a snapshot takes the user right to the source code of a page, instead of trying to display the page. Crawler Physical Location. Several major foreign news outlets embargo content or present different selections or versions of their content depending on where the visitor s computer is physically located. A visitor accessing such a site will see a very different picture depending on whether she is in the United States, the United Kingdom, China, or Russia. This is actually growing as an issue, as more sites adopt content management systems that dynamically adjust the structure and layout of the site for each individual visitor based on their actions as they click through the site. Analyses of such sites require information on where the crawlers were physically located and the exact order of pages they requested from the site. As with the other

10 recommendations listed above, this information is already held by most archives, it is simply a matter of making it more available to users. FIDELITY AND LINKAGE Fidelity. Modern web archiving platforms capture not only the HTML code of a page, but also interpret the HTML and associated CSS files to compile a list of images, CSS files, JavaScript code, and other files necessary to properly display the page and archive these as well. The rise of interactive and highly multimedia web pages is challenging this approach, as pages may have embedded Flash or AJAX/JavaScript applications, streaming video, and embedded widgets displaying information from other sites. No longer limited to high design or highly technical sites, these features are making their way into more traditional websites, such as the news media. For example, the BBC s website includes both Flash and JavaScript animations on its front page, while the Chicago Tribune s site includes Flash animations on its front page that respond to mouseovers and animate or perform other actions. BBC also includes an embedded JavaScript widget that displays advertisements. Both sites include extensive embedded streaming Flash based video. Many of these tools reference data or JavaScript code on other sites: for example many sites now make use of Google s Visualization API toolkit for interactive graphs and displays and simply link to the code housed on Google s site. On the one hand, we might dismiss advertisements and embedded content as not worth archiving, yet a rich literature in the advertising discipline addresses the psychological impact of advertisements and other sidebar material on the processing of information in the web era. Even digitized historical newspaper archives have been very careful to offer access to the entire scanned page image to allow scholars to access advertisements and layout information, rather than just focusing on the article text. Excluding dynamic content will make it impossible for scholars of the future to understand how advertisements were used on the web. Yet, simply saving a copy of a Flash or AJAX widget may not be sufficient, as technical dependencies may render them unexecutable 20 years from now. One possibility might be creating a screen capture of each page as it is archived, to provide at least a coarse snapshot of what that page looked like to a visitor of the time period. Web/Social Linkage. Many sites are making use of social media platforms like Twitter and Facebook as part of their overall information ecosystem. For example, the front page of the Chicago Tribune prominently links to its Facebook page, where editors post a curated assortment of links to Tribune content over the course of each day. Visitors like stories and post comments on the Facebook summary of the story, creating a rich environment of commentary that exists in parallel to the original webpage on the Tribune site. Other sites allow commentary directly on their webpages through a user comments section. Some sites may only allow comments for a few days after a page is posted, while others may allow comments years later. This social narrative is an integral part of the content seen by visitors of the time, yet how do we properly preserve this material, especially from linked social media platform profiles? CONCLUSIONS AND THE ROLE OF ARCHIVES As web archives mature and expand, a growing question revolves around the role of these archives in society. What should their primary mission(s) be and how can they best fulfill those roles? At their most basic level, I believe web archives fulfill three primary roles: Preservation, Research, and Authentication, in that order.

11 Preservation. First and foremost, web archives preserve the web. They act as the web equivalent of the archive or library, constantly monitoring for new content, requesting a copy of that content, and keeping a copy of it for posterity. In this role, their mission is to acquire and preserve the web for future generations, with access being primarily through basic browsing and retrieval. Some archives, for legal purposes, may not even be able to provide access to their holdings during the lifetime of the organizations providing them content, instead holding that material under embargo for a certain number of years, but ensuring its continued survival for future generations. Research. A unique and emerging use of archives is as a research service for scholars. Very few academics, especially in the social sciences and humanities, have the computational expertise or resources to crawl and download large portions of the web for research. Commercial web crawling companies like Google do not provide their data for research, and thus web archives provide a fundamentally unique and enabling resource for the study of the web that scholars can turn to. Even more critically, many key humanities and social science questions revolve around how ideas and communication change over time, and web archives capture the only view of change on the web. In this role, the secondary mission of archives is to provide access to their holdings that goes beyond the basic browsing needed for casual use or deep scholarly reading of a small number of works, towards providing programmatic tools and access policies that support computational data mining of large portions of their holdings. Authentication. A final emerging use of archives is as an authentication service. Web data is highly mutable, changing constantly, and there is no way to authenticate whether the page I see today is the same as what I saw yesterday, especially if the change is a small one. It took more than five years for changes to White House press releases to be spotted via copies held in the Internet Archive, and even then the discovery was entirely by accident. Third party archives allow authentication of what a page looked like at a given moment. One could even imagine someday a browser plugin that, as a user browsed certain sites on the web (such as government pages, perhaps medical or other pages), would compare each page with the most recent copy stored by a network of web archives, and display an indicator to the user as to whether the page has changed since it was last archived, as well as highlight those changes. In this role, the third peripheral mission of the web archive is to act as a disinterested third party that can authenticate and verify the contents of a given web page at a given moment in time. Wikipedia offers an intriguing vision of what the ultimate web archive might look like. Every edit to every page since the inception of the site has been archived and is available at a mouse click, allowing a visitor or scholar to trace the entire history of every word. Every operation taken on the site and the complete source code to every algorithm used for various automated processes are fully documented and make available, offering complete technical transparently. Finally, a dedicated bulk download page is maintained in which researchers may download a ZIP file containing the entirety of the site and every edit ever performed, which has made Wikipedia a mainstay of considerable social and computer science research. As our digital world continues to grow at a breathtaking pace and more and more of our daily live occurs within its digital boundaries, we must ensure that web archives are there to preserve our collective global consciousness for future generations.

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

Web Archiving and Scholarly Use of Web Archives

Web Archiving and Scholarly Use of Web Archives Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on

More information

The Complete Guide to CUSTOM FIELD SERVICE APPLICATIONS

The Complete Guide to CUSTOM FIELD SERVICE APPLICATIONS The Complete Guide to CUSTOM FIELD SERVICE APPLICATIONS Copyright 2014 Published by Art & Logic All rights reserved. Except as permitted under U.S. Copyright Act of 1976, no part of this publication may

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Beyond Responsive Design (for Online Retailers): Delivering Custom Mobile Experiences for Multiple Touch Points

Beyond Responsive Design (for Online Retailers): Delivering Custom Mobile Experiences for Multiple Touch Points Beyond Responsive Design (for Online Retailers): Delivering Custom Mobile Experiences for Multiple Touch Points When the Internet first started to become popular and widespread, webpage design was a relatively

More information

The Public Sector Guide to Social Media Strategy and Policy

The Public Sector Guide to Social Media Strategy and Policy The Public Sector Guide to Social Media Strategy and Policy Use social media with confidence. This guide contains practical steps that will help public sector agencies, organizations and departments develop

More information

FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA

FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA Kalev H. Leetaru Yahoo! Fellow in Residence Georgetown University kalev.leetaru5@gmail.com http://www.kalevleetaru.com FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA AUDIENCE QUESTION Have you

More information

Why HTML5 Tests the Limits of Automated Testing Solutions

Why HTML5 Tests the Limits of Automated Testing Solutions Why HTML5 Tests the Limits of Automated Testing Solutions Why HTML5 Tests the Limits of Automated Testing Solutions Contents Chapter 1 Chapter 2 Chapter 3 Chapter 4 As Testing Complexity Increases, So

More information

Oglethorpe University. CRS410 Internship in Communications. Debra Bryant, Web Content Intern. December 10, 2012

Oglethorpe University. CRS410 Internship in Communications. Debra Bryant, Web Content Intern. December 10, 2012 Website Development and Design: Real World Experience Debra Oglethorpe University CRS410 Internship in Communications Debra, Web Content Intern December 10, 2012 Experience Website Development and Design:

More information

SEO 360: The Essentials of Search Engine Optimization INTRODUCTION CONTENTS. By Chris Adams, Director of Online Marketing & Research

SEO 360: The Essentials of Search Engine Optimization INTRODUCTION CONTENTS. By Chris Adams, Director of Online Marketing & Research SEO 360: The Essentials of Search Engine Optimization By Chris Adams, Director of Online Marketing & Research INTRODUCTION Effective Search Engine Optimization is not a highly technical or complex task,

More information

Mobile Strategy and Design

Mobile Strategy and Design Mobile Strategy and Design A Guide for Publishers December 5, 2011 www.xtenit.com US: 01.877.XTENIT.1 International: 01.212.646.9070 Overview This paper outlines mobile strategies and deployment guidelines

More information

Scholarly Use of Web Archives

Scholarly Use of Web Archives Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png

More information

The objective setting phase will then help you define other aspects of the project including:

The objective setting phase will then help you define other aspects of the project including: Web design At the start of a new or redesign web project, an important first step is to define the objectives for the web site. What actions do you want visitors to take when they land on the web site?

More information

Leveraging Social Media

Leveraging Social Media Leveraging Social Media Social data mining and retargeting Online Marketing Strategies for Travel June 2, 2014 Session Agenda 1) Get to grips with social data mining and intelligently split your segments

More information

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA) Data Driven Success Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA) In business, data is everything. Regardless of the products or services you sell or the systems you support,

More information

Search Engine Optimization

Search Engine Optimization Search Engine Optimization Search An Introductory Guide How to improve the effectiveness of your web site through better search engine results. As you ve probably learned, having a Web site is almost a

More information

CASE STUDY: SPIRAL16

CASE STUDY: SPIRAL16 CASE STUDY: SPIRAL16 The Rise of the Social Consumer: A graphical representation BACKGROUND Spiral16, as the company states, stands apart from other monitoring applications because we work like a search

More information

Using Google Analytics

Using Google Analytics Using Google Analytics Overview Google Analytics is a free tracking application used to monitor visitors to your website in order to provide site designers with a fuller knowledge of their audience. At

More information

Tables in the Cloud. By Larry Ng

Tables in the Cloud. By Larry Ng Tables in the Cloud By Larry Ng The Idea There has been much discussion about Big Data and the associated intricacies of how it can be mined, organized, stored, analyzed and visualized with the latest

More information

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE Michael Diederich, Microsoft CMG Research & Insights Introduction The rise of social media platforms like Facebook and Twitter has created new

More information

Whitepaper Series. Search Engine Optimization: Maximizing opportunity,

Whitepaper Series. Search Engine Optimization: Maximizing opportunity, : Maximizing opportunity, visibility and profit for Economic development organizations Creating and maintaining a website is a large investment. It may be the greatest website ever or just another website

More information

1. Target Keyword: Digital publishing Page Title: Extend your Mobile Reach with ASO for Apps

1. Target Keyword: Digital publishing Page Title: Extend your Mobile Reach with ASO for Apps 1. Target Keyword: Digital publishing Page Title: Extend your Mobile Reach with ASO for Apps There are hundreds of millions of active mobile app users currently. Together they've downloaded over 50 billion

More information

WHITE PAPER. CRM Evolved. Introducing the Era of Intelligent Engagement

WHITE PAPER. CRM Evolved. Introducing the Era of Intelligent Engagement WHITE PAPER CRM Evolved Introducing the Era of Intelligent Engagement November 2015 CRM Evolved Introduction Digital Transformation, a key focus of successful organizations, proves itself a business imperative,

More information

SPRING 14 RELEASE NOTES

SPRING 14 RELEASE NOTES SPRING 14 RELEASE NOTES At Salesforce ExactTarget Marketing Cloud your success is our top priority and we re working hard to continuously improve the Marketing Cloud solutions you use. We recently reached

More information

Time to stop departmentalising Start thinking integration. ud it s time to start thinking

Time to stop departmentalising Start thinking integration. ud it s time to start thinking ClarksClou Time to stop departmentalising Start thinking integration ud it s time to start thinking Integrate all your IT systems to make everything work together. Streamlining means benefits in system

More information

1. Layout and Navigation

1. Layout and Navigation Success online whether measured in visits, ad revenue or ecommerce transactions requires compelling content and intuitive design. It all starts with the fundamentals: the key building blocks to create

More information

Get results with modern, personalized digital experiences

Get results with modern, personalized digital experiences Brochure HP TeamSite What s new in TeamSite? The latest release of TeamSite (TeamSite 8) brings significant enhancements in usability and performance: Modern graphical interface: Rely on an easy and intuitive

More information

The CBD Guide to Search Optimization. market what s meaningful

The CBD Guide to Search Optimization. market what s meaningful market what s meaningful The CBD Guide to Search Optimization Mike Donofrio, VP, Digital Strategy and Design Richard Inman, Research Assistant What You ll Learn The main factors used by search engines

More information

Discover the best keywords for your online marketing campaign

Discover the best keywords for your online marketing campaign Discover the best keywords for your online marketing campaign Index n... 3 Keyword discovery using manual methodology... 5 Step 1: Keyword analysis and search... 6 Step 2... 10 Additional tools... 11 Competitors...

More information

Leveraging Global Media in the Age of Big Data

Leveraging Global Media in the Age of Big Data WHITE PAPER Leveraging Global Media in the Age of Big Data Introduction Global media has the power to shape our perceptions, influence our decisions, and make or break business reputations. No one in the

More information

WordPress SEO 101 http://philacsinclair.com

WordPress SEO 101 http://philacsinclair.com WordPress SEO 101 http://philacsinclair.com Copyright All rights reserved worldwide. YOUR RIGHTS: This book is restricted to your personal use only. It does not come with any other rights. LEGAL DISCLAIMER:

More information

Website Design Checklist

Website Design Checklist Website Design Checklist Use this guide before you begin building your website to ensure that your website maximizes its potential for your company. 3 THING YOU SHOULD NEVER SAY ON YOUR WEBSITE (That I

More information

Profitable vs. Profit-Draining Local Business Websites

Profitable vs. Profit-Draining Local Business Websites By: Peter Slegg (01206) 433886 07919 921263 www.besmartmedia.com peter@besmartmedia.com Phone: 01206 433886 www.besmartmedia.com Page 1 What is the Difference Between a Profitable and a Profit-Draining

More information

Website Promotion for Voice Actors: How to get the Search Engines to give you Top Billing! By Jodi Krangle http://www.voiceoversandvocals.

Website Promotion for Voice Actors: How to get the Search Engines to give you Top Billing! By Jodi Krangle http://www.voiceoversandvocals. Website Promotion for Voice Actors: How to get the Search Engines to give you Top Billing! By Jodi Krangle http://www.voiceoversandvocals.com Why have a website? If you re busier than you d like to be

More information

Developing a Backup Strategy for Hybrid Physical and Virtual Infrastructures

Developing a Backup Strategy for Hybrid Physical and Virtual Infrastructures Virtualization Backup and Recovery Solutions for the SMB Market The Essentials Series Developing a Backup Strategy for Hybrid Physical and Virtual Infrastructures sponsored by Introduction to Realtime

More information

A RE YOU SUFFERING FROM A DATA PROBLEM?

A RE YOU SUFFERING FROM A DATA PROBLEM? June 2012 A RE YOU SUFFERING FROM A DATA PROBLEM? DO YOU NEED A DATA MANAGEMENT STRATEGY? Most businesses today suffer from a data problem. Yet many don t even know it. How do you know if you have a data

More information

Archiving the Social Web MARAC Spring 2013 Conference

Archiving the Social Web MARAC Spring 2013 Conference Archiving the Social Web MARAC Spring 2013 Conference April 2013 Lori Donovan Partner Specialist Internet Archive About Internet Archive We are a Digital Library Mission Statement: Universal access to

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Michelle Light, University of California, Irvine EAD @ 10, August 31, 2008. The endangerment of trees

Michelle Light, University of California, Irvine EAD @ 10, August 31, 2008. The endangerment of trees Michelle Light, University of California, Irvine EAD @ 10, August 31, 2008 The endangerment of trees Last year, when I was participating on a committee to redesign the Online Archive of California, many

More information

Marketing automation is a buzzword in the marketing world. This chapter

Marketing automation is a buzzword in the marketing world. This chapter In This Chapter Chapter 1 Introducing the Concepts of Marketing Automation Defining marketing automation Defining the modern buyer Knowing why companies implement marketing automation Starting the conversation

More information

How Local Businesses Can Use Mobile Applications to Attract and Retain More Customers

How Local Businesses Can Use Mobile Applications to Attract and Retain More Customers How Local Businesses Can Use Mobile Applications to Attract and Retain More Customers Contents 1. Why not going mobile is unthinkable, for any business 2. How mobile apps can attract more customers 3.

More information

Managing the National Park Service in the Information Age

Managing the National Park Service in the Information Age Managing the National Park Service in the Information Age Harry Butowsky Introduction In 2008, the National Parks Conservation Association convened the National Parks Second Century Commission, which was

More information

Designing for the Web

Designing for the Web Designing for the Web Design Issues Technical Issues Some Web Design Issues Breadth vs. Depth Navigation vs. Content Seller vs. Buyer (i.e., Designer vs. User) Colors and Images Several images in this

More information

SEO AND CONTENT MANAGEMENT SYSTEM

SEO AND CONTENT MANAGEMENT SYSTEM International Journal of Electronics and Computer Science Engineering 953 Available Online at www.ijecse.org ISSN- 2277-1956 SEO AND CONTENT MANAGEMENT SYSTEM Savan K. Patel 1, Jigna B.Prajapati 2, Ravi.S.Patel

More information

The mobile opportunity: How to capture upwards of 200% in lost traffic

The mobile opportunity: How to capture upwards of 200% in lost traffic June 2014 BrightEdge Mobile Share Report The mobile opportunity: How to capture upwards of 200% in lost traffic You ve likely heard that mobile website optimization is the next frontier, and you ve probably

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Archiving Social Media in Senators Offices

Archiving Social Media in Senators Offices Archiving Social Media in Senators Offices Records created as a result of work conducted for the Senator (excluding committee records) are the Senator s personal property and should be retained as part

More information

Practical Options for Archiving Social Media

Practical Options for Archiving Social Media Practical Options for Archiving Social Media Content Summary for ALGIM Web-Symposium Presentation 03/05/11 Euan Cochrane, Senior Advisor, Digital Continuity, Archives New Zealand, The Department of Internal

More information

SharePoint 2010 vs. SharePoint 2013 Feature Comparison

SharePoint 2010 vs. SharePoint 2013 Feature Comparison SharePoint 2010 vs. SharePoint 2013 Feature Comparison March 2013 2013 SUSAN HANLEY LLC SharePoint 2010 vs. 2013 From a document collaboration perspective, the structures of both versions are the same

More information

Sources: Summary Data is exploding in volume, variety and velocity timely

Sources: Summary Data is exploding in volume, variety and velocity timely 1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding

More information

Protecting Data with a Unified Platform

Protecting Data with a Unified Platform Protecting Data with a Unified Platform The Essentials Series sponsored by Introduction to Realtime Publishers by Don Jones, Series Editor For several years now, Realtime has produced dozens and dozens

More information

Power Tools for Pivotal Tracker

Power Tools for Pivotal Tracker Power Tools for Pivotal Tracker Pivotal Labs Dezmon Fernandez Victoria Kay Eric Dattore June 16th, 2015 Power Tools for Pivotal Tracker 1 Client Description Pivotal Labs is an agile software development

More information

Stanford Newspaper Visualization

Stanford Newspaper Visualization Stanford Newspaper Visualization By Lisa Louise Cooke Data Visualization is a growing trend online and Stanford University s Rural West Initiative at the Bill Lane Center for the American West uses this

More information

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012 Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets

More information

QUICK FEATURE GUIDE OF SNAPPII'S ULTRAFAST CODELESS PLATFORM

QUICK FEATURE GUIDE OF SNAPPII'S ULTRAFAST CODELESS PLATFORM QUICK FEATURE GUIDE OF SNAPPII'S ULTRAFAST CODELESS PLATFORM (* Click on the screenshots to enlarge) TABLE OF CONTENTS 1. Visually Develop Mobile Applications 2. Build Apps for Any Android or ios Device

More information

WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT

WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT CHAPTER 1 WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT SharePoint 2013 introduces new and improved features for web content management that simplify how we design Internet sites and enhance the

More information

Wiki Server. Innovative tools for workgroup collaboration and communication. Features

Wiki Server. Innovative tools for workgroup collaboration and communication. Features Wiki Server Innovative tools for workgroup collaboration and communication. Features Single site for group collaboration Project-specific wiki accessible via web browsers on Mac, PC, iphone, and ipod touch

More information

Draft Response for delivering DITA.xml.org DITAweb. Written by Mark Poston, Senior Technical Consultant, Mekon Ltd.

Draft Response for delivering DITA.xml.org DITAweb. Written by Mark Poston, Senior Technical Consultant, Mekon Ltd. Draft Response for delivering DITA.xml.org DITAweb Written by Mark Poston, Senior Technical Consultant, Mekon Ltd. Contents Contents... 2 Background... 4 Introduction... 4 Mekon DITAweb... 5 Overview of

More information

MARKETING KUNG FU SEO: Key Things to Expand Your Digital Footprint. A Practical Checklist

MARKETING KUNG FU SEO: Key Things to Expand Your Digital Footprint. A Practical Checklist MARKETING KUNG FU SEO: Key Things to Expand Your Digital Footprint A Practical Checklist 1 1. Content Development... Page 3 2. Content Organization... Page 4 3. META Data... Page 5 4. Fix Errors... Page

More information

INTRODUCTION TO THE WEB

INTRODUCTION TO THE WEB INTRODUCTION TO THE WEB A beginner s guide to understanding and using the web 3 September 2013 Version 1.2 Contents Contents 2 Introduction 3 Skill Level 3 Terminology 3 Video Tutorials 3 How Does the

More information

Deep analysis of a modern web site

Deep analysis of a modern web site Deep analysis of a modern web site Patrick Lambert November 28, 2015 Abstract This paper studies in details the process of loading a single popular web site, along with the vast amount of HTTP requests

More information

The Essential Guide to Native Advertising. The Rise of a Digital Ad Format and Best Practices for Commanding Audience Attention

The Essential Guide to Native Advertising. The Rise of a Digital Ad Format and Best Practices for Commanding Audience Attention INSIGHT SERIES The Essential Guide to Native Advertising The Rise of a Digital Ad Format and Best Practices for Commanding Audience Attention In digital advertising, ad formats have always fallen into

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

!!!! CONSENT-BASED EMAIL SYSTEM! INKSWITCH MARKETSPACE! Custom Built Online Marketing That Works For You VERSION 1.0 NOVEMBER 12, 2014

!!!! CONSENT-BASED EMAIL SYSTEM! INKSWITCH MARKETSPACE! Custom Built Online Marketing That Works For You VERSION 1.0 NOVEMBER 12, 2014 INKSWITCH MARKETSPACE Custom Built Online Marketing That Works For You CONSENT-BASED EMAIL SYSTEM VERSION 1.0 NOVEMBER 12, 2014 support@inkswitch.com 2815 W. Pebble Rd. Unit 303, Las Vegas, NV 89123 800-465-5890

More information

SEO Definition. SEM Definition

SEO Definition. SEM Definition SEO Definition Search engine optimization (SEO) is the process of improving the volume and quality of traffic to a web site from search engines via "natural" ("organic" or "algorithmic") search results.

More information

SmallBiz Dynamic Theme User Guide

SmallBiz Dynamic Theme User Guide SmallBiz Dynamic Theme User Guide Table of Contents Introduction... 3 Create Your Website in Just 5 Minutes... 3 Before Your Installation Begins... 4 Installing the Small Biz Theme... 4 Customizing the

More information

Facebook SVEA TRAINING MODULES. www.svea-project.eu

Facebook SVEA TRAINING MODULES. www.svea-project.eu Facebook SVEA TRAINING MODULES www.svea-project.eu Author: Louis Dare, Coleg Sir Gâr Project Coordinator: MFG Baden-Württemberg mbh Public Innovation Agency for ICT and Media Petra Newrly Breitscheidstr.

More information

Moreketing. With great ease you can end up wasting a lot of time and money with online marketing. Causing

Moreketing. With great ease you can end up wasting a lot of time and money with online marketing. Causing ! Moreketing Automated Cloud Marketing Service With great ease you can end up wasting a lot of time and money with online marketing. Causing frustrating delay and avoidable expense right at the moment

More information

Spotfire and Tableau Positioning. Summary

Spotfire and Tableau Positioning. Summary Licensed for distribution Summary Both TIBCO Spotfire and Tableau allow users of various skill levels to create attractive visualizations of data, displayed as charts, dashboards and other visual constructs.

More information

What is Prospect Analytics?

What is Prospect Analytics? What is Prospect Analytics? Everything you need to know about this new sphere of sales and marketing technology and how it can improve your business Table of Contents Executive Summary... 2 The Power of

More information

MANAGEMENT SUMMARY INTRODUCTION KEY MESSAGES. Written by: Michael Azoff. Published June 2015, Ovum

MANAGEMENT SUMMARY INTRODUCTION KEY MESSAGES. Written by: Michael Azoff. Published June 2015, Ovum App user analytics and performance monitoring for the business, development, and operations teams CA Mobile App Analytics for endto-end visibility CA Mobile App Analytics WWW.OVUM.COM Written by: Michael

More information

Firefox for Android. Reviewer s Guide. Contact us: press@mozilla.com

Firefox for Android. Reviewer s Guide. Contact us: press@mozilla.com Reviewer s Guide Contact us: press@mozilla.com Table of Contents About Mozilla Firefox 1 Move at the Speed of the Web 2 Get Started 3 Mobile Browsing Upgrade 4 Get Up and Go 6 Customize On the Go 7 Privacy

More information

Figure 1: A Free, Crowd-Sourced Medical Image Database for Your iphone

Figure 1: A Free, Crowd-Sourced Medical Image Database for Your iphone Figure 1 App Review: The exclusive Instagram for Physicians 2 days ago by David Ahn, MD 3 App Review Featured I ve had that very same idea! was my first response when Dr. Joshua Landy contacted me about

More information

Pinterest has to be one of my favourite Social Media platforms and I m not alone!

Pinterest has to be one of my favourite Social Media platforms and I m not alone! Pinterest has to be one of my favourite Social Media platforms and I m not alone! With 79.3 million users, 50 billion pins and 1 billion boards it is host to an enormous amount of content. But many of

More information

Website Planning Questionnaire. Introduction. Thank you for your interest in the services of The Ultimate Answer!

Website Planning Questionnaire. Introduction. Thank you for your interest in the services of The Ultimate Answer! Website Planning Questionnaire Colleen Rice Nelson Introduction Thank you for your interest in the services of The Ultimate Answer! Every choice and decision you make concerning your website may or may

More information

Executive Dashboard Cookbook

Executive Dashboard Cookbook Executive Dashboard Cookbook Rev: 2011-08-16 Sitecore CMS 6.5 Executive Dashboard Cookbook A Marketers Guide to the Executive Insight Dashboard Table of Contents Chapter 1 Introduction... 3 1.1 Overview...

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

CONTENT MARKETING AND SEO

CONTENT MARKETING AND SEO CONTENT MARKETING AND SEO How to Use Content Marketing and SEO to Reach Customers and Business Goals What do you consider the most essential ingredient for your business s marketing success? In today s

More information

Front-End Performance Testing and Optimization

Front-End Performance Testing and Optimization Front-End Performance Testing and Optimization Abstract Today, web user turnaround starts from more than 3 seconds of response time. This demands performance optimization on all application levels. Client

More information

How to Get Your Website on the Internet: Web Hosting Basics

How to Get Your Website on the Internet: Web Hosting Basics The Web Host Advisor How to Get Your Website on the Internet: Web Hosting Basics Copyright 2012 by The Web Host Advisor Table of Contents Why Do You Want a Website page 3 What Kind of Website do You Want?

More information

DKAN. Data Warehousing, Visualization, and Mapping

DKAN. Data Warehousing, Visualization, and Mapping DKAN Data Warehousing, Visualization, and Mapping Acknowledgements We d like to acknowledge the NuCivic team, led by Andrew Hoppin, which has done amazing work creating open source tools to make data available

More information

Best Practices of Mobile Marketing

Best Practices of Mobile Marketing Best Practices of Mobile Marketing With the advent of iphone, Android phones, and tablets, adoption of the mobile is contagious, and will continue in the coming years as well. The market penetration of

More information

Social Media Monitoring: Engage121

Social Media Monitoring: Engage121 Social Media Monitoring: Engage121 User s Guide Engage121 is a comprehensive social media management application. The best way to build and manage your community of interest is by engaging with each person

More information

IT & Small Businesses. It can help grow your small business and cut cost where you never thought possible.

IT & Small Businesses. It can help grow your small business and cut cost where you never thought possible. It can help grow your small business and cut cost where you never thought possible. Contents Introduction Cutting Cost Saving Time Creating a Competitive Advantages Conclusion 3 4 9 12 13 2 Title of the

More information

INTRODUCING AZURE SEARCH

INTRODUCING AZURE SEARCH David Chappell INTRODUCING AZURE SEARCH Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents Understanding Azure Search... 3 What Azure Search Provides...3 What s Required to

More information

{ { Calculating Your Social Media Marketing Return on Investment. A How-To Guide for New Social Media Marketers. Peter Ghali - Senior Product Manager

{ { Calculating Your Social Media Marketing Return on Investment. A How-To Guide for New Social Media Marketers. Peter Ghali - Senior Product Manager { { Calculating Your Social Media Marketing Return on Investment A How-To Guide for New Social Media Marketers Peter Ghali - Senior Product Manager This guide provides practical advice for developing a

More information

2015 SEO AND Beyond. Enter the Search Engines for Business. www.thinkbigengine.com

2015 SEO AND Beyond. Enter the Search Engines for Business. www.thinkbigengine.com 2015 SEO AND Beyond Enter the Search Engines for Business www.thinkbigengine.com Including SEO Into Your 2015 Marketing Campaign SEO in 2015 is tremendously different than it was just a few years ago.

More information

Making Your Marketing Interactive

Making Your Marketing Interactive Making Your Marketing Interactive New Opportunities to Engage Customers with Live Chat Companies around the world are using live chat to boost online sales, reduce customer service costs and increase customer

More information

HTML5 & Digital Signage

HTML5 & Digital Signage HTML5 & Digital Signage An introduction to Content Development with the Modern Web standard. Presented by Jim Nista CEO / Creative Director at Insteo HTML5 - the Buzz HTML5 is an industry name for a collection

More information

Bricks And Clicks A Look At Today s Retail Marketing Trends

Bricks And Clicks A Look At Today s Retail Marketing Trends Bricks And Clicks A Look At Today s Retail Marketing Trends A Quick and Easy Guide to Digital Advertising for Local Businesses TABLE OF CONTENTS 3 4 7 11 The New Customer Path to Purchase The Rise of Mobile

More information

Instructional Design Framework CSE: Unit 1 Lesson 1

Instructional Design Framework CSE: Unit 1 Lesson 1 Instructional Design Framework Stage 1 Stage 2 Stage 3 If the desired end result is for learners to then you need evidence of the learners ability to then the learning events need to. Stage 1 Desired Results

More information

Getting Started with WordPress. A Guide to Building Your Website

Getting Started with WordPress. A Guide to Building Your Website Getting Started with WordPress A Guide to Building Your Website dfsdsdf WordPress is an amazing website building tool. The goal of this ebook is to help you get started building your personal or business

More information

How Big Data is Different

How Big Data is Different FALL 2012 VOL.54 NO.1 Thomas H. Davenport, Paul Barth and Randy Bean How Big Data is Different Brought to you by Please note that gray areas reflect artwork that has been intentionally removed. The substantive

More information

Flash Is Your Friend An introductory level guide for getting acquainted with Flash

Flash Is Your Friend An introductory level guide for getting acquainted with Flash Flash Is Your Friend An introductory level guide for getting acquainted with Flash by Tom Krupka A Brief History: Adobe Flash, which was previously called Macromedia Flash, is a set of multimedia technologies

More information

Fig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript.

Fig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript. Client-Side Dynamic Web Page Generation CGI, PHP, JSP, and ASP scripts solve the problem of handling forms and interactions with databases on the server. They can all accept incoming information from forms,

More information

EMAIL ARCHIVING. What it is, what it isn t, and how it can improve your business operations

EMAIL ARCHIVING. What it is, what it isn t, and how it can improve your business operations EMAIL ARCHIVING What it is, what it isn t, and how it can improve your business operations OVERVIEW: Why businesses are turning to email archiving As your business grows, communication between you and

More information

HTML5 the new. standard for Interactive Web

HTML5 the new. standard for Interactive Web WHITE PAPER HTML the new standard for Interactive Web by Gokul Seenivasan, Aspire Systems HTML is everywhere these days. Whether desktop or mobile, windows or Mac, or just about any other modern form factor

More information