The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Digital library developers make critical design and implementation decisions in the face of uncertainties about the future. We present a qualitative case study of the Large Synoptic Survey Telescope (LSST), a major astronomy project that will collect and make available large-scale datasets. LSST developers make decisions now, while facing uncertainties about its period of operations (2022-2032). Uncertainties...
This paper explores an interesting new dimension to the challenging problem of predicting long-term scientific impact (LTSI) usually measured by the number of citations accumulated by a paper in the long-term. It is well known that early citations (within 1-2 years after publication) acquired by a paper positively affects its LTSI. However, there is no work that investigates if the set of authors...
Building upon a collection with functionality for discovery and analysis has been described by Lynch as a `layered' approach to digital libraries. Meanwhile, as digital corpora have grown in size, their analysis is necessarily supplemented by automated application of computational methods, which can create layers of information as intricate and complex as those within the content itself. This combination...
News aggregators capably handle the large amount of news that is published nowadays. However, these systems focus on the presentation of important, common information in news, but do not reveal different perspectives on the same topic. Thus, current news aggregators suffer from media bias, i.e. differences in the content or presentation of news. Finding such differences is crucial to reduce the effects...
The web is today's primary publication medium, making web archiving an important activity for historical and analytical purposes. Web pages are increasingly interactive, resulting in pages that are correspondingly difficult to archive. JavaScript enables interactions that can potentially change the client-side state of a representation. We refer to representations that load embedded resources via...
Despite the increasing consumption and popularity of audio-visual materials and non-textual information, recommendation-based information retrieval research regarding these materials remains limited. To provide robust recommendation services to users, it is critical to understand how users describe their needs when they seek audio- visual materials. We conducted a content analysis of 396 recommendation...
Memento TimeMaps list identifiers for archival web captures (URI-Ms). When some URI-Ms are dereferenced, they redirect to a different URI-M instead of a unique representation at the datetime. This suggests that confidently obtaining an accurate count quantifying the number of non-forwarding captures for an Original Resource URI (URI-R) is not possible using a TimeMap alone and that the magnitude of...
In this paper, we explore the concept of augmented document and present a new user experience to digitize a document, modify its layout and edit its content by designing speci c interfaces on multi-touch devices and using advanced techniques in document analysis. This framework exploits image processing tools to facilitate manipulations that are natural considering paper documents and complex in their...
OurDigitalWorld has offered a province-wide digital heritage search portal at OurOntario.ca since 2007, and currently indexes digital objects from over 250 GLAM (galleries, libraries, archives, museums) organizations from across Ontario. The British Columbia Library Association's Provincial Digital Library initiative is laying the foundation for a new provincial digital library in British Columbia...
Web Archiving Integration Layer (WAIL) is a desktop application written in Python that integrates Heritrix and OpenWayback. In this work we recreate and extend WAIL from the ground up to facilitate collection-based personal Web archiving. Our new iteration of the software, WAIL-Electron, leverages native Web technologies (e.g., JavaScript, Chromium) using Electron to open new potential for Web archiving...
Physical library collections are valuable and long standing resources for knowledge and learning. However, managing and finding books or other volumes on a large collection of bookshelves often leads to tedious manual work, especially for large collections where books or others might be missing or misplaced. Recently, deep neural-based models have been successful in detecting and recognizing text...
Web archives preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This...
Digital collections are increasingly used for a variety of purposes. In Europe only, we can conservatively estimate that tens of thousands of users consult digital libraries daily. The usages are often motivated by qualitative and quantitative research. However, caution must be advised as most digitized documents are indexed through their OCRed version, which is far from perfect, especially for ancient...
The national (non-local) news media has different priorities than the local news media. If one seeks to build a collection of stories about local events, the national news media may be insufficient, with the exception of local news which "bubbles" up to the national news media. If we rely exclusively on national media, or build collections exclusively on their reports, we could be late to...
We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge...
Large-scale digital libraries such as the HathiTrust contain massive quantities of content combined from heterogeneous collections, with consequential challenges in providing mechanisms for discovery, unified access, and analysis. The HathiTrust Research Center has proposed 'worksets' as a solution for users to conduct their research into the 15 million volumes of HathiTrust content; however existing...
Users wish to preserve Internet resources for later use. But what is part of and what is not part of an Internet resource remains an open question. In this paper we examine how specific relationships between web pages affect user perceptions of their being part of the same resource. This study presented participants with pairs of pages and asked about their expectation for having access to the second...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.