Friday, September 12, 2008

Muddiest Point #2

Are we suppose to be digitizing physical materials for our term project? Or is it strictly to be digital materials already to be placed into a database? I just want to make sure so that there isn't more work for the team than is already presented.

Reading Notes Week 3: Representation of Digital Objects

LESK sections 2.1, 2.2, 2.7, chapter 3.
2.1 Computer typesetting:
Almost nothing in commercial setting is typed traditionally. This progression from tradition has led to most current written info. to be machine readable and available in some form via computer.
A progression from filmstrips manipulated by computer to laser printer.
Developed software that assisted in margin justification and line numbering. I.e. Nroff/troff-Bell Labs scribe-CMU TEX-Stanford
Between 1990 and 2000, the # of online databases almost doubles, however most books and journals are not available in full text via internet. This is an economic rather than technical leap.

2.2Text Formats:
Variety of formats starting with ASCII, due to the simplicity of ASCII not all derivations and symbols in all languages are covered. A new standard is proposed for this solution.
Large publishing groups use higher level systems. The 3 main standards are: MARC, SGML and HTML.

2.3Ways of Searching”
Linear Searching – a search algorithm that goes through a file from beginning to end looking for a string.
Inverted Files – elements to be searched are extracted and alphabetized and are then more readily accessible for multiple searches.
Hash tables or coding – computing whether a word appears in a file.

3 Images of Pages

Scanning
Image Formats
Display Requirements
Indexing Images of Pages
Shared Text/ Image Systems
Image Storage vs Book Storage
Large Scale Projects:
Thesaurus Linguae Graecae(1970s)- machine readable form of the corpus of classical Greek lit.
Gallica collection- 100,000 texts from the Bibliotheque Nationale France in image format.

Largest online book project: The Million Book Project by Raj Reddy, CMU- to place 1,000,000 full text books online, although texts that are older, outside of copyright infringement.


Clifford Lynch, “Identifiers and Their Role In Networked Information Applications”. http://www.arl.org/bm~doc/identifier.pdf


Norman Paskin. “Digital Object Indentifier (DOI) System”. Encyclopedia of Library and Information Sciences. http://www.doi.org/overview/080625DOI-ELIS-Paskin.pdf
Bibliographic utility identifier numbers such as the OCLC or RLIN numbers are used in duplicate detection and consolidation in the construction of online union catalog databases. Bibliographic citation can be viewed as an identifier, having many variations in style, and data elements based on editorial policies.

Question for the week: With all these databases coming to fruition you don't see much in popular audiences with these online databases. Even such pseudo-popular sites as Gutenburg.com. Does Google and other such search engines have a negative impact on finding these sites by having tendencies toward placing other searches higher in search lists?

Tuesday, September 9, 2008

Reading Notes: Week 2 with question.

Hussein Suleman and Edward A. Fox. “A Framework for Building Open Digital Libraries”, D-Lib Magazine, December 2001. Volume 7 Number 12. http://www.dlib.org/dlib/december01/suleman/12suleman.html.

The development of a standards and definitions for DL’s is difficult due to the current varied design differences and varied concentrations of specialists and designers, be they IT, librarians or information architects.

REASONS:
Many DLs are built as a response to the needs of a local community, involving personnel with no prior experience.

Most modern DLs have WWW interfaces -- thus formed to resemble the way people use the WWW, however the way person’s interact with the WWW is constantly in flux.

Each DL is aimed at meeting the needs of a particular community -- so the underlying program logic varies vastly among systems.

Most DLs are intended to be quick solutions to urgent community needs -- so not much thought goes into planning for future redeployment of the systems.

DLs, by the need to be responsive to user needs, can be complex, so new projects sometimes choose to develop from the ground up. It is cheaper than adapting a preconceived framework to a new scenario. And as these systems become more complex maintenance becomes more time consuming and difficult.

A natural solution would be to create software toolkits that would assist in a standard component model. This would in turn allow for broader connections between particular DLs and promote an ability to cultivate an easily extensible field of individual DLs. It is widely accepted as good software engineering practice to adopt some form of component model in regards to any encroaching advancements that could solidify and promote the DL field.



William Y. Arms , Christophe Blanchi, Edward A. Overly. ”An Architecture for Information in Digital Libraries”. D-Lib Magazine, February 1997. http://www.dlib.org/dlib/february97/cnri/02arms1.html.

A digital object is a way of structuring information in digital form, some of which may be metadata. a unique identifier, called a handle.

Components of the computer system
1. User interfaces
2. Repository
3. Handle system
4. Search system

This article explores early ideas in the development of housing data within a DL. And the early theories on how to house it within an architectural framework. Arms and company dedicated a lot of the article to understanding the notion and vital importance the handle system plays in the search, select , retrieval formula if the basic DL model.


Sandra Payette, Christophe Blanchi, Carl Lagoze, Edward A. Overly. “Interoperability for Digital Objects and Repositories, The Cornell/CNRI Experiments”, D-Lib Magazine, May 1999, Volume 5 Issue 5. http://www.dlib.org/dlib/may99/payette/05payette.html.

This paper focuses on the definition of interoperability in the joint Cornell/CNRI work. Their motivation for this work is the eventual deployment of tested reference implementations of the repository architecture for experimentation and development by fellow digital library researchers. Section 2 summarizes the digital object and repository approach that was the focus of the interoperability experiments. Section 3 describes the set of experiments that tested interoperability at increasing levels of functionality. Section 4 discusses general conclusions a preview of future work, including plans to develop experimentation to the point of defining a set of formal metrics for measuring interoperability for repositories and digital objects.

The article is a strict discussion of repositories and digital objects, the writers however suggest the obvious implications there work will have in the DL field in general. There experiments are goal driven toward increasing ease in accession between separate repositories (interoperability). The article does discuss the future issues of security and access management in the context of new interoperability parameters.


Question for the week: How much further have these ideas come since their development in the late 90’s early 20th century? Do the warrant more complicated charts in the case of the Arms examples, or is the tendency toward simplifying?

Sunday, September 7, 2008

Reading Notes: Setting the Foundations of DL's

Leonardo Candela et. al. (2007) Setting the Foundations of Digital Libraries. D-Lib Magazine 13(3-4), March/April 2007. http://www.dlib.org/dlib/march07/castelli/03castelli.html

Presents the core elements of the Delos Manifesto and establishes a framework for the Digital Library structure.
Digital Library: “A possibly virtual organization that comprehensively collects, manages, and preserves for the long term rich digital content, and offers to its user communities specialized functionality on that content, of measurable quality and according to codified policies.”
Digital Library System: “A software system that is based on a defined architecture and provides all functionality required by a particular Digital Library. Users interact with a Digital Library through the corresponding Digital Library System.”
Digital Library Management System: “A generic software system that provides the appropriate software infrastructure both, to produce and administer a Digital Library System incorporating the suite of functionality considered foundational for Digital Libraries and, to integrate additional software offering more refined, specialized, or advanced functionality.”

The Main Concepts of the Digital Library:
Content encompasses the data and information that the Digital Library handles and makes available to its users.
Content: A collection of information objects. It is an umbrella concept applied to all forms of information objects that a DL collects, manages, and delivers.
User: covers the various actors who interact with DL’s. DL’s connect actors with information and sustain their ability to consume and make use of the DL to generate new information. User is another umbrella concept that includes all notions associated with the representation and management of the actor within a DL. It includes such basic concepts as the rights that actors have within the system and the profiles of the actors with elements that personalize the system's performance or correspond to these actors in collaborations.
Functionality: Concept that encapsulates the services a DL offers to its users. The bare minimum of functions would include such examples as new information object registration, search, and browse. The system should seek to direct the functions of the DL so that they reflect the particular requirements of the digital library's users and/or the specific requirements to the data it holds.
Quality: Concept represents the boundaries that characterize and evaluate the substance and performance of a DL. Quality can be associated with content, functionality, and also particular information objects or services. Some boundaries are objective while other subjective.
Policy: Concept represents the sets of conditions, rules, terms and regulations governing interaction between the DL and its users. Ie. Acceptable user behavior, digital rights management, privacy and confidentiality agreements, charges to users, and collection delivery.
Architecture: Concept refers to the DLS entity and represents a mapping of the functionality and content offered by a DL onto hardware and software components.
-DL’s are amongst the most complex and advanced forms of information systems so interoperability across DL’s is recognized as a considerable feat.
Actors: those interacting with Digital Library: 4 main ways that actors interact with digital library systems
1. DL End-Users: utilize the DL functionality for the purpose of providing, consuming, and managing the DL Content and some of its other elements. They identify the DL as serving their functional needs. The performance and production of the DL depend on the DL's condition at the time a particular part of its functionality is activated. The condition of the DL relates to the state of its resources. This state changes during the lifetime of the DL according to the functionality activated by users and their contributions. Sub categories: Information Creators, Information Consumers, and Librarians.
2. DL Designers: Utilize knowledge on the semantics of the application domain in order to define, customize, and maintain the DL so that it is properly associated with the information and functional needs of its potential DL End-Users. Providing functional and content configuration boundaries.
3. DL System Administrators: select the software components necessary to construct the DLS. Task: to identify the architectural configuration that best fits the DLS in order to ensure the highest level of quality. The values of the architectural configuration parameters can be changed over the DL’s lifetime.
4. DL Application Developers: develop the software components of DLMS’s and DLS’s, to ensure that the proper levels and functionality are available.

Muddiest Point #1

I'm a little confused on where we should be with the readings. This whole second labor day thing always throws me off. I assume we are going to discuss week 1 at our next class, so we are just heading into week 2 for this week?

Thursday, September 4, 2008

Digital Archives and Digital Libraries: same thing?

I'm throwing this little idea out belatedly as I did my first assignment for Digital Libraries on a digtial archive, but I don't see any difference in how a digital archive and library are defined.

I just want to speak quickly in reference to the US digital archive experience. Does a digital archive become a digital library when its purpose becomes access rather than preservation. Its existence brings about a two fold purpose by digitizing materials 1.) there physical access can now be more limited and therefore prolong preservation. 2.) Digitizing opens up a new purpose, an means to allow viewing original materials in almost astronomical amounts. Any one with access to a computer and the internet, now has access to direct viewing of the original constitution, letters between Washington and Jefferson, and vacation photos of the Kennedy’s at Hyannisport.

Just throwing this out. What does any one think about this? Meaningful question or totally pointless?

Wednesday, September 3, 2008

Digitizing the Steel City.

I'm sure most of us have read this in the news but I thought I'd throw a link in just in case. The Carnegie Public library received a $600,000 grant to digitize over 400,000 pages of materials relating the hostory of the steel industry in Pittsburgh. if you haven't check it out . . .