Sunday, September 28, 2008

Digital Ice Age.


I came upon this article while doing the second half of assignment 2. It discusses some of the major disadvantages to digitization as a preservation and archiving tool. Thought it might be worth a look for those who like a dystopian interpretation of digitization. follow link here.

Assignment 2: My Flickr Link

My assignment 2 part 1 link. Pictures of random books and junk I own:
http://flickr.com/photos/30778466@N06/

Friday, September 26, 2008

Reading Notes : Week 5 : XML

Martin Bryan. Introducing the Extensible Markup Language (XML) http://burks.bton.ac.uk/burks/internet/web/xmlintro.htm

What is XML?
-subset of the Standard Generalized Markup Language (SGML)
-designed to make it easy to interchange structured documents over the Internet
-mark where the start and end of each of the logical parts (called elements) of an interchanged document occurs

-XML does not require the presence of a DTD.
-XML system can assign a default definition for undeclared components of the markup.

XML allows users to:
bring multiple files together to form compound documents
identify where illustrations are to be incorporated into text files, and the format used to encode each illustration
provide processing control information to supporting programs, such as document validators and browsers
add editorial comments to a file.
It is important to note, however, that XML is not:
a predefined set of tags, of the type defined for HTML, that can be used to markup documents
a standardized template for producing particular types of documents.
XML is based on the concept of documents composed of a series of entities. Entity can contain one or more logical elements. Elements can have certain attributes (properties) that describe the way in which it is to be processed

XML clearly identifies the boundaries of document parts, whether it be a new chapter, a piece of boilerplate text, or a reference to another publication unlike other markup languages, HTML, XHTML.

Uche Ogbuji. A survey of XML standards: Part 1. January 2004. http://www-128.ibm.com/developerworks/xml/library/x-stand1.html
Extending you Markup: a XML tutorial by Andre Bergholz http://www.computer.org/internet/xml/xml.tutorial.pdf
XML Schema Tutorial http://www.w3schools.com/Schema/default.asp

These three sites are tutorials running through examples of XML and its applications. I initially had a difficult time noting the difference between HTML and XML, but the W3 schools site has a page on Web Primer that gives a list for what the Average Joe needs to know about site development and it gives links then to these constituent pieces in understanding the WWW: http://www.w3schools.com/web/default.asp

Question: Do HTML and XHTML serve the same purpose; meaning, do you only use one or the other on a web page?

Friday, September 19, 2008

Reading Notes: Week 4 Metadata in Digital Libraries

Anne J. Gilliland. Introduction to Metadata, pathways to Digital Information: 1: Setting the Stage http://www.getty.edu/research/conducting_research/standards/intrometadata/setting.html

Metadata: "the sum total of what one can say about any information object at any level of aggregation."

ALL information objects have three features - content, context, and structure - all of which can be reflected through metadata:
Content relates to what the object contains or is about, and is intrinsic.
Context indicates the who, what, why, where, how aspects associated with the object's creation and is extrinsic.
Structure relates to the formal set of associations within or among individual information objects and can be intrinsic or extrinsic.

Library metadata includes indexes, abstracts, and catalog records, according to cataloging rules and structural and content standards such as MARC,(as well as authority forms such as LCSH or the AAT (Art & Architecture Thesaurus). Such bibliographic metadata has been cooperatively created since the ‘60s and available to repositories and users through automated systems such as bibliographic utilities, OPACs, and online databases.

Archival and manuscript metadata: accession records, finding aids, and catalog records. Archival descriptive standards that have been developed the past two decades: MARC Archival and Manuscript Control (AMC) format published by the LoC(1984) (now integrated into the MARC format for bibliographic description); the General International Standard Archival Description (ISAD (G)) published by International Council on Archives (1994); & Encoded Archival Description EAD), adopted as a standard by the Society of American Archivists (SAA) in 1999.

metadata:
certifies the authenticity and degree of completeness of the content;
establishes and documents the context of the content;
identifies and exploits the structural relationships that exist between and within information objects;
provides a range of intellectual access points for an increasingly diverse range of users; and
provides some of the information an information professional might have provided in a physical reference or research setting.
metadata provides a Rosetta Stone to decode information objects into knowledge information systems of the 21st century and provides an base to translate between systems.


Stuart L. Weibel, “Border Crossings: Reflections on a Decade of Metadata Consensus Building”, D-Lib Magazine, Volume 11 Number 7/8, July/August 2005 http://www.dlib.org/dlib/july05/weibel/07weibel.html

A personal reflection on some of the achievements and lessons as part of the Dublin Core Metadata Initiative management team: The goal – a starting place for more elaborate description schemes.
What then, is metadata for?
harvest and index.
Metadata for images, useful, associating images with text makes them discoverable.

The Mongolian/Chinese railroad gage dilemma: interoperability challenge across and suffering a measure of broken semantics in the process.

The Web demands an international, multicultural approach to standards and infrastructure, but should they be large brush stroke standards or a light set?


Question: Weibel mentions Google in reference to international standards definitions. Is their a relationship between search engine groups and Dublin core, OCLC, and academic databases in the development of international standards?

Friday, September 12, 2008

Muddiest Point #2

Are we suppose to be digitizing physical materials for our term project? Or is it strictly to be digital materials already to be placed into a database? I just want to make sure so that there isn't more work for the team than is already presented.

Reading Notes Week 3: Representation of Digital Objects

LESK sections 2.1, 2.2, 2.7, chapter 3.
2.1 Computer typesetting:
Almost nothing in commercial setting is typed traditionally. This progression from tradition has led to most current written info. to be machine readable and available in some form via computer.
A progression from filmstrips manipulated by computer to laser printer.
Developed software that assisted in margin justification and line numbering. I.e. Nroff/troff-Bell Labs scribe-CMU TEX-Stanford
Between 1990 and 2000, the # of online databases almost doubles, however most books and journals are not available in full text via internet. This is an economic rather than technical leap.

2.2Text Formats:
Variety of formats starting with ASCII, due to the simplicity of ASCII not all derivations and symbols in all languages are covered. A new standard is proposed for this solution.
Large publishing groups use higher level systems. The 3 main standards are: MARC, SGML and HTML.

2.3Ways of Searching”
Linear Searching – a search algorithm that goes through a file from beginning to end looking for a string.
Inverted Files – elements to be searched are extracted and alphabetized and are then more readily accessible for multiple searches.
Hash tables or coding – computing whether a word appears in a file.

3 Images of Pages

Scanning
Image Formats
Display Requirements
Indexing Images of Pages
Shared Text/ Image Systems
Image Storage vs Book Storage
Large Scale Projects:
Thesaurus Linguae Graecae(1970s)- machine readable form of the corpus of classical Greek lit.
Gallica collection- 100,000 texts from the Bibliotheque Nationale France in image format.

Largest online book project: The Million Book Project by Raj Reddy, CMU- to place 1,000,000 full text books online, although texts that are older, outside of copyright infringement.


Clifford Lynch, “Identifiers and Their Role In Networked Information Applications”. http://www.arl.org/bm~doc/identifier.pdf


Norman Paskin. “Digital Object Indentifier (DOI) System”. Encyclopedia of Library and Information Sciences. http://www.doi.org/overview/080625DOI-ELIS-Paskin.pdf
Bibliographic utility identifier numbers such as the OCLC or RLIN numbers are used in duplicate detection and consolidation in the construction of online union catalog databases. Bibliographic citation can be viewed as an identifier, having many variations in style, and data elements based on editorial policies.

Question for the week: With all these databases coming to fruition you don't see much in popular audiences with these online databases. Even such pseudo-popular sites as Gutenburg.com. Does Google and other such search engines have a negative impact on finding these sites by having tendencies toward placing other searches higher in search lists?

Tuesday, September 9, 2008

Reading Notes: Week 2 with question.

Hussein Suleman and Edward A. Fox. “A Framework for Building Open Digital Libraries”, D-Lib Magazine, December 2001. Volume 7 Number 12. http://www.dlib.org/dlib/december01/suleman/12suleman.html.

The development of a standards and definitions for DL’s is difficult due to the current varied design differences and varied concentrations of specialists and designers, be they IT, librarians or information architects.

REASONS:
Many DLs are built as a response to the needs of a local community, involving personnel with no prior experience.

Most modern DLs have WWW interfaces -- thus formed to resemble the way people use the WWW, however the way person’s interact with the WWW is constantly in flux.

Each DL is aimed at meeting the needs of a particular community -- so the underlying program logic varies vastly among systems.

Most DLs are intended to be quick solutions to urgent community needs -- so not much thought goes into planning for future redeployment of the systems.

DLs, by the need to be responsive to user needs, can be complex, so new projects sometimes choose to develop from the ground up. It is cheaper than adapting a preconceived framework to a new scenario. And as these systems become more complex maintenance becomes more time consuming and difficult.

A natural solution would be to create software toolkits that would assist in a standard component model. This would in turn allow for broader connections between particular DLs and promote an ability to cultivate an easily extensible field of individual DLs. It is widely accepted as good software engineering practice to adopt some form of component model in regards to any encroaching advancements that could solidify and promote the DL field.



William Y. Arms , Christophe Blanchi, Edward A. Overly. ”An Architecture for Information in Digital Libraries”. D-Lib Magazine, February 1997. http://www.dlib.org/dlib/february97/cnri/02arms1.html.

A digital object is a way of structuring information in digital form, some of which may be metadata. a unique identifier, called a handle.

Components of the computer system
1. User interfaces
2. Repository
3. Handle system
4. Search system

This article explores early ideas in the development of housing data within a DL. And the early theories on how to house it within an architectural framework. Arms and company dedicated a lot of the article to understanding the notion and vital importance the handle system plays in the search, select , retrieval formula if the basic DL model.


Sandra Payette, Christophe Blanchi, Carl Lagoze, Edward A. Overly. “Interoperability for Digital Objects and Repositories, The Cornell/CNRI Experiments”, D-Lib Magazine, May 1999, Volume 5 Issue 5. http://www.dlib.org/dlib/may99/payette/05payette.html.

This paper focuses on the definition of interoperability in the joint Cornell/CNRI work. Their motivation for this work is the eventual deployment of tested reference implementations of the repository architecture for experimentation and development by fellow digital library researchers. Section 2 summarizes the digital object and repository approach that was the focus of the interoperability experiments. Section 3 describes the set of experiments that tested interoperability at increasing levels of functionality. Section 4 discusses general conclusions a preview of future work, including plans to develop experimentation to the point of defining a set of formal metrics for measuring interoperability for repositories and digital objects.

The article is a strict discussion of repositories and digital objects, the writers however suggest the obvious implications there work will have in the DL field in general. There experiments are goal driven toward increasing ease in accession between separate repositories (interoperability). The article does discuss the future issues of security and access management in the context of new interoperability parameters.


Question for the week: How much further have these ideas come since their development in the late 90’s early 20th century? Do the warrant more complicated charts in the case of the Arms examples, or is the tendency toward simplifying?

Sunday, September 7, 2008

Reading Notes: Setting the Foundations of DL's

Leonardo Candela et. al. (2007) Setting the Foundations of Digital Libraries. D-Lib Magazine 13(3-4), March/April 2007. http://www.dlib.org/dlib/march07/castelli/03castelli.html

Presents the core elements of the Delos Manifesto and establishes a framework for the Digital Library structure.
Digital Library: “A possibly virtual organization that comprehensively collects, manages, and preserves for the long term rich digital content, and offers to its user communities specialized functionality on that content, of measurable quality and according to codified policies.”
Digital Library System: “A software system that is based on a defined architecture and provides all functionality required by a particular Digital Library. Users interact with a Digital Library through the corresponding Digital Library System.”
Digital Library Management System: “A generic software system that provides the appropriate software infrastructure both, to produce and administer a Digital Library System incorporating the suite of functionality considered foundational for Digital Libraries and, to integrate additional software offering more refined, specialized, or advanced functionality.”

The Main Concepts of the Digital Library:
Content encompasses the data and information that the Digital Library handles and makes available to its users.
Content: A collection of information objects. It is an umbrella concept applied to all forms of information objects that a DL collects, manages, and delivers.
User: covers the various actors who interact with DL’s. DL’s connect actors with information and sustain their ability to consume and make use of the DL to generate new information. User is another umbrella concept that includes all notions associated with the representation and management of the actor within a DL. It includes such basic concepts as the rights that actors have within the system and the profiles of the actors with elements that personalize the system's performance or correspond to these actors in collaborations.
Functionality: Concept that encapsulates the services a DL offers to its users. The bare minimum of functions would include such examples as new information object registration, search, and browse. The system should seek to direct the functions of the DL so that they reflect the particular requirements of the digital library's users and/or the specific requirements to the data it holds.
Quality: Concept represents the boundaries that characterize and evaluate the substance and performance of a DL. Quality can be associated with content, functionality, and also particular information objects or services. Some boundaries are objective while other subjective.
Policy: Concept represents the sets of conditions, rules, terms and regulations governing interaction between the DL and its users. Ie. Acceptable user behavior, digital rights management, privacy and confidentiality agreements, charges to users, and collection delivery.
Architecture: Concept refers to the DLS entity and represents a mapping of the functionality and content offered by a DL onto hardware and software components.
-DL’s are amongst the most complex and advanced forms of information systems so interoperability across DL’s is recognized as a considerable feat.
Actors: those interacting with Digital Library: 4 main ways that actors interact with digital library systems
1. DL End-Users: utilize the DL functionality for the purpose of providing, consuming, and managing the DL Content and some of its other elements. They identify the DL as serving their functional needs. The performance and production of the DL depend on the DL's condition at the time a particular part of its functionality is activated. The condition of the DL relates to the state of its resources. This state changes during the lifetime of the DL according to the functionality activated by users and their contributions. Sub categories: Information Creators, Information Consumers, and Librarians.
2. DL Designers: Utilize knowledge on the semantics of the application domain in order to define, customize, and maintain the DL so that it is properly associated with the information and functional needs of its potential DL End-Users. Providing functional and content configuration boundaries.
3. DL System Administrators: select the software components necessary to construct the DLS. Task: to identify the architectural configuration that best fits the DLS in order to ensure the highest level of quality. The values of the architectural configuration parameters can be changed over the DL’s lifetime.
4. DL Application Developers: develop the software components of DLMS’s and DLS’s, to ensure that the proper levels and functionality are available.

Muddiest Point #1

I'm a little confused on where we should be with the readings. This whole second labor day thing always throws me off. I assume we are going to discuss week 1 at our next class, so we are just heading into week 2 for this week?

Thursday, September 4, 2008

Digital Archives and Digital Libraries: same thing?

I'm throwing this little idea out belatedly as I did my first assignment for Digital Libraries on a digtial archive, but I don't see any difference in how a digital archive and library are defined.

I just want to speak quickly in reference to the US digital archive experience. Does a digital archive become a digital library when its purpose becomes access rather than preservation. Its existence brings about a two fold purpose by digitizing materials 1.) there physical access can now be more limited and therefore prolong preservation. 2.) Digitizing opens up a new purpose, an means to allow viewing original materials in almost astronomical amounts. Any one with access to a computer and the internet, now has access to direct viewing of the original constitution, letters between Washington and Jefferson, and vacation photos of the Kennedy’s at Hyannisport.

Just throwing this out. What does any one think about this? Meaningful question or totally pointless?

Wednesday, September 3, 2008

Digitizing the Steel City.

I'm sure most of us have read this in the news but I thought I'd throw a link in just in case. The Carnegie Public library received a $600,000 grant to digitize over 400,000 pages of materials relating the hostory of the steel industry in Pittsburgh. if you haven't check it out . . .