Friday, November 28, 2008

Muddiest Point - Week 13

Lesk and Stiglitz both cite the need for gov't involvement in intellectual property rights.  But given that international economy is ever increasing how probable and helpful is gov't specific involvement in the case of property rights issues? 

Week 13 - Intellectual Property Rights

LESK, chapter 11

 

            Intellectual property rights issues are far from unsolved.    The topic is riddled with legal questions. How much gov't involvement is required in this new digital realm?  Will protection increases innovation or halt these advances?  This chapter does not try to answer but offers a myriad of examples of present solutions as well as future possibilities.  Lesk pulls out the old hard debate question of when is gov't involvement appropriate versus contract and technology solutions?  Democrat or republican?

 

Joseph E. Stiglitz, “Intellectual-property rights and wrongs”, Daily Times, Aug19, 2005. http://www.dailytimes.com.pk/default.asp?page=story_16-8-2005_pg5_12

 

            A balanced view of property and intellectual rights as it pertains to ideas and materials.  Stiglitz argues for ideas behind the hindrance of innovations as well as fairness in idea and material development of products.

 

Clifford Lynch, “Where Do We Go From Here? The Next Decade for Digital Libraries”,  D-Lib Magazine, Volume 11 Number 7/8 July/August 2005, http://www.dlib.org/dlib/july05/lynch/07lynch.html

 

            A very dramatic article.  I think some would think of it as overly dramatic.  But I do like the set up as to the uncertainty of the digital libraries history, just as its future is surely uncertain. Not in the sense of being in danger.   More along the lines of what will its focus be, which feels rests firmly in broader context in Library and Information Science but the specificity of its purpose does seem to be cloudy. Lynch doesn't make any predictions of what the future brings merely where the future for D-libraries will be.  Location is much easier to predict than future principles.

 

Knowledge lost in Information. Report of the NSF Workshop on Research directions for digital libraries http://www.sis.pitt.edu/~dlwkshop/report.pdf

 

            This report is massive, dealing with every varied little essay and article ranging from usage issues, and management and stewardship of the d-library.  And all in the context of the future decade of the d-library.

Friday, November 21, 2008

Muddiest Point Week 12

Arms discussed that CMU created tehn deleted temporary user records when they accessed data from d-libraries.  In the current light of post PATRIOT ACT topics, is this temporary record creation the norm for d-libraries?  Are there any libraries that maintain user records, rather than deleting them?

Week 12 Security and Economics

ARMS, chapter 7

This chapter looks at two related topics:

1.  Methods for controlling who has access to materials in digital libraries.

2.  Techniques of security in networked computing.

Access Management: The control of access to digital libraries. Some refer to "terms and conditions." In publishing, where the emphasis is usually on generating revenue, expression "rights management". Each phrase is synonymous.

Framework of access management:

  • Information managers create policies for access. Access is expressed in terms of permitted operations.
  • Policies relate users to digital material. Policies that the information managers establish must take into account relevant laws, and agreements made with others, such as licenses from copyright holders.
  • Authorization specifies access. Users need to be authenticated and their role in accessing materials established. Digital material in the collections must be identified and its authenticity established.  Users request access to the collections, request passes through an access management process. Users are authenticated; authorization procedures grant / refuse them permission to carry out operation(s)

Authentication

  • Authentication which establishes the identify of the individual user.
  • The second is to determine what a user is authorized to do.
  • Variety of techniques are used to authenticate users; some are simple but easy to circumvent, while others are more secure but complex.

Chapter looks at basic methods of security:

  • Encryption is the name given to a group of techniques that are used to store and transmit private information, encoding it in a way that the information appears completely random until the procedure is reversed.
  • Private key encryption is a family of methods in which the key used to encrypt the data and the key used to decrypt the data are the same, and must be kept secret. Private key encryption is also known as single key or secret key encryption
  • Dual key encryption permits all information to be transmitted over a network, including the public keys, which can be transmitted completely openly. For this reason, it has the alternate name of public key encryption.
  • Digital signatures are used to check that a computer file has not been altered. Digital signatures are based on the concept of a hash function. A hash is a mathematical function that can be applied to the bytes of a computer file to generate a fixed-length number.

Delay of public key encryption:

  • Patents are part of the difficulty.
  • Agencies such as the CIA claim that encryption technology is a vital military secret and that exporting it would jeopardize the security of the United States. Police forces claim that public safety depends upon their ability to intercept and read any messages on the networks, when authorized by an appropriate warrant. 

William Arms, “Implementing Policies for Access Management”, D-Lib     Magazine,1998. http://www.dlib.org/dlib/february98/arms/02arms.html. 

LESK, chapter 10 “economics” (available in CourseWeb)

ARMS, chapter 6, economics http://www.cs.cornell.edu/wya/DigLib/new/Chapter6.html

Laws and technical solutions must both provide help with d-library economic and legal issues and frameworks. Chapter discusses alot of the framework for these legal and ecomomic issues, but reserves aspecial area of concentration on copyright:

In US law, copyright applies to almost all literary works, including textual materials, photographs, computer programs, musical scores, videos and audio tapes. Major exception: materials created by government employees. Initially, the creator of a work or the employer of the creator owns the copyright. In general, this is considered to be intellectual property that can be bought and sold like any other property.

In France, the creator has personal rights ("moral rights") which can not be transferred. Historically, copyright has had a finite life, but Congress has regularly extended that period. The owner of the copyright has an exclusive right to make copies, to prepare derivative works, and to distribute the copies by selling them or in other ways. It also allows publishers to develop products without fear that their market will be destroyed by copies from other sources. 

2 important concepts in United States law are:

  • First sale applies to a physical object, such as a book. The copyright owner can control the sale of a new book, and set the price, but once a customer buys a copy of the book, the customer has full ownership of that copy and can sell the copy or dispose of it in any way without needing permission.
  • Fair use is a legal right in the United States law that allows certain uses of copyright information without permission of the copyright owner. Under fair use, reviewers or scholars have the right to quote short passages, and photocopies can be made of an article of part of a book for private study. 4 basic factors that are considered:

 

                  -the purpose and character of the use, including whether such use is of a                                       commercial nature or is for nonprofit educational purposes;

                        -the nature of the copyrighted work;

                        -the amount and substantiality of the portion used in relation to the copyrighted                              work as a whole.

                      -the effect of the use upon the potential market for or value of the copyrighted                                work.


The first sale doctrine and fair use do not transfer easily to digital libraries. While the first sale doctrine can be applied to physical media that store electronic materials, such as CD-ROMs, there is no parallel for information that is delivered over networks. This uncertainty was one of the reasons that led to a series of attempts to rewrite copyright law, both in the United States and internationally. Until 1998, the results were a stalemate, which was probably good. Existing legislation was adequate to permit the first phase of electronic publishing and digital libraries. The fundamental difficulty was to understand the underlying issues. 

Friday, November 14, 2008

Muddiest Point Week 11

Do any scholars, librarians or computer science professionals feel that the digital library field might be better served by being divided into more areas? Perhaps the reason there is so much dispute is beacause persons in this field really are talking about not only different definitons but also about different functioning libraries or electronic frameworks.

Week 11 Social Issues

Social Aspects of Digital Libraries. The final report of UCLA-NSF Social Aspects of Digital Libraries Workshop. http://is.gseis.ucla.edu/research/dig_libraries/index.html.

 

Goals: to assess existing data that might inform research and to propose a research agenda that would pose new questions.

 

A definition of digital libraries: emphasizing that they extend and enhance existing information storage and retrieval systems, incorporating digital data and metadata in any form; the other emphasizing that design, policy, and practice should reflect the social context in which they exist.

 

Proposal of an information life cycle model to illustrate the flow of human activities in creating, searching, and using information and the stages through which information artifacts may pass: activity, inactivity, and disposal.

 

Research issues:

  • Human-centered: a focus on people, both as individual users and as members of groups and communities, communicators, creators, users, learners, or managers of information. We are concerned with groups and communities as units of analysis as well as with individual users.
  • Artifact-centered: a focus on creating, organizing, representing, storing, and retrieving the artifacts of human communication.
  • Systems-centered: a focus on digital libraries as systems that enable interaction with these artifacts and that support related communication processes.
  • Multiple disciplinary joint projects.
  • Digital libraries be developed and evaluated in operational, as well as experimental, work environments.
  • The project found a multitude of other research issues: hybrid D-libs, dynamic index and artifact issues, etc.

 

D-libraries represent a set of significant social problems that require human and technological solutions.

 

Attempting to define D-libraries led to 2 separate definitions though complementary:

  1. "Digital libraries are a set of electronic resources and associated technical capabilities for creating, searching, and using information. In this sense they are an extension and enhancement of information storage and retrieval systems that manipulate digital data in any medium (text, images, sounds; static or dynamic images) and exist in distributed networks. The content of digital libraries includes data, metadata that describe various aspects of the data (e.g., representation, creator, owner, reproduction rights), and metadata that consist of links or relationships to other data or metadata, whether internal or external to the digital library.
  2. Digital libraries are constructed -- collected and organized -- by a community of users, and their functional capabilities support the information needs and uses of that community. They are a component of communities in which individuals and groups interact with each other, using data, information, and knowledge resources and systems. In this sense they are an extension, enhancement, and integration of a variety of information institutions as physical places where resources are selected, collected, organized, preserved, and accessed in support of a user community. These information institutions include, among others, libraries, museums, archives, and schools, but digital libraries also extend and serve other community settings, including classrooms, offices, laboratories, homes, and public spaces."

 

The workshop concluded with the development of a life cycle information model: a cyclical model with hierarchical levels of activity, handling, and process.  The concluded that the individual research problems noted needed to be addressed individually, including exploring the definition for digital libraries used in the workshop project, as it was still thought to be vague in its usage of certain terms, i.e., information, community, library.

 

The Infinite Library, Wade Roush, Technology Review, 2005. (available in CourseWeb)

Could not find text in courseweb.

 

William Y. Arms, “A Viewpoint Analysis of the Digital Library”, D-Lib Magazine, Volume 11 Number 7/8, July/August 2005. http://www.dlib.org/dlib/july05/arms/07arms.html

 

The first federal research program: DARPA's Computer Science Technical Reports Project. Heated discussions during this research about whether the field of study should be called digital libraries or the digital library. "Should digital libraries be encouraged to develop independently or together? "

 

This article looks at D-libraries from three viewpoints: organizational, technical, and the user. Organizationally, the world consists of separate digital libraries, but for the user, this characteristic is obscured.

  • Is this important? D-libraries research has neglected recognizing major innovations. Computer scientists resist the simple technology of the Web. Librarians disparaged the value of Web search engines. Greater emphasis must be made toward user viewpoint, and less on the technical and organizational.

 

Besides testing libraries on users, users should be more involved in development not just in criticism.  Rather than just suggesting user groups should perhaps be able to see the incremental suggestions and judge usability before and after.

Friday, October 31, 2008

Week 10 Interaction and Evaluation

  1. Arms chapter 8. http://www.cs.cornell.edu/wya/DigLib/new/Chapter8.html. This is useful if you want to learn really basic of interaction.

This chapter deals primarily with the issue of user interface.  Over the past decade interface with systems has had to change to due to the broad range of people who currently have access.  Originally such systems only required the ability to deal with academics and IT people who had knowledge of abstract access interfaces.  They have now become more user friendly, even to the extent of imitating page turning, such as the case of JSTOR and American Memory.

 Usability as a property of the whole system, not just the user interface of a client.

            Conceptual Model:

·        Interface design

·        Functional design

·        Data and metadata

·        Computer systems and networks

 Browsers added a  new layer of access to systems and networks, starting in 19993 with Mosaic. "Mobile code gives the designer of a web site the ability to create web pages that incorporate computer programs. Java is a general purpose programming language that was explicitly designed for creating distributed systems, especially user interfaces, in a networked environment. A Java applet is a short computer program. It is compiled into a file of Java byte code and can be delivered across the network to a browser, usually by executing an HTTP command. The browser recognizes the file as an applet and invokes the Java interpreter to execute it." 

 

New conceptual models: DLITE and Pad++.

 

Proper user support is more than a aesthetic handicap. well-designed, suitable functionality, and reactive systems make a quantifiable distinction to the value of DL's. When a system is diificult to use, users may fail to find important results, may misconstrue the data found, or may give up believing that the system is void of the proper data.

           

  1. Rob Kling and Margaret Elliott "Digital Library Design for Usability" http://www.csdl.tamu.edu/DL94/paper/kling.html

 

During the last decade, software designers have made progress in developing usable systems for products such as word processors. Less attention given to usability in DL design.  2 forms of DL usability discussed - interface and organizational. While the Human-Computer-Interaction research community has helped pioneer design principles to improve interface usability, organizational usability is less well understood. "Design for usability" is a new term that refers to the design of computer systems so that the organizational usability is addressed. DL developers need to consider "design for usability" issues during DL system design.

Systems Usability :refers to how well people can exploit a computer system's intended functionality.

discuss two key forms of DL usability - interface usability and organizational usability.

Design for usability : that refers to the design of computer systems so that they can be effectively integrated into the work practices of specific organizations

organizational usability : characterizes the effective "fit" between computer systems (and DL's) with the social organization of computing in specific organizations. Learn primary characteristics of  client organizations

 

Examined 5 models of computer-system design which are known in information systems and computer science research and professional communities. Each is a cultural model only in the specific organization and is hard to alter. Characterized one design model which we believe is the dominant cultural design model in the DL research community. Each of the five have strengths and weaknesses. Therefore propose a new organizationally-sensitive model which has the strongest chance of producing DL systems which most people will find usable in their workplaces.

 

This is a good time for the DL research community analysis of respective user systems and DL frameworks, we can start developing systematic understanding of the actual working conditions under which user find these models highly usable.

 

TefkoSaracevic,“Evaluation of digital libraries: An overview” http://www.scils.rutgers.edu/~tefko/DL_evaluation_Delos.pdf.

 

An extensive overview of DLs. States that DLs have a short history: discussion of them began in the 60's but no applicable systems developed 'til the mid 1990's.  Writers evaluated over 80 studies on DLs in order to assemble a history that outlines the criteria around which D-libraries are observed and explored.  Writer concludes that theorists and practitioners of evaluating DL systems do not seem to be agreeing or complying with each others observations and work.

 Diagnosis of lack of evaluation:

"Complexity: Digital libraries are highly complex, they are much more than technological systems alone; evaluation of complex systems is very hard; we are just learning how to do this job and have a lot more to learn. In other words, we as yet do not know how to evaluate and we are experimenting with doing it in many different ways. 

Premature: Even though they are exploding and are widespread, it may be too early in the evolution of digital libraries for evaluation. 

Interest: There is no interest in evaluation. Those that do or research digital libraries are interested in doing, building, implementing, breaking new paths, operating … evaluation is of little or no interest, plus there is no time to do it. 

Funding: There are inadequate or no funds for evaluation. Evaluation is time consuming, expensive and requires commitment – all these are in short supply. Grants have minimal or no funds allocated for evaluation. Granting agencies, while professing evaluation, are not allocating programs and budgets for evaluation. If there were funds there would be evaluation. With no funds there is no evaluation.

Culture: evaluation is not a part of the culture in research and operations of digital libraries. It is below the cultural radar. A stepchild. Plus many communities with very different cultures are involved in digital libraries. This particularly pertains to differences between technical and humanists cultures: language and frames of reference, priorities and understandings are different; communication is hard and at times impossible. Under these circumstances evaluation means very different things to different constituencies.

 Cynical: who wants to know or demonstrate actual performance? Are there any emperor clothes around? Evaluation may be subconsciously or consciously suppressed. The ultimate evaluation of digital libraries will revolve around assessing transformation of their context – determining possible enhancing changes in institutions, learning, scholarly publishing, disciplines, small worlds and ultimately society due to digital libraries(10)."

 Ben Sheiderman, Catherine Plaisant, "Designing the user interfaces" 4ed. chapter 1. A good introduction about usability and its application in human computer interaction   (available in CourseWeb)

 

I cannot find this article on Course Web.

 

 

Muddiest Point Week 10

I can't find certain hyperlinks or materials on Course Web.  Is there a possibilty of reavaluating some of these links as they may not be current?  In particular, this weeks "article" by Sheiderman and Plaisant, as well as the Arms e-text which seems to migrate around the Cornell servers.

Friday, October 17, 2008

Week 8 Access in Digital Libraries: Part

Chapter 1. Definition and Origins of OAI-PMH. (Available in CourseWeb)

 Todd Miller, Federated Searching: Put It in Its Place . April 15, 2004. http://www.libraryjournal.com/article/CA406012.html&

 Proposing a relationship between federated search engines and library catalogs:

If the catalog is the primary source of information, then access federated searches through the catalog?

Available content is not limited to data stored within the physical library. The content demanded by users is often not cataloged by libraries. Viewing the catalog as the primary source of data does not reflect the current library. Today's libraries are vast information centers, providing books and other cataloged material is only one aspect of the modern library.

 

“Knowledge is power”, true for the patron & for the library. The libraries enable and engage their information, the more central they become in the lives of their constituency.

 

U.S. Senator Wendell Ford said, "If information is the currency of democracy, then libraries are the banks." Libraries have been made too secure. Google has shown that the most powerful information access approach also happens to be the simplest and easiest. “The most complex and least intuitive interfaces wind up securing information, not facilitating information access.”

 

The Truth About Federated Searching. October 2003. http://www.infotoday.com/it/oct03/hane1.shtml

 

Not all federated search engines can search all databases, most can search Z39.50 and free databases. Federated search engines cannot search all licensed databases for both walk-up and remote users. Why? Authentication: difficult to manage for subscription databases, especially for remote users.

 

True de-duplication is not possible.

Relevancy ranking are never totally relevant.

Subscribing rather than having a federated engine as software is the best option, due to updates and labor intensiveness of the IT issues. Leave it up to the engine database developers.

 

A federated search translates a search into something the native database's engine can understand. It's restricted to the capabilities of the native database's search function. A federated search can't can’t go beyond the parameters set by the native database engine.

 

Lynch, Clifford A. (1997).  The Z39.50 Information Retrieval Standard, Part 1: A Strategic View of its Past, Present, and Future.  D-Lib Magazine, April 1997. http://www.dlib.org/dlib/april97/04lynch.html

 

I’ve been dying all these years for a succinct definition for Z39.50 and I finally have it: “ Z39.50 -- properly "Information Retrieval (Z39.50); Application Service Definition and Protocol Specification, ANSI/NISO Z39.50-1995" -- is a protocol which specifies data structures and interchange rules that allow a client machine (called an "origin" in the standard) to search databases on a server machine (called a "target" in the standard) and retrieve records that are identified as a result of such a search.

 

The rather forbidding name "Z39.50" comes from the fact that the National Information Standards Organization (NISO), the American National Standards Institute (ANSI)-accredited standards development organization serving libraries, publishing and information services, was once the Z39 committee of ANSI. NISO standards are numbered sequentially and Z39.50 is the fiftieth standard developed by NISO. The current version of Z39.50 was adopted in 1995, thus superseding earlier versions adopted in 1992 and 1988. It is sometimes referred to as Z39.50 Version 3.“

 

The article is the first part of a 2-part story on the history and implementation of Z39.5 protocol.  Article deals primarily with Z39.5 and its use in digital libraries.

 

Norbert Lossau, “Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic Internet” D-Lib Magazine, June 2004, Volume 10 Number 6. http://www.dlib.org/dlib/june04/lossau/06lossau.html

 

Librarians should not only look to Google and Yahoo but rather other search engines with the means of searching into the “Deep Web” as discussed in last weeks class.  I think this is a big conundrum in all areas of IS and LIS.  Ignorance or a laziness of skimming the Web rather than exploring the Web. Finding those sites and e-documents that can be pulled from beneath the levels normally searched by the larger engines.  I’ve find many an important document with newer engines like Hakia that report to be semantic, but offer a far better advantage, it contains sites pooled and suggested manually by IT and LIS experts and amateurs.

 

Question:  Has there been current research done on semantic engines like Hakia?  I haven’t seen any news pertaining to the idea.  Perhaps I’m not looking “deep” enough?”

Thursday, October 9, 2008

Muddiest Point

One of my Digital Library teammates might have beat me to this question, but . . . is there any issue with the database for the final project having very little coherency? Let me explain a little better. Can we pick three media that have no real content connection just so we can practice with the said formats, if we are developing our own fictitious d-library?

Reading Notes: Week 7: Access in Digital Libraries

LESK chapter 4.
This chapter discusses the varied and disparate non-textual materials that are involved in digital archives. As Lesk comments, it’s not all text pages! Not any more anyway! It runs through 4 main categories: Sound formats, Pictures (formatting by color texture and shape), Speech (more difficult to index and search than images), and Moving Images (Currently being researched but no contemporary solution that is affordable for library functionality.). Lesk discusses the indexing of these items, as well as issues with searches and solutions to these problems.

David Hawking, Web Search Engines: Part 1 and Part 2 IEEE Computer, June 2006.
1995- There was much speculation on the vastness of the web and the inability for an engine to search even a usable portion of it. And yet today the Big Three, Google, Yahoo, and MS all calculate about a billion queries a day in over a thousand languages world wide. This article explores the issues and techniques that these major search engines encounter and resolve.
INFRASTRUCTURE - large search engines manage numerous dispersed data centers. Services from these data centers are built up from clusters of commodity PCs. Individual servers in these data centers can be dedicated to specializations, i.e. crawling, indexing, query processing, snippet generation, link-graph computations, result caching, and insertion of advertising content. The amount of web data that search engines crawl and index = 400 TB. Crazy!.
CRAWLING ALGORITHMS - Uses a queue of URL’s to be visited and a system for determining if it has already seen a URL. This requires huge data structures—a list of 20 billion URLs = a TB of data. The crawler initializes with "seed" URLs. Crawling proceeds by making an HTTP request to fetch the page at the first URL in the queue. When the crawler fetches the page, it scans the contents for links to other URLs and adds each previously unseen URL to the queue. Finally, the crawler saves the page content for indexing. Continues until the queue is empty.

Crawling must address the following issues: Speed, Politeness, Excluded Content, Duplicate Content, Continuous Crawling, and Spam Rejection.

INDEXING ALGORITHMS - “Search engines use an inverted file to rapidly identify indexing terms—the documents that contain a particular word or phrase (J. Zobel and A. Moffat, "Inverted Files for Text Search Engines," to be published in ACM Computing Surveys, 2006).”
REAL INDEXERS - Store additional information in the postings, such as term frequency or positions.
QUERY PROCESSING ALGORITHMS - Average query length is 2.3 words.
By default, current search engines return only documents containing all the query words.
REAL QUERY PROCESSORS – “The major problem with the simple-query processor is that it returns poor results. In response to the query "the Onion" (seeking the satirical newspaper site), pages about soup and gardening would almost certainly swamp the desired result.”
Result quality can be dramatically improved if the query processor scans and sorts results according to a relevance-scoring utility that takes into account the number of query term occurrences, document length, etc. The MSN search engine reportedly takes into account more than 300 factors.
Search engines use many techniques to speed things up – Skipping, Early Termination, Caching, and Clever Assignment of Document Numbers.


M. Henzinger et al. challenges in Web Search Engines. ACM SIGIR 2002. http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=792553
Couldn’t get into this site without a subscription.

Question: I’ve tried using the semantic search engine, HAKIA, and have come up with some perfect hits and some deplorable misses. Do the factors in the Hawking article still apply to semantic searching or are there different factors involved in such a redesigned engine?

Friday, October 3, 2008

Readings: Week 6 Preservation in Digital Libraries

Margaret Hedstrom “Research Challenges in Digital Archiving and Long-term Preservation” http://www.sis.pitt.edu/~dlwkshop/paper_hedstrom.pdf

The major Research Challenges:
1. Digital Collections are large, multi-media libraries that are growing exponentially. They is currently no method for preservation in light of an exponentially growing collection that is constantly interred with new and variable media.
2. Digital preservation of these collections bears more similarity to archive programs thamn library oriented issues. There is a need to develop self-sustaining, self-monitoring, and self-repairing collections.
3. Maintaining digital archives over long periods of time are as much economic, social, and institutional as technological. And there are no current models for this type of extended advance.
4. To develop tools that automatically supply and extract metadata from resources, ingest, restructure and manage metadata over time. And become progressively affordable as the digital archive expands.
5. Future inexpensive, flexible, and effectual infrastructures for collections.

Brian F. Lavoie, The Open Archival Information System Reference Model: Introductory Guide. http://www.dpconline.org/docs/lavoie_OAIS.pdf

The OAIS Reference Model was developed through a joint venture between the CCSDS and ISO to create a solution to data handling issues ---- digital preservation problems.

An archival information system is “an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community.” The open meaning that the model is posed as a public forum to create a solution an any one who wishes to assist or use is welcome.

The duties of the OAIS model are:
1. Negotiate for and accept appropriate information from information producers
2. Obtain sufficient control of the information in order to meet long-term preservation objectives
3. Determine the scope of the archive’s user community
4. Ensure that the preserved information is independently understandable to the user community, in the sense that the information can be understood by users without the assistance of the information producer
5. Follow documented policies and procedures to ensure the information is preserved against all reasonable contingencies, and to enable dissemination of authenticated copies of the preserved information in its original form, or in a form traceable to the original
6. Make the preserved information available to the user community

The development group has create a fully detailed conceptual model, that explores a digital environment, functionality between, management, administration and the user. Even how the data would be packaged in the system. However this model is just that, only a model, it currently has no basis in reality.


Jones, Maggie, and Neil Beagrie. Preservation Management of Digital Materials: A Handbook. 2001. http://www.dpconline.org/graphics/handbook/index.html
introduction and digital preservation sections.

A manual developed through the Digital Preservation Coalition that elaborates on preservation management issues. Although designed to be an international handbook to d-preservation, the handbook does cite that it’s primacy deals with UK issues particularly legislative events. However it does remain current by posting current links with preservation topics and sites.

Justin Littman. Actualized Preservation Threats: Practical Lessons from Chronicling America. D-Lib Magazine July/August 2007. http://www.dlib.org/dlib/july07/littman/07littman.html

Chronicling America:
1. to support the digitization of historically significant newspapers.
2. to facilitate public access via a Web site.
3. to provide for the long-term preservation of these materials by constructing a digital repository.
--- has a digital repository component that houses the digitized newspapers, supporting access and facilitating long-term preservation

Preservation threats encountered: failures of media, software, and hardware. But the worst errors came from operators, ie. Human error, deletion of files.

Question: Statistically, is operator error always the worst preservation threat found in digital archives?

Sunday, September 28, 2008

Digital Ice Age.


I came upon this article while doing the second half of assignment 2. It discusses some of the major disadvantages to digitization as a preservation and archiving tool. Thought it might be worth a look for those who like a dystopian interpretation of digitization. follow link here.

Assignment 2: My Flickr Link

My assignment 2 part 1 link. Pictures of random books and junk I own:
http://flickr.com/photos/30778466@N06/

Friday, September 26, 2008

Reading Notes : Week 5 : XML

Martin Bryan. Introducing the Extensible Markup Language (XML) http://burks.bton.ac.uk/burks/internet/web/xmlintro.htm

What is XML?
-subset of the Standard Generalized Markup Language (SGML)
-designed to make it easy to interchange structured documents over the Internet
-mark where the start and end of each of the logical parts (called elements) of an interchanged document occurs

-XML does not require the presence of a DTD.
-XML system can assign a default definition for undeclared components of the markup.

XML allows users to:
bring multiple files together to form compound documents
identify where illustrations are to be incorporated into text files, and the format used to encode each illustration
provide processing control information to supporting programs, such as document validators and browsers
add editorial comments to a file.
It is important to note, however, that XML is not:
a predefined set of tags, of the type defined for HTML, that can be used to markup documents
a standardized template for producing particular types of documents.
XML is based on the concept of documents composed of a series of entities. Entity can contain one or more logical elements. Elements can have certain attributes (properties) that describe the way in which it is to be processed

XML clearly identifies the boundaries of document parts, whether it be a new chapter, a piece of boilerplate text, or a reference to another publication unlike other markup languages, HTML, XHTML.

Uche Ogbuji. A survey of XML standards: Part 1. January 2004. http://www-128.ibm.com/developerworks/xml/library/x-stand1.html
Extending you Markup: a XML tutorial by Andre Bergholz http://www.computer.org/internet/xml/xml.tutorial.pdf
XML Schema Tutorial http://www.w3schools.com/Schema/default.asp

These three sites are tutorials running through examples of XML and its applications. I initially had a difficult time noting the difference between HTML and XML, but the W3 schools site has a page on Web Primer that gives a list for what the Average Joe needs to know about site development and it gives links then to these constituent pieces in understanding the WWW: http://www.w3schools.com/web/default.asp

Question: Do HTML and XHTML serve the same purpose; meaning, do you only use one or the other on a web page?