Wednesday, December 26, 2007

Read this Blog: Stop your presentation before it kills again!

Everyone hates "zombie presentations": created by zombies, presented by zombies, viewed by zombies. Perhaps the worse aspect of zombie presentations is that they actually create zombies in the process! I think each one of us should take a vow, make a resolution, do whatever it takes to free ourselves and our loved ones from the bonds of bullet points. Each of us should commit ourselves to designing, giving and accepting only great presentations in the coming year.

The title of this post comes from an awesome blog entry that I return to frequently, Stop your presentation before it kills again! from the Creating Passionate Users blog by Kathy Sierra and Dan Russell. READ IT, think about it, bookmark it, share it with everyone you know. And most importantly, use it as a basis for raising your expectations!

My co-conspirator and ace DSpace coder Jim Rutherford and I talk about this a lot. The problem is not simply that creators of presentations are not being creative, are not stretching themselves; it's that audience expectations are so incredibly low!

The Stop your presentation... entry provides many good points, but there are a number of other good sites you can check out. One of the great teachers of evangelistic style is Guy Kawaski; check out Speaking as a Performing Art at his How to Change the World blog. (Be sure to also check out Guy's Art of Innovation talk). Lawrence Lessig, the influential copyright scholar and thought leader for the Creative Commons initiative, is often considered the Zen master of presentation; his style is discussed in The "Lessig Method" of presentation entry at the Presentation Zen blog.

Update: See this compilation of the Top 10 Presentations Ever, which includes Steven Jobs' 1984 introduction of the Macintosh; Martin Luther King's I Have a Dream speech; Lawrence Lessig's Free Culture talk; and Dick Hardt’s famous Identity 2.0 presentation at OSCON 2005.

Wednesday, December 19, 2007

pf-dspace blog now Zotero (and COinS) compatible!

In our previous post I mentioned the new Zotero/Internet Archive alliance. Since I wrote that I've taken some time to understand a bit more about how Zotero works, and in particular what kinds of markup are required to make web resources "compatible" with Zotero to the extent that they can (a) be detected by the client and (b) added into a local Zotero database at the click of a button. My final step has been to tweak this blog to provide COinS metadata, after a fashion...

Some explanation is in order! After you've installed the Zotero extension to Firefox, when you travel to a site that is compatible with Zotero you will see a small icon on the right side of the address bar; the style of the icon will indicate what type of resource the plugin as detected. Click on the icon; if only one item was detected (one bundle of metadata) it will directly add the item into its local database; if more than one item was detected, it will bring up a list of all the items, and you select which ones you would like to be added. Once the item has been added, you can add addition metadata, notes, etc --- the usual Zotero features.

What's the trick? One way that Zotero "detects" a citation is by way of html SPAN elements of class="Z3988", aka the OpenURL COinS: A Convention to Embed Bibliographic Metadata in HTML standard. I use the Openly Informatics generic COinS generator, a web-based utility in which I enter some metadata and it spews out a bit of markup, which I paste at the end of my blog entry. Now when Zotero-equipped users visit my blog, they will see a collection of citations which they can selectively add to their citation lists -- which of course they will want to do!

Saturday, December 15, 2007

The Zotero/Internet Archive Alliance: Now things are getting interesting!

With proposal deadlines looming early next year and the holidays rapidly approaching, the reader might have missed the big news from earlier in the week: the Zotero project at the Center for History and New Media and the Internet Archive have announced a major new alliance. It has been described by Dan Cohen, director of CHNM as really a match made in heaven: a project to provide free and open source software and services for scholars joining together with the leading open library. The initiative is funded by the Andrew W. Mellon Foundation, which has supported earlier Zotero development.

Some of the language in the Chronicle article has caused concern in the library and IR community, especially that this alliance is meant to bypass the library. In her Library2.0 blog Laura Cohen does a great job summarizing these concerns in an entry entitled Zotero Commons: Who Needs Libraries? Note that the follow-up comments to her entry, including some from Dan Cohen himself, are superb.

In addition to his comments in Library2.0, Dan posted an entry in his blog yesterday responding to two misconceptions about the Zotero/IA alliance, that (1) the scope of the Zotero+IA alliance is limited to the Zotero Commons (it's not), and (2) that the Zotero+IA alliance is an end-run around institutional repositories (it's not intended to be). He goes on to say that he wants to ...emphasize that this project does not make IA the exclusive back end for contributions. Indeed, I am aware of several libraries that are already experimenting with using Zotero as an input device for institutional repositories...

There are elements of functionality (proposed or existing today) that we think are exciting, especially various features that will potentially contribute to collaboration between researchers and the care and feeding of scholarly networks. But there are also some fundamental issues incarnate with the sharing of research materials that even a clever initiative like Zotero+IA cannot's nearly Christmas, so I won't use that other nine-letter word beginning with 'C' (cue Boris Karloff)...

Tuesday, December 4, 2007

Getting the Most Out of Your Institutional Repository

Yesterday (04 Dec 2007) I had the pleasure of presenting at Getting the Most Out of Your Institutional Repository: Gathering Content and Building Use, an educational workshop hosted by NISO at the National Agricultural Library (NAL) near Washington, DC. It was great to see many familiar faces and to meet in person several colleagues whose work I actively follow.

NISO collected each of our presentations and has linked to them from the workshop agenda page.

A consistent theme that built throughout the day was one of adding value for the individual user/contributor. In my own presentation, the future of dspace: making dspace personal (making dspace social) (pdf) the focus was on giving the individual scholar incentives for "living" within their institution's dspace and especially the role of the institutional repository in scholarly networks populated by researchers with Facebook-driven social networking sensibilities.

For more on this theme, see my earlier comments in this blog and also refer to this recent article in OCLC's NextSpace in which the editors asked nine experts to explore and comment on the trends and behaviors of users of the social Web. During the conference I also mentioned Danah Boyd's recent talk at Harvard's Berkman Center for Internet and Society. See this page for information on that talk and to download audio or video.

I'll update this posting when links to the NISO presentations (and video) become available. Stay tuned! Updated 07 Dec 2007

Friday, November 16, 2007

Web 2.0 & Libraries, Part 2: A Review

Those of you who have been searching for practical and relevant ways to integrate social networking technologies with digital library and repository systems might find the following very useful. The lead author is Michael Stephens of Dominican University; the content is based on his Tame The Web blog as well as his writing, teaching, and roadshows/seminars he has presented with his colleague Jenny Levine over the past two years. Few of the individual bits and pieces pieces were new to me -- see below for some that were! -- but the whole thing presented together was a bit like drinking from a fire hose...

Web 2.0 & Libraries, Part 2: Trends and Technologies

Library Technology Reports, September/October 2007, vol.43/no.5 (ISSN 0024-2586) (pdf of the ToC)

I think this work provides an excellent summary of the current generation of social network technologies, with a particular focus on how librarians are applying these tools to create a set of practices some are labeling Librarian 2.0. Particular focus on how their patrons are using the technologies, and how they need to provide for the patron. This is not a library SOA discussion!!!!

This is a great collection because I find it somewhat frightening (and very exciting) how far along these pieces have come, especially how easily and seamlessly they can and are being mashed together. Also, there were some services that I didn't even know about, esp. Ning, which enables users to assemble their own social networking services, or Netvibes, which lets you mix together anything.

For me, these really set the bar high for how to "make DSpace personal," which clearly means how to make DSpace fit in...

Wednesday, November 7, 2007

Capturing China's knowledge: The China Digital Museum Project

The HP news site has just published an interesting story about the China Digital Museum Project and the growing use of DSpace for other applications in China. It also includes a good background history on DSpace and a bit on the future of DSpace at HP and HPLabs. Here's a snip from Capturing China's knowledge: Ancient Terracotta warriors, scientific discoveries, even 2008 Olympics to go online:

As the world’s most populous nation, the People’s Republic of China rarely does anything on a small scale -- and its efforts to share its cultural and academic treasures with the rest of the world are no exception.

Using DSpace, a digital archiving system that HP Labs researchers helped create and continue to support, institutions throughout China are putting literally tens of millions of objects online for the first time. Those objects -- or more accurately, digital copies of them -- range from up-to-minute scientific research reports to historic film clips and photos of traditional Chinese sporting events to centuries-old calligraphy and paintings.

One Chinese DSpace initiative involving 18 major universities is well on its way to archiving up to 90 million objects.

A second, scheduled to coincide with the 2008 Summer Olympic Games in Beijing, will hold two terabytes of information -- which, depending on how the information is compressed, is roughly as much content as you’d find in 1,000 feature-length movies, 300,000 photographs or 500,000 song-length music files.

For a more technical review of the China Digital Museum Project, see Rob Tansley's article, Building a Distributed, Standards-based Repository Federation: The China Digital Museum Project in the July/August 2006 issue of D-Lib Magazine.

Wednesday, October 17, 2007

Collective Intelligence in the Institutional Repository: Making DSpace Personal

Surveys of open repository adopters over the past two or three years have clearly highlighted the "institutional" nature of institutional repositories. The motivations for implementing IRs have always been those of the host institution, while the stated benefits to the individual user and contributor have either been those of the institution projected "down" to them, or happen to be shared goals such as enabling greater access to information or providing managed, long-term preservation of artifacts. Meanwhile, some of those same surveys identify sustaining a constant stream of contributions from the community as the chronic threat to the health of repositories; while all open repository platforms have been designed for self-service ingestion, it is a fact that the strongest and most current repositories are those that have professional staff who are responsible for content management, a luxury few institutions can afford. Even those institutions who have implemented mandatory submission policies, especially in light of increasingly "enlightened" publishers' policies on Open Access, still have not been able to achieve high levels of participation. The simple truth is that participation in an IR today represents extra effort for the busy scholar, effort that doesn't add real value to their research, their authorship, or their collaboration with others in their field.

We'd like to give researchers strong incentives to "live" within DSpace --- features that motivate them to spend significant time there, manage their content there, and make formal submission of content into the IR an easier and more natural part of their work. In general, we'd like their personal space or "desktop" within DSpace to be an amplifier of their research activities. For starters, we believe the user should have basic (but in this Web2.0 world, expected) capabilities available to them for relating their current activities and interests to other artifacts in local collections, so we're experimenting with features like item bookmarking and tagging within local collections and using this constructed "context" as a basis for recommending related items. We'd like to leverage this further as a basis for identifying and retrieving related items within that repository's federation (see our earlier notes on pf-dspace in this blog and elsewhere) and especially for identifying colleagues with related interests. And we want to apply this to identifying and harvesting related materials from other, heterogeneous sources such as external blogs, wikis, and web sources.

This basic contextualization of the scholar's current focus is really just a starting point, because it represents only a few aspects of the scholarly workflow. The real value to both the scholar and their host institution comes when they can leverage other basic functions of the repository in their core research, including the management of both data and information artifacts, and especially using their repository to manage access to their materials for their distributed colleagues.

In terms of the management and versioning of artifacts, there are certain repository capabilities that the developer community has long come to expect from distributed code management systems such as SVN and CVS that are curiously foreign to the IR space, but really shouldn't be. As scholarly journals increasing demand research to be submitted as "packages" containing not only text but also data sets and other content that has been culled from the set of collaborators and authenticated using robust techniques, the proper management of research artifacts in more active ways will become a central function of the IR.

One of the the truly exciting aspects of working with the DSpace open source community is that many of these objectives are already on the horizon for members of the DSpace community, and developers across the globe are hard at work implementing various pieces. The said, we still think there needs to be a focus on the needs of that individual scholar, ensuring that as the Facebook(tm) generation takes its place in the DSpace user community, bringing with them as they will their high expectations for contextualized social networking in nearly everything they do, that DSpace is more than ready to work for them!

In the coming days Desmond Elliott, one of our ace developers working on DSpace at HPLabs in Bristol (UK), will use this space to describe his awesome patches to DSpace, which include item bookmarking, item recommendations and user tagging. In the near future we hope to be more specific about other aspects of this work, including its name...

Thursday, April 12, 2007

pf-dspace: repository research on an open-source repository platform

In this entry we describe on-going work within Hewlett-Packard Labs to extend the open source DSpace platform to address the problem of digital object repository federation in innovative ways. With the extensions introduced by pf-dspace, repository administrators will be able to manage and replicate information and media across many cooperating institutions in a peer-to-peer fashion using essentially “out-of-the-box” features of the DSpace platform. In this entry we describe our next steps with DSpace@HPLabs, including advancements in policy-based federation management and our plan to contribute artifacts this work to the institutional repository community through the active DSpace open source network.

1. Why Repository Federation is Interesting

A repository federation is the interconnection of a set of autonomously-managed digital object repositories into one or more larger-scale, distributed collections based on an expressed set of rules or policies that codify the federation's purpose, and the collection properties that its members agree to achieve. Given this definition it's clear that many federation architectures are possible, based on goals as varied as:

  • long-term persistence of information and preservation of digital assets
  • access to information over wide-ranging geographies and difficult conditions
  • construction and management of large-scale collections
  • managing collections with broad topical scope, from widely diverse communities
  • leveraging skills and technical capabilities contributed by diverse organizations
  • distributed, semi-autonomous collection management

Our interest at HPLabs in repository federation was sparked by the China Digital Museum Project (CDMP), an ongoing collaboration involving the Chinese Ministry of Education, HPLabs and several Chinese universities. CDMP provides a large-scale infrastructure based on the DSpace platform upon which a federation of university-based museums store, manage, preserve and disseminate the digitized versions of university museum artifacts. In the final phase of CDMP it is expected that this federation will interconnect more than 100 university museums, each with an estimated 2TB of digital artifacts stored in local DSpace installations.

CDMP has created a replicated collection architecture in which items of interest from the individual remote collections are harvested or “pulled” to complete the particular local collection, according to that collection's defining rules. A given replica might include only the item's metadata, or it might also include the item's composite files. In the case of CDMP, two modified DSpace instances (known as DM-DSpace) are designated as replicating repositories or data centers, and the remaining repositories hold individual collections of local interest.

2. Peer Federation and pf-dspace

Our pf-dspace code generalizes the approach to DSpace federation introduced with DM-DSpace, allowing the platform to implement a wider variety of federation topologies. pf-dspace also makes federation administration more accessible from the administrator's user interface, allowing both simple and complex topologies to be constructed without extra software or setup. pf-dspace has eliminated the need for a separate, centralized node registry, which is a common feature of many previous repository federation implementations. The key to achieving this decentralized node management was the adoption of a distributed “friends” list, in which each repository shares with other nodes basic information about known peers, know as its “friends,” using standard features of the OAI-PMH protocol. Such decentralized node management is just one of the features of peer federation made possible by pf-dspace. Our extensions build on the way DM-DSpace applies standard protocols and introduce some important new management capabilities:

  • pf-dspace uses OAI-SQ/OAI-SQ-F to provide selective (query-based) harvesting, in which metadata from other repositories is retrieved based on keywords and metadata fields. DM-DSpace limited its harvesting to “new” items.
  • pf-dspace introduces improvements in how interactions with nodes known to an individual repository are managed. In particular, node administrators now have the ability to control whether each of a node's “friends” are published to other peers (i.e. are made “public”) or are suppressed, as well as whether those nodes are harvested (i.e. are “active”). In addition, the pf-dspace code tracks the “live” state of "friends," ie: whether or not a network connection can be established.
  • pf-dspace provides the ability to do metadata-only harvesting, which is useful for constructing “virtual” (non-replicating) repositories. The DM-DSpace platform only supports full replication of items.

3. What's next for pf-dspace?

An important contribution of pf-dspace has been our practical implementation of the OAI-SQ/OAI-SQ-F extensions to the OAI-PMH protocol, giving repositories the ability to make more elective metadata queries against remote collections. This capability allows individual repositories to accumulate items based on their attributes, which is fundamental to federated collection anagement. At this writing pf-dspace successfully performs selective harvests and stores them in a physical directory, but it doesn't (yet) map these replicated items onto the appropriate logical collections in the repository. An important next step will be to integrate the AutoMapper plugin to map retrieved items to logical collections, which will involve being able to actually refine the selective queries and associate them with one or more mappings.

An exciting area of experimentation for pf-dspace will be harvesting objects from a variety of heterogeneous repository-like sources. We believe there is an important opportunity to demonstrate the utility and value of harvesting selected items from "ephemeral" sources and bringing them under the management of institutional repositories. Examples of these sorts of sources include: wikis and blogs; “social networking” sites; decentralized, departmental wikis; social tagging and bookmarking services; mailing list archives; and anything else that we can attach a harvesting interface onto.

4. What's next for DSpace@HPLabs?

HPLabs remains active in the DSpace community at both the advisory and development levels. We are finding that the DSpace platform is an ideal vehicle for certain kinds of repository research, and we look forward to releasing back to the open source community DSpace code patches that we've created as a result of our ongoing research that may be of benefit to the community. Ongoing DSpace-based research at HPLabs currently falls under two categories: the clustering of DSpace instances using open-source tools to achieve robust, large-scale digital repositories; and policy-based management and automation of repository federations. Specific topics that we are exploring include:

  • federating repositories to accomplish goals such as replication, subject-based collections, distributed format migration, etc.
  • automated, event-driven repository management, locally and across federations
  • active integrity assurance of managed items and metadata
  • information-based access control that remains valid over time, dispite item transformations
  • continual expansion of the facets of information that are extracted from managed items
  • providing access to new and different consumers of that information

We anticipate that much of our future DSpace federation work will be in policy-based federation management, building on the basic peer federation capability provided by the current pf-dspace extensions. Some next steps include using a distributed, event-condition-action rules approach to marshal sets of autonomously- managed peers into federations that are been defined by express sets of collection management policies (implemented as reactive rules) that participants in the federation agree to share. To this end, a promising rules language that we are now experimenting with is Xchange from Institut für Informatik der Ludwig-Maximilians-Universität München. (Univ. of Munich)

“Policy”-driven federations exist today, but almost always the “policies” have been hard-coded, are not flexible and are not themselves under some kind of lifecycle management. Still, existing platforms such as LOCKSS have been rigorously studied and teach us much; LOCKSS is of particular interest because it is a proven platform that has implemented examples of the kinds of "policies" that DSpace-based federations must also implement. The NARA-funded PLEDGE project (MIT & SDSC) is in another example; in that case researchers are examining how to implement preservation policies within institutional repositories, including how to cast expressed policies in machine-interpretable, actionable and verifiable ways. A more recent, related bit of work in this area is the PHAROAH project, a follow-on to LOCKSS that our HPLabs colleagues are involved with.

From a DSpace perspective, our continuing focus will be on providing and maintaining visibility for the repository administrator throughout the policy and object "lifecycles." This includes visibility of the policies, visibility of the assets, visibility over all actions performed on the assets, etc. Achieving this visibility has been at the heart of our approach to pf-dspace, especially as we put control of the federation directly into the hands of the repository administrator and deal with federation management in ways that are directly analogous to collection management itself.

Wednesday, April 11, 2007

About the Bloggers (11 April 2007)

John Erickson has spent many years studying the unique social, legal, and technical problems that arise when managing and disseminating information in the digital environment. At HP Labs John has focused on the policy-based management of distributed, heterogeneous digital object repositories and content processing architectures. He has been an active participant in a number of international metadata and rights management standards efforts and currently serves on the OAI Object Reuse and Exchange (OAI-ORE) advisory committee, the DSpace Architectural Review committee, the Handle System Technical Review committee and the Global Handle System Advisory Committee.

Jim Rutherford is the lead DSpace developer for HP Labs and HP's primary contributor to the DSpace open source community. Jim joined HP Labs in 2006 to work on digital repository research using DSpace, in particular working closely with the China Digital Museum Project (DM-DSpace) team and on the problem of repository federation more generally. His work on generalising and extending key elements of the DM-DSpace codebase has led to several recent presentations; these and other contributions to the DSpace community led recently to his elevation to Committer status within the DSpace open source project.

Welcome to the pf-dspace blog

Jim Rutherford and John Erickson of HPLabs (Bristol, UK and Norwich, VT USA) will use this blog to keep the DSpace and repository research communities up-to-date on progress using the DSpace open repository platform as a basis for repository and digital preservation research.

Their current work is focussed on a standards-based extension to the DSpace platform they call pf-dspace (peer-federation DSpace).