Monday, January 28, 2008

Personal Digital Libraries in Web 2.0

While I hate the whole Web 2.0 moniker (and I'm not a fan of "blog" either), it is the most succinct way to describe the addition of tagging, social networking, collaboration, and (to some extent) the dynamic functionality provided by AJAX to websites. Whisper's original design incorporated these ideas into its digital library design at all levels (personal, project, and community). And now I'm wondering if that, in and of itself, might be dissertation-worthy. (Yes Jim, I probably should have paid more attention when you were thinking of whisper from the DL point of view rather than the CSCW view I had).

So with that in mind, here's what I'm wondering:

- What is the current "discoverabiliity" of a paper? In other words, given a set of papers in a digital library, the search capabilities of the ACM digital library or Google Scholar, tags (generally defined by the authors or publishers) on those papers, and maybe a reading list that my advisor has, how hard would it be for me to find the most relevant papers? Ideally, the group of papers I find includes both recent developments and seminal works, and does not include much "noise" (papers in the field that reference the same seminal works, but are on very different topics).

- Do Web 2.0 technologies make it easier to find the papers I'm looking for? And how can we adapt those technologies to further improve "discoverability"?

Here's how I'd apply the technologies to this issue:

- First, let's distinguish between the overall library, controlled by the hosting organization (ACM, Google, etc.) and a user's personal library. For clarity, I'll refer to the overall collection of works as the Collection, and the user's personal library as the Library.

- Allow users to create a hierarchy of tags within their Library that is browsable in the same way a file system is. There should be some reasonable defaults (provided by the Collection), but the user should be able to create their own hierarchy on top of that one, or edit it to meet their needs (perhaps by further refining the defaults, or by making them more general).

- Allow users to add their own tags to the items in their Library. This could act like a voting system, where the best tags are assigned more often (think of the Family Feud game show). We could explore how useful it would be to include tags from user's Libraries in the item's view from the Collection, but given a large set of users (as the ACM Digital Library has), I suspect there would be too much noise for that to be useful, without some amount of filtering (maybe the 10 most popular user-assigned tags not in the official list are visible). Regardless, user-assigned tags could certainly be used to weight search results within the Collection. Perhaps users could be allowed to remove the default tags from their lists, which would count as a "vote against" that tag and cause it to be weighted less in searches.

- Allow users to attach comments to the items in their Library. They could choose how "public" they want their comments to be (private, restricted to specific friends or groups of friends, open to the world). Comments should be threaded, allowing for discussions to take place within a user's Library, and other user should be able to copy comments and entire threads of comments into the item within their own Library (assuming they have access to view them in the first place). Comment ratings may be useful here, especially when the world at large is allowed to see them and add their own, however I'm not sure there would be many users with a large enough "following" to make rating comments effective.

- Provide notifications to a user's "friends" (which in this situation probably refers to co-workers and colleagues, rather than drinking buddies and family). These notifications could include a user adding an item to their Library, commenting on an item in their library, or commenting on a paper in someone else's library. Perhaps there could be a flag indicating whether or not the user has read the paper, or even rating it on a scale of 1 to 10. Having said that, ratings may not be useful though as they can be very subjective and people are likely to use different ratings scales even within their own libraries: item A was rated poorly because it was hard to read and item B was rated poorly because it wasn't relevant to the user's current project.

So the trick to all of this is that for all I know, its all been done. I'm thinking I need to spend some time (maybe even a few weeks) doing some "catch-up" background reading. If nothing else, it would be useful to see if anyone has found a way to quantify "ease of discovering the resources you need in a large digital Collection."


