Monday, January 28, 2008

Personal Digital Libraries in Web 2.0

While I hate the whole Web 2.0 moniker (and I'm not a fan of "blog" either), it is the most succinct way to describe the addition of tagging, social networking, collaboration, and (to some extent) the dynamic functionality provided by AJAX to websites. Whisper's original design incorporated these ideas into its digital library design at all levels (personal, project, and community). And now I'm wondering if that, in and of itself, might be dissertation-worthy. (Yes Jim, I probably should have paid more attention when you were thinking of whisper from the DL point of view rather than the CSCW view I had).

So with that in mind, here's what I'm wondering:

- What is the current "discoverabiliity" of a paper? In other words, given a set of papers in a digital library, the search capabilities of the ACM digital library or Google Scholar, tags (generally defined by the authors or publishers) on those papers, and maybe a reading list that my advisor has, how hard would it be for me to find the most relevant papers? Ideally, the group of papers I find includes both recent developments and seminal works, and does not include much "noise" (papers in the field that reference the same seminal works, but are on very different topics).

- Do Web 2.0 technologies make it easier to find the papers I'm looking for? And how can we adapt those technologies to further improve "discoverability"?

Here's how I'd apply the technologies to this issue:

- First, let's distinguish between the overall library, controlled by the hosting organization (ACM, Google, etc.) and a user's personal library. For clarity, I'll refer to the overall collection of works as the Collection, and the user's personal library as the Library.

- Allow users to create a hierarchy of tags within their Library that is browsable in the same way a file system is. There should be some reasonable defaults (provided by the Collection), but the user should be able to create their own hierarchy on top of that one, or edit it to meet their needs (perhaps by further refining the defaults, or by making them more general).

- Allow users to add their own tags to the items in their Library. This could act like a voting system, where the best tags are assigned more often (think of the Family Feud game show). We could explore how useful it would be to include tags from user's Libraries in the item's view from the Collection, but given a large set of users (as the ACM Digital Library has), I suspect there would be too much noise for that to be useful, without some amount of filtering (maybe the 10 most popular user-assigned tags not in the official list are visible). Regardless, user-assigned tags could certainly be used to weight search results within the Collection. Perhaps users could be allowed to remove the default tags from their lists, which would count as a "vote against" that tag and cause it to be weighted less in searches.

- Allow users to attach comments to the items in their Library. They could choose how "public" they want their comments to be (private, restricted to specific friends or groups of friends, open to the world). Comments should be threaded, allowing for discussions to take place within a user's Library, and other user should be able to copy comments and entire threads of comments into the item within their own Library (assuming they have access to view them in the first place). Comment ratings may be useful here, especially when the world at large is allowed to see them and add their own, however I'm not sure there would be many users with a large enough "following" to make rating comments effective.

- Provide notifications to a user's "friends" (which in this situation probably refers to co-workers and colleagues, rather than drinking buddies and family). These notifications could include a user adding an item to their Library, commenting on an item in their library, or commenting on a paper in someone else's library. Perhaps there could be a flag indicating whether or not the user has read the paper, or even rating it on a scale of 1 to 10. Having said that, ratings may not be useful though as they can be very subjective and people are likely to use different ratings scales even within their own libraries: item A was rated poorly because it was hard to read and item B was rated poorly because it wasn't relevant to the user's current project.

So the trick to all of this is that for all I know, its all been done. I'm thinking I need to spend some time (maybe even a few weeks) doing some "catch-up" background reading. If nothing else, it would be useful to see if anyone has found a way to quantify "ease of discovering the resources you need in a large digital Collection."

Thursday, January 24, 2008

Thesis on a Slide - 2008 edition

Here is the new “Thesis On A Slide” rewritten for 2008. It represents the current (as of this post) thinking about Whisper as a Ph.D. project. Not much has changed here... it may after talking with my advisor.

Quadrant 1: Background and Problem

Managing collaboration among groups of people, especially when geographically distributed, is a difficult task. Existing tools fall short of user's needs and are not widely used. Outside of technical disciplines, the use of computers to facilitate collaboration is rare (with the exception of email and IM). The problems faced include identity verification, version control, and access control. Supporting larger scale collaboration for a community gives rise to problems with membership tracking, information publication and project artifact sharing, and information archival. Within technical disciplines, version control and document control systems are frequently used to address some of these issues; other disciplines generally resort to e-mail, or disk exchange to share information, and print journals to publish information and some types of project artifacts. Even setting up a web site that limits access to specific users requires more technical knowledge than most people are willing to acquire. While most people can easily use e-mail to distribute files and information to collaborators, it is a cumbersome solution at best.

Quadrant 2: Hypothesis and Insights

Just as academic communication was significantly improved by the advent of e-mail, academic collaboration can be similarly improved through the use of computer-based tools. We believe a collection of integrated services is necessary to foster academic collaboration: digital library support, community spaces, project spaces, personal spaces, publication support (including peer reviews), and comment boards integrated into most of the tools. Another important aspects of collaboration are identity verification and access control. Many projects make use of sensitive personal or commercial data; without adequate data protection, none of these projects would be able to use the collaboration tools. These tools must be integrated in a seamless manner, and provide integration support for future tools. In addition, we propose the use of comment boards throughout as a means of communication between users.

Quadrant 3: Approach and Validation

We propose Whisper (for Web Information Sharing Project) as a proof of concept service that will provide information management and collaboration management tools to academic networks. Whisper will provide versioned data stores to each user, and will allow access to be controlled at a high level (based on academic relationships) and a low level (specifying individual users for inclusion or exclusion). Public comments on posted information will also be allowed.

Social networks such as Facebook, Friendster, and Orkut provide access to a circle of friends in a unique way. Whisper seeks to capture this social networking and apply it to the academic community. We have identified four types of personal relationships and two types of groups that we believe are appropriate for academic networks. The Mentor relationship captures teachers and others who have at one time or another been in a knowledge transfer position. The Student relationship is the other side of a Mentor relationship; specifying a one relationship implies the other exists. Collaborators are people who are actively working, or have recently worked, on a project together, or work on closely related projects, possibly in the same lab. Colleagues are people who's work may be related, but there is a very low level of communication on that work, or people who work in the same department or institution but with different areas of research. Project groups consist of people involved in an ongoing research effort. Communities consist of researchers working on similar or related topics. We want to explore how communities and institutions networked through such a system can share their work while it is in process, being prepared for publication, and then after it has been published.

Academic projects are generally limited in their life span, but the artifacts they produce should be available after the work ceases on the project. Whisper's project spaces provide a versioned work space, comment board, and access controls to assist project work and artifact dissemination. Projects live within a larger community, where artifacts and results are published. Whisper also provides community spaces, with peer-reviewed journals, with access controls for both membership and resources.

With most current research available in an electronic format, many personal libraries are moving from the bookshelf to the computer. This saves office space, and makes searching the articles faster (most operating systems are able to search inside text, HTML, and PDF files). Digital libraries can also be accessed from any computer on the internet, or taken into the field on a laptop. Whisper provides each user with their own digital library space and allows comments for each resource. The libraries and comments each have their own individual access controls, allowing users to share resources but keep some (or all) of their commentary private.

The popularity of blogs demonstrates the basic human need to comment on anything and everything in the world. While not traditionally thought of as an academic tool, we believe an integrated blog space can provide a valuable outlet for users. It could be used for anything from mundane announcements, to personal musings, or thoughts about future research directions or interesting problems.

As the system grows, we will invite others to begin using it for their own work. At the end of the project, users will be asked to share their experiences.

Quadrant 4: Contributions and Schedule

The practical contributions include a standard collaboration environment that captures the working relationships of the users. This system will allow us to explore access control methods (automatic and explicit) for research products. It will also allow exploration of the network relationships themselves.

January 2008
Focus on shared access to personal digital libraries.

April 2008
Focus on shared access to research work (projects).

September 2008
Focus on writing thesis.

December 2008
Thesis Complete

Top Ten Questions - 2008 edition

Here are the new “Top Ten Questions” for 2008. These represent the current (as of this post) thinking about Whisper as a Ph.D. project.

1) What is the main goal of the work?

We will examine the tools and features of collaborative environments, specifically as they relate to academic research and collaboration. We want to understand how collapsing disparate tools into a common system, and harnessing the relationships between users can improve the usability and usefulness of these tools.

2) What are the tangible benefits to society of achieving the goal? (i.e. Why should anyone pay for this?)

Our results can be used to create new environments and tools, as well as improve existing groupware.

3) What are the technical problems that make the goal difficult to achieve? (i.e. Why hasn't this been done already?)

Technical problems include integration issues for authentication, information synchronization, scalability, and usability. Some systems do exist, but many people still prefer to simply use e-mail; this is perhaps the most difficult user behavior we will have to overcome. However the biggest issue faced in the development of this software is its scope and the fact that there will be limited resources available to develop it.

4) What are the main elements of the approach?

We will create a testbed that allows users to conduct their collaborative work. The system will use open standards (REST, WebDAV and Subversion, RSS) to be extensible and compatible with a range of systems. The system will provide users with the tools we initially identify as being useful for collaborative work (community message boards, project file and message spaces, personal digital library spaces, relationship management, etc.). We will select a set of users who will be offered access to the system and compare their experiences collaborating within their lab group with those of a group without access to the system.

5) How does the approach handle the technical problems that have prevented progress in the past? (i.e. Why will this be successful when nobody was able to be in the past?)

Still doing background work to validate what follows...

Previous approaches consisted of expensive proprietary systems, and few (if any) have offered the full range of features we are attempting to include. In addition, a lack of awareness of the tools or a lack of focus on academic research, has prevented them from achieving a large user base.

6) What are the unique/novel/critical technologies developed in the approach?

We are applying the concept of a social network to simplify access control specification and make related work easier to find (assuming people have strong ties to those working on similar topics). We are also providing version controlled file spaces for individual files and project files, personal digital library management.

7) What are the potential spin-offs or other applications of this work?

Native desktop clients and new back-end modules to support common tasks (scheduling and calendars), or different types of users (lawyers, politicians, doctors, etc.).

8) How can progress be measured? (i.e. How can anyone tell if/when the project is successful?)

Progress can be measured by system use, where more active users means a more successful project. We will also conduct user surveys to determine which areas of the system need to be improved, which features are superfluous, and which additional features are needed.

9) What has been accomplished thus far?

For up to date information, visit the web site http://whisper.cse.ucsc.edu/public/. As of this writing, two prototypes of the system have been created and put aside; in both cases, the implementation technologies proved too unwieldy for a single person to make significant progress. A third prototype is being developed using Ruby on Rails, incorporating a new user interface developed after user testing with the second prototype; it is nearly ready for an initial deployment with the personal digital library feature ready for use.

10) What is the schedule for the work remaining?

January 2008
Focus on shared access to personal digital libraries.

April 2008
Focus on shared access to research work (projects).

September 2008
Focus on writing thesis.

December 2008
Thesis Complete

Sunday, January 20, 2008

What Came Before: Thesis On A Slide

Here is the original “Thesis On A Slide”. It were developed in mid-2004 as I was beginning work on my Advancement to Candidacy. After working on Whisper for a year and a half, I took a break from school, for the most part, between the summer of 2006 and the beginning of 2008. The revised version of this will be coming soon.

Quadrant 1: Background and Problem

Managing collaboration among groups of people, especially when geographically distributed, is a difficult task. Existing tools fall short of user's needs and are not widely used. Outside of technical disciplines, the use of computers to facilitate collaboration is rare. The problems faced include identity verification, version control, and access control. Supporting larger scale collaboration for a community gives rise to problems with membership tracking, information publication and project artifact sharing, and information archival. Within technical disciplines, version control and document control systems are frequently used to address some of these issues; other disciplines generally resort to e-mail, or disk exchange to share information, and print journals to publish information and some types of project artifacts. Even setting up a web site that limits access to specific users requires more technical knowledge than most people are willing to acquire. While most people can easily use e-mail to distribute files and information to collaborators, it is a cumbersome solution at best.

Quadrant 2: Hypothesis and Insights

Just as academic communication was significantly improved by the advent of e-mail, academic collaboration can be similarly improved through the use of computer-based tools. We believe a collection of integrated services is necessary to foster academic collaboration: digital library support, community spaces, project spaces, personal spaces, publication support (including peer reviews), and comment boards integrated into most of the tools. Another important aspects of collaboration are identity verification and access control. Many projects make use of sensitive personal or commercial data; without adequate data protection, none of these projects would be able to use the collaboration tools. These tools must be integrated in a seamless manner, and provide integration support for future tools. In addition, we propose the use of comment boards throughout as a means of communication between users.

Quadrant 3: Approach and Validation

We propose Whisper (for Web Information Sharing Project) as a proof of concept service that will provide information management and collaboration management tools to academic networks. Whisper will provide versioned data stores to each user, and will allow access to be controlled at a high level (based on academic relationships) and a low level (specifying individual users for inclusion or exclusion). Public comments on posted information will also be allowed.

Social networks such as Friendster and Orkut provide access to a circle of friends in a unique way. Whisper seeks to capture this social networking and apply it to the academic community. We have identified four types of personal relationships and two types of groups that we believe are appropriate for academic networks. The Mentor relationship captures teachers and others who have at one time or another been in a knowledge transfer position. The Student relationship is the other side of a Mentor relationship; specifying a one relationship implies the other exists. Collaborators are people who are actively working, or have recently worked, on a project together, or work on closely related projects, possibly in the same lab. Colleagues are people who's work may be related, but there is a very low level of communication on that work, or people who work in the same department or institution but with different areas of research. Project groups consist of people involved in an ongoing research effort. Communities consist of researchers working on similar or related topics. We want to explore how communities and institutions networked through such a system can share their work while it is in process, being prepared for publication, and then after it has been published.

Academic projects are generally limited in their life span, but the artifacts they produce should be available after the work ceases on the project. Whisper's project spaces provide a versioned work space, comment board, and access controls to assist project work and artifact dissemination. Projects live within a larger community, where artifacts and results are published. Whisper also provides community spaces, with peer-reviewed journals, with access controls for both membership and resources.

With most current research available in an electronic format, many personal libraries are moving from the bookshelf to the computer. This saves office space, and makes searching the articles faster (most operating systems are able to search inside text, HTML, and PDF files). Digital libraries can also be accessed from any computer on the internet, or taken into the field on a laptop. Whisper provides each user with their own digital library space and allows comments for each resource. The libraries and comments each have their own individual access controls, allowing users to share resources but keep some (or all) of their commentary private.

The popularity of web logs demonstrates the basic human need to comment on anything and everything in the world. While not traditionally thought of as an academic tool, we believe an integrated web log space can provide a valuable outlet for users. It could be used for anything from mundane announcements, to the personal musings, or thoughts about future research directions or interesting problems.

As the system grows, we will invite others to begin using it for their own work. At the end of the project, users will be asked to share their experiences.

Quadrant 4: Contributions and Schedule

The practical contributions include a standard collaboration environment that captures the working relationships of the users. This system will allow us to explore access control methods (automatic and explicit) for research products. It will also allow exploration of the network relationships themselves.

August 2004
The first working prototype is available to a small group of users via a web browser.

June 2005
A minimal feature complete system is available to an extended group of users.

September 2005
A non-web browser interface is available.

June 2006
A feature complete system is available to everyone.

December 2006
Thesis Complete

What Came Before: Top Ten Questions

Here are the original “Top Ten Questions”. These were developed in mid-2004 as I was beginning work on my Advancement to Candidacy. After working on Whisper for a year and a half, I took a break from school, for the most part, between the summer of 2006 and the beginning of 2008. The revised versions of these will be coming soon.

1) What is the main goal of the work?

We will determine what tools and features are needed to create a comprehensive and compelling academic collaborative environment. We want to understand how the relationships between users can be harnessed to improve the usability and usefulness.

2) What are the tangible benefits to society of achieving the goal? (i.e. Why should anyone pay for this?)

Our results can be used to create new environments and tools, as well as improve existing groupware. The project itself will be developed as an open source tool, allowing others to freely install and improve upon it.

3) What are the technical problems that make the goal difficult to achieve? (i.e. Why hasn't this been done already?)

Technical problems include integration issues for authentication, information synchronization, scalability, and usability. Some systems do exist, but many people still prefer to simply use e-mail; this is perhaps the most difficult user behavior we will have to overcome.

4) What are the main elements of the approach?

We will create a testbed that allows users to conduct their collaborative work. The system will use open standards (J2EE, LDAP, WebDAV, Delta/V, and SOAP) to be extensible and compatible with a wide range of systems. The system will provide users with the tools we initially identify as being useful for collaborative work (community message boards, project file and message spaces, personal digital library spaces, relationship management, etc.). As the system matures, we will invite larger groups of people to use it, and encourage other departments and institutions to install it so that we may investigate the role distance plays in the system's usability.

5) How does the approach handle the technical problems that have prevented progress in the past? (i.e. Why will this be successful when nobody was able to be in the past?)

Still doing background work to validate what follows...

Previous approaches consisted of expensive proprietary systems, and none offered the full range of features we are attempting to include. None of them attempted to model the relationships between users as a way of enhancing the functionality and usability of the system.

6) What are the unique/novel/critical technologies developed in the approach?

We are applying the concept of a social network to simplify access control specification and make related work easier to find (assuming people have strong ties to those working on similar topics). Also, by creating a pure web-services application, we are able to provide an open interface to the system. This will allow others to create client applications that run natively on a user's computer, as opposed to constraining users to a web interface. We are also providing version controlled file spaces for individual files and project files, personal digital library management.

7) What are the potential spin-offs or other applications of this work?

Native desktop clients and new back-end modules to support common tasks (scheduling and calendars), or different types of users (lawyers, politicians, doctors, etc.).

8) How can progress be measured? (i.e. How can anyone tell if/when the project is successful?)

Progress can be measured by system use, where more active users means a more successful project. The project will need to reach a critical mass, after which developers external to the project will begin to assist, and organizations external to ours will begin to utilize it. We will also conduct user surveys to determine which areas of the system need to be improved, which features are superfluous, and which additional features are needed.

9) What has been accomplished thus far?

For up to date information, visit the web site http://whisper.cse.ucsc.edu/public/. As of this writing, a mockup of the web-based client was created to shows the major functional areas of the system. An architectural diagram showing the internal breakdown of components within the system is also available on the web site. We are building a technology demonstration that will show people and their relationships, with a novel graph visualization of the relationships between them (the same visualization that will be used in the final system).

10) What is the schedule for the work remaining?

December 2004
The first working prototype is available to a small group of users via a web browser.

June 2005
A minimal feature complete system is available to an extended group of users.

September 2005
A non-web browser interface is available.

June 2006
A feature complete system is available to everyone.

December 2006
Thesis Complete

Welcome to Whisper

Today, despite decades of research, computer support for knowledge workers in academic is fragmented and poorly integrated. While office automation and other forms of Computer Supported Collaborative Work (CSCW) approaches have benefitted academics, no single environment today integrates the basic research activities of the academic knowledge worker, including: individually or collaboratively writing research papers, sharing research papers, reviewing papers for publication in a journal or conference and persistently sharing comments and observations on existing literature.

Consider a typical collaborative conference paper. It begins as a file that is exchanged multiple times via email as each author makes contributions. The paper is then uploaded to a web-based review system. Upon acceptance, the connection between the reviews, the paper source, and the final paper version is broken, since the camera-ready paper is submitted to the publisher's own web-based paper manager. The final paper is then sent to an institutional digital library. Researchers download papers from the library, storing them in a folder on their local disk, where they are difficult to search, and are completely dissociated from their bibliographic metadata. The paper has now crossed five system boundaries, effectively eliminating any possibility of advanced collaboration such as shared annotations, and shared collection management.

Whisper's primary goal is to develop an integrated toolset that will reduce the number of system boundaries knowledge must cross as it is developed and eventually published to the greater community.