
Background
Robert Creeley is widely recognized as one of the most influential American poets of the 20th century. Among other accomplishments, he helped to found the Poetics program at SUNY Buffalo, was later appointed distinguished professor of English at Brown University, and served as the New York State Poet Laureate from 1989 to 1991.
Primarily associated with the Black Mountain Poets, he also travelled to San Francisco to explore the emerging Beat poetry scene. There he forged enduring connections with the likes of Allen Ginsberg and Jack Kerouac.
The digital portion of Creeley’s collection contains a number of complete hard drive images from Creeley’s work and home environments, as well as miscellaneous additional media (ZIP drives used for backups, etc.). These materials are part of Creeley’s personal papers held by the Stanford University Libraries Department of Special Collections, which can be browsed online.
Assigning riskiness to data
In order to make Robert Creeley’s personal files accessible to researchers, we must first ensure that there is no high-risk Personally Identifiable Information (PII) present in these materials. This process is carried out by a lab assistant working in our Born-Digital Preservation Lab (BDPL). Borrowing from the Code of Ethics of the Society of American Archivists, the Lab Assistant balances the needs of “creators, donors, organizations, and communities to ensure that any restrictions applied are appropriate, well-documented, and equitably enforced.”
Stanford University defines three categories of risk: High, Medium, and Low. High Risk data tends to be easily definable and recognizable, such as: Social Security Numbers of living people, credit card numbers and other financial account numbers, and other specific, highly sensitive PII (medical record numbers, student ID numbers). High Risk data cannot be released, and must be stored on a secure server.
Low Risk data is also generally easy to recognize. Items classified as Low Risk might include materials such as previously published materials, public records, and other innocuous, donor-originated materials. These items can be made available online and will be accessible to the general public.
Items that do not fall into these categories can be classified as Medium Risk. These items do not need to be sequestered on a secure server, but are considered too sensitive for publication to the open internet. Medium Risk data is restricted for a shorter length of time than High Risk data, and may be accessed in our reading room. Researchers must sign an agreement before reviewing any items in the reading room.
Our most common examples of Medium Risk data would be home addresses, private phone numbers, and private email addresses. In some cases, a bit of incidental research can help to separate Low Risk data from items actually requiring the Medium Risk designation (i.e., a phone number used for business purposes, or used in public correspondence, can be classified as Low Risk).
The review process
In keeping with archival principles, we make every effort to accept and preserve these donated materials “as is.” Due to the age and structure of computer media in our collections, we often need to make complete disk images for preservation purposes. These can include system files, such as: browser histories, downloaded files, executable files, and deleted (“Trash” or “Recycle Bin”) files, that would generally be considered unsuitable for release.
Additionally, as we become more reliant on computers for multiple tasks, these materials may inadvertently include sensitive data, such as financial account information, or correspondence that was intended to be private. We need to separate these files from those that were intended to be made available to researchers.
Even in materials that have been approved by the donor, we cannot assume that every file is suitable for publication to a global audience on a “forever” internet. Materials must still be reviewed to help guard against inadvertently publishing data that should remain private.
We have implemented a workflow that assigns risk levels in alignment with the University’s risk categories. We primarily rely on an automated tool, Bulk Extractor, to rapidly scan collections for potentially sensitive data. Bulk Extractor is a gold standard in this area, widely used by law enforcement agencies and forensic investigators. The results from our Bulk Extractor scan are then manually reviewed and assigned to the appropriate risk category. Digital collections are not made available to researchers until this review has occurred.
Working in partnership
This collection was our first experience working in partnership with a processing archivist while performing this data review, rather than performing the review before the processing step. Our workflow can help to identify areas of concern that are not strictly privacy-related, while leaving room for the processing archivist’s discretion in handling these areas.
This can be beneficial for the processing archivist, as we can point out and give guidance on issues that might require special handling. Working with the processing archivist was also helpful to us, as he was able to weigh in on questions we had based on his knowledge of the collection and its context.
For instance, in this collection, some of the computers that were imaged were household computers used by the entire family. In such a case, it may not always be straightforward to identify ownership of specific files, presenting issues of both privacy and copyright. These issues are more properly decided by the processing archivist, but we can help identify, for instance, particular directories or certain file names that appear to be associated with a particular family member.
Similarly, we were able to identify directories and files where Creeley stored classroom writings by his students. Fair-use copyright exemptions may apply to these items, and the work of other writers has been included in the collection on this basis (for instance, drafts of writings reviewed by Creeley and saved on his computers). However, as a learning institution, we must take particular care to protect the privacy rights of all students. Here again, we would want to call the processing archivist’s attention to the presence and locations of these items. The processing archivist would then be able to review this context and conduct a proper privacy and copyright assessment before deciding whether these items need to be restricted.
As we build experience working with a variety of donated materials, we can advise on common privacy-related issues that we have come across in our collections. With every collection, we refine our process and strive to make our collections more accessible, while also protecting the privacy of the individuals who appear in them.