Evaluating Scholarly Digital Outputs: The Six Layers Approach

Developing standards to evaluate scholarly digital output is one of the most significant problems our generation of digital humanists can work on. The improvement of tools and methods, the elaboration of theoretical perspectives, and (above all else) the development of digital outputs, will always be of primary importance. The elaboration of evaluation standards, however, has a broader reach: it is part of our responsibility as good scholarly citizens. Regardless of how digital humanities develops in the next few years humanists are going to need quality standards that help us distinguish ‘good’ from ‘bad’ work. In our universities this is primarily connected with the administrative requirements of performance and tenure review, but it goes much further than that.

The humanities have always valued quality scholarship; our tools for evaluating it have evolved over hundreds of years. But in many ways our evaluation and review mechanisms are broken. This isn’t only in relation to publishing models and peer review systems that the Journal of Digital Humanities and other initiatives aim to augment. An arguably more fundamental problem is the evaluation of born-digital products (tools, websites, ontologies, data models and so on) that are fundamentally different to anything produced by humanists before. While computer science standards must obviously be considered, the aims of digital humanists (and their technical ability) will often be at odds with them. While traditional humanities standards need to be part of the mix, the domain is too different for them to be applied without considerable adaptation.

Just like the analog humanities, it is unlikely that there will ever be one humanities standard for evaluating digital output. Different approaches will be needed for different contexts (universities, libraries, museums etc.) and different digital humanities sub-disciplines (history, classics, literary studies etc.). At a birds-eye level, though, it might be possible to come up with broad frameworks that can guide more detailed evaluation. By defining them we will be able to communicate to our peers the standards we’ve chosen to work to.

My feeling is that, in simple terms there are five levels of standards met by most digital humanities projects, and a sixth that doesn’t really make the grade at all. This isn’t a hierarchical scale as much as a classification framework describing types of projects seen ‘in the wild’. Not all digital humanities outputs are intended to be Category 1, for instance. Some, like blog posts, serve a quite different function. Other projects are produced by people just starting out with a new technology, so there is little chance the product will reach a standard required for tenure or review. They might be experienced digital humanists trying out a new method or experimenting with something likely to fail, or they might be a beginner learning the ropes.

In short, these are ‘layers’ that all contribute in important ways to the digital humanities ecosystem. Each layer has a function, and is in many ways inter-dependent with the others. To denigrate any layer is to undermine our goal of building a broad, inclusive and open ecosystem welcoming of a variety of approaches.

  1. Category 1: The scholar has built the output themselves, or been a key driver in the technical design and building. The output has been driven and project-managed by the scholar, often with external funding, including a high degree of technical input in both the design and build phases. The output is complex and/or wide-ranging (either in terms of project scope or technical complexity) and a highly innovative contribution to the field. It conforms to accepted standards in both digital humanities and computer science. Significant and robust review milestones have been used during all phases of the project, including international feedback. Usage reports (where relevant or possible) indicate high engagement with the output from an international audience. The output has gained wide-spread recognition in both the scholarly and digital humanities communities, and perhaps broader media. It is sustainable, backed up, and controlled by good data management standards.
  2. Category 2: The scholar has built the output themselves, or been a key driver in the technical design and building (in this category, because the outputs tend to be of smaller scope than Category 1, the expectation is really that the scholar has built it themselves, or been an integral part of the team that did). It either conforms to accepted standards in both digital humanities and computer science, or provides a conscious and challenging departure from them. The product is of limited scope, but represents an innovative contribution to the field and has gained significant recognition in either the scholarly community, digital humanities community, or the broader media. Usage reports (where relevant or possible) indicate high engagement with the output from an international audience.
  3. Category 3: The output has been built by an external service unit or vendor with no technical input from the scholar, but the scholar has been closely involved in the design and build phases, and contributed high quality content of some form (data or text, perhaps). The product conforms to some standards in either digital humanities or computer science, but these are loosely applied and/or incompletely implemented.
  4. Category 4: The output has been built by an external service unit or vendor with no technical input from the scholar. It does not conform to generally accepted standards in either computer science or digital humanities. The scholar, however, has provided high quality content of some form (data or text, perhaps) and the product is of use to general users and researchers.
  5. Category 5: This is a catch-all layer for all the wonderful stuff that the digital world enables — the ephemera of digital scholarship. Examples include blog posts, tweets, small contributions to code repositories. etc. It’s also the category that suggests a slightly relativistic attitude is needed when considering the categories outlined here, because Category 5 outputs are incredibly important to digital humanities. They are our flotsam and jetsam, the glue that keeps the community humming.
  6. Category 6: Rarely seen, and generally politely ignored if they are. This category doesn’t conform to any standards, scholarly or otherwise, indicates little or no understanding of current discourses and practices in digital humanities, and includes poor quality data or content.

This is only a very broad-brush framework. Like any other field, the important thing with digital humanities outputs is that the producer of them understands where their output fits within the broader intellectual context. While this won’t always be the case — we always hope that something will come from left-field — it indicates both an understanding of the field, and respect for it. In general, though, I expect that builders of digital humanities outputs have consciously designed and positioned their product within the broader landscape of digital humanities, and understand that there is a broader matrix of standards and expectations alive in the community. Although as the field grows only Categories 1, 2 and 5 tend to get much airtime, it really doesn’t matter which category the final product falls into….unless it’s Category 6 and even then people don’t tend to get too bothered: it is what it is.


Originally published by James Smithies on September 20, 2012. Revised for the Journal of Digital Humanities December 2012.

About James Smithies

James Smithies is a Senior Lecturer in Digital Humanities at the University of Canterbury, New Zealand. James completed a Ph.D. in History at Canterbury University in 2002, and has also worked as a technical writer, senior business analyst and project manager. His scholarly work focuses on New Zealand history, the history of literature, technology and ideas, and the digital humanities.