Posts Tagged ‘ database ’

Needling the Old Guard: XML in Prosopography

The last few weeks we have been discussing the ongoing debate in the digital humanities between textual markup and databases. Reading K.S.B. Keats-Rohan’s “Prosopography for Beginners” on her Prosopography Portal (, I found it interesting that the tutorial focuses initially and primarily on mark-up. Essentially, Keats-Rohan outlines three stages to prosopography:
1. “Data modelling”—For Keats-Rohan, this stage is accomplished by marking up texts with XML tags “to define the groups or groups to be studied, to determine the sources to be used from as wide a range as possible, and to formulate the questions to be asked.” It does far more than that, however, since the tags identify the particular features of sources that need to be recorded. Keats-Rohan covers this activity extensively with eleven separate exercises, each with its own page.
2. “Indexing”—This stage calls for the creation of indexes based on the tag set or DTD developed in stage one. These indexes collect specific types of information, such as “names”, “persons” and “sources”. These indexes are then massaged with the addition of biographical data into a “lexicon”, with the application of a “questionnaire” (i.e. a set of questions to query your data points.) Ideally, it is suggested, this is done through the creation of a relational database with appropriately linked tables. A single page is devoted to the explanation of this stage, with the following apology:

It is not possible in the scope of this tutorial to go into detail about issues relating to database design or software options. Familiarity with the principles of a record-and-row relational database has been assumed, though nothing more complex that an Excel spreadsheet is required for the exercises.

…11 lengthy exercises for XML, but you’re assumed to appreciate how relational databases work by filling out a few spreadsheets?
3. “Analysis”—This is, of course, the work of the researcher, once the data collection is complete. This section of the tutorial includes a slightly longer page than stage 2 with 4 sample exercises. The exercises are designed to teach users how prosopographical analysis can be conducted.
It strikes me as incongruous that, for a research method that relies so heavily on the proper application of a relational database model, so little time is devoted to discussing its role in processing data. Instead, Keats-Rohan devotes the majority of her tutorial in formulating an XML syntax that, when all is said and done, really only adds an unnecessary level of complexity to processing source data. You could quite easily completely do away with stage one, create your index categories in stage two as database tables, and process (or “model”) your data at that point, simply by entering it into your database. What purpose does markup serve as a means of organizing your content, if you’re just going to reorganize it into a more versatile database structure?
Keats-Rohan’s focus on markup starkly emphasizes how XML is far more greatly valued than databases by humanities scholars. Since both are useful for quite different purposes, and relational databases have so much to offer to humanities scholarship—as prosopographies prove—I am baffled that such a bias persists.

The Implications of Database Design

In studying the database schema for the Prosopography of Anglo-Saxon England (PASE), several features of the design are immediately apparent[1].  Data is organized around three principal tables, or data points: the Person (i.e. the historical figure mentioned in a source), the Source (i.e. a text or document from which information about historical figures is derived), and the Factoid (i.e. the dynamic set of records associated with a particular reference in a source about a person).  There are a number of secondary tables as well, such as the Translation, Colldb and EditionInfo tables that provide additional contextual data to the source, and the Event, Person Info, Status, Office, Occupation and Kinship tables, among others, that provide additional data to the Factoid table.  In looking at these organizational structures, it is clear that the database is designed to pull out information about historical figures based on Anglo-Saxon texts.   I admire the versatility of the design and the way it interrelates discrete bits of data (even more impressive when tested using the web interface at ), but I can’t help but recognize an inherent bias in this structure. In reading John Bradley and Harold Short’s article “Using Formal Structures to Create Complex Relationships: The Prosopography of the Byzantine Empire—A Case Study”, I found myself wondering at the choices made in the design of both databases.  The PBE database structure appears to be very similar if not identical to that of the PASE.  Perhaps it’s my background as an English major—rather than a History major—but I found it especially unhelpful in one particular instance: how do I find and search the information associated with a unique author? With its focus on historical figures written about in sources, rather than the authors of those sources, the creators made a conscious choice to value historical figures over authors and sources.  To be fair, the structure does not necessarily preclude the possibility of searching author information, which appears in the Source table, and there is likely something to be said of the anonymous and possibly incomplete nature of certain Anglo-Saxon texts.  In examining the PASE interface, the creators appear to have resolved this issue somewhat by allowing users to browse by source, and listing the author’s name in place of the title of the source (which, no doubt, is done by default when the source document has no official title).  It is then possible to browse references within the source and to match the author’s name to a person’s name[2].  The decision to organize information in this way, however, de-emphasizes the role of the author and his historical significance, and reduces him to a faceless and neutral authority.  This is maybe to facilitate interpretation; Bradley & Short discuss the act of identifying factoid assertions about historical figures as an act of interpretation, in which the researcher must make a value judgment about what the source is saying about a particular person(8).  Questions about the author’s motives would only problematize this act.  The entire organization of the database, in fact, results in the almost complete erasure of authorial intent. What this analysis of PASE highlights for me is how important it is to be aware of the implications of our choices in designing databases and creating database interfaces.  The creators of PASE might not have intended to render the authors of their sources so impotent, but the decisions they made both in the construction of their database tables and of the user interface, and of the approach to entering factoid data had that ultimate result. Bradley, J. and Short, H. (n.d.).  Using Formal Structure to Create Complex Relationships: The Prosopography of the Byzantine Empire.  Retrieved from PASE Database Schema. (n.d.). [PDF].  Retrieved from Prosopography of Anglo-Saxon England. (2010, August 18). [Online database].  Retrieved from

[1] One caveat: As I am no expert, what is apparent to me may not be what actually is.  This analysis is necessarily based on what I can understand of how PASE and PBE are designed, both as databases and as web interfaces, and it’s certainly possible I’ve made incorrect assumptions based on what I can determine from the structure.  Not unlike the assumptions researchers must make when identifying factoid assertions (Bradley & Short, 8).
[2] For example, clicking “Aldhelm” the source will list all the persons found in Aldhelm, including Aldhelm 3, bishop of Malmsbury, the eponymous author of the source (or rather, collection of sources).  Clicking Aldhelm 3 will provide the Person record, or factoid—Aldhelm, as historical figure.  The factoid lists all of the documents attributed to him under “Authorship”.  Authorship, incidentally, is a secondary table linked to the Factoid table; based on the structure, it seems like this information is derived from the Colldb table, which links to the source table.  All this to show that it is possible but by no means evident to search for author information.

Brief update

Nothing new this week, unless I come up with something on the fly.  I’m knee-deep in figuring out ethics applications for directed study/thesis research, something I basically need to get done ASAP if I plan on doing any sort of data collection or analysis before the end of the term. I’ve also completed most of the response/review paper assignments required for my courses.

To make life more complicated, some database workshops related to my HUCO course this term have renewed my desire to do a bit of coding.  I’ve been toying with the idea of starting a simple PHP/mySQL project– unrelated to coursework– to refresh my memory and hone my (admittedly limited) programming skills. More on that, possibly, if anything comes of it.

You will also notice I’ve changed the look of the blog once more.  It needed a bit of a facelift.