Needling the Old Guard: XML in Prosopography

The last few weeks we have been discussing the ongoing debate in the digital humanities between textual markup and databases. Reading K.S.B. Keats-Rohan’s “Prosopography for Beginners” on her Prosopography Portal (http://prosopography.modhist.ox.ac.uk/index.htm), I found it interesting that the tutorial focuses initially and primarily on mark-up. Essentially, Keats-Rohan outlines three stages to prosopography:
1. “Data modelling”—For Keats-Rohan, this stage is accomplished by marking up texts with XML tags “to define the groups or groups to be studied, to determine the sources to be used from as wide a range as possible, and to formulate the questions to be asked.” It does far more than that, however, since the tags identify the particular features of sources that need to be recorded. Keats-Rohan covers this activity extensively with eleven separate exercises, each with its own page.
2. “Indexing”—This stage calls for the creation of indexes based on the tag set or DTD developed in stage one. These indexes collect specific types of information, such as “names”, “persons” and “sources”. These indexes are then massaged with the addition of biographical data into a “lexicon”, with the application of a “questionnaire” (i.e. a set of questions to query your data points.) Ideally, it is suggested, this is done through the creation of a relational database with appropriately linked tables. A single page is devoted to the explanation of this stage, with the following apology:

It is not possible in the scope of this tutorial to go into detail about issues relating to database design or software options. Familiarity with the principles of a record-and-row relational database has been assumed, though nothing more complex that an Excel spreadsheet is required for the exercises.

…11 lengthy exercises for XML, but you’re assumed to appreciate how relational databases work by filling out a few spreadsheets?
3. “Analysis”—This is, of course, the work of the researcher, once the data collection is complete. This section of the tutorial includes a slightly longer page than stage 2 with 4 sample exercises. The exercises are designed to teach users how prosopographical analysis can be conducted.
It strikes me as incongruous that, for a research method that relies so heavily on the proper application of a relational database model, so little time is devoted to discussing its role in processing data. Instead, Keats-Rohan devotes the majority of her tutorial in formulating an XML syntax that, when all is said and done, really only adds an unnecessary level of complexity to processing source data. You could quite easily completely do away with stage one, create your index categories in stage two as database tables, and process (or “model”) your data at that point, simply by entering it into your database. What purpose does markup serve as a means of organizing your content, if you’re just going to reorganize it into a more versatile database structure?
Keats-Rohan’s focus on markup starkly emphasizes how XML is far more greatly valued than databases by humanities scholars. Since both are useful for quite different purposes, and relational databases have so much to offer to humanities scholarship—as prosopographies prove—I am baffled that such a bias persists.

The Implications of Database Design

In studying the database schema for the Prosopography of Anglo-Saxon England (PASE), several features of the design are immediately apparent[1].  Data is organized around three principal tables, or data points: the Person (i.e. the historical figure mentioned in a source), the Source (i.e. a text or document from which information about historical figures is derived), and the Factoid (i.e. the dynamic set of records associated with a particular reference in a source about a person).  There are a number of secondary tables as well, such as the Translation, Colldb and EditionInfo tables that provide additional contextual data to the source, and the Event, Person Info, Status, Office, Occupation and Kinship tables, among others, that provide additional data to the Factoid table.  In looking at these organizational structures, it is clear that the database is designed to pull out information about historical figures based on Anglo-Saxon texts.   I admire the versatility of the design and the way it interrelates discrete bits of data (even more impressive when tested using the web interface at http://www.pase.ac.uk ), but I can’t help but recognize an inherent bias in this structure. In reading John Bradley and Harold Short’s article “Using Formal Structures to Create Complex Relationships: The Prosopography of the Byzantine Empire—A Case Study”, I found myself wondering at the choices made in the design of both databases.  The PBE database structure appears to be very similar if not identical to that of the PASE.  Perhaps it’s my background as an English major—rather than a History major—but I found it especially unhelpful in one particular instance: how do I find and search the information associated with a unique author? With its focus on historical figures written about in sources, rather than the authors of those sources, the creators made a conscious choice to value historical figures over authors and sources.  To be fair, the structure does not necessarily preclude the possibility of searching author information, which appears in the Source table, and there is likely something to be said of the anonymous and possibly incomplete nature of certain Anglo-Saxon texts.  In examining the PASE interface, the creators appear to have resolved this issue somewhat by allowing users to browse by source, and listing the author’s name in place of the title of the source (which, no doubt, is done by default when the source document has no official title).  It is then possible to browse references within the source and to match the author’s name to a person’s name[2].  The decision to organize information in this way, however, de-emphasizes the role of the author and his historical significance, and reduces him to a faceless and neutral authority.  This is maybe to facilitate interpretation; Bradley & Short discuss the act of identifying factoid assertions about historical figures as an act of interpretation, in which the researcher must make a value judgment about what the source is saying about a particular person(8).  Questions about the author’s motives would only problematize this act.  The entire organization of the database, in fact, results in the almost complete erasure of authorial intent. What this analysis of PASE highlights for me is how important it is to be aware of the implications of our choices in designing databases and creating database interfaces.  The creators of PASE might not have intended to render the authors of their sources so impotent, but the decisions they made both in the construction of their database tables and of the user interface, and of the approach to entering factoid data had that ultimate result. Bradley, J. and Short, H. (n.d.).  Using Formal Structure to Create Complex Relationships: The Prosopography of the Byzantine Empire.  Retrieved from http://staff.cch.kcl.ac.uk/~jbradley/docs/leeds-pbe.pdf PASE Database Schema. (n.d.). [PDF].  Retrieved from http://huco.artsrn.ualberta.ca/moodle/file.php/6/pase_MDB4-2.pdf Prosopography of Anglo-Saxon England. (2010, August 18). [Online database].  Retrieved from http://www.pase.ac.uk/jsp/index.jsp


[1] One caveat: As I am no expert, what is apparent to me may not be what actually is.  This analysis is necessarily based on what I can understand of how PASE and PBE are designed, both as databases and as web interfaces, and it’s certainly possible I’ve made incorrect assumptions based on what I can determine from the structure.  Not unlike the assumptions researchers must make when identifying factoid assertions (Bradley & Short, 8).
[2] For example, clicking “Aldhelm” the source will list all the persons found in Aldhelm, including Aldhelm 3, bishop of Malmsbury, the eponymous author of the source (or rather, collection of sources).  Clicking Aldhelm 3 will provide the Person record, or factoid—Aldhelm, as historical figure.  The factoid lists all of the documents attributed to him under “Authorship”.  Authorship, incidentally, is a secondary table linked to the Factoid table; based on the structure, it seems like this information is derived from the Colldb table, which links to the source table.  All this to show that it is possible but by no means evident to search for author information.

Brief update

Nothing new this week, unless I come up with something on the fly.  I’m knee-deep in figuring out ethics applications for directed study/thesis research, something I basically need to get done ASAP if I plan on doing any sort of data collection or analysis before the end of the term. I’ve also completed most of the response/review paper assignments required for my courses.

To make life more complicated, some database workshops related to my HUCO course this term have renewed my desire to do a bit of coding.  I’ve been toying with the idea of starting a simple PHP/mySQL project– unrelated to coursework– to refresh my memory and hone my (admittedly limited) programming skills. More on that, possibly, if anything comes of it.

You will also notice I’ve changed the look of the blog once more.  It needed a bit of a facelift.

The Knowing and Agency of Information Need

There is a fuzzy distinction between “information” and “knowledge” that is strongly emphasized in Wilson’s article “On User Studies and Information Needs”.  Information exists as a subcategory of knowledge; in terms of the models we’ve previously discussed—in particular, Nonaka and Cook & Brown—knowledge encompasses both the property of information and context, and the activity of interpretation (or “knowing”).  Wilson describes this in his figure for the “Universe of Knowledge” (661).  An alternative interpretation of this model would be to consider the concentric circles as “bodies of knowledge”, and the intersecting lines between “users”, “information systems”, and “information resources” as the action or practice of “knowing”.

The distinction between “information” and “knowledge” becomes fuzzy the instant you introduce agency into the equation—particularly, human agency.  As soon as we begin thinking of people accessing, transmitting, and creating information, we also have to start thinking about processes and motivation.  The concept of “information needs”, then, is epistemological; as Wilson describes it, an information need arises from a more basic “human need” that may have a physiological, affective or cognitive source, implying that a person must know something before seeking information (663).  That initial knowing or knowledge might be implicit or tacit.  You might feel hungry and, knowing implicitly that you must eat to resolve this physiological need, you might seek information about the nearest restaurant or supermarket.  How you go about doing that would be categorized as “information seeking behaviour”, and would be influenced by context—for instance, what you already know about what restaurants or supermarkets look like, what neighborhood you are in, what kind of restaurant or food you could afford and how much money you have in your purse or wallet, what information resources are most easily available to you, etc, etc.  If you have an iPhone, you might simply locate the nearest restaurant using GIS technology.  If not, you might consult a nearby map or directory, or simply look for signs of restaurants.  Or you might ask someone.  All of these represent different behaviours designed to fulfill an information need.  Once you have located a restaurant, you have fulfilled the information need required to fulfill your physiological need—hunger.  You have acquired information—namely, where to find the nearest restaurant from your starting point.  But you have also acquired a great deal of additional, potentially useful knowledge about the neighborhood, about other businesses you came across that were not restaurants, about how to find restaurants in general, and so on and on.  What you now know is not limited to the restaurant itself and the meal you are about to have, but includes every new piece of information that you came across throughout the information seeking process.  Including the process itself.  And this knowledge will be available to you the next time you have an information need.

Wilson identifies three definitions of “information” in user studies research (659):

1. Information as a physical entity (a book, a record, a document).

2. Information as a medium, or a “channel of communication” (oral and written).

3. Information as factual data (the explicit contents of a book, or record, or document).

These definitions are useful, but need to be expanded.  In his analysis, Wilson only discusses information as being transmitted orally or in writing.  There are, however, a number of alternative means for acquiring information.  Taking my previous example, you might smell cooked food before you see the marquee above a restaurant.  Or you might first notice the image of a hamburger on a sign before reading the words printed underneath.  Both of these examples—visual and olfactory information media—demonstrate that messages are transmitted in a variety of ways.  Additionally, we cannot forget the context.  If I am on a diet, I might ignore the building that smells of French fries and hamburgers.  If I am allergic to certain foods, an image of the type of fare served in a particular establishment might turn me off of it.  And it is possible that I miss these messages entirely; if I have a cold, maybe I won’t smell the hamburgers, and walk past that particular restaurant, unaware that it could satisfy my need.

Knowledge seeking can also be considered in terms of communication.  When I look at a sign, a message containing information is being transmitted to me.  Simplistically, this is the “conduit” metaphor for communication, which usually disregards or downplays the notions of context, influence and noise.  The communication process is far more complex, but conceptually the metaphor is useful for highlighting the roles of transmitter/speaker, message and receiver/listener.  Thomas, Kellogg and Erickson explore this idea in their article by suggesting the alternative “design-interpretation” model.  They argue that “getting the right knowledge to people” is only part of the equation, and that “people need to engage with it and learn it.” (865)  Thomas, et al. describe the model as follows:

The speaker uses knowledge about the context and the listener to design a communication that, when presented to and interpreted by the listener, will have some desired effect. (865)

The application of existing knowledge about the environment and the target audience by the speaker (or transmitter) is important to understand.  When I see the image of the hamburger, I can assume that the restaurant owners put some thought into presenting an appetizing, attractive product that will draw the most clientele.  If the image makes my mouth water, the message is received—and if I am then motivated to enter the restaurant, the owners achieved the desired effect.  If, however, I find the image unappealing, the message has failed; not because I don’t understand the information it contains, but because the restaurant owners failed to appropriately apply their knowledge about what potential customers want.  Perhaps they lacked the information they needed in order to do this successfully.

Cited References

Thomas, J. C., Kellog, W. A. and Erickson, T. (2001). The knowledge management puzzle: Human and social factors in knowledge management. IBM System Journal, 40(4), 863-884.

Wilson, T. D. (2006). On User Studies and Information Needs.  Journal of Documentation, 62(6), 658-670.

The Commonplace Book—extinct form of critical reading and sensemaking?

I found Robert Darnton’s chapter on the Renaissance tradition of the commonplace book an interesting insight into how people made—and make—sense of what they read.  It made me wonder about how this tradition of reading has changed over time.  Darnton suggests that today’s reader has learned to read sequentially, while the early modern reader read segmentally, “concentrating on small chunks of text and jumping from book to book” (169).  The implication is that, from this transformation of practice, we have lost a critical approach to reading.  The commonplace book, Darnton describes, was a place where early modern readers collected bits and pieces of texts alongside personal reflections about their significance (149).  This activity was a hybrid of reading and writing, making an author of the reader, and serving as a method for “Renaissance self-fashioning”—the grasping for a humanist understanding of the autonomous individual (170).  Arguably, in adopting a sequential mode of reading and forgetting the practice of the commonplace book, we have lost a useful tool for making sense of the world and of ourselves.

At the beginning of the chapter, Darnton makes a curious allusion to the present reality, the Digital Age.  He writes: “Unlike modern readers, who follow the flow of a narrative from beginning to end (unless they are digital natives and click through texts on machines), early modern Englishmen read in fits and starts and jumped from book to book.” [Emphasis is my own] (149). Clearly he is referring to hypertextual practice, the connective structure of texts on the Web that are joined through a network of inter-referential links, and provoke a non-sequential mode of reading.  The Web has initiated a number of changes in how we read, write, create and make sense of texts.  Hypertextuality is certainly one them, but I think Darnton only touches upon the tip of the iceberg with this passing reference.  While the commonplace book as genre might be extinct, new hybrid forms of critical reading/writing have taken its place.  Take, for instance, the blogging phenomenon.  Many people today write blogs on a vast variety of subjects.  Most represent critical responses to other media—articles, videos, images, stories, other blog posts.  They are the commonplace book of the digital native.  The difference is that the digital native’s commonplace book is accessible to all, and (more often than not) searchable.  Consider also the phenomenon of microblogging in the form of Twitter.  As an example, I am going to look at my own Twitter feed (http://twitter.com/eforcier – I have attached a page with specific examples).  In 140 character segments I carry on conversations, post links to online documents and express my reactions to such texts.  It is, in fact, perfectly possible to consider a 21st century individual’s Twitter feed analogous to the early modern reader’s commonplace book.  These activities represent a far more complex mode of reading than Darnton assigns the contemporary reader.  It is a type of reading that is at times segmental, at times sequential, but is remarkable because of the interconnectivity of sources and the critical engagement of the reader that it represents.  What is most interesting is that, rather than emphasizing the notion of the autonomous individual, these digital modes of reading/writing emphasize collectivity and community—what could be described as a “Posthuman self-fashioning”.

 

Darnton, R. (2009).  The Case for Books. New York: PublicAffairs.  219p.

***

I have not include the appendix of selected tweets that was submitted along with this assignment, but I’m sure you’ll get the gist by viewing my Twitter page: http://www.twitter.com/eforcier

Forms of Knowledge, Ways of Knowing

The principle premise of Cook & Brown’s “Bridging Epistemologies” is that there are two separate yet complimentary epistemologies tied up in the concept of knowledge.  The first one of these is found in the traditional definition of knowledge, which describes knowledge as something people possess—that is, a property (in more than one sense of the word) that is.  Cook & Brown refer to this as the “epistemology of possession”, and it can be characterized as the “body” of knowledge.  The second, “epistemology of practice” hones in on the act of knowing found in individual and group activities—it is the capacity of doing.  Cook & Brown contend that the interplay between these two distinct forms is how we generate new knowledge, in a manner not unlike Nonaka’s spiral structure of knowledge creation (with one key difference, described below), which they call the “generative dance”.

Another way I conceptualized this distinction (using analogy, as Nonaka urges, to resolve contradiction and generate explicit knowledge from tacit knowledge, (21)) was to consider these two notions of “knowledge”/”knowing” from a linguistic perspective: if knowledge and knowing were distinct properties of the English sentence, knowing would be the verb and knowledge the object.  This is supported by Cook & Brown’s emphasis on how “knowledge” can be applied in practice as a tool to complete the task, and can result from the act of knowing (388); “knowing” acts upon (and through) “knowledge”, just as the verb acts upon (or through) the object.  The subject—that is, the person or people who are performing the action—is an essential element both to the formulation of knowledge/knowing and to the sentence.  The subject’s relationship to the verb and the object is very similar to the individual (or group’s) relationship to knowing and knowledge.  The verb represents enaction by the subject—as knowing does—and the object represents that which is employed, derived or otherwise affected by this enaction—as knowledge is.  Cook & Brown’s principle of “productive inquiry” and the interaction between knowledge and knowing, then, can be represented by the structure of the sentence.

Cook & Brown’s premise has many important implications for knowledge management.  Perhaps the most important of these is the idea that knowledge is abstract, static and required for action (that is, “knowing”) in whatever form it takes, while knowing is dynamic, concrete and related to forms of knowledge.  Of these characteristics, the most dramatic must be the static nature of knowledge; in what is Cook & Brown’s most significant break with Nonaka, they state that knowledge does not change or transform.  The only way for new knowledge to be created from old knowledge is for it to be applied in practice (i.e. “productive inquiry”).  Nonaka perceives knowledge as something malleable, that can transform from tacit to explicit and back again, while Cook & Brown unequivocally state that knowledge of one form remains in that form (382, 387, 393, 394-95).  For Cook & Brown, each form of knowledge (explicit, tacit, individual and group) performs a unique function (382).  The appropriate application of one form of knowledge in the practice (the act of knowing) can, however, give rise to knowledge in another (393).

I found Blair’s article “Knowledge Management: Hype, Hope or Help?” useful as a supplement to Cook & Brown.  Blair makes several insightful points about knowledge and knowledge management, such as the application of Wittgenstein’s theory of meaning as use in defining “knowledge”, identifying abilities, skills, experience and expertise as the human aspect of knowledge, and raising the problem of intellectual property in KM practice.  Blair’s most valuable contribution, however, is to emphasize the distinction between the two types of tacit knowledge.  This is a point Cook & Brown (and Nonaka) fail to make in their theory-sweeping models.  It is also a point I have struggled with in my readings of Cook & Brown and Nonaka.  Tacit knowledge can be either potentially expressible or not expressible (Blair, 1025).  An example of tacit knowledge that is “potentially expressible” would be heuristics—the “trial-and-error” lessons learned by experts.  Certainly in my own experience, this has been a form of tacit knowledge that can be gleaned in speaking with experts and formally expressed to educate novices (generating “explicit knowledge” through the use of “tacit knowledge”).  An example of inexpressible tacit knowledge would be the “feel” of the flute at different levels of its construction described in Cook & Brown’s example of the flutemakers’ study (395-96); this is knowledge that can only be acquired with experience, and no amount of discussion with experts, of metaphor and analogy, will yield a sufficient understanding of what it entails.  It is an essential distinction to make, since as knowledge workers we must be able to determine how knowledge is and should be expressed.

 

Cited References

Blair, D. (2002). Knowledge management: Hype, hope, or help? Journal of the American Society for Information Science and Technology 53(12), 1019-1028.

Cook, S. D. N., and Brown, J. S. (1999). Bridging Epistemologies: The Generative Dance between Organizational Knowledge and Organizational Knowing, Organization Science 10(4), 381-400.

Nonaka, I. (1994). A Dynamic Theory of Organizational Knowledge Creation. Organization Science 5(1), 5-37.

Shapiro’s Shakespeare and the “Generative Dance” of his Research

Perhaps the most interesting thing about James Shapiro’s A Year in the Life of Shakespeare is the kind of scholarship that it represents.  Drawing upon dozens—likely hundreds—of sources, Shapiro presents a credible depiction of Shakespeare’s life in 1599.  Rather than limiting himself to sources that are exclusively about Shakespeare or his plays, Shapiro gathers a mountain of data about Elizabethan England.  He consults collections of public records that shed light either on Shakespeare’s own life or the life of his contemporaries, not just to identify the historical inspiration and significance of his plays, but to give us an idea of what living in London as a playwright in 1599 would have been all about.  This, to me, is a fascinating use of documentary evidence that few have successfully undertaken.

Before I go on, I should note that I’m currently working on a directed study in which I am being thoroughly steeped in the objects and principles of knowledge management.  It is in light of this particular theoretical context that I read Shapiro and think, “he’s really on to something here.”   In their seminal article “Bridging Epistemologies: The Generative Dance Between Organizational Knowledge and Organizational Knowing”, Cook & Brown present a framework in which “knowledge”—the body of skills, abilities, expertise, information, understanding, comprehension and wisdom that we possess—and “knowing”—the act of applying knowledge in practice—interact to generate new knowledge.  Drawing upon Michael Polanyi’s distinction between tacit and explicit knowledge, Cook & Brown present a set of distinct forms of knowledge—tacit, explicit, individual and group.  They then advance the notion of “productive inquiry”, in which these different forms of knowledge can be employed as tools in an activity—such as riding a bicycle, or writing a book about an Elizabethan dramatist—to generate new knowledge, in forms that perhaps were not possessed before.  It is the interaction between knowledge and knowing that produces new knowledge, that represent a “generative dance”.

Let’s return for a moment to Polanyi’s tacit and explicit knowledge.  The sources Shapiro is working with are, by their nature, explicit, since he is working with documents.  The book itself is explicit, since it too is a document, and the knowledge it contains is fully and formally expressed.  The activity of taking documentary evidence from multiple sources, interpreting each piece of evidence in the context of the other sources, and finally synthesizing all of it into a book, represents more epistemic work than is represented than in either the book or the sources by themselves.  The activity itself is what Cook & Brown describe as “knowing”, or the “epistemology of practice”.  The notions of recognizing context and of interpretation, however, suggest that there’s even more going on here than meets the eye.  In this activity, Shapiro is merging these disparate bits of explicit knowledge to develop a hologram of Shakespeare’s 1599.  This hologram is tacit—it is an image he holds in his mind that grows more and more sophisticated the more historical relational evidence he finds.  Not all of the patterns and connections he uncovers are even expressible until he begins the synthesis, the act of writing his book.  Throughout this process, then, new knowledge is constantly being created—both tacit and explicit.

Let’s also consider for a moment Cook & Brown’s “individual” and “group” knowledge.  Shapiro’s mental hologram can be safely classified as individual knowledge.  And each piece of evidence from a single source is also individual knowledge (though, certainly, some of Shapiro’s sources might represent popular stories or widely known facts, and thus group knowledge).  The nature of Shapiro’s work, however, the collective merging of disparate sources, problematizes the individual/group distinction.  What arises from his scholarship is neither group knowledge (i.e. knowledge shared among a group of people) or individual knowledge (i.e. knowledge possessed by an individual), but some sort of hybrid that is not so easily understood.

From a digital humanist perspective, we can think of Shapiro’s scholarship (and have) as a relational database.  All of the data and the documentary evidence gets plugged into the database, and connections no one even realized existed are then discovered.  We might have many people adding data to the database, sharing bits of personal knowledge.  And everyone with access to the database can potentially discover new connections and patterns, and in doing so create new knowledge.  Would such a collective be considered group knowledge?  Would individual discoveries be individual knowledge?  Would the perception of connections be tacit or explicit?  It is not altogether clear because there are interactions occurring at a meta-level, interactions between data, interactions between sources, interactions between users/readers and the sources and the patterns of interacting sources.  What is clear is that this interactive “dance” is constantly generating additional context, new forms of knowledge, new ways of knowing.

 

Cook, S. D. N., and Brown, J. S. (1999). Bridging Epistemologies: The Generative Dance between Organizational Knowledge and Organizational Knowing, Organization Science 10(4), 381-400.

Shapiro, J. (2006).  A Year in the Life of William Shakespeare: 1599.  New York: Harper Perrennial.  394p.