Network Analysis at the Digital Humanities Summer Institute

I had the good fortune to be able to attend the Digital Humanities Summer Institute last month in Victoria, British Columbia. As a pleasant contrast to muggy New England summers, the Pacific Northwest greeted me with beautiful blue skies and moderate, breezy weather. UVic has totem poles on campus, and a deer pressed her nose to the auditorium window as I was giving my colloquium presentation. It all felt very idyllic and charming.

Inner Harbour, Victoria

The impetus for my DHSI trip was Scott Weingart’s class called “Data, Math, Visualization, and Interpretation of Networks: An Introduction.” Scott’s lectures included plenty of space for discussion, as well as scheduled breaks, exercises, and time to try things out on our own. He was admirably both organized and flexible. We spent quite a bit of class time on the theory of network analysis, and some on the math behind the calculations that the various software packages perform, but this was with an eye toward knowing what was happening when we pushed buttons in the programs. We looked at many different types of data that can be usefully organized in network structures. This is perhaps an especially network-y moment in history–all the more reason to be aware of what we are doing and not put data into networks simply because we have Gephi and NodeXL to make spiffy diagrams.

Scott explained a power-law distribution by having us introduce ourselves to each other based on a rule that would be iterated across everyone in the class. Beginning with two people, who introduce themselves to each other, each new person comes to the group and chooses someone random, and then that person introduces the new person to someone they’ve already met. We drew the resulting connections as a graph on the board, then went outside and played a complicated game of catch in which each person could only throw the ball to someone they’d met in the earlier exercise. After person #2 in the network had caught the ball 40 times, we went back inside to graph the results in NodeXL, recreating the calculations we had done by hand. Person #2 (who luckily was good at catching the ball) had exponentially more connections than just about everyone else except person #1. Those of us who were introduced to someone else with fewer connections would ourselves not have as many connections.

Working in groups, we tried to create networks representing just six days in the Diary of Samuel Pepys, based on the original text. How to encode his various relationships and connections using nodes and edges? How to begin to keep track of all of the different people? Clearly, as Scott pointed out, the map is not the territory.

We read Franco Moretti’s Stanford Literary Lab pamphlet on character networks in Hamlet, and then set out to recreate his experiment, starting from scratch. Using the text of the play, we created a collaborative spreadsheet recording characters who speak to each other within given scenes. Interesting discussion followed about how much simplification is involved in saying that two characters speak to each other or don’t–and yet even that determination is not necessarily clear cut. (We also learned some valuable lessons about data-gathering in the process of creating the spreadsheet.) We then examined the graphs in NodeXL, removing Hamlet, Claudius, Horatio, etc. as Moretti did. We looked at the six principal characters within the court who have a 100% clustering coefficient–they form a tight network in which every character speaks to every other character. It is possible to treat literary experiments like these as hypotheses on data, and they can be repeated. What do network analyses of humanities data bring to light, and what do they obscure or reduce?

HamletCourt

When I recounted some of the highlights of the DHSI class for colleagues in the DHLab back at Yale, they were particularly interested in the critiques of network analysis as interpretation–how do you know when a network is the right approach? What do you get from it that makes the labor of data entry and wrestling with Gephi worthwhile? As Scott put it, “Many types of information can be fit into networks; that doesn’t mean they should be.” Still, networks are clearly useful in looking at patterns of influence and the exchange of ideas, both of which play into any number of DH projects in history, literature, art, etc.