Category Archives: DH

Culture Analytics @ UCLA Institute for Pure & Applied Mathematics

This spring, I was invited to speak at the Culture Analytics long program at UCLA’s Institute for Pure and Applied Mathematics, during the first workshop week, “Culture Analytics Beyond Text: Image, Music, Video, Interactivity, and Performance.” It was great to hear about so many projects from around the world, with people who work in computer vision, data analysis, art history, etc. all in the same room figuring out how to talk across disciplinary lines. Here’s the video of my talk on image analysis in the Vogue Archive.


Text and Data Mining Webinar for CRL


On July 29, Peter Leonard and I presented the webinar “Text and Data Mining in the Humanities and Social Sciences: Strategies and Tools” as part of the Center for Research Libraries‘ ongoing series on text and data mining. We had over 175 virtual attendees from a variety of institutions and fielded a number of excellent questions, many too complicated to answer via live chat or in the short Q&A sections (CRL added some of the chat questions into the video). It was great to see such interest in the topic. The range of questions provided a good reminder that we are still in the early days of establishing standards and practices for TDM, especially related to copyright and fair use. The CRL website has more information on the webinar, including our slides.

Network Analysis at the Digital Humanities Summer Institute

I had the good fortune to be able to attend the Digital Humanities Summer Institute last month in Victoria, British Columbia. As a pleasant contrast to muggy New England summers, the Pacific Northwest greeted me with beautiful blue skies and moderate, breezy weather. UVic has totem poles on campus, and a deer pressed her nose to the auditorium window as I was giving my colloquium presentation. It all felt very idyllic and charming.

Inner Harbour, Victoria

The impetus for my DHSI trip was Scott Weingart’s class called “Data, Math, Visualization, and Interpretation of Networks: An Introduction.” Scott’s lectures included plenty of space for discussion, as well as scheduled breaks, exercises, and time to try things out on our own. He was admirably both organized and flexible. We spent quite a bit of class time on the theory of network analysis, and some on the math behind the calculations that the various software packages perform, but this was with an eye toward knowing what was happening when we pushed buttons in the programs. We looked at many different types of data that can be usefully organized in network structures. This is perhaps an especially network-y moment in history–all the more reason to be aware of what we are doing and not put data into networks simply because we have Gephi and NodeXL to make spiffy diagrams.

Scott explained a power-law distribution by having us introduce ourselves to each other based on a rule that would be iterated across everyone in the class. Beginning with two people, who introduce themselves to each other, each new person comes to the group and chooses someone random, and then that person introduces the new person to someone they’ve already met. We drew the resulting connections as a graph on the board, then went outside and played a complicated game of catch in which each person could only throw the ball to someone they’d met in the earlier exercise. After person #2 in the network had caught the ball 40 times, we went back inside to graph the results in NodeXL, recreating the calculations we had done by hand. Person #2 (who luckily was good at catching the ball) had exponentially more connections than just about everyone else except person #1. Those of us who were introduced to someone else with fewer connections would ourselves not have as many connections.

Working in groups, we tried to create networks representing just six days in the Diary of Samuel Pepys, based on the original text. How to encode his various relationships and connections using nodes and edges? How to begin to keep track of all of the different people? Clearly, as Scott pointed out, the map is not the territory.

We read Franco Moretti’s Stanford Literary Lab pamphlet on character networks in Hamlet, and then set out to recreate his experiment, starting from scratch. Using the text of the play, we created a collaborative spreadsheet recording characters who speak to each other within given scenes. Interesting discussion followed about how much simplification is involved in saying that two characters speak to each other or don’t–and yet even that determination is not necessarily clear cut. (We also learned some valuable lessons about data-gathering in the process of creating the spreadsheet.) We then examined the graphs in NodeXL, removing Hamlet, Claudius, Horatio, etc. as Moretti did. We looked at the six principal characters within the court who have a 100% clustering coefficient–they form a tight network in which every character speaks to every other character. It is possible to treat literary experiments like these as hypotheses on data, and they can be repeated. What do network analyses of humanities data bring to light, and what do they obscure or reduce?


When I recounted some of the highlights of the DHSI class for colleagues in the DHLab back at Yale, they were particularly interested in the critiques of network analysis as interpretation–how do you know when a network is the right approach? What do you get from it that makes the labor of data entry and wrestling with Gephi worthwhile? As Scott put it, “Many types of information can be fit into networks; that doesn’t mean they should be.” Still, networks are clearly useful in looking at patterns of influence and the exchange of ideas, both of which play into any number of DH projects in history, literature, art, etc.