[Textual Visualization and Analysis: Poetry]

Determining analytical elements

As a poet, I'm interested in continuing to develop my work in new directions. Part of that development includes reviewing my existing work and looking at patterns in that work. By making that discovery and development process visible, I work to understand facets of my poetry that have been previously inaccessible to me.

I decided on three approaches to unraveling meaning in my work:

  • Manuscript scans
  • Word frequency analysis
  • Machine Poetry

I developed a workplan, and made a note of which tasks were completed along the way.

Each individual approach had its own learning curve and workflow, which can be seen here.

+ Manuscript scans

The first way I wanted to re-imagine my poems was from their physical manuscript manifestations, and how that messy process could be re-represented in a useful way. In order to see the kind of palimpsest effect that manual revisions have, I have scanned the original manuscript pages for one poem, cropped it to a single stanza, and displayed them visually in a way that emphasizes them as visual artifacts as well as their textual changes, as an animated gif.

Results: These manuscript scans are animated here.

+ Word frequency analysis

The next piece of looking more closely at the work is a way to give me a sense of the "invisible" patterns present in the poems. While traditional word frequency analysis can give me a sense of themes that I return to (as are expressed in my choice of language), even the singular words give me a picture of the overall tone of my writing.

I have chosen thirty-eight works that are currently in a static form; in other words, the revision process has not taken place in the past six months before their selection. These works are already in a digital format as Microsoft Word documents. I have converted them to plain text files using a freeware conversion program, and then review and edit them manually. Once they were in the correct format, I installed Cygwin, a collection of free software tools that alowed my Windows workstation to act like a UNIX box. This gave me access to more powerful file manipulation tools.

For example, Cygwin allowed me to use command line shell scripts to process these thirty-eight files efficiently and automatically, once I had renamed all of the electronic files with an ISO-compliant file naming scheme. I concatenated all of the poems together into one large file, and converted it into a list of every word that appears and its frequency. I also converted it into a list of the words themselves, and used that text file with a freeware program to create a tag cloud of my 300 most frequent words, providing a kind of visual gestalt of the frequencies.

Results: Here is the resulting tag cloud, using a free markup tool from TagCrowd.

+ Machine poetry

Fluxus (an art movement whose purpose was to question not just the boundaries between creative disciplines like art, literature, and music, but also to question the very concepts of those disciplines) provides another opportunity for looking at my work from a new perspective. Fluxus was interested in decoupling the art from the artist, and redefining what could be considered a work of art. Because I want to get a sense of where my self or voice is resident in my work, I want to decouple my intent from the act of composition, but not from the initial creation.

I can effect that removal of intent by splitting up the 38 poem files I have selected, which are static or finished works into one large file of individual lines, including their titles. This file can be seen here. These lines are displayed exactly as they appeared in the poems, and will maintain their capitalization, punctuation, and indentation, just as if they had been cut as strips from a paper copy of these works.

These lines are recombined through an automated and randomized process, and displayed dynamically on the browser page using scripts. This is not a strictly random process; the set of lines is small enough that randomization can show repeated lines or stanzas, not unlike some forms of poetry. My interest here lies in whether or not these machine poems will still contain enough of my voice that I would consider them part of my serious work, and not simply an experiment. While most machine poems operate either on an entirely random selection of individual words from a file or database or a kind of "fill in the blank with a part of speech" approach, keeping the integrity of the lines implies that some part of my voice/self/style will remain.

This has been the most complex part of the project, and has required HTML, a look at XML (especially the Text Encoding Initiative's standards [TEI sample]), XSLT, Perl, Javascript, and a host of other technology resources.

Results: Here is the resulting exploration and the cgi-based poetry machine, which uses a modified Perl script.

Representative output of this Poetry Machine and another JavaScript version (in static, printable form) are found here.


In Summary

Transforming my poetry with these three approaches has been an informative journey. I have honed my coding in HTML, and re-learned how to read Perl scripts. I taught myself bits of cgi-scripting and JavaScript, and how to transform .tifs into animated .gifs using Adobe ImageReady.

Not everything went swimmingly; I had to touch the text files multiple times to get what I wanted to happen to work properly. The same is true of the web pages that make up this project; many revisions have taken place and many structures considered, tested, and replaced. At this stage I have yet to see how the transference of the web materials to a CD will go, but I am hopeful that my preparatory work will make that part of the workflow go smoothly.

There are implications for the future here, certainly. Whole manuscript pages can be scaled and animated to show the process of their transformations visually. Tag clouds are simple to do, after an initial processing of texts is complete, and so could be made in multiples with different emphases. Even though only 18 out of the whole of the 107 digitized texts were tagged with TEI XML tags for verse, those fundamental principles could be applied in the future to tagging all nouns in the poems, and then loading those into a tag cloud. Finally, the Poetry Machines could be made more robust, with a larger corpus and a SQL database on the backend.

As for my art, the Machines are the most intriguing. The boundary-crossing of intent as a prerequisite for serious poetry seems to have been blurred here. Writing the original lines in one context and then shearing them from that context and placing them in a more arbitrary one does seem to bend the original meanings. However, and this is the interesting part, it doesn't seem to break them entirely. This is a fruitful place to grow novel poems, both in planting the seeds for newly written works, and for stamping them out in a Machine.

A bibliography of digital and print works is located here.

All creative works, generated or otherwise, copyright © 2007-2012 - Ray Henry