When is a webtext?

Mats Dahlström

Lecturer and PhD student, Dept of Library and Information Science,University College of Borås and Gothenburg University.

E-mail: mad@adm.hb.se ; web site: http://www.adm.hb.se/personal/mad/index.htm


This article was first printed in:
Text Technology : The Journal of Computer Text Processing. Vol 11 (2002) : 1, pp. 139-161.


Abstract

There are textual qualities, vaguely discernible in print, supported by new media, and remarkably enhanced by the Web, that suggest the need for further distinctions of textuality. Such qualities include the split between document storage and presentation devices, binarity, dynamics, distribution logistics, kinetics, versatile markup, document layering, and hypertextuality. Any textuality is constrained by the semiotic systems, where production and consumption of text take place. In print culture, text is normally defined as sequence or two-dimensional hierarchy. Digital textuality and its subclass, Webtextuality, add a third dimension: spatial depth. Perhaps even a fourth: time.

The theme of this issue is webtextuality. What, then, is 'webtext'? I suppose you can explain webtext quite simply by saying that it is text, that it is digital, and that it is distributed on the web. That's all very nice, but what does it mean? What do the words 'text' and 'digital' in 'digital text' suggest, and what are the specifics of such text when available on the web, as opposed to those of printed text? A discussion of webtextuality might fruitfully start with some tentative conceptual considerations: what, if anything, makes text on the web particular? The fact that it is digital? What does that imply, anyway? What distinguishes webtext from other digital texts? And what do we talk about when we talk about text? Do printed text, digital text, and ultimately, webtext at all share a feature?

Without being so naive as to neglect the contextual constrains of the text concept and aim for some stable generic definition, I'd like to discuss a print textuality perspective and see to what degree it is severely challenged by digital media in general and by the web in particular. The purpose of this article is in other words not to put forth new terminological distinctions and definitions, a pursuit not at all suited to the present extent and scope, but rather to exemplify the kind of theoretical problems at hand. Hopefully this will manifest the tension in the text concept and point to a few particulars of digital media and the web, for which traditional text definitions are in need of further distinctions. The various problems the text concept tends to run into, are in effect the results of fundamental changes in the semiotic systems, where production and consumption of text take place. Particularly, I want to stress the fact that even the plainest, orthodox web text by itself offers considerable challenges to traditional text concepts rooted in print culture, and causes them to fall short.

Let us keep in mind a very simple example, to which I will return later. Suppose two students want to electronically store the same article published in a web-only magazine. One student (A) chooses to save the HTML text file (i.e. including the markup tags) for future reading in a web browser, while the other (student B) selects all the characters presented by the web browser at the screen, copies them, and pastes them into a word processor document. There would probably be some agreement that the students have stored the same work. But is it also fair to say that the two students have stored the same text? This boils down to the question of whether the markup level, explicitly its HTML tags, is to be regarded as extrinsic or intrinsic to the work's text. If extrinsic, the markup level of any text will have to be regarded as "outside" the text itself, i.e. as programming script, paratext or paralinguistic elements. If intrinsic, which of the two students have stored the text, if in fact there is one?

From a bibliographical point of view, this might present a tricky discussion. The act of copying an electronically distributed article can certainly be performed in ways so as to lose important (meta)information contained in the markup tags. How much loss is acceptable for us in order to be able to regard the two versions as the same text? At the end of the day, this all comes down to very basic concept of text. Is there any way to discover the essential qualities of textual objects, that justifiably distinguishes them from the members of all other classes of objects?

1. Textonomy

The scope of the text concept has extended significantly during the last decades, resulting in considerable ambiguity, at times at the cost of manageability. There certainly are a number of competing text perspectives to choose from: text as semiotic systems; as material manifestations in, for example, paper form; as social processes [1]; as Wittgensteinian language games; as oral communication; as media type or document type; and even as a general structure (texture) what so ever. Text, apparently, is being defined specifically for each separate scientific discipline [2], discourse and use. Two major problems appear. Firstly, the predominant definitions and usages of the text concept, deeply rooted in certain historical media conditions, seem to face problems when applied to digitally produced and distributed objects. Secondly, the text definitions of the areas mentioned seem difficult to reconcile. Even within each separate discipline, there are often quite irreconcilable differences in opinion as to what really constitutes a text. It is especially troublesome to identify a concordant working definition that might prove to be adequate for: a) different types of supporting media - paper as well as electronic environments; b) different presentational forms; c) other medial modes besides verbal [3]; d) an analytical basis of late prefigations of text, for instance hypertext.

Already the scholarly area occupied first and foremost with verbal expressions has been split in two: on the one hand modern literary theory and criticism, where text has been the object of metaphorisation and abstraction to the degree that it increasingly denotes signified meaning rather than signifying expressions. On the other hand, and in the background, as it were, fields such as textual criticism and analytical bibliography have assiduously maintained a much more concrete notion of the concept of text. To the latter, document texts themselves [4] are at the focus as signifying records, rather than as the subsequent signified meanings and literary values.

Granted, there is much hyperfiction and experimental e-writing on the web, and scholarly discourse on digital textuality, including hypertextuality, deals with the matter mostly from the point of view of literary theory. Nevertheless, to be able to understand and unambiguously talk about current digital practices and digitally produced textual objects, one needs terms and concepts that precisely try to treat text as an extrinsic expressional entity, at the level of signifiers, rather than as an intrinsic entity at the level of signifieds (e.g. as the implied meanings of the works on the web).

Espen Aarseth has formulated this distinction as one between textonomy and textology, the former dealing with the functions and roles of verbal media themselves, whereas the latter is a "subsequent" matter of e.g. hermeneutics, implied meaning, and intertextuality, i.e. of "semantics, influence, otherness, mental events, intentionality, and so forth" [5].

Textology, such as modern and in particular postmodern literary theory, often exhibits an inability to include in its text concept the material properties of the message carrier, and to consider its importance as a tool in the process of meaning making. A plausible reason for this deficiency is the fact that the material aspects of text and the physical properties of documents, as a result of the historical total dominion of paper media, have been practically transparent and, consequently, considered as unambiguous and therefore trivial, if considered at all [6]. In many cases, however, the very form, structure, and nature of the medium have to be included in a fundamental understanding of the notion of text, in order to be able to cover recent media forms as well as older ones.

Textonomy scholars such as bibliographers or textual critics, on the other hand, show considerable interest in the physical aspects of verbally encoded documents and their appearances, and specifically what the textual and physical evidence tells us about a document's production history. Materiality as a necessary basis is a given in the framework of the diverse subsections of bibliography (analytical, historical, and textual), and might be a path to mutual understanding between scholarly disciplines [7].

Permit us, then, to approach webtextuality at this textonomically literal rather than textologically literary level. Such a pursuit is to a not insignificant degree a matter of defining and separating from one another the objects of metaphors such as document, text, object, and work. Of utmost importance are demarcations: what does a text consist of, when considered textonomically? Probably due to the transparency suggested above, there are remarkably few attempts at defining the concept even within bibliography and textual criticism, and even far less agreed upon definitions. In recent bibliographic literature, text even seems to be vanishing from the scene, perhaps a result of the increasing awkwardness of the concept.

2. Text as sequence

We have characterised textonomy as occupied with the "extrinsics" of text. How, then, is extrinsic text to be described? There is, to be sure, an etymological nucleus of the lexical term signifying 'weave' or 'web' [8]. But what is woven together to form this web, and how is this weaving performed?

Although the result of weaving is a web (of signs), the weaving process itself might be described as linear [9], the textual result of which is often regarded as linear, or sequential. Certainly the inherent structure of textual media so far, such as the papyrus roll or the codex book with its series of subsequent sheets bound together to be consumed one after another, supports this sequential view of text. Bibliographers commonly treat text at its immediate sign surface and emphasise this horizontal quality: text is a linear something, a serial something, or more often: a sequential something, i.e. a sequence of signs [10]. Sequentiality is, in other words, a quality inherent in texts and a property distinguishing texts from other classes of objects.

Consequently, bibliographers and textual critics have presented a number of minute distinctions of text, all of them with the common feature of sequentiality. Notable for its elaboration, but not an isolated example, is the one performed by Peter Shillingsburg [11]. Simply put, text as a general concept is to Shillingsburg "the sequence of words and pauses recorded in a document" [12]. He then pursues to clarifying subdistinctions of textual phenomena, but important here is his emphasis on the very sequences on words and pauses. Shillingsburg separates the (ideal) text from the (real) linguistic signs representing the text: "it is possible for the same text to be stored in a set of alphabetical signs, a set of braille signs, a set of electronic signals on a computer tape, and a set of magnetic impulses on a tape recorder. Therefore, it is not accurate to say that the text and the signs or storage medium are the same. If the text is stored accurately on a second storage medium, the text remains the same though the signs for it are different. Each accurate copy contains the same text; inaccurate or variant copies contain new texts" [13]. Differences in opinion as to what might be implied by the expressions "same text" and "stored accurately" are to be expected. Some might object, that an "accurate storage" has to observe, in the terms of Jerome McGann [14], "bibliographic codes" (such as i.e. typography, nature and quality of the storage medium, or colouring) in order to be able to discuss the "same" text, instead of merely relying on "linguistic signs". This distinction between signified text and its varying signifying signs is an important one. Returning to our initial student example, perhaps one might claim that the two students have stored the same text, using different sets of signs. But then - how do they differ?

In the quest for bibliographic tools to further distinguish between varying states of a text, perhaps we might find some guidance in the work of the bibliographer Rolf Du Rietz and his attempts at defining text as "the sequence in a sequential work" [15]. He distinguishes interconnected subterms as well: the abstract, always immaterially defined work is (equally immaterially) manifested in an abstract ideal text. This ideal text is realised in actualised, so-called natural texts, which we in turn are able to study through their materialisation in material texts, carried by documents [16]. The ideal text is thus the sequence of the sequential (ideal) work, e.g. in the mind of the author, or the intended text in the many copies of a single impression, whereas the natural text is the sequence as realised in actual, material printings or screenic presentations. Material text, then, is the term attributed to the very physical matter forming the graphical images we acknowledge as text in our daily toil: the ink on a piece of paper, or the phosphor in the screen presentation of a computer text. Obviously, in order to be functional at all, this model has to make a further distinction, namely of what is implied by 'sequential work'. Du Rietz's definition is based upon the aesthetics of Lessing, who makes a distinction Du Rietz wants to maintain, between on the one hand stationary works (e.g. sculpture, painting), distinguished by synchronicity, and on the other hand sequential works (e.g. music, literature), distinguished by diachronicity or sequentiality in time and space. Sequential works are those that can be manifested by a text, and they all share being abstract sequences, temporal rather than materially palpable objects.

Du Rietz explicitly intends for his definition to include various art forms, e.g. music, film and choreography, and this might make it apt for digital multi- and hypermedia. But it is also precisely within the realm of digital media that this sequential definition of text begins to face serious problems. In order to pronounce some of these problems let us consider some prominent features that are either enhanced by or completely new to digital media. This will help us understand in what way these features might affect fundamental conditions of textuality. Put in other words: if sequentiality is a feature distinguishing text in general, are there any qualities by which its subclass digital text may be characterised?

3. Qualities of digital texts

3.1 Immateriality

Firstly, digital documents and their texts are immaterial and therefore logically defined, rather than material and therefore physically defined [17].

Works manifested by digital documents cannot, as in print culture, be constituted or defined by the more or less accurate alphanumeric notation of their texts, but rather - and in fact only - by the pattern of signals and tensions at the binary level of the material carrier. The pattern is, with each displaying of the document, hopefully filtered by e.g. software tools so as to render the intended textual strings of the work. This secondary, more or less "lucky strike", must however not form the basis of a defining digital document constitution [18]. Such a task is instead more adequately and reliably performed by the logical pattern.

Consequential to the immaterial quality, digital texts are no longer absolutely fixed to their carriers, thus transportable between carriers, machines, environments and file formats. This also means that the digital document is free from volume constrains in its storage phase: its storage and display are phenomenologically separate. Hence the increasing tendency [19] to terminologically distinguish between text-as-stored and text-as-displayed and treat them as different, a practice quite new to us, cast in the firm mould of print culture.

Digital text is thus ghostly immaterial where printed media are comfortably tactile. The obvious correlation between the length of a printed text and the spatial extension of its artefactual carrier (i.e., the book) simply ceases in the digital realm.

3.2 Binarity

Secondly, all digitally represented art and communication forms are based on the same binary sequences (as a lingua franca) bundled in files [20]. This facilitates [21] media integrated storage and presentation of works (as multi- or rather hypermedia), and radically improves e.g. image processing. This is of course further increased by the steadily developing capacity of computer hardware and cables. The synchronised simulation of different art, communication and media forms might simultaneously blur the traditional boundaries between them.

Another aspect of this: texts in the digital realm are stored and read by the aid of different artifacts and tools, e.g. discs and computer files for storing text, screens for reading it. Print culture on the other hand simultaneously stores and displays text with the use of one and the same artifact, usually a book. So it is that we read digital text indirectly, through a number of layers, filters, and software tools, whereas printed text is accessed in a more immediate way [22].

3.3 Dynamics

Thirdly, digital texts are dynamic. The manifestation forms of document texts are variable and malleable with each user. This might be described by the somewhat abused [23] term interactivity. Depending on certain user input, the output of a document will differ.

In relation to their immaterial quality, digital texts are amenable to easy migration from one carrier to another. This suggests instability and perhaps a loss of authority where printed texts might seem to promise unchanging stability and consequential authority. We must however recognize just how relative this stability concept is. All material text carriers crumble away by time, be it stone, wood, textiles, paper, woodpulp, plastic, or silicon.

Immateriality, binarity, and dynamics are three examples of digital text particulars, suggesting a second dimension to the traditional text concept and perhaps a more fluid kind of text than has been the case in print culture. But there are some aspects of digital texts that are much more challenging to the concept of text. These are the ones that are enhanced especially by a subclass of digital texts, namely those distributed on the web, or webtexts.

4. Qualities of webtexts

The web exhibits a document and text environment of its own. For instance, as part of the Internet it fundamentally alters the logistics of document distribution. Rather than print culture's pre-manufacturing of a fixed number of identical copies distributed by producers to would-be consumers, the web supports the manufacturing of only one set of files made available on a particular hard disk, to which the consumers teleconnect in order to copy the documents by themselves. The responsibility of copying and distributing documents has thus largely shifted from producer to consumer, as has the system of producers, publishers and gatekeepers, affecting who gets to publish what to whom, when, how and in what form. This might have considerable consequences to textological matters such as referentiality, authority and quality, but maintaining the textonomical focus of this article, let us see if and how textonomics, i.e. the very textual expression of documents, are affected by the properties of the web. I would say that there are aspects of the web that add a third, spatial dimensionality to the concept of text. Perhaps even a fourth: time. Let us consider some examples of such webtext aspects, at the same time keeping in mind that these are no absolute polarities, but aspects vaguely discernible in print textuality, then supported by digital textuality, and finally remarkably enhanced by web textuality.

4.1 Kinetics

Firstly, and in analogy to the dynamics of digital texts in general, webtexts are seemingly kinetic. Seemingly, because this kinetic quality is primarily the cognitive result of speed: that of the oscillation between manifestations (counted in milliseconds), as well as that with which webtexts are amenable to emendation, editing, refreshing, and instant republishing.

4.2 Markup. Text as hierarchy

Secondly, markup techniques, especially the general markup languages, make way for a split between form and content, that is a separation of the contents of a text from the way this content is to be visually displayed to a reader.

Vast amounts of malleable and searchable metainformation can be attached to the text in separate markup layers. Due to the growing number of sophisticated markup languages designed for specific document type groups, the level and sheer amount of potential metainformation is increasing. Also, depending on the markup technique used, a particular text can be matched to several different layers exhibiting different levels of markup, depending on the user's needs and interests.

Let us develop this aspect somewhat. The textonomical concept of printed text was described as horizontal and sequential. Encoding, or markup, of a text can include the uncovering of the suggested structures of text (but at a still extrinsic, textonomical level!), entailing the conception of text as vertical, i.e. a web of hierarchical relations [24], thus enhancing a spatiality of the text concept.

Some researchers (such as Allen Renear and Steven DeRose) have tried to define text as an ordered hierarchy of content objects, or OHCO for short. These definitional attempts are still being developed and elaborated [25]. Briefly, the OHCO perspective claims that the essence of texts is that they consist of discrete and discernible elements, which can be explicitly described and declared. Some elements occur in several or even all texts: titles, headings, paragraphs, lists, foot- or endnotes etc. These elements relate hierarchically to one another, and thus a text typically has an essence of hierarchical structure that hopefully can be identified and declared, e.g. in a markup language system. The initial aim of the developers of this definition was fairly pragmatic, namely to compose appropriate strategies for the creation and development of a universal markup scheme for all types of texts and works, on the basis of a commonly accepted definition of text. In their approach, they took a back door entrance, so to speak, by relying upon the contemporary proposed SGML standard, the philosophy of which thereby became a fundamental feature in their description of the essence of text. Huitfeldt [26] even labels SGML as a "Procrustes' bed" for textual theories [27].

What is evident and of utmost importance to remember is the fact that this particular view of text as discrete hierarchical structures of identifiable objects permeates every construction of universally intended markup grammars such as e.g. SGML, and thereby its derivative languages on the web as well, XML and HTML. Granted, markup is not at all particular to web texts, but a feature common to any word-processed text (or indeed to any electronic text). The growth and striking development of the web, however, along with its popularisation of HTML and XML, has certainly brought this markup layering into the light. The widespread use of markup schemes belonging to this large family - along with their demands for well-formed documents and validated markup - increasingly affects current webtextual practice, thereby justifying itself and its definition, in a roundabout way.

Overall, it is striking to study different types of markup schemes (as well as word processing tools) as manifest text conception statements. There is much room for scientific investigations into the inherent nature of different markup schemes, or in other words: their architecture [28]. Michael Sperberg-McQueen has on several occasions [29] touched upon this idea. A markup of a text is, he says, a theory of this text, and a general markup language is a general theory or conception of text [30]. Still, an approach where you expect to be able to extract from general markup languages a general grammar for texts involves considerable difficulties. Firstly, you can't possibly list all elements, extensively and exclusively, in a markup scheme. Rather, it is a question of constructing particular definitions, DTD: s, for particular types of works - definitions where some textual traits are included at the expense of others. Secondly, there are features in traditionally paper bound works and their textual representations that we have problems in unambiguously representing digitally, i.e. consciously and explicitly specifying in markup tags, not least because we have grown so accustomed to them that they have become more or less transparent. Thirdly, new (markup) technology offers seamless integration of texts and other communication modes - sound, moving pictures - in a way previously thought to be impossible.

4.3 Fragmentisation

Thirdly, and in relation to the markup separation of form and content, the web is to an increasing degree characterised by fragmentisation of text into discrete textual layers, co-operating in forming temporary, virtually composed displays of what we conceive as documents and texts. This has been the case all along with e.g. Internet's packet distribution of files, and currently with style sheets and XML. An expression of this is the splitting of documents into several segments, each with its own function [31].

This fragmentisation implies a decisive moment in the history of documents, and current development in web textuality might be labelled as deconstruction. Unlike the textological, postmodern deconstruction of the inherent meanings of abstract works, however, it is a textonomical, concrete deconstruction of the textual expression of physically displayed documents.

Even in the case of a simple web page, we are faced with a divorce of the unified document into at least two [32] or three textual layers [33]. Each of these is editable at minimal level: the binary layer of ones and zeros, the syntactic layer of marked up text along with its markup tags, and finally the presentational layer of displayed text at the temporary screen or at a fixed, laser printed page. Which of these layers, if any, is the primary? One is tempted to answer the second one - that of the syntax as manifested in e.g. markup. Any subsequent layer of display is derivative to the syntax and is always conditioned by the customizable particulars of the filtering tool (e.g. a web browser) used to compose the displayed text.

So perhaps the essential text, along with e.g. its governing markup tags, is to be found in the syntax? This would point to student A having the text, if ever there were only one, and assuming we're talking about a digital-only document. It seems as if we're dealing with potentially kinetic text [34], where display is malleable, and syntax stable. In such a case, the identification of a text ought to be that of the syntactic layer. From this syntax a number of varying, customised displays can be temporarily produced. As well, a derivative can be preserved, e.g. as a printout, being a frozen snapshot of the way a (potentially) kinetic text happened to look like at a certain moment in time. The syntactic, marked up layer, resembles Shillingsburg's text and Du Rietz's ideal text as the hub, around which the varying manifestation forms rotate.

4.4 Hypertextuality

Fourthly, and in close connection to fragmentisation, the web enhances hypertextuality as a structuring principle. While this is a feature not at all alien to printed media, it is certainly much enhanced by digital media, and is indeed a crucial feature of the web, since hypertextuality is in fact the very central nerve system of the WWW protocol, HTTP. The concept of hypertext seems to return to the lexical roots of the text concept as web rather than as a sequential chain of signifiers. Perhaps it is adequate to explain hypertextuality as a quality in different kinds of works, whether paper-based or electronic. This quality entails deviations from the horizontally linear sequence, where links have been established in the flow of the text. The link leads to an associated passage in another text, or in an appended text part, or within the same text, in which the link was established. Thus defined, this very article is hypertextual through its endnotes (textual sequences supplementing the body text) and note numbers ("hyperlink" markers).

Hypertextuality is a quality specific to document architecture, enabling the multisequential [35] structuring and reading of text through various document layers, elements or fragments [36], interconnected by the wormholes of links or footnote markers, and at times possible to reshuffle. This layer linking suggests a three-dimensional spatiality of webtext. The hypertextual structuring of fragments can take on different forms, in varying degrees of complexity and linearity, ranging from strictly locked sequence [37], via axial to hierarchical structures, and finally to completely retinal web forms, where every single fragment is linked to all the others.

The conceptual confusion surrounding the term hypertext is, to a significant extent, a result of the diverging definitions and uses of the text concept itself. If a discipline uses the term text to denote the signified, immaterial work created by an author, it consequently understands by hypertext the inter- and intratextual relations on the level of implied meaning. The equating of hypertextuality with the literary concept of intertextuality then makes a case. Where on the other hand text is used to denote signifiers, i.e. the manifested expression, hypertext as well will be a phenomenon at an expressional level of a document. Criteria such as sequentiality then become central. Where text implies an entity in reception theory and primarily describes the reader's interpretation and (re)creation of the graphic signs, then the concept of hypertext is one of linearity (the way the reader traverses the work). If, finally, text is defined from a narratological point of view, hypertext will foremost be a matter of story structure and plots.

Sequentiality is the ordering of the medially structured textual fragments, and is the fruit of the authorially intended linearity mated with the media conditions (i.e. for production, storage, and distribution) of the document. Due to the statics of the medium, the printed book is normally [38] fixedly monosequential, whereas digital media make possible dynamic sequentiality, allowing authors and readers to affect e.g. web site sequentiality. Text manifestations are always conditioned by their specific media settings, by the machines that mediate them.

Kinetics, markup layering, fragmentisation, and hypertextuality are four examples of webtext particulars, suggesting a third, spatial dimension to the concepts of print and digital text. Let us continue to consider whether these qualities challenge the concept of text, perhaps to the degree that we are hard put distinguishing the subclass of webtexts from the superclasses of digital text and text.

5. Webtext challenges

5.1 Multisequential text

The essence in textonomical text concepts is the proposal that works and their texts are sequential. If by this is implied one single, long sequence, this definition will not cover multisequential hypertext works [39], distributed through web sites or stored on compact discs. At every node, there is a larger or smaller number of links to other nodes, and depending on the path chosen by the user, he or she studies one particular sequence of the work, whereas the other, potential sequences consequently are left dormant. In highly complex hyperworks it is not possible for any one reader to acquaint him- or herself with all fragments of the work. The natural text the user finally will have read as he or she reaches some kind of closure or stops reading, might differ considerably from the one any other reader will prove to have read. In other words textual sequentiality no longer needs to be fixed in the text. It can quite contrarily be thought of as fluid, as a potentially kinetic quality, the temporary expression of which can change with different manifestations, through the particular paths taken and the input performed by each respective user.

But we need not even address complex hyperworks in order for the monosequential text definitions to come across problems. The student example in the beginning of this article is quite enough. Lurking behind the sequentialist text conceptions is the tacit assumption that a text is identified by the accurate rendering and exact positioning of its notation signs (in the case of verbal texts: alphanumerical characters, along with word spacing and punctuation). But which sequence of alphanumerical characters? In the case of the two students, the characters manifested in the syntax layer or the displayed text layer? Further: when the temporary screen display is the result of the processing of several discrete, customizable segments, and when the perhaps decisive textual manifestation, that of the syntactically marked up text, actually means bringing the alphanumerical signs together with metainformation concerning the same signs into one and the same layer, what happens to the notion of fixed sequences of alphanumerical signs? What, then, is to be regarded as Jerome McGann's bibliographic codes in this respect?

5.2 Multilayered text

We stated earlier an important feature of word-processed and encoded digital texts: the tripartite layering. In other words, there is suddenly an explicit segmentation of natural text, a segmentation where every layer is principally malleable and editable [40]. The natural text is in such cases represented by three different sign sequences and by, as it were, two different semiotic (notation) systems: binary and alphanumerical notation. The case of the students causes us to ask: are the tags of HTML to be regarded as part of the natural text of a work? We must not make the mistake of dismissing this as a simple matter of textual variants. The syntactic layer might include quite significant features, not included and therefore invisible to the presentational layer, e.g. hyperlinks, transclusion tags [41], elaborate metainformation such as comments or keywords. There might in other words exist significant semantic and linguistic differences between syntax and display, at a qualitative rather than quantitative level.

5.3 Potential text

Closely related to the difference between sequentiality and linearity is the previously hinted distinction between text as engineered or authored, and text as displayed or read [42]. Aarseth identifies this distinction as one between textons and scriptons, where the former denote sign strings as they "exist in the text", the latter as they "appear to readers" [43]. There is by nature an inherent linearity in scriptons, whereas textons can be organised in a multitude of ways, and can be fixed or fluid. Hypertextually linked objects (such as complex web sites) consist of textons, some or all of which are picked by the reader to constitute scriptons. Textons, then, are in a way potential text [44], scriptons realised. Perhaps one might claim that student A preserves the text in a textonic state, from which an appropriate filtering tool (in this case a particular web browser version) can produce scriptons of displayed text over and over. Student B on the other hand has merely preserved an initially scriptonic, algorithmically produced text version [45].

But then again, how far can this textonic thread lead us? Can one single sign be called a texton, and can therefore any collection of uncombined letter signs, such as a box of metal types used in a printing house, be considered as a collection of textons, and consequently as "potential text"? At first, such questions might seem a trifle silly, but are actualised when, again, considering the nature of hypertextually complex web sites, where each reading actualises a particular sequence of scriptons from a much larger collection of textons. Are the components (the stored textons) and the manifest model (the particular scriptonic sequence) to be considered as one and the same text? Different text versions of the same work? Or does the textonic state represent potential text, and scriptonic text realised, actualised text?

6. Textuality reconsidered

Of the text perspectives discussed earlier, hierarchy deals with the internal logic of extrinsic text, sequence external. They are both nevertheless qualities of textual expressions [46], and become simultaneously visible and malleable in digital media. The same goes for potentiality, multisequentiality, hypertextuality, typography, metatextuality, paratextuality, and what have you. This is namely what new digital environments such as the web do. They force us to reconsider textual matters. Digital technologies return the deceptively shallow concept of text to the degree of complexity it has always already occupied. Investigative excursions beyond the historical and generic confines of print culture, into modern digital media as well as into older, suppressed textual artifacts, carriers, and technologies, will quickly identify particular qualities of text, that are either enhanced or restrained to varying degrees by different media settings. Certain qualities previously supported by print media will have a tough time making it into future generation of media (where not temporarily kept alive by the breathing machine of conservative media chauvinism). Other qualities (e.g. multisequentiality), having been for centuries either completely suppressed by print culture, or reduced to particular document types or genres, now face revitalisation. Such a panoramic view emphasises textual practices as long-term patterns across media platforms rather than as systems of differentials.

Each media environment brings its own semiotic conditions. The problems that historically confined definitions at hand tend to run into are the results of fundamental changes in the semiotic systems, where production and consumption of text take place, changes of which digitisation is an example. Web textuality offers a fundamentally new infrastructure for the organisation of textual information, as well as new document architecture. Digital texts in general, and web texts in particular, seem to follow some Heisenbergian principle and change according to how, with what tools, and even when we are looking at them - and looking for them [47]. This infrastructure and architecture will require modified or perhaps even completely new tools for analysis and theory, for instance regarding the discussion of what constitutes works, documents, and texts. Maybe we have to decide, in each particular case, at what formal level we are referring to text. Before asking ourselves "What is a webtext?" perhaps we might better start with "When is a webtext?"

All in all, the sequential text definitions and distinctions within textonomy are unable to satisfactorily describe digital objects in all their two-dimensionality. They are even more impotent when we direct our focus to the 3D (or perhaps even 4D) objects distributed on the web, the webtexts. Bibliography and textual criticism are disciplines in considerable need of renewed theoretical labour. My guess is that they are not alone in this respect.


Noter

1. Thus defined, texts are intertwined with their social contexts: "We can define text (...) by saying that it is language that is (...) doing some job in some context, as opposed to isolated words or sentences that I might put on the blackboard" (Halliday & Hasan, 1985, p. 10). This echoes the 'sociological' definition of documents exhibited by Seely Brown & Duguid, 1996, or David Levy: "Documents are surrogates for people. They are bits of the material world (...) that we create to speak for us and take on jobs for us" (Levy, 2000, p. 25).

2. Even in disciplines you might normally regard as closely related, obviously textual versus (primarily the postmodernist) literary criticism, where the concept of text in the latter discipline quite often is equivalent to the concept of work in the former.

3. Crucial when considering the multi- or hypermediality of the web.

4. What Michael Buckland (1991, p. 45 ff.) labels "information-as-thing".

5. Aarseth, 1997, p. 15.

6. This circumstance is increasingly addressed (cf. Gumbrecht & Pfeiffer, 1994; Kittler, 1990) as a deficiency, which has subsequently resulted in an inability to recognise the materiality of messages.

7. "As with differences of opinion in library schools, or between library schools and other divisions of universities, or between literature departments and other departments that use and study written and printed matter, it is the centrality of the physical object that will ultimately be the catalyst for mutual understanding" (Tanselle, 1998, p. 21).

8. Text derives from the Latin texere, basically signifying "weave, interlace; build, timber", hence the derivatives 'textile' and 'texture' (cf Walde & Hofmann, 1938, p. 678). While this Latin etymology is frequently addressed in texts on text, a deeper investigation into the underlying archaeology of the term is rarely performed. Once attempted, numerous interesting sediments will be discovered: there is e.g. a link to the Greek word techné, meaning "craft(manship), artistry, art, science, cunning", as well as to a word signifying "carpenter, lumberman". Furthermore, the Sanskrit táshti signifies "artfully construct, build". The seemingly separate meanings of "timber" and "weave" are interlaced through the signification of "interlace".

9. "The image of the web is commonly used to suggest non-linearity, but weaving is a fundamentally sequential operation" (Ryan, 1999, p. 105, n. 8).

10. Obviously, linear sequentiality is a subjectively experienced textual phenomenon, in the sense that it is there for anyone who chooses to see it typographically, but so is retinal spatiality for anyone choosing to see the page of text as a graph.

11. Cf. particularly Shillingsburg, 1991, and Shillingsburg, 1996.

12. Shillingsburg, 1996, p. 174. This of course pertains to what is implied by the term document. Unfortunately, Shillingsburg defines document as "[t]he physical vessel (such as a book, manuscript, phonograph record, computer tape) that contains (or incarnates) the text" (ib.), which, taken together with the definition of text, sounds much like a circular definition.

13. ib., p. 47.

14. See e.g. McGann, 1991.

15. Du Rietz, 1998, p. 57 and 67.

16. As readers, we are thus able to study the ideal texts, and consequently the works, only indirectly, through the study of real texts as manifested in material texts and documents.

17. Granted, works on the web depend on hard disks and other material carriers for their storage. The texts however constantly oscillate between different carriers and various manifestations, creating a highly complex document flow including immense numbers of more or less temporary versions, even within single readings.

18. Electronic text, thus virtually manifested, is rather what Katherine Hayles called flickering signifiers, generated by "multilayered coding chains flexibly mutating across interfaces" (Hayles, 2000, ¶ 31).

19. E.g. Ryan, 1999, p. 97, who sees this as a result of various usages of text, depending on which the text will appear as different display layers, either the text as written/engineered or the text as presented/displayed.

20. Causing media semiotician Finnemann (1997) to recognise the fundamental "textuality" of any computer representation, be it digital sound, images, or video. Which is true, given that one accepts binary sequences as "text". Perhaps it will prove practical to define (any) texts as sequences of items belonging to some sort of notation systems. We would then be able to identify text as something that can simultaneously be represented and manifested by several different notation systems (e.g. binary or alphanumerical or musical).

21. Rather than "necessitates". The occasional argument that multimediality is by itself a quality assessment factor in digital texts, is a bit rash. An absurd consequence might be that any product that doesn't take advantage of the particular medium's full modal potential is inferior in quality to the one that does.

22. This is nevertheless only partly true. Even the reading of a printed text depends on "software" in the form of particular literary competence such as familiarity with artifacts or genre conventions.

23. Any reader of text, regardless of its supporting physical carrier, interacts with the text. Interactivity as a relevant distinction can only be called for when designating the reader's potency to affect the production of the very sequences of textual signs displayed on the surface of the page or the screen. Digital texts are in other words characterised by textonomical interactivity, print texts by textological.

24. An echo of this division is audible in current digital markup practice, where two strategies are clearly identified: one can be labelled "logical", exemplified by purist SGML conviction, while the other is "typographic" or "presentational". The logical strategy results in explicitly describing hidden content structures of the text - descriptive markup, while the typographic, orthographic perspective mainly deals with procedurally stating the visual appearance of the text (this is discussed at depth by Renear, 1997).

25. Beginning with DeRose et al, 1990, and recently elaborated in Renear, 1997. For comments and criticism, see e.g. Sperberg-McQueen & Huitfeldt, 1999.

26. Huitfeldt, 1999, p. 140.

27. It was seminally stated (DeRose et al, 1990) that the common basic feature of all types of texts, which thereby might serve as an ingredient in a universal definition of text, was a hierarchical structuring of content objects. This might certainly be a matter of debate. 'Content objects' still have to be defined in its turn. Further, some markup experts (cf. Sperberg-McQueen & Huitfeldt, 1999) have claimed it quite possible to exhibit texts that are not distinguished by clear hierarchical structures, but on the contrary by overlapping structures. In for example printed drama you might find colliding cue, line number and verse. Note that these are properties of separate architectures: the architectural logic of the abstract work of art (cue) is in conflict with that of the particular document or edition (line number), which in its turn is due to the properties of the medium at hand. The hierarchical overlap thus becomes a fact when one single layer of markup has to describe the logics of at least two separate regimes.

28. This thread is being woven further in Dahlström & Gunnarsson, 1999.

29. An illuminating example being Sperberg-McQueen, 1991.

30. If we by "text" refer to the sequence(s) of linguistic alphanumerical signs stored in and / or manifested on a physical carrier, a carrier that, together with its text(s), constitutes a document. Markup however typically comprises "extratextual" tags as well, regarding e.g. the very physicality, form, and material particulars of the document. Thus, tag sets are really statements on documents (including the texts) rather than on the mere texts themselves.

31. Noted early on by Jay David Bolter (1991, p. 42 f.): "In the electronic medium several layers of sophisticated technology must intervene between the writer or reader and the coded text. There are so many levels of deferral that the reader or writer is hard put to identify the text at all: is it on the screen, in the transistor memory, or on the disk?"

32. Levy (2000, p. 28) identifies two states: the binary storage, or the "source", and the temporary manifestation in a physical medium, or the "copy": "digital documents are founded on a distinction between a source and the copies produced from it. The source is a digital representation of some kind, a collection of bits. The copies are the sensible impressions or manifestations - text, graphics, sound, whatever - that appear on paper, on the screen, and in the airwaves".

33. This is a characteristic of word processing and markup techniques rather than of page description (such as PDF).

34. A similar chain of thought is trailed by Van Hulle (1999, p. 232).

35. Early hypertext theory had a predilection for equating hypertextuality with non-linearity or non-sequentiality, whereas more recent theory rather identifies multilinearity or multisequentiality, i.e. a presence of several different fragmental paths, each of which sequential per se.

36. Labelled lexias by Landow (1992, p. 4), textons by Aarseth (1997, p. 62), content spaces by Svedjedal (2000, p. 57).

37. Placing the author much more in control of the reader's movements than does the sequentially arranged and therefore seemingly linear (but nevertheless "random access") printed book.

38. But not necessarily, of course. Apart from tearing the book apart by its spine and disrupting the ordering of the pages, a book can even be printed in loose leaves to begin with, enabling the user to construct his or her own sequence of the pages.

39. Or hyperworks for short, a neologism suggested by Svedjedal, 2000, p. 62.

40. To varying degrees, one might add. Different types of tools position themselves at different spaces in the "magnetic field" of interdependent roles, balance, and power between writer, medium, and reader. This has been noted by i.a. Friedrich Kittler, who has spent some thought on just what kind of affecting and influencing power we seem to have (or not have) when working at different levels with digital writing tools. One can't help wondering to what degree such potencies are affected (or not) by the specific architecture of the web.

41. A term coined by Ted Nelson, explained in e.g. his 1993 Literary Machines as the simultaneous display of (parts of) a document in different contexts. Having come across several misunderstandings of the concept, Nelson has lately (Nelson, 1997) attempted another term for the same phenomenon: hyper-sharing. Textual transclusion is, as of yet, a practical impossibility on the World Wide Web, whereas on the other hand transclusion of images is performed on a daily basis (by way of absolute hyper references).

42. A distinction that has not been particularly relevant to previous print culture, where the two states normally coincide, whereas digital culture separates them.

43. Aarseth, 1997, p. 62.

44. Again: not to be confused with the potency of a text as "libretto" to produce signified meaning in the minds of readers.

45. Would this imply that any word-processed or encoded text involves what Aarseth calls a cybertext?

46. Sperberg-McQueen (1991, p. 36): "Texts are both linear and hierarchical".

47. In the digital realm, each document display is confined by the particulars of the hardware, software and environment involved. David Levy (2000, p. 29) states: "two different viewings of the "same" source may differ in important ways - they may not be "the same." Under such circumstances of radical variability, there does not appear to be anything like a stable document or object. (- - -) What you do see at any given moment will be the product both of the local digital source and of the complex technical environment (hardware and software), which is itself changing in complex and unpredictable ways."


References

Aarseth, E. (1997). Cybertext : Perspectives on Ergodic Literature. Baltimore: Johns Hopkins University Press.

Bolter, J. D. (1991). Writing Space : The Computer, Hypertext, and the History of Writing. Hillsdale, NJ: Lawrence Erlbaum.

Buckland, M. (1991). Information and information systems. New York: Greenwood Press.

Dahlström, M., & Gunnarsson, M. (1999). DA Draws a Circle : On Document Architecture and Its Relation to LIS Education and Research. Information Research. 5 (2). Retrieved March 8, 2002, from http://www.shef.ac.uk/~is/publications/infres/paper70.html

DeRose, S. J., Durand, D.G., Mylonas, E., & Renear, A.H. (1990). What Is Text, Really? Journal of Computing in Higher Education. 1 (2), 3-26.

Du Rietz, R. E. (1998). The Definition of 'text'. TEXT : Swedish Journal of Bibliography. 5 (2), 51-69.

Finnemann, N. O. (1997). Modernity Modernised : the cultural impact of computerisation. Aarhus: University of Aarhus, Center for kulturforskning.

Gumbrecht, H. U., & Pfeiffer, K.L. (Eds.). (1994). Materialities of Communication. Stanford: Stanford University Press.

Halliday, M.A.K., & Hasan, R. (1985). Language, context, and text: aspects of language in a social-semiotic perspective. Oxford: Oxford University Press.

Hayles, N. K. (2000). Flickering Connectivities in Shelley Jackson's Patchwork Girl : The Importance of Media-Specific Analysis. Postmodern Culture. 10 (2). Retrieved March 8, 2002, from http://jefferson.village.virginia.edu/pmc/text-only/issue.100/10.2hayles.txt

Huitfeldt, C. (1999). Tekstkoding og tekststrukturer. In E. Aarseth (Ed.), Datahåndbok for humanister (pp. 123-146). Oslo: Ad Notam, Gyldendal.

Huitfeldt, C., & Sperberg-McQueen, C. M. (1999). Concurrent Document Hierarchies in MECS and SGML. Literary and Linguistic Computing. 14 (1), 29-42.

Van Hulle, D. (1999). Authenticity or Hyperreality in Hypertext Editions : Notes Towards a Searchable 'Recherche'. Human IT. 3 (1), 227-244. Retrieved March 8, 2002, from http://www.hb.se/bhs/ith/1-99/dvh.htm

Kittler, F. (1990). Discourse Networks 1800/1900. Stanford, Cal.: Stanford University Press.

Landow, G. P. (1992). Hypertext : The Convergence of Contemporary Critical Theory and Technology. Baltimore: Johns Hopkins University Press.

Levy, D. M. (2000). Where's Waldo? : Reflections on Copies and Authenticity in a Digital Environment. In Authenticity in a Digital Environment (pp. 24-31). Washington , D.C.: Council on library and information resources.

McGann, J. (1991). The Textual Condition. Princeton, N.J. : Princeton University Press.

Nelson, Th. Holm. (1981). Literary Machines. Sausalito: Mindful Press.

Nelson, Th. Holm. (1997). The Future of Information : Ideas, Connections, and the Gods of Electronic Literature. Tokyo: ASCII Corporation.

Renear, A. (1997). Three (Meta)Theories of Textuality. In K. Sutherland (Ed.), Electronic Text : Investigations in Method and Theory (pp. 107-126). Oxford: Clarendon.

Ryan, M.-L. (1999). Cyberspace, Virtuality, and the Text. In M.-L. Ryan (Ed.), Cyberspace Textuality : Computer Technology and Literary Theory (pp. 78-107). Bloomington and Indianapolis : Indiana University Press.

Seely Brown, J., & Duguid, P. (1996). The Social Life of Documents. First Monday. 1 (1). Retrieved March 8, 2002, from http://www.firstmonday.dk/issues/issue1/documents/index.html

Shillingsburg, P. (1996). Scholarly Editing in the Computer Age (3rd ed.). Ann Arbor: The University of Michigan Press.

Shillingsburg, P. (1991). Text as Matter, Concept, and Action. Studies in Bibliography. 44, 31-82.

Sperberg-McQueen, C.M. (1991). Text in the Electronic Age : Textual Study and Text Encoding, with Examples from Medieval Texts. Literary and Linguistic Computing. 6 (1), 34-46.

Svedjedal, J. (2000). The Literary Web : Literature and Publishing in the Age of Digital Production. A Study in the Sociology of Literature. Stockholm: The Royal Library.

Tanselle, G. T. (1998). Literature and Artifacts. Charlottesville: The Bibliographical Society of the University of Virginia.

Walde, A., & Hofmann, J.B. (1938). Lateinisches Etymologisches Wörterbuch (3. Aufl.). Heidelberg: Carl Winter.


Version mounted on the web in October 2003
URL: http://www.adm.hb.se/personal/mad/tt.htm

Back to Mats Dahlström's list of publications