In this paper, direct manipulation and other notions of dialogue directness are analyzed according to a layered interaction model. This kind of model identifies a number of levels on which an interaction takes place, using the lower levels to support the communication on the upper levels. This again means that it becomes possible to talk more precisely about directness in terms of the levels of the interaction model.
The Nielsen virtual protocol model (Nielsen 1986) is used for the analyses in this paper, since it is specifically intended for detailed analysis of dialogue techniques. A short summary of this model is shown in Table 1. Several other layered interaction models exist and could be used with similar results. Nielsen (1986) shows a comparison between the levels in virtual protocol model and the levels in three other popular layered models, and Taylor (1988a; 1988b) presents a general discussion of layered models.
Level Number | Name of Layer | Exchanged unit of Information | Text Example | GUI Example |
---|---|---|---|---|
7 | Goal | Real world concepts, external to computer | Remove section of my letter | |
6 | Task | Computer-oriented objects/actions | Delete 6 lines of edited text | |
5 | Semantics | Concrete objects, specific operations | Delete line no 27 | Delete selected lines |
4 | Syntax | Sentences (1 or 2 dimensional sequences or layouts) of tokens | DELETE 27 | Click to the left of the first line; while holding down the shift-key, click to the right of last line; select CUT in menu |
3 | Lexical | Tokens: smallest info-carrying units | DELETE | Click at left of first line |
2 | Alphabetic | Lexemes: primitive symbols | D | Click at (345,120) |
1 | Physical | "Hard I/O", light, sound, movement | Press D-key | Press mouse button |
Most of the examples in this paper involve the Apple Macintosh Finder (operating system user interface). This is neither because the Finder has a badly designed user interface nor because it is the only possible example, but simply because it is so widely used that most readers should be able to try out the examples in a dynamic interaction if they want to. In some sense, the Macintosh Finder serves as the canonical example of direct manipulation interfaces in the same way as the standard Unix user interface often serves as the canonical example of command line based systems.
As a matter of fact, errors frequently do occur when using direct manipulation systems but in many cases the error messages are pretty poor, maybe because the designers have believed the "no errors" claim. As an example, users of the Macintosh Finder file system interface very often get the error message "An Application could not be found for this Document." This error message succeeds in breaking almost all the standard principles for message design (Shneiderman 1982b): It uses computer-oriented terminology rather than user-oriented terminology, it is general rather than specific, and it is not constructive. So it is no surprise that I have fairly often observed users (including in one case a user with a full year's Macintosh experience) having severe problems with this error message. A more specific message would tell the user the type of the document in question and what application it is intended for. If the document was of a type that could be opened by several applications (typically plain ASCII text or a bitmapped image), the message could also be constructive and inform the user if another application was available which would allow users to work with the specified document.
Figure 1. Correct operation for "discard document foo" on the
Macintosh: The icon representing the document file "foo" is dragged from
the window containing "foo" to the Trash icon.
For several other examples of errors in a direct manipulation dialogue, consider the case of deleting a file in the Macintosh Finder by dragging its icon to the Trash icon. The correct operation is shown in Figure 1. It is impossible to make syntax errors in this dialogue, since any movement of the icon is legal and has some meaning.
Figure 2a.
Lexical level error: "Foo" is dragged to the Backup icon instead of to the Trash icon. |
Figure 2b.
Alphabetic level error: "Foo" is released at a pair of screen coordinates which are not inside the Trash icon. |
Figure 2a shows an error I have frequently made myself. The file icon is dragged to the icon representing backups, and not to the Trash icon as intended. This mistake constitutes a lexical level error since the syntax of dragging an icon on top of another icon has been correctly specified. It just happens that the second operand of the command has been given an erroneous value. Further analysis shows that the problem is due to a "capture error" (Norman 1983) because the two commands "delete file" and "backup file" start out almost identically by a gesture sweeping the file icon towards the bottom right of the screen. It happened that I had positioned by backup icon next to the Trash icon and that I used backup much more frequently than I used delete. After having observed the frequency of my error and having completed this analysis of the dialogue, my solution was simply to move the backup icon to the lower left part of the screen instead of the lower right. After this change, the gestures for backup and delete start out so differently that I have never since made the error of confusing the two commands.
Figure 2b shows a frequently observed error where the user has moved the document icon to a position just outside the Trash icon. Since most of the outline icon overlaps the Trash icon, the novice user may think that the Trash has been indicated as the destination for the document, but in actual fact, the cursor's "hot spot" is outside the Trash icon and therefore indicates another destination for the document. When the user lets go of the mouse button, the document will not be discarded but will be moved to a new location between the Backup and Trash icons. This is an alphabetic level error because the user is erroneously specifying a point on the screen which is outside the desired region. The experienced user may note the lack of lexical feedback since the Trash icon has not highlighted, thus indicating that it has not been specified, but this absence of feedback is unfortunately easy to miss.
Payne (1990) has also frequently observed users missing the Trash icon. His suggestion for a redesigned interface is to increase the feedback when users do specify the Trash icon as a destination for an icon being moved. He proposes having the outline icon disappear from the screen when it is over the Trash, thus giving a preview of the result of releasing the mouse button. This feedback action would give users information on both the syntax level ("you will have completed an entire command specification if you let go of the mouse button now") as well as the semantic level ("the action specified by this command will be to delete the document"). This increased feedback suggested by Payne might therefore alleviate the problems novice Macintosh users have been observed to have (Nielsen 1987) because of the generic nature of icon movements, since different feedback could be given depending on the chosen destination. For example, it would be possible to distinguish between the move operation started by dragging the document icon to a folder icon from same disk as the document and the copy operation started if the folder was on a different disk. Additional possibilities for enhanced feedback includes the use of sound effects to illustrate the user's actions (Gaver 1989).
For the specific alphabetic level error discussed here, it would probably be better, however, to redesign the lower levels of the dialogue. My suggestion would be either to allow an icon to be specified as the destination symbol for a command if there was more than 50% overlap between it and the icon being dragged, or to implement a "snap-to" attraction between destination icons and the cursor. The level of analysis presented in this paper is not sufficient to decide for sure whether one of these solutions to the icon trashing problem would be better than Payne's solution. The protocol model can only identify problems and suggest possible improvements. It would still be necessary to perform empirical tests in cases like this where there are several competing solutions.
In addition to the lexical and alphabetic errors discussed above, it is even possible to make physical level errors in direct manipulation interfaces. A common example is the manipulation of scrolling lists where the scrolling speed is too fast. When the program runs on a computer with the speed of the original Macintosh Plus, this scrolling works quite nicely, and the user can easily make any desired page number appear in the visible part of the list. Unfortunately, the implementation of the scrolling speed does not take variations in the computer's execution speed into account. When the program is run on the four times faster Macintosh SE/30, the list simply scrolls four times as fast as it did on the Macintosh Plus. This scrolling speed almost always make users overshoot because they hold down the mouse button too long, thus making an error on the physical level of the dialogue.
Finally, errors can also occur on the higher levels of the dialogue.
It will often be impossible to avoid goal or task level errors, but in
many cases, the computer can help the user in the case of semantic errors.
As an example, consider the dialogue specifying the file to be opened by
an application. In a traditional, character-based interface, this task
would be realized by a command somewhat like
open foo.txt
If the file foo.txt was a text file and the application required a graphics file, this command would constitute an error on the semantic level of the dialogue even though it was syntactically correct. Therefore, the user would get some kind of error message ranging from "ILLEGAL FILE" to "The file 'foo.txt' is a text file and cannot be opened since [Name-of-Application] only supports graphics files"-depending on the attention to usability engineering during the design of the application. In any case, assuming that the user actually wanted to open foo.txt in the current application, the text-based interface did allow the user to express that semantic intention. This action immediately resulted in an error message, thus indicating the user's mistake and hopefully clarifying the situation.
Many modern user interfaces like the Macintosh prevent the user from getting these error messages because they only allow the specification of semantically valid actions. In these systems, users open files by selecting them from a menu listing only those files that are currently legal operands for the open command. Since other files are not on the list, the user cannot click on them, and the interface avoids the "illegal file" message completely. Assume again that a user was in a graphics application and wanted to open the text file foo.txt. Often, the indirect feedback of not listing the file name in the menu will not be sufficient to make users realize their semantic error. Instead, users may conclude that the desired file must have been stored elsewhere in the file system, leading them on a major hunt through their various subdirectories.
Exactly this kind of problem occurred for a user who was the manager of the computer-human interaction group at the National Defence Research Center in a Western European country and was thus no novice user. He was returning to his computer after having been interrupted for slightly more than an hour because of my visit and he wanted to save his work. The Save command was grayed out, however, leading him to exclaim "Save! ... Why won't it Save? How can I get it to Save?" He tried various options without getting the computer's permission to save until he finally remembered that he had actually already saved his work when he was originally interrupted an hour earlier. This user was quite frustrated because it was impossible for him to perform his intended operation and because the computer could provide no feedback in this break-down of a protocol on the task level. An alternative design would have allowed the user to activate even the illegal Save command and would then have provided a simple feedback message somewhat like, "You have made no changes to [Name-of-File] since the last time you saved it."
In situations like those outlined above, having error messages may actually help the user-at least if they are worded according to established human factors principles. Taking the layered protocol view of human-computer interaction implies that negative acknowledgements (error messages) are better than the absence of communication that can easily result from a "no errors" design philosophy.
For example, setting a tabulator in one desktop publishing program requires the following series of steps:
Each of these steps is actually quite direct and relies on direct manipulation interaction techniques. For example, step 2 is achieved by dragging the dialog box to the desired location, and step 3 involves grabbing the tab marker with the mouse and directly moving it to the desired new setting. In spite of this directness on the lexical level of the dialogue, the complete syntax for the operation turns out to feel very indirect.
Another example of indirectness is based on a complaint by a user (Westland 1988) who reviewed the upgrade of a drawing program in a user group magazine: To perform the task of drawing a rectangle filled with text in the earlier version, one could just draw the rectangle and then immediately start typing from the keyboard. The program would interpret the character input as text which was intended to go into the rectangle. In the revised version one has to first draw the rectangle, then select the "text" mode and use it to define a text region inside the rectangle, and only then type the characters. This later dialogue is more modular and conforms to the general interaction techniques used in the interface. But the user complained about the loss of directness for this frequent special case which now involves two extra steps: 1) select "text", and 2) define input region. The directness in the earlier version was achieved by having a temporal mode for special-case interpretation of keyboard input: If typing occurred as the next action immediately after the drawing of a rectangle, the text went into the rectangle.
Directness sometimes come at the cost of introducing additional complexity in the dialogue. For example, consider the action of pressing down the mouse button while the cursor is over the name of an icon in the Macintosh Finder and then moving the mouse while holding down the mouse button. If the icon had not previously been selected (highlighted), this action will result in moving the icon. But if the icon was the currently selected object, the same action will result in the selection of part of the icon name. A single gesture on the physical level of the dialogue thus has two different interpretations, depending on state of the interface. This modedness of the interaction is necessary to achieve two types of directness: The editing of icon names can be achieved directly by selecting part of the name and typing in a replacement text, thus achieving a direct mapping between the input and output languages for the operation. At the same time, grabbing hold of an unselected icon to move it seems to be a very frequent operation during use of the Macintosh Finder. Therefore, the requirements for precision at the physical level and alphabetical levels of the dialogue are decreased by allowing the user to achieve the lexical level effect of indicating the icon as the operand for the move operation by pointing to any location within the icon or its name.
A final example of directness cannot be achieved by traditional forms of direct manipulation. Assume that a user of a window system has a large and a small window and that the large window is on top, completely hiding the small window. The user now wants to bring the small window to the front and make it visible, but since it is hidden, there is no way for the user to directly manipulate the small window with the mouse. The standard way to perform this task is to first move the large window off to the side to make the small window visible. The user then grabs the small window and moves it to a third location to make room for moving the large window back to its original position. And the user can then finally move the small window back to its original location where it will now be on the top because it was the last object to be selected. This solution is obviously very indirect.
A faster way to solve the problem is by having a special command to bring the large window to the back, thus implicitly bringing the small window to the front. This solution involves an indirect mapping between the task level and the semantic level of the dialogue, as the user will have to realize a task expressed in terms of one object by a semantic operation expressed in terms of another object.
In contrast, the most direct solution to the problem would involve a menu listing all the windows and allow the user to bring a window to the front by choosing its name from the menu.
Figure 4. The communication principle in the layered protocol
model. A communication on level i of the model (indicated by the
gray arrow) is realized by an exchange of information on level i-1.
Both the user and the computer will have to translate between the two levels
as indicated by the vertical arrows. The units of information exchanged
at each level of the model are listed in Table 1.
The principle of direct mappings between dialogue levels seems to provide a better way to conceptualize directness in interactive systems. As shown in Figure 4, the layered protocol model involves several transformations between different units of information and we would want those transformations to be as direct mappings as possible to make it easy for the user to move between the levels. The idea of direct mappings is similar to the mathematical notion of isomorphisms; there is a one-to-one function between the two levels and the relationships between the units of information are the same on the two levels and are preserved by the function. For the analysis of dialogues, however, it does not seem possible to use the strict mathematical definition, so the term "direct mapping" is used in a slightly looser sense to indicate a feel of close and direct relation between two interaction levels.
As an example of a direct mapping, consider the operation of deleting six lines in a text editor. In a modern display editor, the user's input syntax would involve selecting the six lines and activating the delete command, utilizing a direct mapping from the user's knowledge of what lines to delete and the user's specification of those lines as an operand to the delete command. The system output would involve the removal of the selected lines from the display, thus making the mapping between the user input and the system output extremely direct following the principle of stimulus-response compatibility (John et al. 1985). In a traditional, line-oriented editor, the user input to realize this task would be something like DELETE 27-32 and the result would be an unchanged display on the syntax level even though the lines would be removed at the semantic level. This is indirect with respect to both the mappings just considered.
The directness of mappings is not always a clear-cut property of a dialogue element. For example, I recently designed a hypertext system (Nielsen 1990) with a front page screen depicting a book cover. This illustration served two functions in the user interface. First, it was an output language element intended to indicate that the screen was the front of a hypertext, thus making it easy for users to recognize the screen and its status when they returned to it after some period of reading. Second, the illustration was intended as the control element for getting into the hypertext proper. By clicking on the book cover as an input action, the user would "open the book" and start reading. It turned out that the illustration served the first purpose very well but failed at the second. In other words, the mapping between the lexical element of a book cover and the semantic notion of "beginning of information" was close enough to be immediately understood. But the mapping between the lexical book cover picture and the semantic notion of opening was too indirect, since a book cover suggest being pulled up, while a mouse button is pushed down during a click. Therefore, the physical level action needed to realize the lexical level element of activating the book illustration interfered with its interpretation as part of the input language. This example shows that we still do not know enough about user interface design to rely on a purely mathematical definition of isomorphisms between the levels of the dialogue.
Even though the directness of a mapping cannot be measured exactly, the concept of direct mappings can still be used to gain a better understanding of several forms of directness in user interfaces. In the rest of this section, direct mappings are used as a unifying principle in the discussion of direct manipulation, transparency, WYSIWYG, immediate command specification, articulatory directness, and computational appliances.
According to the direct mapping view, direct manipulation is not just related to the dialogue's input language. The output language is also of interest for determining the degree of directness in the mapping between the semantic and the syntactic level of the dialogue. For example, consider a multi-font text editor where the user is about to type some text to be inserted at a point between two existing characters, say, A and B. In many systems, this system state would look something like A|B where the vertical bar | is a blinking insertion cursor. When the user starts typing, the font of the new text could be that of A, that of B, that of the previous text between A and B (if the user had just removed some such text), or it could even be something altogether different (if the user had just issued some font changing command). Most text editors provide no visible indication of this semantic state, leaving the user to guess. In most cases, this lack of information does not matter since A, B, and the new text will all have the same font. Otherwise, the user will have to start typing to learn what font will be used, leading to a small degree of indirectness: There is no way to "directly manipulate" (or learn) the font state at the insertion mark since it represents an empty object.
A typical example of the lack of transparency is the complicated installation procedures involved in transferring certain large applications from the floppy disks on which they are sold to the user's hard disk. The user's goal is to move the application to the hard disk and to run it, but it is sometimes necessary for the user to follow long and complicated instructions on how to establish various subdirectories and to copy files from the floppies in a specific order. These computer-oriented concepts and actions are completely unnecessary from the user's goal perspective.
As an example, most mouse-based interfaces have a two-cursor problem (Brooks 1988) where one cursor is used to point and another cursor indicates where text input will appear. Users often confuse these two cursors because of the lack of WYSIWYG in the interface. The pointing cursor that tracks the user's mouse movements is the user's focus of attention ("what you see") in the syntax, but the input ("what you get") changes the output product according to the location of the insertion mark. One possible solution to the two-cursor problem is to follow a strict WYSIWYG interpretation of a single cursor where input appears wherever the pointer happens to be when the user hits the keyboard (Akscyn et al. 1988).
Changing the headers in one early graphical-interface word processor required the user to open a special window for the header. The actual editing followed direct manipulation principles as defined here since there was a direct mapping between the syntax for changing the header and the semantic change in the stored header information. But the interaction technique was not WYSIWYG since the user could not easily translate between the syntax for changing the header and the goal of making the header look a specific way in relation to the rest of the page. For a true WYSIWYG editing of headers, it becomes necessary to make them editable in the main window together with the main sequence of text in the file.
Figure 5. Interface control for specifying borders surrounding
the text in a word processor. Here, the user has specified that a wide
bar is to appear in the left margin.
Figure 5 shows a control for specifying text borders in a word processor with a graphical user interface. This interface has both direct manipulation, WYSIWYG, and transparency aspects. The WYSIWYG property comes from the direct correspondance between the position of the borders on the miniature page and their position on the screen and the final printout. The syntax for placing the border lines on the miniature uses a direct manipulation technique corresponding directly to the semantics: Clicking first on the fat line in the palette and then on the left margin of the miniature page. Finally, the semantics of having a wide bar in the left margin correspond directly to the goal of getting a fat line in the margin because the interface has been simplified
A user who often wanted a wide bar to appear in the left margin could construct a button (perhaps labelled "Make Change Bar") that would have the same effect as calling up the control in Figure 5 and modifying it as shown. This hypothetical interface would be an immediate command specification. As this example shows, macros or other user-constructed customizations will often be necessary for immediate command specification unless the interface designer has conducted a very detailed task analysis.
In many visual interfaces, input actions sometimes involve not just the specification of a single point but the specification of a curve. When the curve is used to drag an icon, the mouse has a higher articulatory directness than a stylus, whereas the stylus has a higher articular directness than the mouse when it comes to handwriting input because of the different requirements for the preciseness of the curve.
An example of a measurement of the extent to which a computer served as a computational appliance, is the study by Haas (1989) of users revising a text either in a computer editor or using pen and paper. She had test subjects think aloud while they edited the text and she then counted the proportion of their utterances referring to the editing medium instead of the writing task. In the pen and paper condition, only 3% of the utterances referred to the medium, while 14% of the utterances in the computer condition did so. In the ideal case, users would be able to focus completely on their primary task and just deal directly with it without having to think about the computer at all. Haas' study indicates that we still have some way to go to achieve this goal.
In the computational appliance mapping, there is a direct connection between the user's input and the achieved goal, and the computer's output is interpreted as a signal for the real world goal state instead of signs or symbols of the computer's state (Rasmussen 1983). With current computers, the computational appliance mapping is of course easiest to achieve for text editing where the input and output equipment on modern workstations almost completely match the goal of producing sheets of paper with letters on them.
Figure 6. Summary of the direct mappings discussed in section
4. For simplicity, the figure only shows those interaction levels actually
involved in these mappings, but similar mappings could be defined between
the rest of the interaction levels.
Summary of the direct mappings discussed in section 4. For simplicity, the figure only shows those interaction levels actually involved in these mappings, but similar mappings could be defined between the rest of the interaction levels. Previous work has shown that experienced users find the concept of "user friendliness" more closely related to whether an application is pleasant to work with than to whether it is easy to learn (Nielsen 1989). Even though it is not really known what makes an interface pleasant, it is likely that the various forms of directness contribute to pleasantness. Another likely contributing factor is good-looking graphic design. Further research is required to address this issue.