Archive for the ‘User Interfaces’ Category

The next text-based user interface

August 16, 2007

In an inspiring article, Aza Raskin concludes:

It’s time for new, user-centric command line interfaces to make a comeback. A command line that lets you type or say what you want to do, and the computer does it.

In one sentence, here’s how that interface can become reality:
the next evolution step will be a context sensitive, implicit command-based interface driven by collected user data.

Driving the evolution

With ubiquitous search boxes on the desktop, users will frequently formulate a desired task in terms of a text search. They will frequently get step-by-step instructions as a result. Naturally, they will wonder why they have to perform these steps manually, instead of having the computer do that. Et voila, here’s already the new interface: just type your query in the DoIt-box instead of the search box. [1]

An Example

The user has taken a photo and wants to get rid of the red-eye effect.

Currently, the typical workflow is quite tedious: do a web search, find a human-written tutorial on how to accomplish the task with a certain program, install that program, open the file in that program, follow the steps in the tutorial, and finally save the file.

With a better user interface, the user just types “red-eye removal” or simply “red-eye” and gets her picture fixed.

How it works

Two things are required:

  1. Context information about what the user is currently doing [2]
  2. A huge public database which records users’ interaction with their computers (Yes, smells like Big Brother, but let’s ignore that for a moment).

Staying with the example, the computer knows from the context that the user is viewing a picture and that this picture was taken with a camera. Together with the text “red-eye” this information is used as a query into the database of what other users have done. The results contain pre-recorded actions which are then executed by the computer. The user might have to choose from multiple result pictures, analogous to web search results.

So it’s basicly a socialized text interface (finally, here’s the buzzword[3]). The computer doesn’t even try to interpret the user’s commands, but instead uses statistics to find the best matching actions other users have performed before. That’s the key point: data mining on human-contributed data is on magnitudes more powerful than today’s AI trying to be humane in terms of understanding, syntax forgiveness and so on [4].

Characteristics

This kind of interface is essentially syntax-free. It is not neccessary to explicitly use verbs in imperative form. There’s no need to learn any commands, which makes it very efficient for the casual user. You just have to guess which words other users have used to describe the task at hand. Which is –almost by definition– humane.

Some Details

Naturally, this vision leaves a lot of open questions; i’ll try to address some of these:

Gathering Context Information

The way current applications are written makes it difficult to find out what the user is doing. Clearly, some standard protocol is required for polling application state. I believe it can stay quite simple, revealing only very basic data.
For the “red-eye”-example, it would be sufficient to know that the current application is “GQView” and that the active document therein has a description line “Canon EOS 300X”: the different image viewer and camera names will build clusters in the dataset and the combination of “GQView” plus origin: “Canon EOS300X” will match a cluster that could be tagged as “viewing a photograph”.

In any case, a full-blown semantic interface[5] is not required. Here again, statistics are superior to an approach which tries to understand the user.

Replay of Recorded User Actions

From a pure technical perspective, this is feasible right now as it’s just a matter of executing a set of system calls in the correct order. To make it portable and robust, the actions should be recorded at some higher level, probably at the “appname.command” level. The usual security concerns apply, though [6].

Recording User Actions

An important point is that the use of the DoIt-box gets recorded just as any other action. This provides valuable feedback. A successful DoIt-query leaves a special trace in the dataset: one of the results has been chosen and there’s no terminating undo-action. This way, useful queries get duplicated in the database, increasing their rank in future DoIt-results. (Compare this mechanism to the “did you find this information useful?”-box some help systems utilize).

The biggest challenge, of course, is to retain users’ privacy.
I don’t know if this can be achieved. Even with a fully open and properly anonymized mechanism to share your computer actions with the world, fine-grained control is needed on what information can be shared and what should stay private. This would require per-application policies with presets like “at work”, “at home” based on common sense.

A somewhat less comprehensive[7] approach would build the database from spezialized help forums. These would allow volunteers to perform the desired actions on the target computers. The thereby recorded conversations build the database’s text body. In other terms, the current forums’ “explain me” mode would be replaced by “do it for me once”. The cool thing is that common queries (the FAQ-analogon) will not charge the forums as they yield successful results in the first place.

Spam Protection

In principle, spam should be filtered out automatically by the ranking system as it favours results which have actually been applied by users. But a spammer could easily simulate users applying these fake actions and thus increase their rank. So probably, some sort of trust system has to be implemented which in turn might conflict with the privacy goal. No low-hanging fruits here.

Parameters

Not all tasks are digital ones. If the example user typed “brighten” or “too dark” on the photo, it would shurely be better to show the classic brightness/contrast sliders instead of letting him choose from 100 pictures with brightnesses set from 1 to 100 percent. So at least two different classes of results must be handled. For actions which contain parameters, the difficult part is to choose which parameters should be presented to the user. They probably should be ranked on frequency of usage.

Ultimatively, the following utopian workflow should be possible:
The user wants to make a 3D animation of his photograph flying in space. Searching the internet, she finds an example of the moon rotating around the earth, having a portrait projected on it. The creator of that animation was kind enough to publish the actions required to build the movie (either inside the movie file or in the DoIt-database). Now the user exchanges the source pictures, adjusts the rotation speed and gets her movie rendered.
In this example, lots of parameters and options are involved. Most of them are integral to the desired task. Only some of them allow variation without deteriorating the solution. Currently, it’s not clear how to identify the latter parameters. However, the aforementioned paper[2] has some interesting thoughts on how parameters’ effects can be defined (and explained) by example.

Coda

So i’m with Braydon Fuller: CLI is not the right term for the future. While anything human-readable must be line-based, this new interface is not even explicitly command-based.

And the power users who want to use a powerful language have been neglected as always…

Footnotes

[1] During Aza’s talk at google, the phrase “Google DoIt” has been used before, but i think that referred to the final state of google’s search box being filled up with features like translator and calculator etc. — and users being confused about the syntax of how to access these functions.

[2] Bret Victor’s “Magic Ink” Paper is well worth reading, not only for a wide definition of “context”:
http://worrydream.com/MagicInk/

[3] Using computers has always been a social process, requiring lots of communication between developers and users and amongst these groups. The goal here is to reduce the amount of communication needed for solving the accidental difficulties of computing. The web already has helped in this way, as specialized tutorials often ease users from learning matters which are not essential to their goals. The next step is to minimize the burden of writing and reading these tutorials by making this knowledge available in an automated fashion. After all, it’s the accidental difficulties of computing which leave so many people frustrated with today’s user interfaces.

[4] Consider amazon.com’s sometimes frightening well matching book suggestions. Another site, last.fm, makes suggestions based on the user’s taste of music — with today’s algorithms this task is not even thinkable of tackling directly from the waveform data. And let’s not forget that google’s web search power does not stem from text understanding, but from human contributed data in form of links.

[5] Who would be able to design a stable sematic interface for application state, and above all, who would be willing to implement it for each and every application? Quite the other way round, statistics can be used to research the semantics of user actions, but that’s not required here.

[6] From an abstract perspective, it’s a matter of trust. (which lets PGP’s Web of Trust spring to mind). Some sandboxing like executing the whole data processing on a virtualization server or in a java applet might lower the required amount of trust. In any case, a reliable undo function is mandatory, but doesn’t protect from invisible side-effects.

[7] In a way, the collected computing knowledge of mankind is applied everyday in form of (mouse-)commandos. Only a small fraction of it is available in form of easily applicable tutorials. With automated analysis of user actions, a much larger amount of knowledge could be made available with less effort.

Advertisements