Architectures for intelligent systemsby J. F. Sowa |
||||||||||||||||||||||
People communicate with each other in sentences that
incorporate two kinds of information: propositions about some subject, and
metalevel speech acts that specify how the propositional
information is used—as an assertion, a command, a question, or a promise.
By means of speech acts, a group of people who have different areas of
expertise can cooperate and dynamically reconfigure their social
interactions to perform tasks and solve problems that would be difficult
or impossible for any single individual. This paper proposes a framework
for intelligent systems that consist of a variety of specialized
components together with logic-based languages that can express
propositions and speech acts about those propositions. The result is a
system with a dynamically changing architecture that can be reconfigured
in various ways: by a human knowledge engineer who specifies a script of
speech acts that determine how the components interact; by a planning
component that generates the speech acts to redirect the other components;
or by a committee of components, which might include human assistants,
whose speech acts serve to redirect one another. The components
communicate by sending messages to a Linda-like blackboard, in which
components accept messages that are either directed to them or that they
consider themselves competent to handle. |
||||||||||||||||||||||
In the years since its founding
conference in 1956, the field of artificial intelligence (
Over the past 40 years, the performance, reliability, and generality of
these components have been vastly improved. Their theoretical foundations
are much better understood, and they have found their way into
applications that are no longer considered part of The lack of progress in building general-purpose intelligent systems could be explained by several different hypotheses:
Many people have presented strong, but not completely convincing
arguments for the first hypothesis.1,2,3
In the search for an ideal architecture, others have implemented a variety
of, at best, partially successful designs. The purpose of this paper is to
explore the third hypothesis: propose a flexible modular framework that
can be tailored to an open-ended variety of architectures for different
kinds of applications. The tailoring could be done either by a human
knowledge engineer who uses specialized The idea of a flexible modular framework (
The first three principles are as valid today as they ever were, but
the fourth has been modified to accommodate modules that require data with
more structure than linear strings, especially database management systems
( The Although The most promising candidate for a glue language is Elephant 2000,
which McCarthy4
proposed as a design goal for the Although the glue language for intelligent systems should be at a
higher level than the scripting languages of
This paper shows how a framework based on these four principles can
support a family of architectures that can easily be tailored for
different kinds of applications. The next section discusses three
logically equivalent notations for an Elephant-like glue language:
controlled natural language for the human interface; conceptual graphs for
components that use graph-based algorithms; and
Notations for logicMcCarthy's Elephant language requires a highly expressive version of
logic, but he did not propose any particular notation for it. Although
various notations—graphical, linear, or For his syllogisms, the first version of formal logic, Aristotle defined a highly stylized form of Greek, which became the world's first controlled natural language. During the Middle Ages, Aristotle's sentence patterns were translated to controlled Arabic and controlled Latin, and they became the major form of logic until the twentieth century. Table 1 lists the names of the four types of propositions used in syllogisms and the corresponding sentence patterns that express them.
With letters such as A and B in the sentence patterns, Aristotle introduced the first known use of variables in history. Each letter represents some category, which the Scholastics called praedicatum in Latin and which became predicate in English. If necessary, the verb form is may be replaced by are, has, or have in order to make grammatical English sentences. Although the patterns may look like English, they are limited to a highly constrained syntax and semantics: each sentence has exactly one quantifier, at most one negation, and a single predicate that is true or false of the individuals indicated by the subject. Although Aristotle's syllogisms are the oldest version of formal logic,
they are still an important subset of logic, which forms the foundation
for description logics, such as Another important version of logic is the Horn-clause subset, which is
widely used for defining expert system rules and
If a copy of a book is checked out to a borrower and a staff member returns the copy then the copy is available. If a staff member adds a copy of a book to the library and no catalog entry of the book exists then the staff member creates a catalog entry that contains the author name of the book and the title of the book and the subject area of the book and the staff member enters the id of the copy and the copy is available. The Attempto system translates these rules to an executable form in
Prolog. Anyone who can read English can read controlled English as if it
were English, but controlled languages are formal languages that require
some training for an author to stay within their limitations. Tools have
been developed that can help an author translate full natural language to
a In the late nineteenth century, three logically equivalent, but
structurally very different notations for first-order logic (
The boxes in Figure 1 are called concepts, and the circles are called conceptual relations. The default quantifier for each concept is the existential, which says that something of the specified type exists; the concept [City: Boston] means that there exists a city named Boston. Each conceptual relation has one or more arcs: (Agnt) links a concept that represents an action to the concept that represents its agent; (Inst) links the action to its instrument; and (Dest) links an action that involves motion to its destination. All the relations in Figure 1 are dyadic, but in general, a conceptual relation may have any number of arcs. Although the display form is quite readable, it is not easy to type or to transmit across a network. Therefore, two interchange formats have been developed: the Conceptual Graph Interchange Format (CGIF) maps directly to and from the display form; and the Knowledge Interchange Format (KIF) maps directly to and from the algebraic notation for predicate calculus. Following is the CGIF representation of Figure 1:
[Go: *x] [Person: 'John' *y] [City: 'Boston' *z] [Bus: *w] (Agnt ?x ?y) (Dest ?x ?z) (Inst ?x ?w) This statement captures every detail of the display form except the two-dimensional layout, which is not semantically relevant. If desired, the layout information could be included as structured comments inside the brackets and parentheses that enclose the nodes of the graph. The connections between concepts and relations, which are shown directly by the arcs of the graph in Figure 1, are shown indirectly by labels, such as ?x and ?y in CGIF. Those labels are translated to variables in KIF, as in the following example:
(exists ((?x go) (?y person) (?z city) (?w Bus)) (and (name ?y John) (name ?z Boston) (agnt ?x ?y) (dest ?x ?z) (inst ?x ?w))) KIF notation is used for many theorem provers and inference engines that are based on predicate calculus. The translations between KIF and CGIF preserve the semantics: a mapping from KIF to CGIF and back to KIF might not generate an identical statement, but it will generate a statement that is logically equivalent. KIF and conceptual graphs can represent the full range of operators and quantifiers of first-order logic, and they have been extended with metalevel features that can be used to define extensions to FOL, including modal logic and higher-order logic. The metalevel features are necessary for representing the speech acts of Elephant 2000, which uses logic to talk about the use of logic. In natural languages, metalevels are marked by a variety of syntactic features that delimit the context of the metalanguage from the context of the object language. The most obvious delimiters are quotation marks, but similar contexts are introduced by verbs that express what some agent says, thinks, believes, requests, wants, promises, or hopes. As an example, the following English sentence contains two nested levels, which are enclosed in brackets for emphasis: Tom believes [Mary wants [to marry a sailor]]. This sentence is represented by the CG in Figure 2.
The context of Tom's belief is represented by a concept of type Proposition, which contains a nested CG that states the proposition. The context of Mary's desire is represented by a concept of type Situation, which is described by a proposition that is stated by the nested CG. The (Expr) relation represents the experiencer of a mental state, and the (Thme) relation represents the theme. In general, the theme of a belief or an assertion is a proposition, but the theme of a desire must be something physical, such as a situation. Following is the CGIF equivalent of Figure 2:
[Person: *x1 'Tom'] [Believe *x2] (Expr ?x2 ?x1) (Thme ?x2 [Proposition: [Person: *x3 'Mary'] [Want *x4] (Expr ?x4 ?x3) (Thme ?x4 [Situation: [Marry *x5] (Agnt ?x5 ?x3) (Theme ?x5 [Sailor]) ]) ]) And following is the equivalent KIF statement.
(exists ((?x1 person) (?x2 believe)) (and (name ?x1 Tom) (expr ?x2 ?x1) (thme ?x2 (exists ((?x3 person) (?x4 want) (?x8 situation)) The context boxes delimit the scope of quantifiers and other logical operators. The sailor, whose existential quantifier occurs inside the context of Mary's desire, which itself is nested inside the context of Tom's belief, might not exist in reality. Following is another sentence that makes it clear that the sailor does exist: There is a sailor that Tom believes Mary wants to marry. This sentence corresponds to the CG in Figure 3.
The English sentence mentions the sailor before introducing any verb that creates a nested context. Therefore, the concept [Sailor] in Figure 3, with its implicit existential quantifier, is moved outside any nested context. In the CGIF and KIF notations, the concept or the quantifier that refers to the sailor would be moved to the front of the statement. Another possibility, represented by the sentence Tom believes there is a sailor that Mary wants to marry, could be represented by moving the concept [Sailor] into the middle context, which represents Tom's belief. In CGIF and KIF, the corresponding concept or quantifier would also be moved to the context of Tom's belief. As these examples illustrate, conceptual graphs in the display form are more readable than either CGIF or KIF. There are two reasons for the improved readability:
Besides human readability, graphs also have theoretical and computational advantages, which are discussed in the section on graph algorithms.
Using controlled natural languagesDuring the 1980s, the dominant approach to knowledge acquisition required two kinds of highly trained, highly paid professionals. At the top of Figure 4, a knowledge engineer is interviewing a subject matter expert in order to capture her knowledge and encode it in the arcane formats of an AI system. Meanwhile, computational linguists, who were designing natural-language tools, tried to make them translate NL documents into similar encodings without requiring any human intervention. At the bottom of Figure 4, a physician who is examining a patient scribbles some notes on a sheet of paper, which some clerk will later transcribe for the computer. Then the NL tools will attempt to convert those notes to the formats specified by the knowledge engineer.
There are two things wrong with Figure 4: the top row requires far too much human effort, and the bottom row is expected to process unrestricted natural language without any human assistance. To reduce the cost of two high-priced experts, some developers merged the two roles at the top row into one: either the subject matter expert learned knowledge engineering, or the knowledge engineer learned enough about the subject matter to extract knowledge from documents. Yet people with expertise in both fields became even more expensive to find, hire, and train. Figure 5 shows a better alternative: simplify the tools and the training required by the people who use them. Instead of designing complex NL tools that process documents without human intervention, AI researchers developed simpler knowledge extraction (KE) tools that can extract knowledge from documents with assistance from just one human editor. Furthermore, the editor communicates with the KE tools in a controlled natural language, which people can read without special training.
The editor in Figure 5 represents various people who at different times might play different roles with respect to the subject matter, the computer system, and the people and activities involved with them. Each of the three people mentioned in Figure 4 has a different kind of expertise. Any of them might use KE tools to edit their knowledge or to write a note, a report, or a book that someone else might edit with the aid of KE tools. Following are the three kinds of knowledge, the roles of the two experts in Figure 4, and the way that KE tools can help the editor in Figure 5 do the work of both:
From an editor's point of view, a KE system looks like an intelligent word processor combined with sophisticated tools for searching, classifying, summarizing, and paraphrasing. After the output has been revised by an editor, who might be the original author of the documents, the result can be stored in a knowledge base or be written as an annotation to the documents. As examples of KE tools, Doug Skuce15-17 has designed an evolving series of knowledge extraction and document management tools. All input to the knowledge base, whether generated by the KE tools or entered directly by an editor, is represented in a CNL called ClearTalk. The KE tools have the following advantages over the older systems represented by Figure 4:
As an example, the students in Skuce's operating systems course used the KE tools to map information from on-line Linux** manuals to a knowledge base for a Linux help facility. The people who wrote the manuals were experts, but the students who edited the knowledge base were novice users of both Linux and the KE tools. As another example, Skuce built a simple knowledge base about animals for his 9-year-old daughter's school project. She and her class could browse the knowledge base on the Web, and they had no difficulty in understanding every fact presented in ClearTalk. Over the past 30 years, many natural-language query systems have been developed that are much easier to use than SQL. Unfortunately, one major stumbling block has prevented them from becoming commercially successful: the amount of effort required to define the vocabulary terms and map them to the appropriate fields of the database is a large fraction of the effort required to design the database itself. However, if appropriate KE tools are used to design the database, the vocabulary needed for the query system can be generated as a by-product of the design process. As an example, the RÉCIT system18,19 uses KE tools to extract knowledge from medical documents written in English, French, or German and translates the results to a language-independent representation in conceptual graphs. The knowledge extraction process defines the appropriate vocabulary, specifies the database design, and adds new information to the database. The vocabulary generated by the KE process is sufficient for end users to ask questions and get answers in any of the three languages. Translating an informal diagram to a formal notation of any kind is as difficult as translating informal English specifications to executable programs. But it is much easier to translate a formal representation in any version of logic to controlled natural languages, to various kinds of graphics, and to executable specifications. Walling Cyre and his students have developed KE tools for mapping both the text and the diagrams from patent applications and similar documents to conceptual graphs.20-22 Then they implemented a scripting language for translating the CGs to circuit diagrams, block diagrams, and other graphic depictions. Their tools can also translate CGs to VHDL, a hardware design language used to specify very high-speed integrated circuits (VHSICs). Design and specification languages have multiple metalevels. As an
example, the Unified Modeling Language (UML) has four levels: the
metametalanguage defines the syntax and semantics of the UML notations;
the metalanguage defines the general-purpose UML types; a systems analyst
defines application types as instances of the UML types; finally, the
working data of an application program consists of instances of the
application types. To provide a unified view of all these levels, Olivier
Gerbé and his colleagues at the DMR Consulting Group implemented design
tools that use conceptual graphs as the representation language at every
level.23-27
For his Ph.D. dissertation, Gerbé developed an ontology for using CGs as
the metametalanguage for defining CGs themselves.28
He also applied it to other notations, including UML and the Common KADS
system for designing expert systems. Using that theory, Gerbé and his
colleagues developed the Method Repository System as an authoring
environment for editing, storing, and displaying the methods used by the
DMR consultants. Internally, the knowledge base is stored in conceptual
graphs, but externally, the graphs can be translated to Web pages in
either English or French. About 200 business processes have been modeled
in a total of 80 No single system discussed in this paper incorporates all the features desired in a KE system, but the critical research has been done, and the remaining work requires more development effort than pure research. Figure 6 shows the flow of information from documents to logic and then to documents or to various computational representations. The dotted arrow from documents to controlled languages requires human assistance. The solid arrows represent fully automated translations that have been implemented in one or more systems.
For all these tools, the unifying representation language is logic, which could be represented in KIF, CGs, or other notations specialized for various tools. Aristotelian syllogisms together with Horn-clause rules provide sufficient expressive power to specify a Turing machine, and they support efficient computational mechanisms for executing the specifications. For database queries and constraints, statements in full first-order logic can be translated to SQL. All these subsets, however, use the same vocabulary of natural-language terms, which map to the same ontology of concepts and relations. From the user's point of view, the system communicates in a subset of natural language, and the differences between tools appear to be task-related differences rather than differences in language.
Graph algorithmsFor many purposes, graphs are a natural representation that is isomorphic to the structure of an application: maps with cities as nodes and highways as arcs; flow diagrams through programs, electrical wiring, and plumbing; the valence bonds between atoms of an organic molecule; the communication links in a computer network; the reference patterns between documents and Web sites on the Internet; and the semantics of natural languages with their complex phrase structures and anaphoric references. When such networks are represented by strings or matrices, the resulting data structures tend to make inefficient use of storage space, execution time, or both. This section surveys five important components of an intelligent system that can benefit from graph-based algorithms:
In all five of these areas, the direct connectivity of CGs and their nested contexts support algorithms that are simpler and more efficient than algorithms on linear strings and tables. For these reasons, most reasoning systems in AI, even those that use linear notations externally, use tree and graph data structures internally. During the 1970s, the database field was embroiled in a controversy between the proponents of the new relational DBMS, which stored data in tables, and the proponents of older DBMS systems, which stored data in networks or hierarchies. For many applications, the network and hierarchical systems had better performance, but the relational systems became the universal standard because their logic-based query languages, such as SQL, were far easier to use than the navigational systems, which required a link-by-link traversal of the networks. The battle for network DBMSs was finally lost when one of the staunchest defenders claimed that ease of use was not important because “programmers enjoy a challenge.” Today, network systems have come back into vogue as the foundation for object-oriented DBMSs, which represent the connections between objects more directly than the now standard RDBMS. Yet the query languages for OODBMSs (object-oriented DBMSs) require the same kind of link-by-link traversals as the navigational methods of the 1970s. Unlike the logic-based SQL standard, the OODBMS query languages require far more programming effort, which must be specialized to the formats of each vendor. To support a more natural interface between humans and computers, Sowa29,30 proposed conceptual graphs as an intermediate language between natural languages and logic-based computer languages. For question-answering systems, a CG derived from a natural language question could be translated to logic-based query languages such as SQL or be matched against the graphs of a network DBMS. In principle, CGs could provide a high-level interface for any DBMS—relational, network, or object-oriented. However, there were two obstacles to using CGs as the universal interface to every DBMS: the natural language processors were not sufficiently robust to generate them, and the algorithms for searching network databases were too slow. The breakthrough in performance that made a CG database efficient was accomplished by Levinson and Ellis,31,32 who developed algorithms that could search a lattice of graphs in logarithmic time. Instead of navigating the networks link by link, their systems could take any query graph q and determine where it fit within the lattice. As a result, it would return two pointers: one would point to the lower sublattice of all graphs that are more specialized than q, and the other would point to the upper sublattice of all graphs that are more generalized than q. For deduction and theorem proving, Peirce33 discovered graph-based rules of inference, which are generalizations and simplifications of the rules of natural deduction by Gentzen.34 The beauty of Peirce's rules is that they make a perfect fit with a system that stores and retrieves graphs in a generalization hierarchy: Peirce's rules are based on the conditions in which any graph p may be replaced by a generalization of p or a specialization of p. Furthermore, the negation of any context reverses the ordering for all graphs in the context: if p is a generalization of q, then ~p is a specialization of ~q. Esch and Levinson35,36 presented algorithms for combining Peirce's rules with search and retrieval from a generalization hierarchy, and one of Levinson's students, Stewart,37 implemented those algorithms in a high-speed theorem prover for first-order logic. Every proposition that was proved, either as a theorem or as an intermediate result, was stored in its appropriate place in the generalization hierarchy together with a pointer to its proof. During a proof, each possible step that could be generated by Peirce's rules was used as a query graph q to determine whether q or any specialization of q had already been proved. If so, the proof was done. A high-speed search and retrieval mechanism for generalization hierarchies of graphs can also be used as the basis for structural learning algorithms. Unlike neural networks and statistical algorithms, whose learning consists of changing numerical weights, a graph-based algorithm can learn arbitrarily large structures represented as graphs. To demonstrate that principle, Levinson38 used his search algorithms in a learning program that would learn to play board games, such as chess, by starting with no knowledge about the game other than the ability to make legal moves. His chess program, called Morph, learned chess by playing games with a tutor called Gnu Chess, which was a master-level program, but it could not improve its performance by learning. At the end of each game, Morph was told whether the game was won, lost, or drawn (usually lost, especially in the early stages of learning). Then Morph would estimate the values of all the intermediate positions achieved during the game by backpropagation from the final value (1.0 for a win, 0.5 for a draw, or 0 for a loss), and save the chess positions with their estimated values as graphs in the hierarchy. When it made a move, Morph would determine all the possible moves, look up the corresponding positions in the hierarchy, find the closest matching positions, and consider their previously estimated values. Morph would then make the move that led to the position with the best estimated value. After playing a few thousand games with its tutor, Morph would have a sufficient database of moves with estimated values to play a decent game of chess. To find analogies, Majumdar39 implemented a system called VivoMind, which represents knowledge in dynamic conceptual graphs. What makes the graphs dynamic are algorithms that pass messages along the nodes of a graph. Each node in a CG corresponds to an object that can pass messages to neighboring nodes. The result is an elegant generalization of the marker-passing algorithms originally implemented by Quillian40 and further developed by Fahlman41 and Hendler.42 For finding analogies, VivoMind has proved to be faster than other analogical reasoners on all the usual test cases. For a knowledge base with N relations, most analogy finders take time proportional to N cubed, but VivoMind finds the same analogies in time proportional to N log N. The contexts of conceptual graphs are based on Peirce's logic of existential graphs and his theory of indexicals. Yet the CG contexts happen to be isomorphic to the similarly nested discourse representation structures (DRSs), which Hans Kamp43,44 developed for representing and resolving indexicals in natural languages. When Kamp published his first version of DRS, he was not aware of Peirce's graphs. When Sowa30 published his book on conceptual graphs, he was not aware of Kamp's work. Yet the independently developed theories converged on semantically equivalent representations; therefore, Sowa and Way45 were able to apply Kamp's techniques to conceptual graphs. Such convergence is common in science; Peirce and Frege, for example, started from very different assumptions and converged on equivalent semantics for FOL, which 120 years later is still the most widely used version of logic. Independently developed, but convergent theories that stand the test of time are a more reliable basis for standards than the consensus of a committee. Although graphs are one of the most versatile representations, many good tools use other notations. A framework for intelligent systems should take advantage of different structural properties: some algorithms are more efficient on graphs, and some algorithms are more efficient on strings or tables. The logical equivalence of KIF and CGIF facilitates the mapping from one to the other. Their generality facilitates the integration with other languages that have more restricted expressive power, such as SQL (Structured Query Language), DAML, OIL, RDF (Resource Description Framework), and others. Components based on any of those languages can be integrated with a system that uses KIF and CGs as its primary languages.
Expressive power and computational complexityThe limitations of AI systems have often been blamed on the complexity of the required computations. Various solutions have been proposed, ranging from highly parallel networks that mimic the mechanisms of the human brain, to restricted languages that limit the complexity of the problem definition. A modular architecture could support components that use such strategies for special purposes: neural networks, for example, have been highly successful for pattern recognition, and restricted languages can be highly efficient for specific kinds of problems. For central communications among all components, however, the Elephant language used in the blackboard must be the most expressive, since it must transmit any information that any other component might use or generate. That extreme expressive power raises a question about the complexity of the computations needed to process it. Computational complexity, however, is not a property of a language, but a property of the problems stated in that language. First-order logic has been criticized as computationally intractable because the proof of an arbitrary FOL theorem may take an exponentially increasing amount of time. That criticism, however, is misleading, since large numbers of problems stated in full FOL are easily solvable. Placing restrictions on the logic or the notation cannot make an intractable problem solvable; they merely make it impossible to state. The expressive power of Elephant does not slow down the communications from one component to another. The components that receive a communication are responsible for determining what they can do with it. For certain kinds of problems, first-order logic can be the most efficient way to express them and to solve them. A typical example is answering a query in terms of a relational database. The answer to an SQL query that uses the full expressive power of FOL can be evaluated in at most polynomial time, with the exponent of the polynomial equal to the number of quantifiers in the query. If the quantifiers range over an indexed domain, the evaluation can often be done in logarithmic time. Evaluating a constraint against a relational database is just as efficient as evaluating a query; in fact, every constraint can be translated to a corresponding query that asks for all instances in the database that violate the constraint. In commercial SQL systems, queries and constraints with the expressive power of FOL are routinely evaluated with databases containing gigabytes and terabytes of data. Although the time to solve an intractable problem may be very long, the time to detect the complexity class of a problem can be very short. Callaghan46 took advantage of syntactic criteria to subdivide the Levinson-Ellis graph hierarchies into several disjoint subhierarchies, each of which is limited to one complexity class. For each subhierarchy, he determined appropriate algorithms for efficiently classifying and searching that hierarchy. To determine the complexity class of any graph, Callaghan computed a signature or descriptor to determine its complexity properties. Each graph's descriptor would specify easily computable prerequisites (necessary conditions) that any matching graph must meet. By precomputing the descriptor of a query graph, Callaghan accomplished several goals at once: determining the complexity of the search (tractability or decidability); narrowing the search to a particular class of graphs that have compatible descriptors; or determining whether the query graph lies outside the known complexity classes. Besides the subhierarchies of graphs supported by the Levinson-Ellis algorithms, Callaghan's approach can accommodate any external subsystem for which a suitable descriptor can be computed by simple syntactic tests. Among those subsystems are the relational databases, which are highly optimized for data stored in tables. In fact, the Levinson-Ellis hierarchies are complementary to an RDBMS: the kinds of data that are most efficient with one are the least efficient with the other. Other important subsystems include the specialized query languages of many versions of description logics. If a query graph lies outside of any of the known classes, it can be sent to a general first-order theorem prover. As a result, this approach can accept any query expressible in first-order logic, determine its complexity class, and send it to the most efficient subsystem for processing it. Mapping a smaller logic to a more expressive logic is always possible,
but the reverse mapping usually requires some restrictions. To map
information from a large, rich knowledge base to a smaller, more
efficiently computable one, Peterson, Andersen, and Engel47
developed a system they called the knowledge bus. Their source was
the Cyc** knowledge base,48,49
which contains over 500
For a particular application, the knowledge bus extracts a small subset of the Cyc knowledge base that can be processed more efficiently by simpler tools. Although some information and some potential inferences are lost, the extracted subset has a well-founded semantics that is guaranteed to be free of contradictions. Furthermore, the resulting subset is more portable: the inference engine can be used as an extension to any relational database, and Engel50 has developed techniques for mapping the definitions and axioms to Java classes that can be used in Web-based applications.
A flexible modular frameworkA framework based on Elephant and Linda would subsume anything that could be done with a more conventional scripting language. Natural languages can specify procedures with a sequence of imperative statements linked by adverbs such as then and next. A translation of those statements into KIF or CGs would specify the same procedure. But natural languages and their translations into logic could also specify more complex speech acts that could dynamically reconfigure the components of an intelligent system and their ways of interacting. In the original Linda system, the operators access a blackboard that contains tuples, which consist of sequences of arbitrary data. For a system that supports multiple languages, the first element of the tuple should identify the language so that the Linda system could immediately determine how to interpret the remainder. A general format would have six elements:
The Linda pattern matcher could use an ordinary string comparison for the first five elements of the tuple, but it would require a more general logical unification (of CGs or KIF statements) for the sixth. Unification of messages in controlled natural languages might be difficult to define, and the pattern matcher might need to translate the message to CGs or KIF if a pattern match is necessary. To simulate a conventional scripting language, the destination would always be specified, and the speech act would always be “command.” To access a relational database, the speech act would be “assertion” for an update, “question” for a query, or “definition” for creating a new table with a new format. At the end of his book, Austin5 specified a large number of possible speech acts, and he insisted that his list was not exhaustive. Following are his five categories, his description of each, and a few of his examples:
The verbs listed in these examples illustrate the kinds of speech acts that people commonly perform, but in most cases, they omit the verb that specifies the speech act. A man who stands up in a meeting to shout something in an angry voice seldom begins with the words “I protest.” Yet the people in the audience would recognize that the new speaker is protesting rather than agreeing with the previous speakers. Computers, however, need to be told how to interpret such speech, and an explicit statement of the speech act would enable them to respond more “intelligently.” To illustrate the kinds of speech acts in an AI system, Figure 7 shows a kind of system discussed by Sowa.51 The boxes labeled FMF represent the same flexible modular framework that has been adapted to different kinds of tasks by changing the roles and the kinds of speech acts expected of the users. At the upper left, linguists, logicians, and philosophers are using the FMF to define a general ontology. Logicians could use FMF to enter the definitions and axioms for logical operators, set theory, and basic mathematical concepts and relations. Linguists could use it to enter the grammar rules of natural languages and the kinds of semantic types and relations. Philosophers could use FMF to collaborate with the linguists and logicians in analyzing and defining the fundamental ontologies of space, time, and causality common to all domains of application. The major speech act for these users would be definition, but they might also ask questions about how to use the system, and they might use verdictives to evaluate the work of their colleagues.
In the center of Figure 7, application developers use FMF to enter domain-dependent information about specific applications. Some of them would use FMF to define generic ontologies for industries such as banking, agriculture, mining, education, and manufacturing. Others would start with one or more generic ontologies and combine them or tailor them to a particular business, project, or application. The users in this mode would perform the same kinds of speech acts as the linguists, logicians, and philosophers. But they might put more emphasis on commissives, which would commit them to strict deadlines and performance goals. At the bottom right of Figure 7 the application users might interact with the FMF in an unpredictable number of ways. A business user with a job to do would have different requirements from those of a recreational user. Both, however, might react with behabitives, such as grumbling, complaining, or cursing, when the system does not do what they wish. But unlike more conventional systems, an FMF could apologize, sympathize, and commiserate. The examples shown in Figure 7 do not begin to exploit the kinds of opportunities offered by an FMF that is able to recognize and respond to a wide range of speech acts. An important reason for building an FMF is to explore new ways of interaction, either between computers and humans or among mixed committees of human and computer participants. The explicit recognition and marking of speech acts enables the components of an FMF to interact, negotiate, and cooperate more intelligently among themselves and with their human users.
Implementing the FMFA major advantage of a flexible modular framework is that it does not have to be implemented all at once. The four design principles, which enabled UNIX-like systems to be implemented on anything from a wearable computer to the largest supercomputers, can also support the growth of intelligent systems from simple beginnings to a complete “society of mind,” as Minsky52 called it. For an initial implementation, each of the four principles could be reduced to the barest minimum, but any of them could be enhanced incrementally without disturbing any previously supported operations:
As an FMF is being developed, it can accommodate any mixture of a variety of components: newly designed components specially tailored for the FMF; legacy systems enclosed in a wrapper that translates their I/O formats to the common language of the blackboard; commercial products that perform specific services; experimental components that are being designed and tested in research projects; an open-ended variety of client interfaces specialized for different applications; remote servers distributed anywhere across the Internet. A blackboard is an ideal platform for supporting hot-swap or plug-n-play components. When a new component is added to the FMF, it would send a message to the blackboard to identify itself and the patterns of messages it accepts. It could then be invoked by any other component whose message matches the appropriate pattern. To take advantage of that flexibility, the Jini system uses the Linda operators to accommodate any kind of I/O device that might be attached to a network. But for intelligent systems, it is even more important to have that flexibility at the center instead of the periphery. Any server anywhere on the Internet could be converted to an intelligent agent by using an FMF as its front end. It could then respond to requests from other FMF servers anywhere else on the Internet. Each FMF would be, in Minsky's terms, a society of mind, and the entire Internet would become a society of societies. Human users could have a personal FMF running on their own computers, which could communicate with any other FMF to request services. The traditional help desks, in which a human expert answers the same questions repeatedly for multiple users, could be replaced by a human teacher or editor, as in Figure 5, who would build a knowledge base. That knowledge base would drive a specialized FMF, which could be consulted by the personal FMF of anyone who asks a relevant question. The intelligence accessible to any user would then be the combined intelligence of his or her personal FMF together with every FMF accessible to it across the Internet.
Cited references and notes
*Trademark or registered trademark of International Business Machines Corporation. **Trademark or registered trademark of The Open Group, Apple Computer, Inc., Microsoft Corporation, Sun Microsystems, Inc., Linus Torvalds, or Cycorp, Inc.
|
||||||||||||||||||||||