foundations of computational agents
ELIZA, written in 1964–66, was one of the first programs that conversed in English. A version, called DOCTOR, analyzed language and used a script for the role of a psychotherapist. The author, Joseph Weizenbaum [1976], was shocked by the reaction to the program:
A number of practicing psychiatrists seriously believed the DOCTOR computer program could grow into a nearly completely automatic form of psychotherapy. …
I was startled to see how quickly and how very deeply people conversing with DOCTOR became emotionally involved with the computer and how unequivocally they anthropomorphized it. …
Another widespread, and to me surprising, reaction to the ELIZA program was the spread of a belief that it demonstrated a general solution to the problem of computer understanding of natural language.
– J. Weizenbaum [1976]
Over 50 years after ELIZA, natural language systems are much more sophisticated and much more widespread. The issues that Weizenbaum outlined have become more pressing, and new issues have arisen.
Bender et al. [2021] outline many problems with modern systems based on learning from huge corpora. These include the following:
Data: biases in large uncurated data include stereotypical and derogatory language along gender, race, ethnicity, and disability status. Most content on the Internet is created by the privileged; those who have the time and the access to create content, whereas the marginalized experience harassment which discourages participation in open fora. Increasing the size does not guarantee diversity, because while the number of diverse views might increase, their proportion tends not to. Datasets curated for other reasons, such as Wikipedia, have biases about who is included (“notable” people), and how much information is included about each person. Blodgett et al. [2020] provide a review of biases in natural language processing. Bender et al. [2021] recommend significant resources allocated to dataset curation and documentation practices in order to mitigate the biases.
Learning: once you have the data, training the models is very energy intensive. Generating the energy creates greenhouse gases, or diverts energy from other sources that create greenhouse gases. Only rich corporations and governments can afford to train them, and accrue the benefits, but it is poor countries that disproportionately accrue the risks. The expense of training only increases inequality.
Use of models: once the models have been trained, they can be used in various ways. Used as a generative language model, they are particularly useful for those who want to spread misinformation; they can generate seemingly plausible text which may or may not correspond with the truth. For example, consider saying you want fiction and asking for a completion for “the atrocities committed by target included”, and spreading this as factual information, in order to recruit people to a cause against target. AI systems are trained to optimize some score, which may not correspond to what the user wants to optimize. McGuffie and Newhouse [2020] discuss the risk of large language models for radicalization and weaponization by extremists.
Large language models are trained on text only, but language understanding also involves meaning; there is typically a world that text is about, and understanding meaning involves not just the text but the interaction of the text, the world, and the intent of the author [Bender and Koller, 2020; Bisk et al., 2020]. People will assume that the output is the truth; but, when the data used to train isn’t necessarily the truth, and the inference is opaque, there is no reason to have confidence in the outputs. The preoccupation with improving scores on well-defined, but artificial, tasks – those the AI systems are trained to optimize – has diverted resources away from other research.
Feeding AI systems on the world’s beauty, ugliness and cruelty and expecting it to reflect only the beauty is a fantasy.
– Prabhu and Birhane [2020]
Deep learning has had success in science fields where large datasets can be created. For example, the problem of protein folding – determining the three-dimensional structure of proteins – is one of the successes of transformer-based models. The predictions of these programs have changed how chemists work:
Today, thanks to programs like AlphaFold2 and RoseTTAFold, researchers like me can determine the three-dimensional structure of proteins from the sequence of amino acids that make up the protein – at no cost – in an hour or two. Before AlphaFold2 we had to crystallize the proteins and solve the structures using X-ray crystallography, a process that took months and cost tens of thousands of dollars per structure.
– M. Zimmer [2022]
Better predictions promise to enable improved medicine, drug design, and understanding of biochemistry, which can have enormous social impacts. Ma et al. [2022] used deep learning to identify many new candidates for potential antimicrobials drugs, which may be important as drug-resistant bacteria kill millions of people each year. These programs make predictions that still need to be verified in the real world before being accepted.