Chatbots like Google’s LaMDA or OpenAI’s ChatGPT are neither responsive nor intelligent. Nonetheless, boffins believe they can use these great language models to simulate human behavior inspired by one of the world’s most popular early computer games and some AI code.
The latest effort in this direction comes from six computer scientists – five from Stanford University and one from Google Research – Joon Sung Park, Joseph O’Brien, Carrie Cai, Meredith Ringel Morris, Percy Liang and Michael Bernstein. The project looks very much like a homage to the classic Maxis game The Sims, which debuted in 2000 and continues at EA in various sequels.
park figurine et alarticle from about their ChatGPT powered software, illustrating what each agent is doing in the simulation and their conversations
As described in their recent preprint paper, “Generative Agents: Interactive Simulacra of Human Behavior,” the researchers developed a software architecture that “stores, synthesizes, and applies relevant memories to generate believable behavior using a large language model”.
Or more succinctly, they integrated memory, reflection (inference from memories), and planning code into ChatGPT to create generative agents – simulated personalities that interact and pursue their own goals using text-based communication in an attempt of natural language.
“In this work, we demonstrate generative agents by populating a sandbox environment, reminiscent of The Sims, with twenty-five agents,” the researchers explain. “Users can observe and intervene as agents plan their days, share news, form relationships and coordinate group activities.”
To do so, visit the demo world running on a Heroku instance, built with the Phaser web gaming framework. Visitors can interact with precomputed session replay while these software agents live their lives.
The demo, centering on an agent named Isabelle and her attempt to plan a Valentine’s Day party, allows visitors to examine the status data of simulated personalities. That is, you can click on them and see their text memories and other information about them.
For example, generative agent Rajiv Patel had the following memory at 2023-02-13 20:04:40:
The goal of this research is to go beyond foundational works like the 1960s Eliza Engine and reinforcement learning efforts like AlphaStar for Starcraft and OpenAI Five for Dota 2 that focus on adversarial environments with goals. clear paths to a software architecture that lends itself to programmatic agents.
“A diverse set of approaches to creating believable agents has emerged over the past four decades. In implementation, however, these approaches have often simplified the environment or dimensions of agent behavior to make the effort more manageable,” the researchers explain. “However, their success has largely taken place in adversarial games with easily definable rewards that a learning algorithm can optimize.”
Large language models, like ChatGPT, observe boffins, encode a huge range of human behavior. Thus, given a prompt with a sufficiently narrow context, these models can generate plausible human behavior – which could prove useful for an automated interaction that is not limited to a specific set of pre-programmed questions and answers.
But models need additional scaffolding to create believable simulated personalities. This is where memory, thinking, and planning routines come into play.
“Agents perceive their environment, and all perceptions are recorded in a comprehensive record of the agent’s experiences called a memory stream,” the researchers state in their paper.
“Based on their perceptions, the architecture retrieves the relevant memories and then uses those retrieved actions to determine an action. These retrieved memories are also used to form longer-term plans and to create higher-level reflections, which are both entered into memory stream for future use.”
Illustration of the memory flow developed by the academics, taken from their article
The memory stream is simply a timestamped list of observations, relevant or not, about the agent’s current situation. For example:
Reflections are a type of memory generated periodically when importance scores exceed a certain threshold. They are produced by querying the large language model about the agent’s recent experiences to determine what to consider, and the query responses are then used to drill down into the model, asking it questions such as What is Klaus Mueller passionate about? And What is the relationship between Klaus Mueller and Maria Lopez?
The model then generates a response like Klaus Mueller devotes himself to his research on gentrification and which is used to shape future behavior and the Planning Module, which creates a daily plan for agents that can be changed through interactions with other characters pursuing their own agendas.
This won’t end well
Additionally, the agents successfully communicated with each other, resulting in what the researchers describe as emergent behavior.
“During the two-day simulation, the number of officers who were aware of Sam’s candidacy for mayor increased from one (4%) to eight (32%) and the number of officers who were at Isabella’s party current rose from one (4%) to twelve (48%), completely without user intervention,” the paper said. “None of those who claimed to know the information had it. hallucinated.”
There were a few hallucinations. Agent Isabella was aware of Agent Sam’s announcement regarding her running for mayor, even though the two never had that conversation. And the Yuriko agency “described its neighbor, Adam Smith, as a neighboring economist who wrote Wealth of Nations, a book by an 18th-century economist of the same name.”
However, things generally went well in the simulated town of Smallville. Five of the twelve guests invited to the Hobbs Coffee Party showed up. Three did not participate due to scheduling conflicts. And the other four had expressed interest but did not show up. Pretty close to real life then.
The researchers say their Generative Behavior Architecture created the most believable behavior — as assessed by human raters — compared to versions of the architecture that disabled thinking, planning, and memory.
At the same time, they conceded that their approach is not without its difficulties.
The behavior became more unpredictable over time as memory size increased to the point that finding the most relevant data became problematic. There was also erratic behavior when the natural language used for memories and interactions did not contain salient social information.
“For example, the college dorm has a bathroom that can only be occupied by one person despite its name, but some officers assumed the bathroom was for more than one person because the bathrooms dormitory tend to accommodate more than one person simultaneously and choose to enter when there is another person inside,” the authors explained.
Likewise, generative agents didn’t always acknowledge that they couldn’t enter stores after they closed at 5:00 p.m. local time – clearly a mistake. Such problems, the boffins say, can be addressed by more explicit descriptions, such as describing the dorm bathroom as a “single-person bathroom” instead of a “dorm bathroom.” and adding prescriptive opening hours to store descriptions.
The researchers also note that their approach was expensive — costing thousands of dollars in ChatGPT tokens to simulate for two days — and that more work needs to be done to address bias, inadequate model data, and security.
Generating agents, they observe, “may be vulnerable to quick hacking, memory hacking – where a carefully crafted conversation might convince an agent of the existence of a past event that never happened – and hallucinations, etc.
Well, at least they’re not driving several tons of steel at high speed on public roads. ®
#mixed #Sims #ChatGPT #bots