One of my most deeply contained values as a technological chronicler is humanism. I believe in humans, and I think technology should help people, rather than designating them to replace them. I care to align artificial intelligence – that is to say, ensuring that AI systems act in accordance with human values - because I think our values are fundamentally good, or the least better than the values that a robot could find.
So when I heard that the researchers of Anthropic, the IA society which made Claude Chatbot, began to study the “model of well -being” – the idea that the models of AI could soon become conscious and deserve a kind of moral status – the humanist thought: Who cares about chatbots? Are we not supposed to worry about mistreating ourselves, not us who badly?
It is difficult to say that today’s AI systems are aware. Of course, large language models have been trained to speak like humans, and some of them are extremely impressive. But can chatpt feel joy or suffering? Do Gemini deserve human rights? Many IA experts I know would say no, not yet, not even close.
But I was intrigued. After all, more people are starting to treat AI systems as if they were aware – Fall in love with them, using them as therapists And request their advice. The most intelligent AI systems exceed humans in certain areas. Is there a threshold to which an AI would begin to deserve, if not human level rights, at least the same moral consideration that we give to animals?
Consciousness has long been a taboo subject in the world of serious research of AI, where people are wary of the anthropomorphization of AI systems for fear of resembling cranks. (Everyone remembers what happened to Blake Lemoine, a former Google employee who was drawn in 2022After claiming that the company’s lamda chatbot had become sensitive.)
But it can start to change. There is a small body of Academic research on the well-being of the AI model, and a modest but growing number Experts in fields such as philosophy and neurosciences take the perspective of AI awareness more seriously, as AI systems become smarter. Recently, the Podcaster Tech Dwarkesh Patel compared the well-being of AI to animal welfare, adage He thought it was important to make sure that “the digital equivalent of industrial agriculture” does not arrive at the future beings of AI.
Technological companies are also starting to talk more about it. Google recently published a job list For a “post-agi” researcher whose areas of interest will include “the consciousness of the machines”. And last year, anthropic hired his first IA well-being researcherKyle Fish.
I interviewed Mr. Fish at the Anthropic office in San Francisco last week. It is a friendly vegan who, like a certain number of anthropogenic employees, has links with effective altruism, an intellectual movement with roots in the technological scene of the bay region which focuses on the security of AI, animal welfare and other ethical problems.
Mr. Fish told me that his work at Anthropic focused on two fundamental questions: first, is it possible that Claude or other AI systems become aware in the near future? And secondly, if that happens, what should Anthropic do on this subject?
He stressed that this research was still early and exploratory. He thinks that there is only a small chance (perhaps 15%) that Claude or another current AI system is aware. But he thinks that in the coming years, because IA models develop more human capacities, IA companies will have to take the possibility of conscience more seriously.
“It seems to me that if you find yourself in the situation to put a new class of existence which is capable of communicating and connecting and reasoning and solving problems and planning so that we are previously associated with conscious beings, it seems quite prudent to ask questions about if this system could have its own experiences,” he said.
Mr. Fish is not the only person at the anthropogenic thought of the well-being of AI. There is an active chain on the company’s Slack messaging system called # Model Welfare, where employees check Claude’s well-being and share examples of AI systems acting in a human way.
Jared Kaplan, Director of Sciences of Anthropic, told me in a separate interview that he thought that it was “quite reasonable” to study the well-being of AI, given the intelligence of models.
But testing AI systems for conscience is difficult, warned Mr. Kaplan, because they are so good imitations. If you invite Claude or Chatgpt to talk about his feelings, it could give you a convincing answer. This does not mean the chatbot actually has Feelings – Only he knows how to talk about it.
“Everyone is very aware that we can train the models to say what we want,” said Kaplan. “We can reward them for saying they had no feeling. We can reward them for having said really interesting philosophical speculation about their feelings. ”
So how are researchers supposed to know if AI systems are really aware or not?
Mr. Fish said he could involve the use of techniques borrowed from mechanistic interpretability, an AI sub-champ which studies the internal functioning of AI systems, to verify whether some of the same structures and paths associated with consciousness in the human brain are also active in AI systems.
You can also probe an AI system, he said, observing his behavior, looking at how he chooses to operate in certain environments or perform certain tasks, which he seems to prefer and avoid.
Mr. Fish recognized that there was probably not a single decisive test for AI awareness. (He thinks that consciousness is probably more a spectrum than a simple change yes / no, anyway.) But he said that there were things that IA companies could do to take into account the well-being of their models, in case they become aware one day.
An anthropic explore question, he said, is whether future AI models should have the opportunity to stop discussing with a boring or abusive user, if they find user requests too painful.
“If a user constantly asks for harmful content despite the refusals and attempts to redirect the model, could we allow the model to end this interaction?” Mr. Fish said.
Critics could reject measures like these such as crazy discussions – today’s AI systems are not aware of most standards, so why speculate on what they might find unpleasant? Or they could oppose the study of an AI company in the first place, as this could create incentives to train their systems to act more sensitive than they really are.
Personally, I think that it is very good for researchers to study the well-being of AI, or to examine AI systems for signs of conscience, as long as it does not divert the resources of security and the alignment of AI which aim to ensure the safety of humans. And I think it’s probably a good idea to be nice to AI systems, if only like a hedge. (I try to say “please” and “thank you” to the chatbots, even if I don’t think they are aware, because, as Openai Sam Altman saysYou never know.)
But for the moment, I will reserve my deepest concern for carbon -based forms of life. In the next storm of AI, it is our well-being that worries me the most.