The more we try to make AI models, the larger their carbon footprint – with a few guests producing up to 50 times more carbon dioxide emissions than others, a new study has revealed.
The reasoning models, such as Claude d’Anthropic, the O3 and the OPENAI and the DEEPSEEK R1, are specialized large (LLM) models which devote more time and calculation power to produce More precise responses than their predecessors.
However, apart from certain impressive results, these models faced themselves severe boundaries in their ability to tighten complex problems. Now, a team of researchers has highlighted another constraint on model performance – their exorbitant carbon footprint. They published their conclusions on June 19 in the journal Communication borders.
“The environmental impact of the question of LLMS formed is strongly determined by their reasoning approach, with explicit reasoning processes considerably stimulating energy consumption and carbon emissions,” said the study to the first author Maximilian DaunerResearcher at Hochschule München University of Applied Sciences in Germany, said in a press release. “We found that the models compatible with reasoning produced up to 50 times more CO₂ emissions than concise response models.”
To respond to the prompts given to them, the LLM discloses the language in chips – pieces of words which are converted into a chain of numbers before being introduced into neural networks. These neural networks are set using training data which calculate the probabilities of certain models appearing. They then use these probabilities to generate answers.
Reasoning models are trying to stimulate precision using a process known as “reflection chain”. It is a technique that works by breaking down a complex problem with smaller and more digestible intermediate steps that follow a logical flow, imitating how humans could reach the conclusion of the same problem.
In relation: Ai ‘hallucinous’ constantly, but there is a solution
However, these models have much higher energy requirements that the conventional LLMs, posing a bottle of potential economic strangulation for companies and users who wish to deploy them. However, despite some Environmental impact research According to the adoption of the growing AI more generally, comparisons between the carbon footprints of different models remain relatively rare.
The cost of reasoning
To examine the CO₂ emissions produced by different models, scientists behind the new study asked 14 LLMS 1000 questions on various subjects. The different models had between 7 and 72 billion parameters.
The calculations were carried out using a Perun frame (which analyzes the LLM performance and the energy it needs) on an NVIDIA A100 GPU. The team then converted energy consumption into co₂ by assuming every kilowatt hour of energy produced 480 grams of Co₂.
Their results show that, on average, the reasoning models generated 543.5 tokens per question against only 37.7 tokens for more concise models. These additional tokens – equivalent to more calculations – meant that more precise reasoning models produced more CO₂.
The most precise model was the COGITO model of 72 billion parameters, which correctly answered 84.9% of reference issues. Cogito released the CO₂ emissions three times of similar size models made to generate more concise answers.
“Currently, we are seeing a clear compromise of the precision of sustainability inherent in LLM Technologies,” said Dauner. “None of the models that kept emissions less than 500 grams of equivalent CO₂ (Total Liberated Total greenhouse gases) has reached precision more than 80% to correctly answer the 1,000 questions.”
But the problems go beyond precision. Questions that required longer reasoning times, as in algebra or philosophy, have increased emissions six times higher than simple research requests.
Researchers’ calculations also show that emissions depended on the chosen models. To answer 60,000 questions, the R1 model of 70 billion Deepseek parameters would produce the CO₂ issued by a round-trip flight between New York and London. The Qwen 2.5 model of the parameter of 72 billion Alibaba Cloud, however, would be able to respond to them with similar precision rates for a third of emissions.
The results of the study are not final; The emissions can vary depending on the material used and the energy networks used to provide their power, the researchers stressed. But they should encourage users to think before deploying technology, the researchers noted.
“If users are experiencing the exact cost of their results generated by AI, as transformed with casualness into a figure of action, they could be more selective and thoughtful at the time and the way they use these technologies,” said Dauner.