By Aman Adukoorie
AI is rapidly becoming a core part of the modern world. According to McKinsey, 78% of organisations have already integrated AI into at least one of their business functions, while UNESCO estimates over a billion people use generative AI tools daily.
However, the productivity gains from AI come at a price: the immense computational power required to train and deploy AI models demands staggering amounts of energy.
The International Energy Agency projects that by 2026, the industry’s energy consumption will exceed 1,000 terawatt-hours, ten times the amount consumed in 2023.
Rising energy demands in the AI sector have sparked significant concern among governments and local stakeholders. In areas surrounding new data center developments, frequent blackouts have already triggered public protests.
Regulators have responded, by instituting monitoring frameworks and guidelines on energy use. The EU AI Act, for instance, requires developers of general-purpose models to document the energy consumption of their models.
The same act also directs the EU Commission to create standards that will ensure the AI industries practices are resource efficient and sustainable. In order to address the concerns of local stakeholders and ensure they are able to comply with future regulations, it is imperative that the AI industry ensures that the training and deployment of AI models is done in a sustainable manner.
Fortunately, there are new algorithmic techniques and hardware improvements are being developed that can facilitate the sustainable training and deployment of models.
Algorithmic Improvements
Modern deep neural network-based AI algorithms are inspired by the human brain. They consist of multiple layers of connected ‘neurons’ that take in data, perform simple mathematical operations, and pass the results forward to produce an output.
Advertisement
Each connection has an associated ‘weight’ and each neuron has a ‘bias,’ these parameters control how neurons in one layer affect the operations in the next layer. The network is trained by adjusting the weights and biases using large amounts of human data until the model’s output closely matches human behaviour.
Once trained, the network can perform inference, that is it can perform these operations on novel inputs that the user specifies, and produce human-like results. The current state of the art models utilise over a trillion weights and biases, as a result training cycles and inference requests involve an immense number of operations, which in turn requires significant energy resources.
Lowering the energy footprint of such models ultimately depends on reducing both the number of operations they perform and the cost of each operation these neural networks perform.
Two complementary techniques, pruning and quantisation, do just this. Pruning systematically identifies and removes weights, biases, or even entire neurons that have minimal influence on the model’s predictions.
This can be done during training or after training. By eliminating these low-impact components, the network’s effective size is reduced, resulting in fewer computational operations during inference and hence lower reduced energy consumption.
Quantisation makes the remaining computations cheaper by representing weights and biases with less precise numbers, such as switching from 32-bit floating-point values to 8-bit integers.
This enables simpler arithmetic and reduces energy use. Together, pruning and quantisation shrink model size, reduce computational workload, and lower power consumption, enabling AI systems that are faster, cheaper to run, and more environmentally sustainable.
Hardware Optimisation
Hardware enhancements also offer a significant pathway toward greater energy efficiency. In-memory computing (IMC) is a particularly helpful innovation. Traditional computing architectures are hampered by a ‘Von Neumann bottleneck’, where data must constantly shuttle between processor units and memory units.
This shuttling uses considerable amounts of energy. IMC overcomes this by executing mathematical operations directly within the memory unit, transforming storage into a functional processing unit.
By eliminating the “data tax” of moving billions of parameters, IMC is energy-efficiency. State of the art in-memory computing chips also leverage memristor crossbars; these systems can perform calculations by using the physical properties of the hardware itself rather than complex logic gates.
By bypassing the traditional bottleneck of standard chips, IMC provides a scalable hardware solution that makes the complex, large-scale processing required in modern deep neural networks significantly more energy-efficient.
A Call to Action
Energy-efficient AI is no longer a peripheral green initiative; it is a prerequisite for the industry’s survival. To maintain the trust of regulators and the public, the AI industry must ensure that it prioritises sustainability as a core metric, not an afterthought.
Fortunately, by marrying algorithmic refinements like pruning and quantisation with hardware breakthroughs like in-memory computing, AI practitioners can cut their energy footprint without sacrificing performance. The task is clear: before AI can revolutionise the world, it must preserve it.
The author is an experienced quantitative analyst. Views are personal.
Discover more from
Subscribe to get the latest posts sent to your email.





