The pace of technological innovation has accelerated over the past year, most dramatically in the area of AI. And in 2024, there was no better place to help create these advances than NVIDIA Research.
NVIDIA Research is made up of hundreds of extremely bright people who are pushing the boundaries of knowledge, not only in AI, but across many areas of technology.
Over the past year, NVIDIA Research has laid the foundation for future improvements in GPU performance with major discoveries in circuitry, memory architecture, and sparse arithmetic. The team’s invention of new graphics techniques continues to raise the bar for real-time rendering. And we’ve developed new methods to improve AI efficiency, requiring less power, requiring fewer GPU cycles, and delivering even better results.
But the most interesting developments of the year concern generative AI.
We are now able to generate not only images and text, but also 3D models, music and sounds. We are also developing better control over what is generated: to generate realistic humanoid movement and generate image sequences with consistent subjects.
The application of generative AI to science has resulted in high-resolution weather forecasts that are more accurate than conventional digital weather models. AI models have allowed us to accurately predict blood sugar response to different foods. Embodied generative AI is used to develop autonomous vehicles and robots.
And it was just this year. The following is a deeper dive into some of NVIDIA Research’s biggest generative AI work in 2024. Of course, we continue to develop new models and methods for AI, and we expect even more exciting results l next year.
ConsiStory: AI-generated images with the energy of the main character
Consistorya collaboration between researchers at NVIDIA and Tel Aviv University, makes it easy to generate multiple images with a consistent main character – an essential capability for storytelling use cases such as illustrating a comic strip or storyboard development.
The researchers’ approach introduced a technique called subject-focused shared attention, which reduces the time needed to generate coherent images from 13 minutes to around 30 seconds.
Read the ConsiStory document.
Edify 3D: generative AI enters a new dimension
NVIDIA Edify 3D is a basic model that allows developers and content creators to quickly generate 3D objects that can be used to prototype ideas and populate virtual worlds.
Edify 3D helps creators quickly imagine, design, and conceptualize immersive environments with AI-generated assets. New and experienced content creators can use text and image prompts to leverage the template, which is now part of the NVIDIA Edify multimodal architecture to develop visual generative AI.
Read the Edit 3D paper and look at it video on YouTube.
Fugatto: Flexible AI sound machine for music, vocals and more
A team of NVIDIA researchers recently unveiled Fugatto, a foundational generative AI model capable of creating or transforming any mix of music, voice, and sounds based on text or audio prompts.
The model can, for example, create music snippets based on text prompts, add or remove instruments from existing songs, change the emphasis or emotion in a vocal recording, or generate completely new sounds. It could be used by music producers, advertising agencies, video game developers or language learning tool creators.
Read the Fugatto paper.
GluFormer: AI predicts blood sugar levels in four years
Researchers from the Weizmann Institute of Science, Tel Aviv-based startup Pheno.AI, and NVIDIA led the development of GluFormeran AI model that can predict an individual’s future glucose levels and other health metrics based on past blood glucose monitoring data.
Researchers showed that after adding food intake data to the model, GluFormer can also predict how a person’s glucose levels will respond to specific foods and dietary changes, enabling precision nutrition. The research team validated GluFormer on 15 other datasets and found that it generalizes well to predict health outcomes in other groups, including those with prediabetes, type 1 and type 2 diabetes, gestational diabetes and obesity.
Read the GluFormer Paper.
LATTE3D: enabling almost instantaneous generation of text to 3D form
Another 3D generator released by NVIDIA Research this year is LATTE3Dwhich converts text prompts into 3D representations in a second, like a fast virtual 3D printer. Designed in a popular format used for standard rendering applications, the generated shapes can be easily used in virtual environments to develop video games, advertising campaigns, design projects or virtual training grounds for robotics.
Read the LATTE3D paper.
MaskedMimic: Reconstructing realistic movement for humanoid robots
To advance the development of humanoid robots, NVIDIA researchers introduced MaskedMimican AI framework that applies inpainting (the process of reconstructing complete data from an incomplete or obscured view) to motion descriptions.
Using partial information, such as a textual description of a movement or head and hand position data from a virtual reality headset, MaskedMimic can fill in the blanks to infer the movement of the whole body. It became part of NVIDIA GR00T Projecta research initiative aimed at accelerating the development of humanoid robots.
Read the Paper MaskedMimic.
StormCast: improving weather forecasts and climate simulation
In the area of climate science, NVIDIA Research announced StormCasta generative AI model to emulate atmospheric dynamics. While other machine learning models trained on global data have a spatial resolution of around 30 kilometers and a temporal resolution of six hours, StormCast achieves a time scale of 3 kilometers.
The researchers trained StormCast on about three and a half years of NOAA climate data from the central United States. When applied with precipitation radars, StormCast offers forecasts with lead times of up to six hours, which are up to 10% more accurate than in the United States. State-of-the-art 3-kilometer regional weather forecast model from the National Oceanic and Atmospheric Administration.
Read the StormCast Paperwritten in collaboration with researchers at Lawrence Berkeley National Laboratory and the University of Washington.
NVIDIA Research sets records in AI, autonomous vehicles and robotics
Through 2024, models from NVIDIA Research have set records across all benchmarks in AI training and inference, route optimization, autonomous driving and more.
NVIDIA cuOptan optimization AI microservice used for logistics improvements, has 23 world records. The NVIDIA Blackwell platform has demonstrated world-class performance on MLPerf industry benchmarks for AI training And inference.
In the field of autonomous vehicles, Hydra-MDPan end-to-end autonomous driving framework developed by NVIDIA Research, won first place in the End-To-End Driving at Scale portion of the Grand Autonomous Challenge at CVPR 2024.
In robotics, FoundationPosea unified baseline model for 6D object pose estimation and tracking, won first place in the BOP classification for model-based pose estimation of invisible objects.
Learn more about NVIDIA Researchwhich has hundreds of scientists and engineers around the world. NVIDIA’s research teams focus on topics such as AI, computer graphics, computer vision, self-driving cars, and robotics.