Close Menu
timesmoguls.com
  • News
  • Entertainment
  • Politics
  • Business
  • Tech
  • Lifestyle
  • Health
  • Science
  • Sports
Featured

The dead gray whale washes on the beach of the island of Vancouver in the west of Vancouver – British Columbia

Some mayors of Ontario against the financial incentives of communities to attract doctors

Powerschool Hack: School councils face a new ransom of months after the flight

Subscribe to Updates

Get the latest news from timesmoguls.

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and services
Facebook X (Twitter) Instagram Pinterest
timesmoguls.com
Contact us
HOT TOPICS
  • News
  • Entertainment
  • Politics
  • Business
  • Tech
  • Lifestyle
  • Health
  • Science
  • Sports
timesmoguls.com
You are at:Home»Technology»This is where the data needed to create AI comes from
Technology

This is where the data needed to create AI comes from

December 21, 2024012 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Data Train2a.jpg
Share
Facebook Twitter LinkedIn Pinterest Email

Their discoveries, shared exclusively with MIT Technology Reviewreveal a worrying trend: AI data practices risk massively concentrating power in the hands of a few dominant technology companies.

In the early 2010s, data sets came from a variety of sources, says Shayne Longpre, an MIT researcher who is part of the project.

This information came not only from encyclopedias and the Web, but also from sources such as parliamentary transcripts, telephone calls, and weather reports. Back then, AI datasets were specifically curated and collected from different sources to address individual tasks, Longpre explains.

Then transformers, the architecture behind language models, were invented in 2017, and the AI ​​industry began to see its performance improve as models and datasets were bulky. Today, most AI datasets are built by indiscriminately sucking material from the Internet. As of 2018, the web has been the dominant source of datasets used in all media, such as audio, images and video, and a gap between retrieved data and more curated datasets has emerged and s ‘is expanded.

“In basic model development, nothing seems to matter more to capabilities than the scale and heterogeneity of the data and the web,” says Longpre. The need for scale has also massively driven the use of synthetic data.

Recent years have also seen the rise of multi-modal generative AI models, capable of generating videos and images. Like large language models, they need as much data as possible, and the best source for this has become YouTube.

For video models, as you can see in this chart, over 70% of the data in both voice and image datasets comes from a single source.

This could be a boon for Alphabet, the parent company of Google, which owns YouTube. While text is distributed across the web and controlled by many different websites and platforms, video data is extremely concentrated on a single platform.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleFederal government restricts flights over New York infrastructure (Hochul)
Next Article Daily KosCartoon: Incredible Global News 2024Daily Kos is a progressive and independent news media. We don’t have billionaire backers or big corporate sponsors: we have you… 9 hours ago

Related Posts

Mower Technology increases $ 21 million series B

May 9, 2025

How a 5000 -year -old technology, policy and culture have led to modern wealth inequalities

May 9, 2025

Scientists develop advanced MRI technology to diagnose aortic stenosis

May 8, 2025
Add A Comment
Leave A Reply Cancel Reply

We Are Social
  • Facebook
  • Twitter
  • Instagram
  • YouTube
News
  • Business (1,497)
  • Entertainment (1,498)
  • Global News (1,605)
  • Health (1,439)
  • Lifestyle (1,426)
  • Politics (1,331)
  • Science (1,422)
  • Sports (1,458)
  • Technology (1,440)
Latest

The attacks of the new Pope Leo Robert Prévost against the “homosexual lifestyle” come back to haunt him

Ut Health Science Center to obtain an 821 students diploma during the beginning of spring

The dead gray whale washes on the beach of the island of Vancouver in the west of Vancouver – British Columbia

Featured

The attacks of the new Pope Leo Robert Prévost against the “homosexual lifestyle” come back to haunt him

Ut Health Science Center to obtain an 821 students diploma during the beginning of spring

The dead gray whale washes on the beach of the island of Vancouver in the west of Vancouver – British Columbia

We Are Social
  • Facebook
  • Twitter
  • Instagram
  • YouTube
News
  • Business (1,497)
  • Entertainment (1,498)
  • Global News (1,605)
  • Health (1,439)
  • Lifestyle (1,426)
  • Politics (1,331)
  • Science (1,422)
  • Sports (1,458)
  • Technology (1,440)
© 2025 Designed by timesmoguls
  • Home
  • About us
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and services

Type above and press Enter to search. Press Esc to cancel.