Apple, NVIDIA and Anthropic allegedly used YouTube videos illegally to train their AI models – Firstpost

Transparency regarding the sources of data used to train AI models has been lacking among AI companies. Recently, criticism was aimed at Apple for not disclosing the origin of the training data used for Apple Intelligence. Image Credit: Apple, Reuters

A recent investigation by Proof News has uncovered that some of the world’s largest tech companies used transcripts from over 173,000 YouTube videos to train their AI models without obtaining permission. The dataset, compiled by EleutherAI, a nonprofit organization, includes transcripts from more than 48,000 YouTube channels and was utilised by companies like Apple, NVIDIA, and Anthropic.

This investigation sheds light on a troubling aspect of AI technology: much of its development relies on data taken from content creators without their consent or compensation.

The dataset consists solely of video transcripts, not actual videos or images, from notable creators such as Marques Brownlee and MrBeast, along with major news outlets like The New York Times, BBC, and ABC News.

Marques Brownlee expressed concern on social media, noting that his data, among others’, had been scraped from YouTube videos without proper authorization.

According to a spokesperson from Google YouTube CEO Neal Mohan has previously stated that using YouTube data to train AI models violates the platform’s terms of service. Despite this, Apple, NVIDIA, Anthropic, and EleutherAI declined to comment on the matter.

Transparency regarding the sources of data used to train AI models has been lacking among AI companies. Recently, criticism was aimed at Apple for not disclosing the origin of the training data used for Apple Intelligence, its upcoming generative AI platform set to launch on millions of devices this year.

YouTube, renowned as the world’s largest repository of videos, offers not just transcripts but also audio, video, and images, making it a highly desirable dataset for training AI models.

Earlier this year, Mira Murati, OpenAI’s chief technology officer, avoided discussing whether YouTube videos were used to train Sora, OpenAI’s upcoming AI video generation tool, when questioned by The Wall Street Journal. Murati mentioned that the data used was publicly available or licensed.

Alphabet CEO Sundar Pichai reiterated that using data from YouTube to train AI models violates the platform’s terms of service.

Latest News

Find us on YouTube

Subscribe

Source link

Apple, NVIDIA and Anthropic allegedly used YouTube videos illegally to train their AI models – Firstpost #Apple #NVIDIA #Anthropic #allegedly #YouTube #videos #illegally #train #models #Firstpost

Source link Google News

Source Link: https://www.firstpost.com/tech/apple-nvidia-and-anthropic-allegedly-used-youtube-videos-illegally-to-train-their-ai-models-13793872.html/amp

Apple, NVIDIA and Anthropic allegedly used YouTube videos illegally to train their AI models – Firstpost

Transparency regarding the sources of da… – BLOGGER – WP1, allegedly, Anthropic, Apple, Firstpost, Illegally, Models, Nvidia, Train, videos, YouTube

Author: BLOGGER