Apple responds to allegations of using YouTube videos to train Apple Intelligence

Apple’s response follows an investigative report on a public dataset that companies such as Apple, NVIDIA, and others of relative size used to train artificial intelligence models. The report found the dataset contained copyrighted IP, such as YouTube video transcripts or subtitles from the most popular creators on the platform.

Apple responds to allegations of using YouTube videos to train Apple Intelligence 654566

VIEW GALLERY – 3 IMAGES

The report from Proof News alleges subtitles from more than 170,000 YouTube videos across more than 48,000 channels were scraped and found within the dataset known as Pile. The report also found AI companies stating in their research papers the use of the Pile dataset to train certain models. For example, Apple used the Pile dataset to train OpenELM, a new AI model that was released in April, which is only weeks before the company officially unveiled Apple Intelligence.

The close time difference between the release of OpenELM and the Apple Intelligence announcement, combined with the mounting controversy surrounding copyrighted IP AI training, resulted in the assumption that the AI model powering Apple Intelligence was trained on YouTube video transcripts. However, that doesn’t seem to be the case, as Apple has informed 9to5Mac that Apple Intelligence doesn’t use the OpenELM model and went as far as to say OpenELM doesn’t power any of its AI or machine learning features.

Apple responds to allegations of using YouTube videos to train Apple Intelligence 615651165

Furthermore, Apple said OpenELM was created purely for research purposes and a way for Apple to give back to open-source large language model development. Apple also stated it has no plans to create future iterations of the OpenELM model.

It should be noted that Apple, NVIDIA, and any other company that used the Pile dataset to train its AI models didn’t download the YouTube video transcripts, as the dataset was compiled by non-profit EleutherAI for academic purposes. However, this issue of AI companies acquiring datasets from third-parties and later discovering those datasets contain copyrighted information raises the question as to who is responsible for the infringement.

As for Apple, the Cupertino company previously stated Apple Intelligence was trained on “licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler.

Important AI-Copyright IP Questions

Do AI companies have a duty of care when it comes to knowing what data is being used to train AI models that will then be used commercially? And is the AI company responsible if that data is later found to be copyrighted IP? Can the training with this data be removed from the AI, or is it simply too late? Moreover, what happens when public academic datasets partially containing copyrighted IP are downloaded and commercialized?

Source link

Apple responds to allegations of using YouTube videos to train Apple Intelligence #Apple #responds #allegations #YouTube #videos #train #Apple #Intelligence

Source link Google News

Source Link: https://www.tweaktown.com/news/99398/apple-responds-to-allegations-of-using-youtube-videos-train-intelligence/index.html

Apple responds to allegations of using YouTube videos to train Apple Intelligence

Apple’s response follows an investigativ… – BLOGGER – WP1, allegations, Apple, Intelligence, responds, Train, videos, YouTube

Author: BLOGGER