Researchers find that artificial intelligence models generate photos of real people and copyrighted images

What happened now? Researchers have found that popular imaging models are susceptible to instructions to create recognizable images of real people, which can compromise their privacy. Some hints make the AI ​​copy the image instead of creating something completely different. These altered images may contain copyrighted material. But to make matters worse, current generative AI models can remember and replay private data collected for use in an AI training set.

The researchers collected over a thousand training examples from models that ranged from photographs of individuals to movie stills, images of copyrighted news, and trademarked company logos, and found that the AI ​​reproduced many of them almost identically. Researchers from colleges such as Princeton and Berkeley, as well as from the tech sector such as Google and DeepMind, conducted the study.

The same team worked on a previous study that pointed out a similar problem with AI language models, especially GPT2, the precursor to OpenAI’s hugely successful ChatGPT. Group reunion, a team led by Google Brain researcher Nicholas Carlini. discovered results by providing image captions such as the person’s name in Google Imagen and Stable Diffusion. After that, they checked if any of the generated images matched the originals stored in the model’s database.

A dataset from Stable Diffusion, a multi-terabyte image collection known as LAION, was used to create the image below. It used the title specified in the dataset. An identical image, albeit slightly distorted by digital noise, was obtained when the researchers entered the caption “Stable Diffusion” into the prompt. The team then manually checked if the image was part of the training set after running the same query multiple times.

The researchers noted that the unlearned response might still accurately represent the text that was offered to the model, but would not have the same pixel composition and would be different from any training images.

Professor of Computer Science at ETH Zurich and study participant Florian Tramer noticed significant limitations of the results. The photographs that the researchers were able to extract either recurred frequently in the training data or stood out significantly from the rest of the photographs in the dataset. According to Florian Tramer, people with unusual names or looks are more likely to be “remembered”.

According to the researchers, diffuse AI models are the least particular type of image generation model. Compared to Generative Adversarial Networks (GANs), an earlier class of graphical models, they miss more than twice as much training data. The purpose of the study is to alert developers to the privacy risks associated with distribution models, which include various issues such as the potential for misuse and duplication of copyrighted and sensitive personal data, including medical images, and vulnerability to external attacks when training data can be easily extracted. The fix the researchers propose is to identify duplicates of the generated photos in the training set and remove them from the data collection.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button