I guess this is growing up? —

AI trained on photos from kids’ entire childhood without their consent

Kids "easily traceable" from photos used to train AI models, advocates warn.

Safeguards to keep kids’ data away from AI

When LAION-5B was introduced in spring 2022, it was described as an attempt to replicate OpenAI's dataset and touted as "the largest freely available image-text dataset." With its release, AI researchers cut off from private companies' proprietary datasets had a way to experiment more freely with AI.

Around that time, LAION researchers released a paper that said that LAION anticipated some "potential problems arising from an unfiltered dataset" and "introduced an improved inappropriate content tagging" to make it easier to flag harmful content and update and improve the dataset.

Back when the dataset was publicly available, users were encouraged "to explore and, subsequently, report further not yet detected content and thus contribute to the improvement of our and other existing approaches," the report said.

This is essentially what happened with HRW's report this week and is one reason why LAION sees its dataset as more transparent than other large AI datasets.

"In our opinion, this process is not supposed to be a non-transparent closed-door avenue," LAION's paper said. "It should be approached by [a] broad research community, resulting in open and transparent datasets and procedures for model training."

Other researchers could potentially help flag more URLs linking to real kids' images to keep improving the dataset off the back of HRW's research once the dataset is again publicly available.

When HRW contacted LAION about the images about a month ago, LAION told HRW that AI models trained on LAION-5B could not reproduce kids' personal data verbatim. But acknowledging other privacy and security risks, LAION began removing links to photos from the dataset while also advising that "children and their guardians were responsible for removing children’s personal photos from the Internet." That, LAION said, would be "the most effective protection against misuse."

Han told Wired that she disagreed, arguing that previously, most of the people in these photos enjoyed "a measure of privacy" because their photos were mostly "not possible to find online through a reverse image search.” Likely the people posting never anticipated their rarely clicked family photos would one day, sometimes more than a decade later, become fuel for AI engines.

"Children and their parents shouldn't be made to shoulder responsibility for protecting kids against a technology that's fundamentally impossible to protect against," Han said. "It's not their fault."

Instead, "LAION should take action to prevent the ingestion of children’s personal data into its datasets, and it should also regularly scan for and remove children’s data," Han said.

And lawmakers should urgently intervene to protect children's privacy as AI technologies emerge and proliferate, HRW reported.

In Brazil, legal changes are expected as soon as July.

Last April, the National Council for the Rights of Children and Adolescents published a resolution directing the Ministry of Human Rights and Citizenship "to develop a national policy to protect the rights of children and adolescents in the digital environment within 90 days," HRW reported.

Through that initiative, the children's rights body said that, among other provisions, the policy should specifically cover AI, protect against harassment, establish a right to privacy, and only allow processing of kids' personal data when consent is freely given "in advance" of data collection. Seemingly, that means that soon children could get the right to revoke consent for any AI training on their data, should those provisions be upheld.

In the US, laws have been introduced in Congress to narrowly prevent the spread of non-consensual explicit deepfakes—including those that target children and adults—through the DEFIANCE Act and the “Preventing Deepfakes of Intimate Images Act.” But in Brazil, HRW is advocating for lawmakers to go further and specifically cut kids' personal data entirely out of AI systems.

Brazil's "new policy should prohibit scraping children’s personal data into AI systems, given the privacy risks involved and the potential for new forms of misuse as the technology evolves," HRW recommended. "It should also prohibit the nonconsensual digital replication or manipulation of children’s likenesses. And it should provide children who experience harm with mechanisms to seek meaningful justice and remedy." And Brazil's General Personal Data Protection Law should be updated to adopt "additional, comprehensive safeguards for children’s data privacy," HRW said.

Ars could not immediately reach the Ministry or Han for comment.

This story was updated on June 11 to add comments from HRW researcher Hye Jung Han.

Channel Ars Technica