EDPB: Report of the work undertaken by the ChatGPT Taskforce. The report outlines the common denominator agreed upon by the EU SAs in interpreting the GDPR regarding OpenAI's ChatGPT. It also contains a questionnaire that different SAs used when interacting with OpenAI. Key takeaways: - One of the most crucial takeaways is that companies should not rely on technical impossibility when implementing GDPR requirements. The EDPB here mentions data protection by design requirements, which state that the GDPR requirements should be considered when determining the means for processing and at the time of the processing itself. That said, the recently raised issues of the possibility of addressing the data rectification requests by LLMs or other concerns like using anonymization as something that might make the data useless for improving services should be considered from the beginning of building a model and service; - When relying on the legitimate interest legal basis, remember that adequate safeguards play a special role in reducing the undue impact on data subjects and can, therefore, change the balancing test in favor of the controller (e.g., define precise collection criteria; ensure that certain data categories like special categories of data are not collected or removed after data collection; delete or anonymize personal data that has been collected via web scraping before the training stage, etc.); also, when you use prompts for training purposes, a sufficient level of transparency to users about this is another crucial factor when performing LIA; - Do not transfer the responsibility of complying with the GDPR to your users by stating in your Terms or other user-facing documents that users are responsible for their inputs in prompts (data put in your system is your responsibility). For example, here's how Google Gemini puts it here: https://lnkd.in/gjkxwjgF: Please don't enter confidential information in your conversations or any data you wouldn't want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies; - To comply with the transparency requirements, the user should be informed that the generated text, although syntactically correct, may be biased or made up. Still, it doesn't mean this should become an argument of not complying with the data accuracy principle (the report says you must abide by it in any case); - It will also be helpful to check the questionnaire at the end of the report and understand where the gaps might be (e.g., have your DPIAs, LIAs, and purpose compatibility assessments in good shape); also, check your contracts with the processors or joint controllers, as you might need to show [if you are the sole controller] how you ensured that no other party (for example, another company) decides the purposes and means regarding the processing of personal data in the context of the LLM and software infrastructure. #GDPR #AI #privacy
Vadym Honcharenko’s Post
More Relevant Posts
-
”🚨 Problems: ➡ If they are relying on legitimate interest, transparency obligations must be respected. ➵ Article 14.5 (b) of the GDPR says that in cases where it's not possible to notify data subjects of the information being processed (such as in the context of scraping), "the controller shall take appropriate measures to protect the data subject’s rights and freedoms and legitimate interests, including making the information publicly available." - ChatGPT's data is not publicly available. - Unlike a search engine, the data subject cannot exercise simple data subject rights, for example, the right to be forgotten. ➡ They are not anonymizing personal data, as the system still outputs information about people. ➡ They are closing various licensing deals with websites such as Reddit and other platforms that contain personal data. Obviously, they are not avoiding personal data and don't plan to do so. 🚨 Legitimate interest has been totally distorted here. Either the EDPB says that legitimate interest works differently for OpenAI and other AI companies relying on scraping to train AI (and explain why), or they require them to comply with legitimate interest according to the GDPR.” #cleandatarestart
Co-founder of the AI, Tech & Privacy Academy, LinkedIn Top Voice, Ph.D. Researcher, Polyglot, Latina, Mother of 3. Subscribe to my AI policy & regulation newsletter (29,700+ subscribers)
🚨 BREAKING: The EDPB has just published its ChatGPT Taskforce Report, and there is a big 🐘 ELEPHANT IN THE ROOM 🐘. Read this: ➡ On web scraping and "collection of training data, pre-processing of the data and training," the report recognizes that OpenAI relies on legitimate interest to collect and process personal data to train ChatGPT (OpenAI states that in a hidden page in its Help Center, as I've discussed previously). On that, the report says: 1. "It has to be recalled that the legal assessment of Article 6(1)(f) GDPR should be based on the following criteria: i) existence of a legitimate interest, ii) necessity of processing, as the personal data should be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed and iii) balancing of interests (...)" and also 2. "As already stated by the former Article 29 Working Party, adequate safeguards play a special role in reducing undue impact on data subjects and can therefore change the balancing test in favor of the controller. While the assessment of the lawfulness is still subject to pending investigations, such safeguards could inter alia be technical measures, defining precise collection criteria and ensuring that certain data categories are not collected or that certain sources (such as public social media profiles) are excluded from data collection. Furthermore, measures should be in place to delete or anonymise personal data that has been collected via web scraping before the training stage." ➡ What they are basically saying is that scraping to train ChatGPT under legitimate interest might be possible if technical measures are used, such as the ones described above. 🚨 Problems: ➡ If they are relying on legitimate interest, transparency obligations must be respected. ➵ Article 14.5 (b) of the GDPR says that in cases where it's not possible to notify data subjects of the information being processed (such as in the context of scraping), "the controller shall take appropriate measures to protect the data subject’s rights and freedoms and legitimate interests, including making the information publicly available." - ChatGPT's data is not publicly available. - Unlike a search engine, the data subject cannot exercise simple data subject rights, for example, the right to be forgotten. ➡ They are not anonymizing personal data, as the system still outputs information about people. ➡ They are closing various licensing deals with websites such as Reddit and other platforms that contain personal data. Obviously, they are not avoiding personal data and don't plan to do so. 🚨 Legitimate interest has been totally distorted here. Either the EDPB says that legitimate interest works differently for OpenAI and other AI companies relying on scraping to train AI (and explain why), or they require them to comply with legitimate interest according to the GDPR. ➡ Read the report below #AI #ChatGPT #GDPR #AItraining #AIregulation #AIpolicy
To view or add a comment, sign in
-
Noyb has filed a complaint about ChatGPT with the Austrian Data Protection Authority. https://lnkd.in/guPNpg6K From the complaint: "ChatGPT cannot correct information, cannot selectively block information and any data subject must simply life [sic] with that situation – according to the controller. ChatGPT seems to take the view that it can simply spread false information and is (other than any media company or other controller) not liable for it. " Basically, OpenAI says they can't block some personal data (in this case a fabricated birthday) without blocking all data about that person. The fundamental issue here is of course hallucinations, and every LLM will have this issue. When these hallucinations are applied to individual people there's trouble brewing. This particular part of the complaint gets at the catch-22 inherent in the issue: "The controller does not seem to have any option to actually correct false information, but can only 'hide' at the final output stage of the processing. Even if all data would be blocked, the false information would still be present in the system – just not shown to users." "Present in the system" is a strange way to put this... because of course the issue is the erroneous birthday wasn't in the system at all but was made up on the spot. It's the "gen" in GenAI that so often gets glossed over. It's a lack of data in the first place causing offending output. If OpenAI was somehow able to remove an individual's personal data from the trained model, it'd be more likely to then produce incorrect data on that person. GDPR requires that personal data be accurate, but this is at fundamental odds with how LLMs work. There's no easy fix here, and "ChatGPT can make mistakes. Consider checking important information." isn't going to cut it.
To view or add a comment, sign in
-
How to configure #ChatGPT from OpenAI to be as #GDPR-compliant as possible and what else you need to consider for data protection. 👇 The integration of ChatGPT into company processes offers opportunities to increase efficiency and innovation, but also harbours data protection challenges. Companies are therefore required to take proactive measures to ensure data protection compliance when using the popular AI chatbot. 👉 Our expert Vasiliki Paschou explains in practical terms what you should bear in mind when using ChatGPT in your company. 👇
How to use ChatGPT in compliance with the GDPR | activeMind.legal
https://www.activemind.legal
To view or add a comment, sign in
-
EDPB ChatGPT Task Force issues its report. Quick summary of some key emerging themes below. Importantly, the report is non-binding, with just preliminary views. 1. “Technical impossibility” cannot be invoked to justify non-compliance. 2. Transparency: The A14(5)(b) impossible / disproportionate effort exception could apply (consistent with what the ICO has been saying). Important to tell direct users that their content (prompts, uploaded files & feedback) may be used for training. 3. Legitimate interests remains viable (i.e. they have not ruled it out, which is a positive given the very narrow approach the Dutch SA is adopting). In assessing legal basis, useful to split into: I. collection of training data, including scraping or reuse of datasets II. pre-processing, including filtering III. training IV. prompts and model outputs V. training model with prompts 4. SCD: unsurprising reiteration that public accessibility does not imply “manifestly made public” for the purpose of Art 9(2) GDPR - the individual must have intended, explicitly and by clear affirmative action, to make data accessible to general public. 5. Welcome acceptance that case by case examination of scraped data is not possible (specifically in context of SCD). There’s a focus on filtering pre-collection (precise collection criteria and exclusion of sources like public social media profiles) and post-collection (deletion/anonymisation) particularly. This is similar to what the CNIL has been saying. However, the Task Force did not take the opportunity to go as far as the CNIL in giving comfort to controllers about inadvertent residual SCD processing. 6. Accuracy: a. Distinction between inputs and outputs and regulatory recognition that provision of factually accurate information is not the purpose. b, Focus on need to provide proper information on probabilistic output creation mechanism and limited level of reliability, including explicit reference to possibility generated text may be biased or made up. c. Transparency measures are beneficial, but indication that regulators may not see them as sufficient to ensure compliance with accuracy principle. 7. Data subject rights: strong focus on ability of individuals to exercise rights in easily accessible manner. https://lnkd.in/ey6DMX6m
edpb_20240523_report_chatgpt_taskforce_en.pdf
edpb.europa.eu
To view or add a comment, sign in
-
Microsoft-backed (MSFT.O), opens new tab startup OpenAI on Monday found itself the target of a privacy complaint by advocacy group NOYB for allegedly not fixing incorrect information provided by its generative AI chatbot ChatGPT that may breach EU privacy rules. ChatGPT, which kickstarted the GenAI boom in late 2022, can mimic human conversation and perform tasks such as creating summaries of long text, writing poems and even generating ideas for a theme party. NOYB said the complainant in its case, who is also a public figure, asked ChatGPT about his birthday and was repeatedly provided incorrect information instead of the chatbot telling users that it does not have the necessary data. The group said OpenAI refused the complainant's request to rectify or erase the data, saying that it was not possible to correct data and that it also failed to disclose any information about the data processed, its sources or recipients. NOYB said it had filed a complaint with the Austrian data protection authority asking it to investigate OpenAI's data processing and the measures taken to ensure the accuracy of personal data processed by the company's large language models. "It's clear that companies are currently unable to make chatbots like ChatGPT comply with EU law, when processing data about individuals," Maartje de Graaf, NOYB data protection lawyer, said in a statement. If a system cannot produce accurate and transparent results, it cannot be used to generate data about individuals. The technology has to follow the legal requirements, not the other way around," she said. In the past, OpenAI has acknowledged the tool’s tendency to respond with "plausible-sounding but incorrect or nonsensical answers," an issue it considers challenging to fix. www.poshcomply.com
To view or add a comment, sign in
-
-
💡 The GDPR grants data subjects with rights regarding their personal data. Helping data subjects to exercise these rights is the sacred duty of every controller. Are these rights exercisable if a controller processes personal data based on a large language model (LLM)? At this stage of development of AI systems with LLMs, the answer to this question is rather NO❌ than YES✅. And this may become a significant challenge in terms of GDPR compliance for AI system developers and many companies that have already enriched their great services with AI. ⚡ noyb, well-known data privacy activists have significantly raised the level of discussion on this issue by filing a complaint with Datenschutzbehörde (DSB), the Austrian data protection authority, against OpenAI as a GhatGPT's controller: https://lnkd.in/eZqvQWmk. 👉 7 key points from the complaint (see PDF for details, https://lnkd.in/dqdFrYG4): 1️⃣ When asked for a specific person's date of birth, ChatGPT produced different and incorrect answers each time. 2️⃣ Seemingly, ChatGPT was just making them up. 3️⃣ The data subject sent OpenAI requests for access and erasure. 4️⃣ OpenAI did not provide any data from the LLM in response to the access request. What data was used to train the algorithm remains unclear. 5️⃣ In response to the erasure request, OpenAI explained that it is impossible to make ChatGPT stop giving incorrect information only about the date of birth. 6️⃣ The only thing that can be done is to stop giving all information about a person. 7️⃣ However, the LLM will still have the data, it will just be filtered in the responses. In short, ChatGPT is "hallucinating" about personal data, and nothing can be done about it yet🙊. ⚔ noyb asks the DSB to declare a violation of Art. 12(3), 15, and 5(1)(d) of the GDPR by OpenAI, impose corrective measures, and a fine. We will keep an eye on this extremely interesting case. ⚖ Regarding "hallucinations" of LLMs, we recently wrote about the European Commission's requests to Microsoft in the context of Digital Services Act (DSA) compliance for Bing's AI-enabled features: https://lnkd.in/eiHsYXjC. 💡 If you have any questions about the GDPR or the development or implementation of AI systems in a compliant manner, we at ESPE are always ready to help: legal@especg.com📨, https://especg.com/, www.especg.com. You can learn more about the ESPE's services on our website: 🔹 Data Protection: https://lnkd.in/e56pvSut, 🔹 Legal Support for AI Development: https://lnkd.in/exJvWeec. #noyb #datasecurity #aiact #ai #artificialintelligence #dataprivacy #intellectualproperty #gdpr #expertconsulting #ChatGPT #LLM #OpenAI #inspiration #ESPE
ChatGPT provides false information about people, and OpenAI can’t correct it
noyb.eu
To view or add a comment, sign in