30 likes | 43 Views
Open AI updates data usage policies after data breach, states content sent via API not used to train LLMs; ChatGPT not included in the update.
E N D
ChatGPT, Translation, and Confidentiality — ‘We May Use the Data’ There are no specific terms of use for Open AI’s consumer services, which is how the company classifies ChatGPT. The answer to what happens to content submitted to ChatGPT for translation is not found in the user terms of service, as most people would expect. Instead, it is found in Open AI’s Data Control Frequently Asked Questions (FAQs) and various linked documents. The terms of use, or what the company calls data usage policies, govern its API services. These policies have changed since March 2023, which is when Open AI confirmed a data breach caused by a bug in ChatGPT’s source code. Compelled by public criticism for the breach, the company updated the policies to address data confidentiality and security concerns. On Open AI’s end, the policy regarding data submitted by customers via its API is that it will not be used to train or improve the models.
Open AI stated that a vulnerability in the Redis open-source library used by ChatGPT caused some active users’ chat history to become visible to other users active at the same time. It also acknowledged that some payment information from premium users was leaked in March as well, but played down the potential consequences of this breach. Enter at Your Own MT Risk One of the documents linked in the data usage policies is a general statement of how data is used when transmitted across its consumer services: “When you use our non-API consumer services ChatGPT or DALL-E, we may use the data you provide us to improve our models. You can switch off training in ChatGPT settings (under Data Controls) to turn off training for any conversations created while training is disabled …” No distinction is made between these policies for the free and the paid subscription service, called ChatGPT Plus. The paid version just makes the service available in high demand, and it claims to be faster and to offer priority access to new features. For API usage OpenAI states in its Data Usage Policies, “OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data. Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).” Language translation is just one of the many tasks ChatGPT is capable of performing, and there is no specific mention of content submitted for that purpose in the updated data usage policies. Upon searching in the Support area of Open AI’s site (called “Advice and answers from the OpenAI Team”) to see if there are any specific mentions, users are redirected to the Data Controls FAQs. Lock That Door It is still early in the LLM evolution to see a large-scale use of its translation capabilities, but there have been some early integrations with translation management systems, which will depend on robust encryption and safety features like two-factor authentication to secure this data.
As an example of easy yet risky access, ChatGPT was also being used by Samsung employees for translation and other tasks until the company’s leadership prohibited use of the AI tool altogether in April 2023, citing security concerns. Unfortunately for the general [non-paying] public, sensitive information cannot be considered secure when submitted to free translation services. In an example that precedes LLM-served translation by a few years, after confidential texts that were submitted to Translate.com’s free service popped up in search engines like Google and Microsoft in 2017, the company admitted that translations were “sent to our community to improve accuracy.” Other providers, like DeepL Translate, have also made the news regarding questions around data management. DeepL has separate terms and conditions for the free MT product and the Pro version. Under a section titled “Processing of the submitted Texts” (a header that looks like unedited German into English MT), the terms for the free version state that content uploaded for translation, as well as the translations generated and post edited, are processed for an unspecified amount of time to train neural networks and translation algorithms. What these MT providers have in common with ChatGPT, as far as translation is concerned, is that users are made responsible for their use of data, confidential or otherwise. In all cases, users are also allowing companies to use data for various purposes unless they opt out.