Interview: Corinne Benoliel, Optimizing Regulatory Compliance in Cosmetics: Navigating AI ? via ZOOM#35

Optimizing Regulatory Compliance in Cosmetics: Navigating AI and RAG tools and Avoiding Hallucinations

AI and Cosmetic Regulation: Key Insights to Date

Artificial Intelligence (AI) is profoundly transforming numerous sectors, and the cosmetics industry is no exception. With the emergence of free and accessible tools, brands, formulators, and even consumers can now leverage this technology. However, this democratization raises a pivotal question: is AI capable of meeting, with the necessary rigor, the requirements of a regulatory framework as stringent as that governing cosmetic products? This interview explores, through a question-and-answer format, the overarching challenges associated with AI and its application within the field of cosmetic regulation.

General Data Relative to AI This information shall be presented in a separate sidebar

1. How would you define the fundamentals of Generative AI?

Artificial Intelligence refers to algorithms capable of analyzing data, learning, and responding to queries formulated by humans. Generative AI further distinguishes itself by generating new data. When engaging with AI, it is essential to master the following terminology:

  • LLM (Large Language Model): An architecture of generative artificial intelligence (neural network) trained on massive datasets to learn how to generate natural language through probabilistic modeling.
  • Prompt: An instruction or a set of data provided by a human to the AI. Its clarity is paramount to ensure the AI responds accurately and addresses the specific problem statement.
  • Chatbot: A computer program designed to simulate and process human conversation. It requires iterative dialogue to “challenge” the system and refine its output.
  • Training: The process by which the AI learns to predict the subsequent word in a sentence based on the preceding context. An AI must undergo rigorous “training” to achieve a satisfactory level of proficiency in the user’s target domain and to generate pertinent responses.
  • Hallucination: A false or misleading response generated by the AI, presented as an established fact.

2. In your view, how should the framework for confidentiality and data protection be defined when interacting with AI?

The primary concern that arises when querying an AI is that of confidentiality; this concept is fundamental. To address this, it is essential to understand the extraterritoriality clause of the 2018 US Cloud Act (Clarifying Lawful Overseas Use of Data). Succinctly stated, this federal law contains provisions that authorize U.S. authorities to access data hosted by American companies, even when such data is stored abroad, including within Europe.

This clause, therefore, stands in direct conflict with the GDPR (General Data Protection Regulation) and European business data protection laws, as it facilitates data transfers without European consent or oversight. Given that most AI systems are hosted on American servers, confidentiality is not guaranteed. Even Mistral, a fully French entity, may utilize American servers hosted on French soil.

To ascertain an AI’s level of confidentiality, it is imperative to consult the Terms of Service (ToS). As these documents are invariably dense and complex, the involvement of legal counsel or cybersecurity specialists may prove necessary.

What are the fundamental principles to observe when utilizing AI?

  • Legal Responsibility: The human author who produces content based on data derived from an AI bears full legal liability.
  • Data Integrity: The user must ensure that sensitive information, confidential data (not in the public domain), or personally identifiable information (PII) is not transmitted within prompts.
  • Due Diligence: Should any doubt arise regarding sensitive data, the AI’s Terms of Service must be reviewed prior to use.
  • Input Dependency: The outputs delivered by the AI are strictly dependent on its input data.
  • Critical Assessment: AI users must critically evaluate the accuracy of the provided data, its transparency (availability of sources), associated confidentiality and security (notably hosting locations), the intellectual property rights of the obtained data, and the justification for utilizing AI in light of its environmental impact.

1. More specifically within the cosmetics sector, how should bibliographic research be conducted using AI?

Artificial Intelligence can facilitate the identification and collation of targeted scientific publications, primarily due to its conversational interface. The user may pose precise queries, refine searches through successive iterations, and rapidly obtain a selection of pertinent documents.

To guarantee the reliability of results, it is essential to utilize an AI equipped with a RAG (Retrieval-Augmented Generation) architecture. This method integrates a large language model with an advanced search engine, enabling the AI to query scientific or regulatory databases in real-time before generating a response. Consequently, the information provided is anchored in referenced sources, thereby mitigating the risks of extrapolation or error.

Caveat: Even when employing RAG, expert intervention remains indispensable to analyze the delivered outputs, systematically verify cited sources, and cross-reference information, as AI systems may occasionally omit granular details or misinterpret complex data. For optimal research, it is advisable to use specialized tools and formulate queries using precise keywords and appropriate filters (dates, document types, etc.).

For instance, in the context of research regarding the sensitizing potential of a cosmetic substance, utilizing an AI with a RAG approach minimizes “hallucinations” by relying on reliable and precise sources—such as PubMed, SCCS opinions, or ECHA data—provided these sources are explicitly specified within the query.

2. In your view, which open-access AI tools could be beneficial within the field of cosmetic regulation?

As a general principle, priority should be given to AI systems that explicitly cite their sources.

Below are several recommended solutions categorized by use case:

Use CaseRecommended Solutions
Information RetrievalLe Chat, ChatGPT, NotebookLM, Perplexity
Bibliographic ResearchConsensus, Elicit, Perplexity, SciSpace AI, Scopus AI, Web of Science Research Assistant
Document DraftingLe Chat, ChatGPT, Claude, Copilot, Gemini, Perplexity, QuillBot
TranslationChatGPT, DeepL, Google Translate, QuillBot
Scientific Manuscript WritingChatGPT, Elicit, Paperpal, Perplexity, QuillBot, Rubriq
Scientific Figure GenerationAdobe Firefly, BioRender, DALL·E 3, Flux 1.1, Gemini, Midjourney, Wolfram Alpha
Computer Code DevelopmentChatGPT, Claude Code, Cursor, GitHub Copilot, Mistral, Perplexity

3. In your view, is it possible to delegate the data collection required for drafting Part A of a Cosmetic Product Safety Report (CPSR) to AI?

The information necessary for completing Part A of the Safety Report is centralized by the Responsible Person. This individual gathers data from suppliers (regarding raw materials and packaging) and internal teams (such as formulators for finished product data).

The collection of this data cannot be fully automated or delegated to an AI, due to its dispersed nature and inherent diversity. Once the documents have been compiled, they may be stored within a secure environment (a corporate server or a French or European hosting provider).

An AI utilizing the RAG (Retrieval-Augmented Generation) method can then extract pertinent information, provided that the prompts are specific and adapted to both the various regulatory chapters and the structural format of the documents (tables, lists, prose, etc.).

For instance, it is more efficient to provide the AI with a precise list of allergens to be identified—according to current regulations—rather than requesting their identification without guidance. Nevertheless, expert intervention remains indispensable to validate the outputs, particularly to ensure that certain allergens (present in non-fragrance raw materials, for example) are correctly identified.

Given the diversity of document formats and the time required for prompt engineering (which necessitates human expertise), it is legitimate to question the actual time-efficiency gain provided by this approach.

4. Could you provide a practical case study of bibliographic research?

To illustrate the aforementioned points, I conducted a test regarding the sensitizing potential of phenoxyethanol (CAS 122-99-6) by providing the AI with the following instruction: “Can you identify the sensitization potential of phenoxyethanol?”. Three open-access AI systems indicated that this preservative is a sensitizer, even specifying it as a skin sensitizer (H317).

Following several iterations and specific guidance on my part toward ECHA and SCCS sources, I obtained the following data:

Phenoxyethanol is not classified as a skin sensitizer in Annex VI of the CLP Regulation, and there are no current intentions to classify it for this hazard. The SCCS, in its opinion SCCS/1628/21, confirms that phenoxyethanol is not considered a skin sensitizer at current concentrations of use, though it recommends maintaining the 1% limit as a precautionary measure.

This approach accelerates the bibliographic review but necessitates the intervention of a human expert to verify the sources and their overall relevance.

In summary, the RAG process employed here allowed the AI to rely on reliable and contextualized data, thereby strengthening the pertinence and traceability of the results. The fundamental principles for information retrieval remain the cross-referencing of sources and the verification of obtained data, while maintaining extreme vigilance regarding the fact that certain AI systems may store and reuse queries.

Contact

Corinne Benoliel
Doctor of Pharmacy, microbiologist and safety assessor
Founder and manager of Institut Scientis
corinne.benoliel@scientis.fr