Guidelines on Cosmetic Efficacy Testing on Humans. Ethical, Technical, and Regulatory Requirements in the Main Cosmetics Markets

15 July 2016

Ethical, Technical, and Regulatory Requirements in the Main Cosmetics Markets

Vincenzo Nobile^*

Farcoderm s.r.l, Member of Complife Group, Via Mons Angelini, 21, 27028 San Martino Siccomario, Pavia, Italy

Via Mons Angelini, 21
27028 San Martino Siccomario, Pavia, Italy
Tel: +39-0382-25504
E-mail: vincenzo.nobile@farcoderm.com

Received: January 18, 2016 Accepted: February 05, 2016 Published: February 07, 2016

Citation: Nobile V (2016) Guidelines on Cosmetic Efficacy Testing on Humans. Ethical, Technical, and Regulatory Requirements in the Main Cosmetics Markets. J Cosmo Trichol 2:107. doi:10.4172/jctt.1000107

Copyright: © 2016 Nobile V. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Introduction

The protection of Consumers from misleading claims concerning efficacy and other characteristics of cosmetic products are the central core of the worldwide regulatory framework. Cosmetic products are required to be effective when used by Consumers under normal, labelled or foreseeable conditions of use [1]. In Europe, the regulation CE 655/2013 [2] clearly states “Claims for cosmetic products, whether explicit or implicit, shall be supported by adequate and verifiable evidence regardless of the types of evidential support used to substantiate them, including where appropriate expert assessments”. The evidential support for cosmetic claims should take into account the state of the art practices, studies should be relevant to the product and to the benefit claimed, shall follow well-designed and wellconducted methodologies (valid, reliable and reproducible), and shall respect ethical considerations.

However, despite the increase of regulatory requirements only a few of standards and guidelines exist (Table 1). This difficulty, which has been noticed for quite some time, needs an international solution or agreement to standardize the technical requirements of cosmetic efficacy testing studies on humans. The EEMCO-group (European group of efficacy measurement of cosmetics and other topical products) has, from the mid 1990 published a variety of “so called” guidelines for several different skin measurement parameters [8–18]. This manuscript will introduce the ethical, technical and regulatory requirements to which cosmetic efficacy testing should be inspired

…/…

Technical requirements

In order to promote sound scientific design of research, each cosmetic efficacy testing study protocol should include a series of detailed information on how to carry out the study (Table 3) [23].

The saying “the claim dictates the test” is a good starting point when considering whether and what type of clinical study is needed. Different study designs provide different strength and limitation. The gold standard of clinical study designs is the randomized controlled study (RCT). According to this study design, treatments (e.g. active ingredient and placebo or active ingredient and benchmark) are allocated to subjects in a random and unpredictable sequence. However, if this is not feasible, then an observer-blind or single-blind design may be sufficient. In these cases, the assessor or subject is unaware of the test product assignment. In order to reduce the variability due to differences between individuals, designs with intraindividual comparison (e.g. half-face/half-body application, multiple applications on forearm, etc.) of test products are generally preferred in cosmetic testing, if feasible. A random allocation of the test products or untreated control/placebo product should be used in these designs. Blinding is also one of the major concerns in study designing. In addition to RCTs there is a number of study types that the investigator can choose in the clinical trial field that are also applicable to cosmetic efficacy studies. The choice of the study design and type should take into account the nature of the claim (e.g. ingredients claims, performance claims, sensory/aesthetic claims, combination claims, comparison claims) and the strength of the study design as related to consumer expectations. The main types of studies used in the cosmetic efficacy testing field include: i) sensory properties studies, ii) consumer studies, iii) expert grading studies, and iv) instrumental measurement studies.

The sensory properties of a product are fundamental in cosmetic science and can help in the understanding of consumer perception related to consumer needs and claimed benefits. A sensory property contributes substantially to whether a product is liked and thus used by consumers. For example, if a cosmetic product is perceived as unpleasant to touch, it is unlikely to be used voluntarily even if it is potentially beneficial to skin health. Additionally, some skin care products are designed such that their primary benefit is perceptual as opposed to tangible, benefits to the skin. The assessment method used will depend on the sensory attributes being examined and the claim required [24]. Trained panels of volunteers with high levels of sensory acuity can define the language and descriptors of key performance attributes of products [25]. Trained panels are usually valuable in prototype testing, comparative properties, and market comparison. However, trained panels assessment does not necessarily equate to consumer preferences. Alternatively, naïve panels can provide useful spontaneous responses to product concepts. In both cases regulator’s confidence for this type of testing is from low to very low.

Consumer studies are primarily used to mimic the consumer’s response for the cosmetic product (e.g. self-perceived efficacy, sensory properties, consumer’s attitudes toward buying products, and consumers purchase intentions for the products). Consumer’s studies are performed in real life conditions on a representative panel of the target population. The test product should be supplied in an anonymous pack in order to avoid any bias related to the brand strength (“halo” effects). Consumer preferences are reported then, after a variable period of use, on a self-assessment questionnaire; online surveys, can also carried out. One of the frequent challenges of consumer’s studies is that the desired claim has not been captured exactly within the question choice. Sample size is another major concern. Other important considerations include questionnaire design/layout, avoiding leading questions, and ensuring balance in scale of response. Consumer studies are distrusted by many regulators and should be not used alone for efficacy testing but as a part of an expert grading or an instrumental study.

Expert grading is carried out by a professional (dermatologist, make-up artist, hairdresser, etc.) on a variety of characteristics [26–28]. Digital pictures of the test area should be taken under standard and reproducible light conditions. Picture scoring should be carried out by the expert randomly and under blind conditions. The level of expertise must be consistent within any given study and training or other validation essential whether conducted by a dermatologist, ophthalmologist or non-clinical scientist conducting the study. One of the frequent challenges of expert grading studies is the reference scale used for scoring. As a general rule, scoring scales should take into account consumer perception and the visibility of the effect. Welldesigned studies based on expert grading are accepted by regulators.

Instrumental measurements were made possible by the birth of bioengineering techniques and has continued to grow in importance in assessing skin characteristics [8,10–15,18]. The technology of these instruments is constantly being updated and their accuracy is further improving. The area of skin imaging has exploded in recent years with 2D and 3D image analysis devices and software readily available to quantify features such as size of pores; eye bags; facial wrinkles; scalp hair; cellulite [29–36]. All of these approaches provide quantitative results that can be further exploited as percentage variation (e.g. increased moisturization by 20%, reduction of wrinkles by 30%, etc.). This type of approach has been criticized on the basis that changes measured can be too small to be perceptible by the consumer. In order to improve the relationship between the instrumental measurement and the claim and its relevance for the consumer, expert grading or consumer testing (e.g. self-assessment questionnaire) should be always co-tested.

The choice of the “right” study population is critical to the study. Investigators should be inclusive in selecting participants and inclusion and non-inclusion criteria should be clearly defined in the study protocol. Inclusion and non-inclusion criteria have the joint goal of identifying a population in which it is relevant to assess the impact of the cosmetic product use on outcomes. The design of the inclusion criteria should take into account: i) the target population (those who were intended to use the cosmetic product), ii) the maximization of the generalizability of the study finding, iii) the complexity and cost of recruitment. For example, if the outcome of interest is related to the assessment of the moisturizing efficacy, it is necessary to enrol subjects showing the clinical signs related to skin dryness or to skin tendency to be dry. Borderline, altered and/or pathological skin conditions (e.g. xerosis) should be avoided since this is not the field of application of cosmetics. On the same way, the design of the non-inclusion criteria should be parsimonious because unnecessary exclusions may diminish the generalizability of the results, make more difficult to recruit subjects and increase the complexity and cost of recruitment [37]. Particular attention should be paid to stratification by particular characteristics (e.g. age, sex, cosmetic preferences, etc.). Stratification could be desired in certain study designs (e.g. consumer studies) and undesired in other study designs. When stratification is not properly addressed study findings are related to the extent of the stratification limiting the generalizability of the study findings. At baseline, investigators should collect enough information (e.g. age, gender, ethnicity, skin conditions) in order to describe the study participant and to helps other to judge the generalizability of the findings. On the other side a well-documented baseline description of the study participants allows the comparability of the study groups (for parallel study design) or between studies.

Another important concern related to study population is the sample size. Elements of sample size calculation are: i) the estimated outcome, ii) the α (type I) error level, iii) the statistical power (β or type II error level), and iv) the standard deviation of the measurement [38]. Often, estimated outcomes and standard deviation of the measurement can be obtained from historical data obtained with similar measurement procedures, since expected effects are often similar within a similar range. When no existing data are available, a pilot study should be conducted. Type I and Type II errors are usually set at 0.05 (5% chance of a “false positive”) and ≤0.2 (20% chance of a “false negative”; at least 80% statistical power) respectively. Studies with a limited and/or inadequate sample size may produce misleading conclusions, are time and money consuming and unethical.

The outcome measurements should take into account the main efficacy claim or safety issue to be addressed by the study. The choice of the outcome measurements has an impact on the feasibility of the study in answering the question as well as on cost. Studies should include several measurements to increase the study robustness and the opportunity for secondary analysis. However one or two outcomes, or “primary endpoint(s)”, must be chosen to assess the extent of the product effect and to give proof of the claimed effect. For example, the efficacy of a moisturizing product is properly assessed by the measurement of the stratum corneum water content; while the measurement of the product effect on skin barrier, measured by means of evaporimetry, can be used to assess the mechanism of action of the product.

Products can be applied by the investigator (controlled, short-term tests under the expert supervision) or can be used at home (long-term “use test”). Products are applied by the investigator to assess products efficacy under standardized/controlled conditions. The quantity of product applied should be calculated based on the estimated daily exposure for the cosmetic [39–41] and skin surface area [41] in order to calculate the rate of product application (in mg/cm²). Usually in short-term tests products are applied to a 2 mg/cm² application rate based on the application rate indicated by an ISO standard to assess the sun protection factor [5]. In the long-term use test, products should be used at home by subjects under real-life conditions of use. Products way of use and frequency of use should be shared with the Sponsor of the study and close to the real and normal condition of use. The compliance of subjects to treatment should be carefully monitored. Techniques to assess products vary from very simple techniques (e.g. product weight assessment at the end of the study period, questionnaire) to very complicate and expensive monitoring systems (e.g. recoding cameras during product application).

The characteristics of the laboratory in which the cosmetic products are clinically tested have a large impact on the overall quality of the results. The choice of the testing laboratory is then crucial to have a robust study. Table 4 reports the minimum requirements to be checked during an official audit to the testing laboratory facility(ies). Beyond ISO 9001 certification, particular attention should be paid on the existence of a well-documented list of standard operative procedures. Standardization of the measurement technique is fundamental in order to have reproducible and reliable data. Instrument-, environmental-, and individual-related variability can affect the reproducibility of the measurement and then the accuracy of the obtained data. A detailed description of testing laboratory requirements is discussed by Wunderlich [42].

The Cosmetics Testing News