A Data-First Approach for Drug Discovery by Recursion
12 May 2020
Data
We are the only company with a massive dataset of biological images generated entirely in-house on our platform and fit for the purpose of machine learning. A robust and relatable dataset built for machine learning is critical to our ability to derive new algorithms and generate new discoveries. Today we’ve imaged more than 27 billion human cells and generated more than 4 petabytes of biological data… making it the world’s largest dataset of its kind.
Design
Elegant, efficient, and effective design is at the heart of everything we do. Our experimental biologists sit side-by-side with data scientists to design massively parallel screening protocols. Then we carefully choose each experimental parameter to unlock the maximum data from our cellular image datasets, elucidate new biology in an unbiased fashion, and continue to build the world’s largest exquisitely curated repository of biological images.
Execute
Armed with state-of-the-art experimental protocols, our high-throughput screening (HTS) team generates hundreds of thousands of cellular images every week, transferring each one to the cloud in real-time. Our proprietary software leverages the power of computer vision and classic machine learning alongside neural networks to analyze terabytes of this data every week. At the same time, our custom software and utilization of cloud-computing power enable the scalable management and analysis of our image sets.
Our entirely automated approach to experiment execution and data analysis enables us to achieve the industrialization of discovery biology—screening thousands of compounds against hundreds of disease models at a fraction of the cost and in a fraction of the time of other discovery approaches.