Awesome Cytodata – Massive Collection of Resources
A curated list of awesome cytodata resources.
Cytodata refers to a community of researchers and resources involved in the image-based profiling of biological phenotypes.
These biological phenotypes are typically induced by genetic or chemical perturbations and often represent disease states.
Image-based profiling is used to inspect these phenotypes to uncover biological insight including discovering the impact of genetic alterations and determining the mechanism of action of compounds.
This page represents a curated list of software, datasets, landmark publications, and image-based profiling methods.
Our goal is to provide researchers, both new and established, a place to discover and document awesome Cytodata resources.
Contents
Datasets
Annotated datasets, including raw images and processed profiles, for image-based profiling of chemical and genetic perturbations.
Raw Images
- Broad Bioimage Benchmark Collection – The Broad Bioimage Benchmark Collection (BBBC) is a collection of freely downloadable microscopy image sets. In addition to the images themselves, each set includes a description of the biological application and some type of “ground truth” (expected results).
- Image Data Resource – Public repository of image datasets from published scientific studies.
- RxRx1 – RxRx1 is a set of 125,514 high-resolution 512×512 6-channel fluorescence microscopy images of human cells under 1,108 genetic perturbations in 51 experimental batches across four cell types. The images were produced by Recursion Pharmaceuticals in their labs in Salt Lake City, Utah. Researchers will use this dataset for studying and benchmarking methods for dealing with biological batch effects, as well as areas in machine learning such as domain adaptation, transfer learning, and k-shot learning.
- RxRx19 – RxRx19 is the first morphological dataset that demonstrates the rescue of morphological effects of COVID-19.
Chemical Perturbations
- Gustafsdottir et al. 2013 – Cell painting profiles from 1,600 bioactive compounds in U2OS cells (Access from public S3 bucket:
s3://cytodata/datasets/Bioactives-BBBC022-Gustafsdottir/profiles/Bioactives-BBBC022-Gustafsdottir/
). - Wawer et al. 2014 – Cell painting profiles from 31,770 compounds in U2OS cells (Click to download).
- Bray et al. 2017 – Cell painting profiles from 30,616 compounds in U2OS cells (Center Driven Research Project CDRP) (Download from GigaDB | Access from public S3 bucket:
s3://cytodata/datasets/CDRPBIO-BBBC036-Bray/profiles_cp/CDRPBIO-BBBC036-Bray/
).
Genetic Perturbations
- Singh et al. 2015 – 3,072 cell painting profiles from 41 genes knocked down with RNA interference (RNAi) in U2OS cells (Access from GitHub).
- Rohban et al. 2017 – Cell painting data from 220 overexpressed genes in U2OS cells (Access from public S3 bucket:
s3://cytodata/datasets/TA-ORF-BBBC037-Rohban/profiles_cp/TA-ORF-BBBC037-Rohban/
). - Unpublished – Cell painting profiles of 596 overexpressed alleles from 53 genes in A549 cells (Access from public S3 bucket:
s3://cytodata/datasets/LUAD-BBBC043-Caicedo/profiles_cp/LUAD-BBBC043-Caicedo/
) - Unpublished – 3,456 cell painting profiles from CRISPR experiments knocking down 59 genes in A549, ES2, and HCC44 cells (Access from GitHub).
Software
Open source software packages for image-based profiling of biological phenotypes.
- Advanced Cell Classifier – A software package for exploration, annotation and classification of cells within large datasets using machine learning.
- CellProfiler – CellProfiler is a free open-source software for measuring and analyzing cell images.
- CellProfiler Analyst – Interactive data exploration, analysis, and classification of large biological image sets.
- Cytominer – Methods for image-based cell profiling.
- EBImage – Image processing toolbox for R.
- HTSvis – A web app for exploratory data analysis and visualization of arrayed high-throughput screens.
Publications
Publications related to image-based profiling.
Reviews
- Image-based profiling for drug discovery: due for a machine-learning upgrade? – 2020 review of applications in image-based profiling from a Carpenter lab/pharma perspective.
- Data-analysis strategies for image-based cell profiling – Introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images.
- High-content screening for quantitative cell biology – Describe some recent applications of HCS, ranging from the identification of genes required for specific biological processes to the characterization of genetic interactions.
- Microscopy-based high-content screening – Describe the state of the art for image-based screening experiments and delineate experimental approaches and image-analysis approaches as well as discussing challenges and future directions, including leveraging CRISPR/Cas9-mediated genome engineering.
- Applications in image-based profiling of perturbations – Describes applications of image-based profiling including target and MOA identification, lead hopping, library enrichment, gene annotation and identification of disease-specific phenotypes
Applications
- Expanding the antibacterial selectivity of polyether ionophore antibiotics through diversity-focused semisynthesis – Poulsen lab paper from 2020 where antibiotics are tested for their ability to leave mammalian cells as intact as possible, per the Cell Painting assay.
- Image-based multivariate profiling of drug responses from single cells – A multivariate method for classifying untreated and treated human cancer cells based on ∼300 single-cell phenotypic measurements.
- Discovering metabolic disease gene interactions by correlated effects on cellular morphology – Profiling disease-gene interaction during adipocyte differentiation.
- Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes – This study provides an in-depth analysis of cell division phenotypes and makes the entire high-content data set available as a resource to the community.
- Bioactivity screening of environmental chemicals using imaging-based high-throughput phenotypic profiling – Use of image-based profiling to screen the bioactivity of environmental chemicals
- Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery – Using image-based profiles to predict the bioactivity of small molecules in other unrelated assays.
- Tales of 1,008 Small Molecules: Phenomic Profiling through Live-cell Imaging in a Panel of Reporter Cell Lines – Demonstrating the effects of polypharmacology in MOA prediction while offering solutions for overcoming it in future image-based profiling studies.
Methods
- Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes – Protocol describing the design and execution of experiments using Cell Painting.
- Multiplex Cytological Profiling Assay to Measure Diverse Cellular States – Cell Painting assay.
- CIDRE: an illumination-correction method for optical microscopy – Retrospective method for illumination-correction based on energy minimization.
- Retrospective shading correction based entropy minimization – Method for retrospective shading correction based on entropy minimization.
- Capturing single-cell heterogeneity via data fusion improves image-based profiling – Adds dispersion and covariances to population averages to capture single-cell heterogeneity.
- Minimum redundancy feature selection from microarray gene expression data – Minimum redundancy – maximum relevance feature selection framework.
- Learning unsupervised feature representations for single cell microscopy images with paired cell painting – Selfsupervised method to learn feature representations of single cells in microscopy images without labelled training data.
- Weakly supervised learning of single-cell feature embeddings – Training CNNs using a weakly supervised approach for feature learning.
- Accurate Prediction of Biological Assays with High-Throughput Microscopy Images and Convolutional Networks – End-to-end learning with CNNs to predict bioactivity of small molecules in unrelated assays using image-based profiles.
- Evaluation of Deep Learning Strategies for Nucleus Segmentation in Fluorescence Images – Comparing several deep learning methods for nuclear segmentation.
- Automating Morphological Profiling with Generic Deep Convolutional Networks – Transfer of activation features of generic CNNs to extract features for image-based profiling.