|Figure from the article. CC-BY.|
Using data from the Connectivity Map (Cmap, doi:10.1126/science.1132939) and NCI60, we set out to do just that. My role in this work was to explore the actual structure-activity relationship. The Chemistry Development Kit (doi:10.1186/s13321-017-0220-4) was used to calculate molecular descriptor, and we used various machine learning approaches to explore possible regression models. Bottom line was, it is not possible to correlate the chemical structures with the biological activities. We explored the reason and ascribe this to the high diversity of the chemical structures in the Cmap data set. In fact, they selected the chemicals in that study based on chemical diversity. All the details can be found in this new paper.
It's important to note that these findings does not validate the QSAR concept, but just that they very unfortunately selected their compounds, making exploration of this idea impossible, by design.
However, using the transcriptomics data and a method developed by Juuso Parkkinen it is able to find multivariate patterns. In fact, what we saw is more than is presented in this paper, as we have not been able to support further findings with supporting evidence yet. This paper, however, presents experimental confirmation that predictions based on this component model, coined the Predictive Toxicogenocics Gene Space, actually makes sense. Biological interpretation is presented using a variety of bioinformatics analyses. But a full mechanistic description of the components is yet to be developed. My expectation is that we will be able to link these components to key events in biological responses to exposure to toxicants.