Practical Model Selection for Prospective Virtual Screening

Shengchao Liu, Moayad Alnammi, Spencer S. Ericksen, Andrew F. Voter, Gene E. Ananiev, James L. Keck, F. Michael Hoffmann, Scott A. Wildman, Anthony Gitter*

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

52 Scopus citations

Abstract

Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the data set and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein-protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well on public data sets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest.

Original languageEnglish
Pages (from-to)282-293
Number of pages12
JournalJournal of Chemical Information and Modeling
Volume59
Issue number1
DOIs
StatePublished - 28 Jan 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2018 American Chemical Society.

ASJC Scopus subject areas

  • General Chemistry
  • General Chemical Engineering
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Practical Model Selection for Prospective Virtual Screening'. Together they form a unique fingerprint.

Cite this