Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Interactions Between Weighting Scheme and Similarity Coefficient in Similarity-Based Virtual Screening

John D. Holliday, Peter Willett, Hua Xiang

Source Title: International Journal of Chemoinformatics and Chemical Engineering (IJCCE) 2(2)

DOI: 10.4018/ijcce.2012070103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Similarity searching is one of the most common methods for ligand-based virtual screening, and is normally carried out using the Tanimoto coefficient with binary fingerprints. However, a recent study has suggested that it may be less appropriate for use with weighted fingerprints in some circumstances. This paper compares the Tanimoto coefficient with other coefficients, and demonstrates that one of these, the cosine coefficient, exhibits a much greater degree of robustness in the face of variations in the nature of the fragment weighting scheme that is being used.

Article Preview

Top

Introduction

Similarity searching is one of the most common forms of ligand-based virtual screening (e.g., the reviews by Eckert and Bajorath (2007), Geppert et al. (2010), and McGaughey et al. (2007). Given a known bioactive molecule, such as a hit from a high-throughput screening experiment or a compound from the literature, a similarity search involves matching the known molecule (often called the reference structure) against each of the structures in a database, computing the degree of similarity in each case, and then ranking the database structures in order of decreasing similarity. The similar property principle (Johnson & Maggiora, 1980; Martin et al., 2002) states that molecules that are structurally similar have similar properties, and the top-ranked structures from a similarity search are hence those that are most likely to exhibit the required bioactivity (Sheridan, 2007; Stumpfe & Bajorath, 2011; Willett, 2009).

The effectiveness of similarity searching, i.e., its ability to identify bioactive molecules, is determined by the similarity measure that determines the degree of resemblance between the reference structure and each of the database structures. A similarity measure has three components: the descriptors that are used to represent each of the molecules; the weighting scheme that is used to weight different parts of the representation to reflect their relative degrees of importance; and the similarity coefficient that quantifies the degree of resemblance between two weighted sets of descriptors. Although many types of descriptor have been used in similarity searching, by far the best established is a 2D fingerprint, a binary vector in which bits are set to denote the presence of fragment substructures in a molecule (Willett, 2006, 2009). Binary 2D fingerprints are normally used with the Tanimoto coefficient, a simple association coefficient in which the limiting values of zero and unity denote two fingerprints having no bits (and hence having no substructures) in common and two identical fingerprints, respectively. Many other types of coefficient can be used, but comparative experiments have demonstrated the general effectiveness of the Tanimoto coefficient, and this is the basis for similarity searching facilities in most operational chemoinformatics systems (Leach & Gillet, 2007).

There have been many comparisons of fingerprints and similarity coefficients for similarity searching, e.g., the detailed studies by Bender et al. (2009), Hert et al. (2004), Duan et al. (2010), and Sastry et al. (2010). Despite some limited early work (Willett & Winterman, 1986), there has been less interest in the use of weighted fingerprints, in which the elements of the vector contain not binary values denoting the presence or absence of fragment substructures, but integer or real values denoting the relative importance of the fragments. A fragment with a high weight occurring in both a reference structure and a database structure will then make a greater contribution to the overall degree of inter-molecular similarity than will a fragment in common that has a lesser weight. There are two main sources of frequency information that can be used for fragment weighting: weights based on the number of times that a fragment occurs in an individual molecule; and weights based on the number of times that a fragment occurs in an entire database. Both types of weighting have been studied in recent work by Arif et al. (2009, 2010), who found that the former type of weighting could bring about notable increases in screening effectiveness in some circumstances, but that the latter type was of less general applicability. We hence focus here on the former approach, i.e., on exploiting information on how frequently fragments occur within individual molecules.

Complete Article List

Search this Journal:

Reset

Open Access Articles

Volume 8: 2 Issues (2019)

Volume 7: 2 Issues (2018)

Volume 6: 2 Issues (2017)

Volume 5: 2 Issues (2016)

Volume 4: 2 Issues (2015)

Volume 3: 2 Issues (2013)

Volume 2: 2 Issues (2012)

Volume 1: 2 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Interactions Between Weighting Scheme and Similarity Coefficient in Similarity-Based Virtual Screening

Abstract

Introduction

Complete Article List