Introduction

Well-characterized, selective small molecules—“chemical probes”—are essential tools for target validation during drug development and in basic biological research1. Criteria for small molecule modulators to qualify as chemical probes have been established by chemical biologists and are widely accepted in the community2. These include target-related criteria for potency, selectivity, and proof for target engagement in addition to the suitability of the chemical matter itself1. By creating these quality criteria, chemical probes became important and generally recognized tools aiding the scientific community and accelerating drug discovery. Inspired by this approach, our goal is the standardization of quality criteria within the drug candidate evaluation process. For the evaluation of a ligand-protein interaction, either direct or indirect measurements can be carried out. Direct binding assays such as isothermal titration calorimetry (ITC) and surface plasmon resonance (SPR) are state-of-the-art methods to measure dissociation constants (KD). Additionally, ITC measurements are both label-free and do not require immobilization. They allow for the determination of the stoichiometry and thermodynamic parameters, whereas SPR enables the determination of the binding rate constants such as kon and koff. However, these methods either require large amounts of protein (ITC) or immobilized purified protein (SPR) and therefore are excellent tools for final binding validation but not optimal for larger screening and in-cell campaigns. Indirect biochemical and cellular assays often rely on in-solution (or in-cell) displacement assays using fluorescence-labeled molecules, called tracers (sometimes referred to as fluorescent probes—not to be confused with chemical probes themselves or medical radiotracers)3,4,5. Tracers are composed of (1) a moiety that binds to the protein of interest (POI), such as small molecules, DNA, RNA, and peptides, (2) a chemical linker, and (3) a reporter label, typically a fluorescent dye6,7. To avoid interference of the linker with the binding of the molecule to the POI, the choice of the right exit vector, a solvent exposed attachment point of the linker to the molecule, is important (Fig. 1a).

Fig. 1: Composition of a tracer (T000001)22 and the principle underlying the tracerDB.
figure 1

a Schematic representation of a tracer molecule with its three distinct substructures. The POI ligand marks the binding moiety to the target protein, ultimately generating the proximity between target and the label. The optical reporter (dye) is chemically linked to this POI ligand via a chemical linker. The attached dye is selected according to the requirements of the assay system to achieve the desired excitation and emission wavelengths. b Underlying principles of tracer and experimental data processing. Information provided through submission is displayed in the upper panels outside the tracerDB framework. These data are parsed and assigned to each entity: the tracer molecule, the protein and interaction between the former as shown in detail in Fig. 2.

Tracers are used in cellular target engagement assays (in cellulo) such as time-resolved Förster resonance energy transfer (TR-FRET)6 or bioluminescence resonance energy transfer (BRET)4 assays or biochemical in vitro studies, which can be BRET-based, TR-FRET-based or comprise fluorescence polarization (FP)8. In particular, NanoBRET, a method frequently applied in kinase live-cell target engagement assays, critically relies on the use of suitable tracer molecules. This method validates the binding of a small molecule such as an inhibitor to its cognate target in the cell. It is also suitable for assessing cellular selectivity by utilizing a single tracer9. Owing to the stringent distance and orientation constraints of the BRET donor, tracers do not have to be specific for the protein of interest. Promiscuous BRET tracers are ideal as they survey multiple targets. Using this principle, we successfully enabled 206 (as of Feb. 2024) validated kinase interactions with tracer K10 (T000008).

Results

Due to the importance of the quantification of protein-ligand interactions, a large number of tracers are reported within the literature. However, scientists face several problems to establish displacement assays for their respective target: (1) finding established tracers in the literature using search engines is difficult, as much of the required information is buried in the Supplementary Methods; (2) reproducibility of the reported assays is often problematic due to insufficient validation of the tracer or unfavorable assay parameters; (3) the availability of the tracer is often unknown. We created a database for fluorescent tracer molecules named tracerDB to address these problems. It has been developed and standardized to provide design and application guidance based on strict performance criteria. For each tracer-based assay, the chemical structure or commercial availability is provided, as well as the assay parameters and a reference. tracerDB allows to search for the protein of interest or the tracer, enabling fast assessment of available assay options for a specific target. Within the first 6 months (as of April 2024), 42 tracers, targeting 318 different proteins in 476 experimentally validated assays were reviewed and uploaded.

Scientists worldwide can submit their tracer data for review and inclusion in the database. The submission of tracer data must contain all necessary information (no physical molecules) required to judge the quality and reproducibility of a tracer-based assay. First, general information about the molecular structure (e.g. simplified molecular-input line-entry system (SMILES) specification, fluorophore characteristics, storage conditions and trivial name) are required for the creation of a tracer page (Fig. 1b). In some special cases, tracer structures cannot be disclosed. In this case, the availability of the tracer must be guaranteed to allow access to all reported assays which would otherwise be granted by the chemical structure of the tracer. Every structure submitted is checked for structural features associated with “pan assay interference compounds” (PAINS)10 which are reported together with the tracer information. Since the applied filters for the detection of PAINS also detect fluorescent substructures, it is recommended to inspect the highlighted moieties of the tracers that are flagged, by opening the PAINS report within the tracer description panel. All target proteins bound by the tracer have to be listed in UniProt11. Experimental data for the tracer titration and compound displacement are part of the validation process and must be uploaded together with information on a recommended concentration, the Z’ value of the assay, and the assay window observed. Here, the assay window describes the fold-change between signal (tracer bound) and noise (tracer only) at the recommended tracer concentration (https://www.tracerdb.org/about/). The experimental data are available for download by the user. To facilitate the upload and review process, data can be submitted via the submission page (https://www.tracerdb.org/submission). On this page, all information can be added with essential information written in bold. Without providing all necessary information, the submission is not possible. After insertion of all required information, the data is automatically sent to info@tracerdb.org, for final approval and upload, allowing submission without the need of a login by the user. Additionally, tracer IDs can be assigned prior to publication, allowing a direct link to the database (in analogy to PDB). In contrast to an automatically generated data repository, the submission of tracer data is followed by a review process which makes tracerDB a reviewed and curated database. Thus, every entry has been examined for its agreement with the database’s quality criteria, allowing adherence to the highest possible quality control.

The interaction network between tracer molecules and their respective targets can be modeled as a many-to-many relationship where many tracers can bind a single protein and a single tracer can bind many proteins. As a result, the underlying database structure consists of three entity sets: the tracer, the protein, and their interaction (Fig. 1b). To ensure a user-friendly submission of data and standardize the presentation, all molecular representations and calculations are created and executed on the server side. We chose Django12 as a python-based web framework together with a MySQL database to enable high-frequency read operations.

In addition to the information on the crowdsourced tracers, we have also included general information on tracer molecules and illustrations of different assay systems on the “about” page (https://www.tracerdb.org/about/). Here, we describe the quality control criteria and how to calculate the respective values. In order to further increase the reproducibility of the described assays, each assay is classified according to its parameters into robust, expert and unsuitable assays with exemplary data for clarification (Fig. 2). These assay levels are represented by a traffic light icon for each registered assay. In addition, we have included a methods section describing the different assays used to collect the submitted data (https://www.tracerdb.org/methods/). This section is supported by an illustration and key references.

Fig. 2: Data input and processing carried out by the webserver.
figure 2

Input data are depicted on the left, outside the tracerDB framework. Calculations carried out by the database are marked with a processor symbol. First, the database calculates tracer parameters and generates a schematic representation of the tracer molecule including an automated detection of PAINS elements within the tracer structure. Next, assay parameters and experimental data are uploaded and processed. From the experimental data (.csv file) the number of replicates is extracted and the datapoints are plotted. The data are interpolated using the indicated function to yield the tracer KD and displacement IC50/EC50. The recommended tracer concentration is estimated from the tracer KD, but can be changed if more optimal conditions are known. Finally, the target is registered using its UniProt ID, resulting in searchable accession numbers, gene, and protein names.

Discussion

The development and implementation of tracer-based assays is carried out by countless laboratories around the world. For every assay, proper validation and standardization are crucial to ensure assay quality. To support the reproducibility of established assays across different laboratories, tracerDB helps to standardize assays by providing a curated and constantly growing set of experimentally validated tracers with recommended concentrations. Additionally, the tracerDB “about” page (https://www.tracerdb.org/about/) summarizes the most important quality control information to ensure the generation of high-quality data. Further information on validated exit vectors for the development of other bifunctional compounds or indications of suitable protein fusion termini can be extracted from the database, as well.

tracerDB is therefore a resource for drug-screening scientists as well as the chemical biology community, that gathers detailed, reviewed and high-quality information on tracer-based assays and their applications.

Methods

Architecture of the database

RDkit13, a commonly used cheminformatics package for python is employed to render SMILES strings as two-dimensional molecular representations, along with the implemented substructure search to allow for the detection of PAINS elements and their depiction. The average molecular weight and the estimated logP value of the compound- and peptide-based tracers are calculated using RDkit’s implemented methods for molecular descriptors. In order to avoid having to deal with complex SMILES of large peptide tracers, the pyPept package14 has been incorporated into this project to allow for flexible declaration of custom amino acids, i.e. fluorophore peptide labels. These artificial building blocks are then included into the string representation of the peptides and stored in the database as BILN15. For the interactive depiction of the three-dimensional structure of protein-based tracers, the NGL viewer was incorporated16,17. To ensure consistency in the depiction and analysis of experimental data uploaded to the webserver, fitting and plotting are executed on the server side. The experimental titration data is plotted via Matplotlib18 and the fitting is conducted through SciPy19 using non-linear least squares optimization. It is assumed that the data from concentration response experiments exhibit a sigmoidal shape. Hence, to fit the data the following logistic equation is employed:

$$f\left(x\right)=\frac{a}{1+{e}^{-b\left(x-{{{\mbox{XC}}}}_{50}\right)}}+c$$
(1)

The response of the measurement is a function of the logarithmic concentration \(x\), with the additional parameters \(a\), \(b\), and, \(c\) which are utilized to scale and transform \(f\), because the input is not normalized. \({{{\mbox{XC}}}}_{50}\) is the parameter determining the log concentration halfway between the plateaus of the sigmoidal curve. Depending on the experimental context this parameter may be interpreted as \({{{\mbox{EC}}}}_{50}\) or \({{{\mbox{IC}}}}_{50}\). Protein titrations performed during the development of fluorescence polarization assay are commonly plotted as signal in millipolarization units versus the molar concentration. These saturation curves are estimated using the following hyperbolic model:

$$f\left(x\right)=\frac{{B}_{\max }\times x}{{K}_{D}+x}+{cx}+d$$
(2)

where \({B}_{\max }\) denotes the extrapolated maximum specific binding to the protein for high ligand concentrations. \({K}_{D}\) is the equilibrium dissociation constant, which specifies the concentration \(x\) required for half-maximum binding at equilibrium. The parameter \(c\) accounts for the ratio of nonspecific binding to total binding and \(d\) corrects for background signals20.

Protein information is automatically retrieved through the UniProt REST API, enabling the search for alternative protein and gene names. The retrieved XML files are processed using Biopython’s UniProt parser21, resulting in standardized and well-annotated protein entries, ultimately leading to more robust search functionality.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.