Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
Pretraining powerful deep learning models requires large, comprehensive training datasets, which are often unavailable for medical imaging. In response, the universal biomedical pretrained (UMedPT) foundational model was developed based on multiple small and medium-sized datasets. This model reduced the amount of data required to learn new target tasks by at least 50%.
To address the challenge of pretraining foundational models with large datasets, a multi-task approach is proposed, thus helping to overcome the data scarcity problem in biomedical imaging.
A recent study proposes a computational method for the design of free-form metamaterials systems. The method simplifies the design process by avoiding the use of anisotropic materials that are usually required for the conventional methods. The method can be applied in designing both two-dimensional and three-dimensional metamaterials that are subject to multiple physical fields.
A graph neural network using virtual nodes is proposed to predict the properties of complex materials with variable dimensions or dimensions that depend on the input. The method is used to accurately and quickly predict phonon dispersion relations in complex solids and alloys.
A method leverages protein structural data to predict T-cell receptor–peptide interactions for unseen peptide epitopes, which can be particularly useful for applications in cancer immunotherapy, autoimmunity studies, and vaccine design.
Machine learning has enabled major advances in the field of partial differential equations. This Review discusses some of these efforts and other ongoing challenges and opportunities for development.
While large-scale GPS location datasets have been instrumental to applications in epidemiology, there are still several challenges with these data that should be considered and addressed to make data-driven epidemiology more reliable.
A recent study shows that, by leveraging nonlinear optical processes in disordered media, photonic processors can transform high-dimensional machine-learning data, using nonlinear functions that are otherwise challenging for digital electronic processors to compute.
The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.
Data about the transition states of rare transitions between long-lived states are needed to simulate physical and chemical processes; however, existing computational approaches often gather little information about these states. A machine-learning technique resolves this challenge by exploiting the century-old theory of committor functions.
A two-stage learning algorithm is proposed to directly uncover the symbolic representation of rules for skill acquisition from large-scale training log data.
CASTLE, a deep learning approach, extracts interpretable discrete representations from single-cell chromatin accessibility data, enabling accurate cell type identification, effective data integration, and quantitative insights into gene regulatory mechanisms.
Discovering improved semiconductor materials is essential for optimal device fabrication. In this Perspective, data-driven computational frameworks for semiconductor discovery and device development are discussed, including the challenges and opportunities moving forward.
We present a method to alleviate re-identification risks behind sharing haplotype reference panels for imputation. In an anonymized reference panel, one might try to infer the genomes’ phenotypes to re-identify their owner. Our method protects against such attack by shuffling the reference panels genomes while maintaining imputation accuracy.
MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.
Multicellular modeling is increasingly being used to understand biological systems. SimuCell3D is a tool that allows mechanically realistic simulations, using the deformable cell model, to be developed and run.
Cooperation is crucial for human prosperity, and population structure fosters it through pairwise interactions and coordinated behavior in larger groups. A recent study explores the evolution of behavioral strategies in higher-order population structures, including pairwise and multi-way interactions to reveal that higher-order interactions promote cooperation across networks, especially when they are formed by conjoined communities.
SANGO efficiently removed batch effects between the query and reference single-cell ATAC signals through the underlying genome sequences, to enable cell type assignment according to the reference data. The method achieved superior performance on diverse datasets and could detect unknown tumor cells, providing valuable functional biological signals.
Approaches are needed to accelerate the discovery of transition metal complexes (TMCs), which is challenging owing to their vast chemical space. A large dataset of diverse ligands is now introduced and leveraged in a multiobjective genetic algorithm that enables the efficient optimization of TMCs in chemical spaces containing billions of them.
While there is a clear opportunity for digital twins to bring value in mechanical and aerospace engineering, they must be considered as an asset in their own right so that their full potential can be realized.