Physical Sciences Data Infrastructure: unlocking insights
In our digital world, scientific research creates vast quantities of data which can be used to generate new findings, enhance predictive power and improve precision. The challenge lies in extracting insights from the multitude of data systems used across chemistry and materials science, and related research fields.

Experts at the School of Chemistry and Chemical Engineering are at the forefront of a national initiative that’s tackling this challenge by connecting data sources to accelerate discovery.
Reproducible, accessible scientific data
The Physical Sciences Data Infrastructure (PSDI) is a programme funded by the Engineering and Physical Sciences Research Council that will empower researchers to handle data more easily by connecting the different data infrastructures they use.
The University of Southampton is a lead programme partner, working with the Science and Technology Facilities and other collaborators. The School of Chemistry’s Simon Coles, Professor of Structural Chemistry, is Principal Investigator for Southampton, with Professor Jeremy Frey, a physical and computational chemistry specialist, advising on governance and strategic relationships.
The programme’s vision is to unlock the full potential of chemistry and physical sciences data through seamless integration and accessibility. Traditionally, research data has been scattered across siloed databases, with limited interoperability. PSDI seeks to bridge these gaps, enabling researchers to:
- Access reference quality data from commercial and open sources
- Combine data from different sources for richer analysis and more comprehensive insights
- Share data, software and models including experimental and simulation data
- Deploy close-to-data computation and containerisation of data and software, bringing computation closer to the data itself for greater efficiency
- Use AI to explore data
- Learn how to make the results of their research open and FAIR (findable, accessible, interoperable, reusable)
Data sources across chemistry and other physical sciences
PSDI currently provides access to more than 20 different databases and repositories of physical sciences data including the BioSimDB (Biomolecular Simulations Database), Cambridge Structural Database, Chemical Availability Search (to help researchers find and compare chemical products from various suppliers), Propersea (Property Prediction), and the Chemotion Repository (chemical data). Researchers can search these data sources using the PSDI Cross Data Search Service. PSDI and its partners also provide a range of other data sources of interest to the physical science community. Accompanying tools, training and guidance are available to help researchers utilise these resources.
One of these resources, Data Revival, is an AI-driven service that can seamlessly convert handwritten lab book pages into machine-readable data. It was developed by Senior Research Assistant Sam Munday at Southampton and is now a fledgling spin-out co-founded by Sam and Professor Jeremy Frey.
Ultimately, PSDI aims to equip scientists with practical skills in data stewardship and cutting-edge technologies such as AI and computational tools, accelerating discovery and preparing the next generation of researchers for an increasingly data-driven world.
Related expertise

Professor Simon Coles
Crystallography, structural chemistry and digital chemistry
