Credit: The original article is published here.
The National Data Library (NDL) is a key element of the government’s AI Opportunities Action Plan. At its heart the government is identifying five high-impact public datasets to make accessible to government agencies, businesses, and researchers, enabling innovation in healthcare, policy development, and public services.
The questions are how do you build one and what are we building?
Precedents for a National Data Library
Opinions differ on how to build an NDL. One school of thought is that it should be a decentralized, federated data platform connecting existing databases and facilitating data exchange, without storing data centrally. Another argues that it should be centralized data repository.
The UK Biobank is an example of the latter. It holds half a million people’s health data that can be accessed by approved academic, enterprise, charitable and government researchers. Transparency in data usage, strong security measures, and public trust in privacy protections have been key to its success.
Estonia’s X-Road platform is an example of the former. X-Road is the technology underpinning a national digital identity scheme. It facilitates seamless data exchange across government agencies while maintaining decentralized data storage. The system enables Estonians to interact effortlessly with public services, from healthcare to taxation, saving time and improving efficiency.
Lessons from the platform reinforce the need for interoperability, as well as the importance of ensuring that data cannot be corrupted and that it’s secure. Crucially it is also user-centric and operates on the ‘Only Once Principle’. Estonia’s citizens don’t need to know it works, just that it does. And only provide their data once, which is automatically updated across all relevant systems.
Solid Foundations Are Key to Building a Library
The proposed National Data Library will be the largest data unification project ever undertaken by any country. Regardless of what it contains and how it’s used, the foundations are the same: high quality, trusted data.
Digital transformation efforts in the NHS are a perfect example of the complexity of modern master data management. Our health system contains the largest healthcare dataset in the world, which will make a fitting centerpiece of the NDL’s collection. Although the NHS is referred to as a singular – the ‘NHS’ it is a collection of departments, commissioning and provider organizations, regions and systems.
Complexity increases further as it needs to coordinate care across local authorities and social care, particularly as our population is aging and creating pressure on hospitals. Patient data is currently siloed across legacy systems, fragmented IT infrastructure often without common data formats and/or standards and is often incomplete, outdated and/or inaccurate. All of which regularly results in duplicate entries in different (siloed) repositories. The goal for the NHS is to create a single view of truth by data matching across these multiple data sources.
The Modern Data Management Transformation Challenge
Traditional Master Data Management has an inherent data-quality problem. These models take an age to ingest source system feeds that are often beset by data quality issues. They also rely on data matching, which compares each data string and applies a score across it to create a record-to-record match.
These probabilistic matching engines use algorithms that evaluate and score the matches. All of which is unideal for patient records that have several variations, because they might have multiple identifying attributes. Variations in personal data—such as name inconsistencies (e.g., Eliza vs Elizabeth)—can result in inaccurate matches, making it difficult to create a single, accurate record for each citizen.
A more powerful approach to managing patient and citizen data, which is essential for unifying data across siloes, is Entity Resolution (ER). ER uses a schema-agnostic model to save data engineering teams time and money from performing preliminary data conversions. ER leverages all available records to create the most accurate possible representation of an individual’s data, minimizing errors and enhancing the reliability of government datasets.
Crucially for public sector datasets that are constantly being added to, ER allows for continual data refresh for all the applications and services built on top of the platform.
Lessons from the NHS for the National Data Library
The Federated Data Platform (FDP), currently being implemented by NHS England, provides a glimpse into how the National Data Library could function. The FDP aggregates local healthcare data to enable faster, more coordinated care at a regional level, reducing inefficiencies caused by fragmented systems.
If expanded, a national-scale data platform could unify health records across the NHS, allowing for a single patient record accessible via the NHS app. This approach recognizes that citizens engage with multiple public services across different regions, requiring a seamless data-sharing framework.
Enhancing Public Services Through Contextual Data Sharing
As mentioned, the government’s approach to the National Data Library will focus on identifying five key public datasets that can deliver the most immediate impact. However, data alone is not valuable without a clear strategy for its application.
One promising example is using the NDL to enable cross-departmental data sharing between the NHS and the Department for Work and Pensions. By linking healthcare and employment data, policymakers could gain deeper insights into the connections between health outcomes and socioeconomic factors. Additionally, integrating benefit eligibility verification within healthcare services could reduce fraud and ensure that resources are allocated to those in genuine need.
Seamless interoperability between government systems will be essential to maximizing the benefits of the NDL, allowing departments to communicate efficiently and reducing the need for manual data processing.
If You Build It, They Will Come
The government’s vision for a National Data Library is ambitious, but its potential to transform public services is unparalleled. While the exact structure of the NDL is yet to be finalized, the journey toward its creation is just as important as the outcome.
We’ve compiled a list of the best IT management tools.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro