This interview in in two parts. Part 1 can be read here.
What kind of master data is associated with clinical trial data?
The pharmaceutical industry has a wide variety of specialized tools for handling clinical trial data and few of them are designed to use or handle master data. However the whole clinical trials process is full of master and reference data. If you think about it, the clinical trial, or clinical study, is itself a master data entity. It has a unique identifier and a study title. To operate the trial within each country, you need a licence from the relevant regulatory authority, which itself has an identifier. You also need to identify the products within the clinical trial and the indications (or medical conditions) that it is focused upon. Then there are the Study Sites, including the primary investigators and the healthcare organisation (or facility) who are recruiting patients and conducting the clinical trial. And of course we have human subjects, who are the anonymised patients involved in the trial, and frequently biospecimens that are collected from the patient to support further investigation and research. All of these can be considered master data. There’s an even longer list of Reference data, things like Study Phase, Study Type, Country, Route of Administration, Dose Form – all of these need to be standardised to correctly interpret the clinical trial.
What kind of innovative approaches are you building to counter your challenges with the transformation here?
The main change we are driving is to recognise that this data needs to be separated from the operational systems and integrated via REST API’s. We are already well advanced on this path with our Reference Data Platforms. We combine this with a strong data governance function who ensure that data is managed and trustworthy.
The other element of innovation is to combine our delivery of reference and master data with published and internal ontologies. Implementing in this way establishes rich links in our data which we can exploit in Knowledge Graphs/AI and analytics.
How do you see the impact of Covid on the healthcare industry and your work in particular?
Covid has had a big impact on AstraZeneca, ranging from development of new Vaccines and monoclonal antibodies through to implementing covid testing processes. We’ve even invested time in changing the way we plan and operate clinical trials to work within the pandemic.
As a data leader, what is your view on technologies like AI on MDM and other data solutions?
I see the solutions that I’m responsible for as providing the foundations for AI by providing clean, trustworthy linked data.
I’m always cautious of statements that I often hear that AI will eventually replace technologies like MDM. The reality is that all technologies will be supported by AI in the future, but the role of master data management in providing unique identity for data will remain and will always need strategic thinking.
Editor’s note: Agree wholeheartedly. Data mastering is the foundation for AI and analytics.
How do you see the impact of cloud technologies on your day to day work?
Cloud is built into the way we work at AstraZeneca. Our MDM and RDM platforms are both cloud based.
You have a background in Maths and Physics and are now working on Chemical Substances. What is your learning strategy to evolve to new aspects of your work and what are some tips and techniques you would like to share with our readers.
A lifelong ambition to learn about science is essential to my role. We’ve not touched on it, but I’ve also found learning about biomolecules, such as proteins/monoclonal antibodies to be even more fascinating than chemical substances. I would never describe myself as an expert, but I’ve found that I can quickly scan text books and web sites to learn sufficiently about the concepts to be able to have an effective conversation with scientists about their data. This is absolutely key to master data, the world of science is full of localised and conflicting terminology and you frequently find several terms are used to describe the same thing in different groups or even within the same process. Terms like product are particularly bad and can mean multiple things. This is where I apply my scientific knowledge to probe what the scientist really means.
It’s also important to look at the data itself. Often when you observe the data you spot patterns in the data that weren’t revealed in discussions with the scientists. If you can decipher the pattern, you can understand what is happening and you may be able to propose a significant improvement or simplification.
What is your advice to someone starting on the MDM journey?
- MDM is not about technology – paradoxically it’s mostly about process. Understanding the data and the process used to create that data is the most critical thing. You should always identify the process that creates the data entity first and you should always define an end to end process for managing the master data.
- You must really understand the data…anyone who talks vaguely about mastering something like product, without defining what that means, will not be successful.
- An additional point is that you should never expect an MDM process to arrive through consensus; most organisations will try and retain their existing processes and you’ll end up making multiple concessions and will fail to generate trustworthy data. As an MDM architect you have to be prepared to define the way forward and argue your case as a simplification.
Thank you so much Colin for these insights. We wish you the very best of luck in your upcoming tasks and for the future. Stay safe!Posted on July 27th, 2020
Ladies and Gentlemen, it is with great pleasure that we welcome Master Data Maestro Colin Wood, Head of Information Architecture at AstraZeneca. Colin has a deep experience of over 30 years in information technology, with the last 20 years spent in lead Information Architecture roles within large pharmaceutical R&D organisations. He has a degree in Maths and Physics and initially trained as a research scientist at the UK Meteorological Office, before moving into a software consultancy role and then onto the Pharmaceutical industry.
Colin specialises in defining and implementing master and reference data solutions that provide unique identity for a broad range of data entities – including Substances, Projects, Clinical Studies, Products, Batches, Samples, Targets, Indications, etc.
Thank you for allowing us to interview you Colin. Really appreciate your time. Can you please tell our readers how your work experience has shaped your outlook on Master Data?
These experiences have taught me about the critical role that accurate master data plays in supporting integration of scientific data across systems and within analytic and search applications. I have also recognised the importance of external identifiers and ontologies in providing a richer ability to integrate both internal and external data sources; many of the principles that I have learned have now become embedded within the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
In your current role as the Head of Information Architecture within the AstraZeneca R&D IT organisation, what do you do? What is the business problem you are solving and what impact does it have on the organisation?
I lead a large team of Information Architects who specialise in data architecture, master data management, reference data management and metadata management; as well as providing expert domain knowledge of scientific areas.
We are supporting several ambitious activities within Astrazeneca to transform its science organisation, using data, digital and analytics. We are also building a Science Data Foundation which will be used to integrate Scientific data as a series of FAIR data sets.
AstraZeneca and myself are also involved with FAIRplus whose aim is to make life science available to researchers, in an accessible way with consistent annotations and interoperable formats
What are your 3 top challenges in your current role?
According to me, the top 3 problems we face are
1. Delivering foundational components (e.g. for master data and metadata), whilst at the same time delivering new business capabilities. AZ is highly ambitious and like most organisations cannot wait for foundational components to fully mature. We often have to make compromises or rely on long term plans to fully implement the foundational components. The foundational components I focus on about are:
A. Data Models that define the organisations data – these form the basis of the Data Architecture for R&D.
B. Master data and reference data to generate unique identity
C. Data catalogs and metadata management solutions that can be used to manage glossaries and datasets.
The challenge is that everyone wants to jump straight to solutions that exploit data (e.g. AI or visualisation solutions) and are reluctant to wait for, or even sponsor, the delivery of foundational components. However, to deliver these effectively and sustainably, the foundation has to be there, the data has to be organized and trusted. It is a balance between tactical and strategic, delivering value as well focusing on the long term. Based on my experience, I suggest that Information Architects should spend 50% of their time delivering the projects, 25% of their time planning and laying the foundation for the future and 25% on strategic planning.
2. Working in an IT organisation that has relatively little history of Information Architecture as a function. My team was only founded 18 months ago and some of the processes required to embed Information Architecture are not yet mature; for example there is no mandate that IT projects must provide a data model that defines the data managed in a solution. The lack of formal process means that we have to lay the groundwork for strategic approaches for Information Architecture whenever we start new initiatives. This may involve educating on the importance of data models, to understand the source data, and the use of master and reference data to support integration of data sources. Like most organisations, Agile features heavily in our development practices. I find that I have to regularly remind teams that Agile does not mean there is no need for architecture; in fact an early focus on Information Architecture can help Agile projects deliver by ensuring there is an early understanding of existing master and reference data sources,
3. Hiring and Training. The pharmaceutical industry is uniquely complex – no other industry has such a broad range of complex data. To be successful in an Information Architecture role you must have in-depth knowledge of industry data and processes. Acquiring people with that background is always challenging, similarly training people who are new to the industry is time consuming.
You have extensive experience in data management in the healthcare domain. Can you please describe what is meant by chemical registration and how does it relate to an MDM system?
Chemical Registration is potentially the earliest form of master data management in any industry, but is rarely described as a master data solution. Most pharmaceutical companies have been assigning unique identity to molecules or compounds for 50+ years and have been using electronic solutions for most of this period. The need for electronic solutions became prevalent in the 1980’s/1990’s when most pharmaceutical companies started to scale up their chemistry programmes requiring the identification of millions of unique compounds/molecules.
Whilst the technology is unique to chemical registration (and you would never use a commercial MDM vendor), the process is essentially a master data process – with one big difference…the thing you are identifying is characterised by a chemical structure. For example something like this:
This might look like a graphic, but it’s actually a depiction of a 3-dimensional molecule with atoms and bonds. A compound registration solution is able to understand this and would first perform a uniqueness match, to find any potential matches, before assigning unique identity. Molecules that have a potential match may require review by a data steward before being registered. Some compound registration solutions even have the ability to merge records, if they are found to represent the same thing at a later date. This has all the classic hallmarks of a master data management system and is foundational to pharma R&D processes.
In the last decade, the industry has shifted to increasingly focus on biomolecules, requiring the investment in similar solutions that uniquely identify proteins, nucleic acids and a range of other complex entities. Whilst these are significantly more complex, the master data principles are the same – you draw the biomolecule, submit to a specialist registration solution, which then runs a uniqueness check and assigns a unique identity.
What are some external sources you use for referring substances or products? How does this increase the complexity of the taxonomy you use internally?
International naming of substances is an essential part of identification of any substance used as an ingredient in a pharmaceutical product. International Nonproprietary names, managed by WHO Mednet are the most widely used, however there are other nationally focused schemes such as USAN (United States Adopted Name).
A significant part of product and substance identification is now linked to external regulator identity. This includes FDA SRS (Substance Registration), EMA XEVMPD and now SPOR. We are also beginning to see external substance registration systems – such as G-SRS which are assigning unique identity to all substances used as ingredients in Pharmaceutical products.
Can you please tell us about your work with Clinical Data ?
I have now worked in 2 large scale clinical transformations in different companies. In both cases I have found that data models are critically important to understand and organise the data, master and reference data are critical to support integration of the data and data governance, ensuring we make good decisions about data, is truly essential.
The Life Sciences Industry, just like the BFSI industry, has a major history of mergers and acquisitions. How does that impact your data management processes?
This is a fact of life in the pharmaceutical industry and all pharmaceutical companies carry the entire history of its company within its internal identifiers. This is an ongoing process as acquisitions of companies or individual products are a constant theme within the industry. An added complication is that identifiers are frequently exposed externally – to regulators and to the general public; hence it is not straightforward to change any of the identifiers. This is made next to impossible to solve because most operational solutions within this industry are set up to expect a single identifier, with no provision for alternatives or easy way to change,
I’ve found that master and reference data solutions can handle this conundrum extremely well. Using these systems we can set up solutions that assign unique, non-changing, identifiers and can code a full range of synonyms allowing ready conversion between legacy and current terms. We can even change preferred terminology, allowing the organisation to change how it reports on a data entity overnight. Embedding use of the master and reference data into operational systems takes time, but addresses this issue and allows us to connect data from across the organisation using common identifiers.
This is very similar to the type of functionality supported within Ontologies and I’ve also found that we can add tremendous richness to the master and reference data by building connections to ontologies. Just storing an external identifier from an ontology as part of your master data opens up the ability to connect to vast numbers of external sources. It also allows the master and reference data to act as a graph, connecting data for use in AI/knowledge graph solutions.
There is a critical dependency on expert data stewardship to ensure that the master or reference data terms are mapped correctly. With scientific data, this can require deep expertise. In the case of substances it is crucial that an expert checks that two different identifiers reference exactly the same substance by checking the molecule or other descriptors; this requires considerable chemical expertise. This means that the connection to a data governance organization is critical for all master and reference data programmes.
Owing to the wealth of information Colin provided, this interview is conducted in 2 parts. We will publish the second part shortly. Watch this space!Posted on July 19th, 2020
Join us for a Nasscom Product Connect Webinar with our Founder and CEO, Sonal talking about the challenges with current rule based systems and how MDM with AI is the answer to your data management woes.
Register at bit.ly/2UvhAz1Posted on June 23rd, 2020