Ladies and Gentlemen, it is with great pleasure that we welcome Master Data Maestro Colin Wood, Head of Information Architecture at AstraZeneca. Colin has a deep experience of over 30 years in information technology, with the last 20 years spent in lead Information Architecture roles within large pharmaceutical R&D organisations. He has a degree in Maths and Physics and initially trained as a research scientist at the UK Meteorological Office, before moving into a software consultancy role and then onto the Pharmaceutical industry.
Colin specialises in defining and implementing master and reference data solutions that provide unique identity for a broad range of data entities – including Substances, Projects, Clinical Studies, Products, Batches, Samples, Targets, Indications, etc.
Thank you for allowing us to interview you Colin. Really appreciate your time. Can you please tell our readers how your work experience has shaped your outlook on Master Data?
These experiences have taught me about the critical role that accurate master data plays in supporting integration of scientific data across systems and within analytic and search applications. I have also recognised the importance of external identifiers and ontologies in providing a richer ability to integrate both internal and external data sources; many of the principles that I have learned have now become embedded within the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
In your current role as the Head of Information Architecture within the AstraZeneca R&D IT organisation, what do you do? What is the business problem you are solving and what impact does it have on the organisation?
I lead a large team of Information Architects who specialise in data architecture, master data management, reference data management and metadata management; as well as providing expert domain knowledge of scientific areas.
We are supporting several ambitious activities within Astrazeneca to transform its science organisation, using data, digital and analytics. We are also building a Science Data Foundation which will be used to integrate Scientific data as a series of FAIR data sets.
AstraZeneca and myself are also involved with FAIRplus whose aim is to make life science available to researchers, in an accessible way with consistent annotations and interoperable formats
What are your 3 top challenges in your current role?
According to me, the top 3 problems we face are
1. Delivering foundational components (e.g. for master data and metadata), whilst at the same time delivering new business capabilities. AZ is highly ambitious and like most organisations cannot wait for foundational components to fully mature. We often have to make compromises or rely on long term plans to fully implement the foundational components. The foundational components I focus on about are:
A. Data Models that define the organisations data – these form the basis of the Data Architecture for R&D.
B. Master data and reference data to generate unique identity
C. Data catalogs and metadata management solutions that can be used to manage glossaries and datasets.
The challenge is that everyone wants to jump straight to solutions that exploit data (e.g. AI or visualisation solutions) and are reluctant to wait for, or even sponsor, the delivery of foundational components. However, to deliver these effectively and sustainably, the foundation has to be there, the data has to be organized and trusted. It is a balance between tactical and strategic, delivering value as well focusing on the long term. Based on my experience, I suggest that Information Architects should spend 50% of their time delivering the projects, 25% of their time planning and laying the foundation for the future and 25% on strategic planning.
2. Working in an IT organisation that has relatively little history of Information Architecture as a function. My team was only founded 18 months ago and some of the processes required to embed Information Architecture are not yet mature; for example there is no mandate that IT projects must provide a data model that defines the data managed in a solution. The lack of formal process means that we have to lay the groundwork for strategic approaches for Information Architecture whenever we start new initiatives. This may involve educating on the importance of data models, to understand the source data, and the use of master and reference data to support integration of data sources. Like most organisations, Agile features heavily in our development practices. I find that I have to regularly remind teams that Agile does not mean there is no need for architecture; in fact an early focus on Information Architecture can help Agile projects deliver by ensuring there is an early understanding of existing master and reference data sources,
3. Hiring and Training. The pharmaceutical industry is uniquely complex – no other industry has such a broad range of complex data. To be successful in an Information Architecture role you must have in-depth knowledge of industry data and processes. Acquiring people with that background is always challenging, similarly training people who are new to the industry is time consuming.
You have extensive experience in data management in the healthcare domain. Can you please describe what is meant by chemical registration and how does it relate to an MDM system?
Chemical Registration is potentially the earliest form of master data management in any industry, but is rarely described as a master data solution. Most pharmaceutical companies have been assigning unique identity to molecules or compounds for 50+ years and have been using electronic solutions for most of this period. The need for electronic solutions became prevalent in the 1980’s/1990’s when most pharmaceutical companies started to scale up their chemistry programmes requiring the identification of millions of unique compounds/molecules.
Whilst the technology is unique to chemical registration (and you would never use a commercial MDM vendor), the process is essentially a master data process – with one big difference…the thing you are identifying is characterised by a chemical structure. For example something like this:
This might look like a graphic, but it’s actually a depiction of a 3-dimensional molecule with atoms and bonds. A compound registration solution is able to understand this and would first perform a uniqueness match, to find any potential matches, before assigning unique identity. Molecules that have a potential match may require review by a data steward before being registered. Some compound registration solutions even have the ability to merge records, if they are found to represent the same thing at a later date. This has all the classic hallmarks of a master data management system and is foundational to pharma R&D processes.
In the last decade, the industry has shifted to increasingly focus on biomolecules, requiring the investment in similar solutions that uniquely identify proteins, nucleic acids and a range of other complex entities. Whilst these are significantly more complex, the master data principles are the same – you draw the biomolecule, submit to a specialist registration solution, which then runs a uniqueness check and assigns a unique identity.
What are some external sources you use for referring substances or products? How does this increase the complexity of the taxonomy you use internally?
International naming of substances is an essential part of identification of any substance used as an ingredient in a pharmaceutical product. International Nonproprietary names, managed by WHO Mednet are the most widely used, however there are other nationally focused schemes such as USAN (United States Adopted Name).
A significant part of product and substance identification is now linked to external regulator identity. This includes FDA SRS (Substance Registration), EMA XEVMPD and now SPOR. We are also beginning to see external substance registration systems – such as G-SRS which are assigning unique identity to all substances used as ingredients in Pharmaceutical products.
Can you please tell us about your work with Clinical Data ?
I have now worked in 2 large scale clinical transformations in different companies. In both cases I have found that data models are critically important to understand and organise the data, master and reference data are critical to support integration of the data and data governance, ensuring we make good decisions about data, is truly essential.
The Life Sciences Industry, just like the BFSI industry, has a major history of mergers and acquisitions. How does that impact your data management processes?
This is a fact of life in the pharmaceutical industry and all pharmaceutical companies carry the entire history of its company within its internal identifiers. This is an ongoing process as acquisitions of companies or individual products are a constant theme within the industry. An added complication is that identifiers are frequently exposed externally – to regulators and to the general public; hence it is not straightforward to change any of the identifiers. This is made next to impossible to solve because most operational solutions within this industry are set up to expect a single identifier, with no provision for alternatives or easy way to change,
I’ve found that master and reference data solutions can handle this conundrum extremely well. Using these systems we can set up solutions that assign unique, non-changing, identifiers and can code a full range of synonyms allowing ready conversion between legacy and current terms. We can even change preferred terminology, allowing the organisation to change how it reports on a data entity overnight. Embedding use of the master and reference data into operational systems takes time, but addresses this issue and allows us to connect data from across the organisation using common identifiers.
This is very similar to the type of functionality supported within Ontologies and I’ve also found that we can add tremendous richness to the master and reference data by building connections to ontologies. Just storing an external identifier from an ontology as part of your master data opens up the ability to connect to vast numbers of external sources. It also allows the master and reference data to act as a graph, connecting data for use in AI/knowledge graph solutions.
There is a critical dependency on expert data stewardship to ensure that the master or reference data terms are mapped correctly. With scientific data, this can require deep expertise. In the case of substances it is crucial that an expert checks that two different identifiers reference exactly the same substance by checking the molecule or other descriptors; this requires considerable chemical expertise. This means that the connection to a data governance organization is critical for all master and reference data programmes.
Owing to the wealth of information Colin provided, this interview is conducted in 2 parts. We will publish the second part shortly. Watch this space!Tweet
Part 2: Master Data Maestro – Colin Wood »
« Agile Master Data Management at Scale with AI