Analytics that helps understand, explain, and refine business processes is one of the most important use cases that a health data platform serves. In this piece, we will examine what analytics means and how a health data platform that enables and facilitates effective analytics can be architected.
What is Analytics?
Analytics is the term used for operations, procedures, and processes that aim to find patterns in data, which can then be used to gather insights and draw inferences. The aim of analytics, when used within a business, is to utilize these insights and inferences to improve business processes, develop better products, and target specific opportunities. In the healthcare domain, too, analytics serves the same purpose. Analytics has broadly been divided into four categories: Descriptive, Diagnostic, Predictive, and Prescriptive.
Descriptive Analytics involves displaying, reporting, and visualizing “what” is happening. It is mostly concerned with presenting and elucidating real-time, and batch data that shows events and patterns and slices/dices them along multiple dimensions. A healthcare example would be data and graphs showing that mental-illness-related clinical problems have increased in the population since the start of the COVID-19 pandemic in 2020.
Diagnostic Analytics involves answering the question “why” something is happening. In other words, it tries to find out the reasons or causes for the data patterns observed. Continuing the healthcare example of mental illness increase during the COVID-19 pandemic, diagnostic analytics would try to find out if it is related to lockdowns, social distancing and lack of in-person interactions, job loss due to the economic recession caused by the pandemic, and other reasons.
Predictive Analytics uses past data patterns in order to foretell or forecast what is likely to happen in the future. Predictive Analytics involves the use of statistical and artificial intelligence-based models to make these predictions. For example, a predictive model in healthcare could try and predict which persons are most likely to develop mental illness problems because of the COVID-19 pandemic.
The Importance of a Robust Health Data Platform in Enabling Analytics in a healthcare organization
Data is the lifeblood of analytics. The patterns, inferences, and insights that analytics seeks to glean are dependent on the quality, timeliness, validity, and accuracy of the data that the analytics algorithms process. Thus, if a healthcare organization seeks to become a leader in creating and utilizing analytics effectively, it needs to start with ensuring that its data platform and its data management processes are architected expertly.
Architecting a Robust Healthcare Data Platform
Creating a robust healthcare data platform that can support and enable analytics involves addressing the following:
Big Data Scale - The healthcare data platform should be able to accommodate and process large volumes of different types of data from multiple sources, i.e., the trusted formulation of the Three Vs. of Big Data: Volume, Velocity, and Variety. When choosing a vendor-supplied or open-source product to implement the data platform, various options are available - both on-premise and cloud-based. Both have their pros and cons. Cloud-based platforms are gaining in popularity because of the ability to scale horizontally and the ease of management of resources.
Batch and Streaming Data - Data is now generated by the minute, and many analytics processes work with real-time data streams. The old-school batch loads of data using traditional ETL (Extract-Transform-Load) that take days to process are still required, but a modern health data platform needs to accommodate real-time streams with extremely low latency. The streaming data needs to be made available for analytics in a real-time fashion.
Low Data Latency and ACID Transactions - A health data platform needs to support low data latency while still enabling ACID (Atomic-Consistent-Isolated-Durable) transactions. Many data warehouses + data lake set-ups offered on cloud-enabled data platforms suffer from either not being fully ACID-compliant for their transactions or having high data latency and low reliability. Therefore, when choosing a technological framework, careful thought must be given to these considerations.
Healthcare Data Standards Compliance - Utilizing healthcare industry standards in the data schemas used to store the health data is an extremely enabling strategy for analytics. If the health data is already compliant with widely-adopted standards like FHIR, the data analysts and data scientists do not have to spend a whole lot of time and effort munging the data to make it fit for analytics. This is a huge time-saver and improves the productivity of the analytics experts.
Relational and Non-Relational Data Storage - Relational Data with easy SQL-based access is still prized for analytics. Thus, a good healthcare data platform should ensure that traditional SQL-based access to data stored in relational schemas is available. In addition, though, there is usually a need to support other methods of querying and data storage, including NoSQL and columnar databases.
Low Total Cost of Ownership - From the organizational perspective, a low TCO is as important as functional and performance goals. Many proprietary health data platforms offer great performance and reliability statistics while sticking the bill for the same to the naive customer. Therefore, a big part of the evaluation process involves examining candidates and selecting a platform that is affordable to maintain.
Direct Support for Data Science Processes including Artificial Intelligence/Machine Learning - Advanced analytics (predictive and prescriptive) often requires the use of Machine Learning and Artificial Intelligence algorithms in a contiguous environment. Unfortunately, many legacy data warehousing environments do not offer direct support for AI/ML algorithms to be executed. They offer good support for Business Intelligence and reporting, but AI/ML seems to be an afterthought. Thus, selecting and implementing a robust health data platform that directly empowers data scientists by providing them with a working environment is a critical differentiator.
In summary, a robust health data platform can be specifically engineered to capacitate analytics by paying close attention to components like scalability, direct AI/ML support, cost of ownership, and the ability to accommodate various types of data.