Jump-start Your Health Data Platform Strategy

Aired on: Tue, October 12, 2021

314e at HLS Symposium 2022 | 314e Corporation

In this webinar, Dr. Siv shares his expert insights on the best healthcare data management strategies for healthcare provider organizations. The session was well attended and ended with an engaging Q&A session. The key highlights of the session include:

  • Creating data platforms attuned to the needs of the future
  • Strategies for managing various healthcare data types
  • Reconciling data from multiple sources
  • Enabling advanced analytics and ML/AI on the data platform
Dr. Siv Raman

Speaker: Dr. Siv Raman
Chief Product Officer at 314e

With 18 years of Fortune 50 leadership experience in Health IT and analytics. He is an industry expert on the healthcare payer & provider space with focus on analytics, informatics, big data innovation & population health management.

Here is the transcript of the webinar

Casey Post 00:01:
All right, we're gonna go ahead and get started here. Good morning, everyone, and welcome to our webinar on jumpstarting your health data strategy. My name is Casey post, I'm the Senior Vice President of Sales and Client Services. And I have the privilege of kicking off our webinars today and introducing our speaker. Just a few housekeeping items before we get started, we are recording this webinar, and a link to the recording and slides will be provided afterwards. We will have a q&a session at the end. So if you have questions as we're going, please drop the questions in the q&a section here on zoom, and we'll make sure to answer those at the end. And if you do want to ask your question live and in person, just, you know, raise your hand on the, use the raise your hand icon, and we will unmute you during the q&a session so you can ask your question live with, with Dr. Raman. I am going to give you a little bit of background on 314e. Before we jumped into this, but, so 314e was founded in 2014 initially to provide resource staffing and services, implementation services. We are exclusively focused in the healthcare-IT space headquartered in the San Francisco Bay Area. We have a regional office on the east coast in Philadelphia. And we have a development and delivery center in Bangalore where we're delivering services and developing products for our clients we are a fully, we've always been organically grown, no external funding and we hold many industry affiliations and partnerships with professional organizations like, like Chime and HIMMS and HFMA, as well as a lot of EHR vendors like Epic and we're working with Cerner and Meditech, and others. We also work with a lot of the analytics platforms and cloud platforms like Microsoft, AWS, Snowflake, Tableau, and Qlik and, and from a standpoint of KLAS rankings were certain you know the best in KLAS for technical services and consistently ranked high for a tie key staffing and implementation services. Next slide. And this is a summary of all the services that we provide. So we're again full, full, full suite of IP services for our healthcare clients. Everything from EHR, to ERP, implementation services on the end, and to interoperability services. We're providing a lot of data analytics services, as well as technology services like programming and robotic process automation, as well as digital transformation and cloud adoption. And as we as we're going to talk about today, we're in the process of developing products and solutions for our customers in addition to providing services. And so with that, I will kick things off here, so the healthcare industry generates more data than we know what to do with; if you can leverage that data you have better insights, enhanced decision making, and a higher return on investment. We have organized this webinar to help you overcome critical challenges like high data latency, inability to scale, fragmented processes, and failure to deal with the very data sources. Our speaker today, Dr. Siv Raman, the Chief Product Officer of 314e. He is a physician and clinical informatics expert. He has over 18 years of healthcare industry experience and has held senior leadership roles in clinical informatics in various organizations including Optum. At 314e. He is leading the development of our health data platform product. Dr. Siv Raman is also an author and has written several books on US healthcare and analytics. The key takeaways from today's webinar will be around creating data platforms attuned to the needs of the future strategies for managing various healthcare data types, reconciling data from multiple sources, and enabling advanced analytics and ML AI on the data platform. Over to you Dr. Raman.

Dr. Siv Raman 04:24:
Thank you for the introduction, Casey. So the webinar title is “Jump-start your Health Data Strategy”, but what I'm going to be doing is walking through some best practices that we recommend when you're developing and implementing a health data platform because things have changed drastically over the last 10 years in terms of how you go about doing that. And there are new insights that we want to share with those organizations that are in the process. of implementing a health data platform or, you know, revitalizing or redoing their platforms, because there are a lot of organizations that have already implemented platforms for analytics and data management 10 years back, and now they're seeing, seeing that their platforms are needing to be redone, or juiced up. So when we talk about healthcare data, you know, there's a whole lot of data that can be included under the domain of healthcare. But the most important types of data that a health care data platform will need to manage are the following. So there's clinical data from EMRs. There's medical and pharmacy claims. So whether you're a provider organization, or a payer organization, nowadays with provider payer, convergence and value-based reimbursement, and so many other things happening; both types of organizations see both types of data. In addition, there's care management data, from your care management systems, you know, utilization, review, and disease management and case management. There's omics data, which is proteomics and genomics data only some organizations have that kind of data, but the volume of that type of data is expected to grow. And then there's social determinants of health and other related data about your patients on members from external sources. Now, there's also non-healthcare data that is linked to this healthcare data that might need to be managed on your health data platform. So that would be data from ERP systems or CRM systems. And, so since it's about the same population of patients or members, you might want to link it on your platform to the healthcare data that you're managing.

Dr. Siv Raman 07:19:
Now, when we talk about healthcare data platforms, I want to make it clear that I'm restricting myself to OLAP platforms, which means Online Analytic Processing. So the other category of systems is OLTP systems, which is your Online Transaction Processing systems and that would be your core system that you do that you used to manage operations, like your EMR system, or your ERM system, e ERP system, or CRM system. But usually, when you want to do analytics, and you want to understand your data, and you want to do advanced algorithms for data science or other use cases, you don't want to do that directly on your OLTP system, you want to have a separate health data management or health data platform that is enabling advanced analytics. And so, when you look at organizations, in terms of where they are with their health data platforms, you usually find that they are in one of three stages, right? The first is where they have fragmented data and multiple databases and data stores not really an enterprise analytics platform. And so, the analytics that this sort of stage supports is rudimentary because a lot of it is done directly against the OLTP system or in small databases that are kind of extracts from your OLTP system. The second stage is you know, where you have an enterprise data warehouse and possibly an operational data store. So, this warehouse is usually relational, it is loaded, you know, daily or weekly, or monthly with data from the OLTP system, and the analytics are usually done from that data and that warehouse might also have non-healthcare data coming in from other systems. And the linkage is done for that data to the healthcare data. The third stage, which is the most advanced as of now, is where you have some sort of big data architecture that supports both a data lake, in addition to an enterprise data warehouse, so you have a situation where you're able to stream in the data, you know, on a real-time basis into your data lake, but then you're also able to move it through ETL processes to your warehouse. So you're able to do analytics on both your data lake, which is more real-time, but more unstructured data. And then the warehouse which has your standard schemas, Snowflake schemas, or Star schemas are key and cubes for doing analytics, and so that's the third and the most advanced stage of a health data platform. Now there is a poll, I would like you to answer, which will let us know what stage Do you think your organization is when it comes to your health data platform? Gaurav is that poll out? Okay. Great. So do please vote. And… so, we'll process those poll answers later. Okay. So what are the use cases for a robust health data platform, there are many use cases. And usually, the very reason why you might be looking to implement or

Dr. Siv Raman 11:52:
Redo your health data platform is because your current setup may not support one or more of these use cases, now they are clinical and financial reporting, business intelligence, competitive intelligence, visualization and dashboards clinical measures, compliance, mandatory reporting, population health management related analytics, which is becoming an important area, AI and ML. And then lastly, applications and APIs built on top of the data platform.

Dr. Siv Raman 12:34:
So I'm going to move on to the best practices we recommend, right. So best practice number one, when you are implementing a health data platform, is at least for the clinical and the claims data. You should be using industry-standard data models, so because otherwise, you will have a situation where you're not able to reconcile different data coming in from multiple sources, you're having trouble making sense of what a particular field or a particular data item means. And for both clinical and claims data, we recommend the FHIR-based models, right? So for, so FHIR or Fast Healthcare Interoperability Resources, this is an HL7 standard that came out a few years back. And for clinical data, it is now being widely accepted as the de-facto standard. And so what you should be doing on your health data platform is making sure that your data is modeled according to the FHIR standard for clinical data. Now, that doesn't necessarily mean you store it in the hierarchical or JSON-based representation that FHIR recommends, because you can always convert it to some sort of relational model that is still based on FHIR that supports analytics in a more easy fashion. But your baseline, your base standard should be the FHIR standards for various EMR data resources. For medical and pharmacy claims data, again, you have some options, you can use the explanation of benefits resource and FHIR to model that data. Or there are these all-payer claims databases that every, almost every US state has set up to receive data from multiple payers and they publish their specifications on their websites. And so most of these specifications are pretty similar. There's also an APCD common Data layout that can be used, that kind of reconciles all the different state-level specifications. So whether you go in for FHIR or for APCD based data models for your medical and pharmacy claims, we recommend going in for the FHIR one and then extending it with the, with any attributes that you see are extra or are found in the APCD specifications but are not covered in FHIR, those would be very few, but they might exist in some cases. So the second-best practice is to implement an Enterprise Master Patient Index or Master Entity Index upfront. So what's an EMPI, as it is called? When you're getting data from multiple sources, there is no universal ID in place in Universal patient ID or member ID in place in the US. So when you're getting data from multiple sources, about the same set of patients, you could run into trouble if you're not able to link different types of data, or different streams of data about the same patient or a member, because you don't know whether this is the same individual or a different person, right? So and you can't really use social security number or something like that as a proxy, because in many cases, especially when you have a subscriber versus member relationship, all of the members, the dependent members under a subscriber might be assigned the same SSN. So there's various situations where what you need to do is implement an EMPI and what that does is an EMPI uses probabilistic as well as deterministic algorithms to figure out whether patient A in stream X is the same as patient B in stream Y, right. So, if the algorithm says that these are the same patient, then you can combine the two types of data under the same patient ID if not you keep them as separate patients or separate persons.

Dr. Siv Raman 17:33:
Now, there are several commercial as well as open-source offerings for the EMPI opening EMPI is an open-source EMPI that you can look at commercial offerings include Argo data, and IBM's initiate and multiple other companies that provide an EMPI so but that's the second-best practice that we would say you absolutely need to adopt, which is to implement any EMPI early on in your health data platform setup. The third is to manage your ETL and ELT effectively, so what's ETL? So ETL is the process of extracting data from the source systems and loading it into a warehouse. So E stands for extract T stands for transform. and L stands for load. Now traditionally, that was the sequence in which this stuff was done, or these processees were undertaken. Because you would first extract the data from your OLTP system, transform it into the schema that you want it to be when it exists in the warehouse. And then lastly, you will load it into your enterprise data warehouse in some sort of Star schema or Snowflake schema. And that would then support analytics. With the advent of data lakes, the ETL changed to ELT. So what that meant is that you extract, you still extract the data from your source systems, but then you just load it into the data lake first, as-is, you don't, you don't really do much of transformation, you pretty much land the data in your data lake so that it can be used directly for analytics, it's a little bit difficult, but for those who understand the data structure, and with modern tools, you're able to directly do analytics on the warehouse on the data lake. And then you transform it for the various use cases, one of which would be loading into an enterprise warehouse. So ETL kind of change to ELT because of the advent of data lakes. Now there are architectures like the Lambda architecture that are used in order to set up an effective ELT or ETL process. So what the Lambda architecture does is, it has a batch, speed, and serving layer. So the batch is for the slow-moving and you know high latency data that you load, and you kind of make it available within the warehouse. The speed is for the streaming data that comes in near real-time and then both these types of data served up from the serving layer; and then you use various processing tools like Spark Kafka, to set up your ETL infrastructure and processes. Now, this is where we have a recommendation unless you are already down the route of setting up the whole data lake and you've already got an existing data lake and data warehouse set up. We would say please do consider a lake house architecture instead of a traditional data lake plus data, data warehouse. And that's because, what's the lake how's it it's, it's an architecture that came out just about a year or so back. It combines the best of what a data lake offers with what a traditional enterprise warehouse offers. So you have a low latency and support for analytics, just like a data lake. At the same time, you have the relational support. And the traditional warehouse architecture also can be implemented within a lake house. And there are various implementations of this lake house one is the data bricks Delta Lake platform, there's another one called Apache Hudi, there is an Apache Iceberg which are both open source.

Dr. Siv Raman 22:30:
But what a lake house offers you is the ability to do what you were able to do with a data lake and relational warehouse. But having, avoiding the problems with data lakes, for example, you know, no support for async or acid transactions. So I'll talk about this a little bit more because we also at 314e, we have a health data platform that is based on the data breaks lake house and it's optimized for healthcare. There's the next best practice is to enable the use of multiple tools, right? So when you have the data platform set up, you're trying to address the use cases that I talked about earlier, but at the same time, you want to enable the use of multiple tools as well, because there are so many tools that are now used to do analytics and other operations within a data platform. So number one would be standard BI tools, Tableau, Cognos, etc. Reporting and analytics using programming languages like Python, R, SAS, SPSS, visualization, using libraries as well as languages. And lastly, you want the platform to be able to support AI and ML algorithms, predictive models using Python, Spark, Scala R, Java, multiple languages. And so if you have a traditional data warehouse that might support you know, one or two of these use cases, but it will probably not support AI and ML algorithms being run directly in the warehouse. So that's why a modern architecture, like a lake house or a data lake plus warehouse will support all these use cases. And that'll allow you to get maximum value out of your platform. And then the last, but certainly not the least, best practice is to enable additional processees, right. So, for example, when you've got clinical claims and other types of data coming in on the same patient, you want to make sure that this is above and beyond the EMPI, right? So the EMPI will link different types of data on the same patient. But even there, you might have two different streams of data that say different things about what that what the set of clinical services for, though that patient were. So you have to have a process for kind of figuring out what the source of truth or the correct record is, you need deduplication algorithms, if you get the same set of claims repeatedly on the patient, or member, you don't want a situation where you're showing 16 encounters instead of the eight, because you've got two different streams coming in. You need to absolutely implement Master Data Management, which includes an understanding of the source of truth, and also a single version of the truth when it comes to facts about the patients and members. And lastly, you need some sort of quality control as you're going through your ETL to ensure that bad data doesn't make it into your platform. And that can lead to multiple problems, especially with the veracity of your analytics.

Dr. Siv Raman 27:12:
Okay, so I'm going to go on to discussing some of the problems that current warehouses and data lake type setups have in terms of and why we are recommending for those organizations that have not already implemented a data warehouse and data lake set up that they should go in for a lake house type architecture. Firstly, you have the problem of data staleness, right, so if you have only the warehouse, you have staleness data latency, which is high. If you're loading the warehouse only weekly, then all of your data there is at least a week old. So the second is reliability. Thirdly, you have a problem, even if you have a full warehouse plus a data lake to support the near real-time data transactions. The total cost of ownership can be high because you're setting up all of these complicated Lambda architectures, etc, that require a lot of complexity, and a lot of resources to maintain. Now, you also have limited support for advanced analytics in your warehouse, for sure, even in a data lake, a traditional data lake because of the next point, which is data lakes as they are architected to date do not really guarantee atomicity, consistency, isolation, and durability. So any data there, if you're reporting on it, or you're doing analytics on it, may or may not be completely correct. So these are all the reasons why we are you know, we have, 314e has a Muspell Health Data Platform that we license to those healthcare entities that are interested, we also provide health data services. So if you are very happy with your platform as is, but you want some stuff done. We also provide services in that realm. So you don't really have to license our Muspell platform as a product. But you can also use our health data services for implementing new things on your existing platform. But the Muspell platform itself offers things like SQL compliance, acid transactions support, full support for ML and AI. We, as I talked about the industry-standard data model. So whenever we set up ETLs, on the Muspell data platform, we make sure that the data is based on FHIR and standard data models. We incorporate Master Data Management, quality control, and easy access to multiple tools. And lastly, we support applications and APIs. So if you want to know more about the new spell, health data management, and analytics platform, please reach out to us if you're interested in services related to data, or your data platform as it exists in your organization today, please reach out to us for that as well. So that's the end of my presentation. And I will take questions now.

Casey Post 31:08:
Thank you very much. Dr. Raman That was great. Great presentation. The results of that poll; were, you know, not surprising, about two-thirds were at fragmented data in multiple databases. And another third were at enterprise data warehouses with, with nobody reporting that they had a data lake plus data, data warehouse. So I think that you know, the, the question came in about what are the steps an organization needs to take to develop a health data platform are relevant given those, those results?

Dr. Siv Raman 31:51:
Right. Yeah. And that's the reason why I was kind of holding off on the second poll, which was you have a data science team. So maybe we should do that poll as well now. And in the meantime, I will also take questions.

Casey Post 32:09:
And a reminder, if anyone has a question they'd like to ask live, just raise your hand, and we'll unmute you to that to let you discuss.

Dr. Siv Raman 32:22:
Okay, so I'm seeing one question come in here. So what are the steps an organization needs to take to develop a health data platform? Okay. So the, I think the first thing to do when you're evaluating your strategy for developing health data platform is to identify your use cases, right? So I have covered those use cases earlier. Because many organizations don't have all of these use cases. But the reason why your health data platform as it exists today will start showing its age is because one of these use cases is difficult to address on it. So the first step is to identify the use case or the list of use cases that you're going to support on your health data platform. And then, at that point, you want to start looking at the technical implementations, the technical architectures are the products that support those use cases. So don't go in technology first. When you are developing a health data platform, instead, try to understand what are all the things that you want to do on your health data platform? And then, once you've identified the use cases, then you look for technology solutions to address those use cases. I'm seeing another question here. Let me just check. Should we go in for a cloud-based platform or an on-premise one? So I think more and more organizations are going in for cloud-based platforms, especially when you're doing a data lake plus a relational data warehouse. The word seems to be moving to the cloud. That doesn't mean that you cannot implement a platform like that on-premise, but it just means you will then have to manage your hardware and your servers and your spark clusters on your own. And most organizations feel that it's easier to put that out on the AWS or Azure, or Google Cloud, and, you know, have that platform be managed on the cloud, rather than doing it on-premise. Okay. There's one question here. It says, what will be the scope of healthcare data analytics? Well, the scope of healthcare data analytics is very broad. If you want to categorize data, I think there are three categories of analytics. The first is what is known as descriptive analytics, right? So you have, you want to just know what's happening like you want to slice dice, you want to show financials you want to show encounters, clinical measures, these are all descriptive analytics, they usually don't require more than your standard BI tools or SQL, in order to deliver them. The second category is called what is known as predictive analytics, right? So you're kind of not only interested in seeing what's happening but also interested in understanding what is going to happen. So that usually involves some sort of AI ML algorithms on top of your data, in addition to just the descriptive analytics. And the last is called prescriptive analytics. So you know, you're able to predict, but you also want to recommend some action, whether that's some clinical decision support action or some sort of financial action. So that's called prescriptive analytics. So a robust Health Data Platform should be able to support all three types of healthcare data analytics.

Casey Post 37:07:
Agree at one last question here to have is, what are the kinds of resources required to manage a modern health data platform?

Dr. Siv Raman 37:16:
That's a very good question. So an old-school ETL with just a warehouse used to be simpler to manage, right, you would have data architects and data analysts, who would be pretty much able to do all of that. When you move to something more modern, like a lake house, or a data lake plus warehouse architecture, there are various types of resources that might be required. So you have data engineers, who are in charge of managing the, the ETL, the ELT getting the data and moving it into the warehouse, moving it into the data lake and making it available; transforming it, of course. You've got cloud architects, if you're implementing this platform on the cloud, you obviously have your traditional data analysts and BI folk. And you also have data scientists, because if you're trying to get predictive models and other value out of your platform, you need a data science team. And so one really good book I read on the topic about building analytics teams, which includes data teams is it's called “Building Analytics Teams” by John Thompson, I think. And you know, that's a good book, because if you're, if you're setting up a platform, you also need to know whom to hire to manage it. So I think you don't have to always hire some in some cases, you can train existing resources. But a robust modern healthcare data platform requires a different set of resources and skillsets than a traditional one.

Casey Post 39:28:
Well, I think that's it for our questions. I want to, I want to thank you for, for your time, Dr. Raman, and for this presentation. And I want to thank all of our guests for joining us on this, this webinar. Again, we'll, we'll send out the slides and the recording of the presentation, shortly after this. So if you have any questions or you care to discuss any of your healthcare, data analytics, you know, initiatives with, with 314e we'd be, we'd be glad to, to have a conversation with you. Thanks again and have a great day!