Introduction
One of the most apt aphorisms for data science is the saying – ‘your results are as good as your data’. Due to an upsurge in electronic storage of medical data, one expects that the analysis of the data to be a straightforward task. However, one of the biggest challenges faced by data scientists especially when dealing with medical data is to gain insights from databases that are unstructured, disparate and inconsistent. This task becomes especially challenging when dealing with textual data.
With data present in several different silos and different formats one must be able to get them together into a single comprehensive database in a consistent and accurate way from which actionable insights may be gathered.
The Challenge
We were approached by one of the largest tertiary health care centre in the country where they sought to streamline their administrative tasks in their operation theatre in order to deliver better quality of care and to reduce mismanagement of resources. The hospital also sought to upgrade its existing Legacy system and bring it on par with international standards for hospital database management.
The hospital data set had approximately 1,00,000 different surgeries of which only approximately 35,000 of them were unique. This show the lack of uniformity of format The hospital then wanted to map their surgeries onto an international database with standardised names for surgeries.
CoL Approach
We at CoL understand the responsibility that is inherent in medical data and are unwilling to compromise on the accuracy of our processes. Due to this, we used a two pronged approach while mining and understanding the data.
After this, we trained our natural language machine to identify the mapping between our structured data to UMLS, an international standard when it comes to standard medical terminology. [Still to be done]
Results
“Data! Data! Data! I can’t make bricks without clay!” -Sherlock Holmes
The basic building blocks that one needs for analysis is data. By our efforts in standardising the data, we were able to help the hospital view its own data
Descriptive Analytics and Visualisation: We designed a dashboard for the hospital which helps them understand their data.
Operations Streamlining: We designed an application which helps the hospital organise and plan its surgeries optimising resources and schedules.
Predictive Analytics: We designed an application that predicts the bill that a patient will have to pay based on their surgery, co-morbidities and other patient characteristics.
Conclusion
Medical Data is a valuable resource from which several insights can be gleaned. However, before using this data for analysis, it is imperative to make a meta thesaurus which can be used to identify synonyms, abbreviations, misspelt words which may then be rectified in order to gain highly structured data. This data can then always be leveraged in order to gain actionable insights