A Machine Learning Approach to Classify Medical Records by Psychiatric Diagnosis
2023 Award: $35,784
The widespread adoption of electronic health records (EHRs) has been accompanied by the enticing promise of using “big data” to improve patient care and clinical research. Many “big data” models rely on discrete data points, while in the field of mental health, our outcomes and diagnostic criteria are typically found in unstructured clinical narratives. In this study, we will train and evaluate machine learning classification models of psychiatric diagnoses using software capable of mining structured and unstructured data in the EHR, by incorporating unstructured clinical data using natural language processing.
Need/Problem: The widespread adoption of electronic health records (EHRs) is accompanied by the enticing promise of using “big data” to improve patient care through large-scale research studies, population health and quality improvement initiatives, and improved clinical decision support tools. While many data informatics tools work well with discrete data elements, in the field of mental health, our outcomes and diagnostic criteria are typically found in unstructured clinical narratives. These unstructured data-rich portions of the EHR present a challenge to machine learning (ML) algorithms that depend on discrete data elements. However, without these tools, mental health researchers and clinical informaticians must devote significant time and resources to inefficiently wade through clinical notes to identify eligible patients for a study or clinical intervention, or to classify cases and controls in a cohort. Similarly, clinical decision tools in the EHR and quality improvement initiatives risk missing important data only available in unstructured text if these tools and initiatives are limited to discrete data elements.
To overcome this challenge and accurately incorporate mental health outcomes into ML models, we must include unstructured data.
Grant Summary: We will train and evaluate ML classification models of psychiatric diagnoses using software capable of mining structured and unstructured data in the EHR. Our models will incorporate unstructured clinical data using natural language processing (NLP). We will first generate a gold standard by manual chart review which will be used to train and evaluate the performance of our models.
Goals and Projected Outcomes: This work will generate a set of trained classification models for psychiatric diagnoses. We will make our results available on the phenotype sharing repository Phenotype Knowledge Base (www.phekb.org). We anticipate building upon and applying the work described in this proposal to broader populations with future collaborators (e.g., investigators through the PheKB repository, such as at Vanderbilt University, or consortia such as PsycheMERGE) with the long term goal of supporting mental health research efforts and providing efficient and effective care to patients with psychiatric diagnoses.