Cardiac disease diagnosis is frequently reliant on echocardiograms, a crucial imaging technique. However, manual analysis of these images is time-consuming and prone to variability. Automated Diagnosis using artificial intelligence (AI) and deep learning offers a promising solution to enhance efficiency and accuracy in cardiac assessments. This article explores a novel deep learning approach, EchoCLR, for automated analysis of echocardiogram videos, specifically focusing on the detection of severe aortic stenosis (AS) and left ventricular hypertrophy (LVH). This method leverages self-supervised learning to improve diagnostic performance and reduce the need for extensive labeled data.
Data Curation and Preprocessing for AI-Driven Echocardiogram Analysis
The foundation of any robust automated diagnosis system lies in the quality and preparation of the data. In this study, a large dataset of trans-thoracic echocardiogram (TTE) studies was compiled from Yale New Haven Hospital, spanning from 2016 to 2021. This dataset was strategically curated to include a development set oversampling severe AS cases to ensure the AI model learns effectively from these critical examples, and an evaluation set reflecting real-world prevalence.
Initially, over 12,500 TTE studies were extracted, with approximately 10,000 studies from 2016-2020 for model development and 2,500 studies from 2021 for external validation. To address the rarity of severe AS, the development set was enriched by a factor of 50 for severe AS and 5 for non-severe AS. Crucially, the 2021 dataset was not artificially enriched, providing a temporally distinct and unbiased dataset for evaluating the real-world applicability of the automated diagnosis system.
From the extracted studies, over 447,000 videos were generated. A critical step in preparing this data for AI was deidentification. This was achieved by masking boundary pixels in each video frame, effectively removing protected health information while preserving the diagnostic content of the echocardiograms. The videos were then converted to AVI format for efficient processing.
Automated View Classification for Targeted Echocardiogram Analysis
Echocardiogram studies consist of multiple views, each providing different perspectives of the heart. For automated diagnosis to be efficient, it’s beneficial to focus on specific views relevant to the condition being assessed. This study utilized Parasternal Long Axis (PLAX) view echocardiograms, a standard view for assessing aortic valve and left ventricular function.
To automatically isolate PLAX views, a pre-trained TTE view classifier, developed by Zhang et al., was employed. For each of the 447,653 videos, ten frames were randomly selected and processed by the classifier. The probabilities predicted by the classifier across these ten frames were averaged to yield a video-level view prediction. The system was designed to retain videos confidently classified as “PLAX,” ensuring that the subsequent automated diagnosis focused on the most relevant image data.
Following view classification, over 30,000 PLAX videos from over 9,000 studies were identified and prepared for deep learning model development. Videos with low-flow, low-gradient AS were excluded due to complexities in severity assessment, resulting in a refined dataset of 29,978 PLAX videos. Further preprocessing involved binarizing frames and masking pixels outside the main image content to focus the AI’s attention on the essential cardiac structures. Finally, videos were downsampled to 112 × 112 resolution to optimize computational efficiency for deep learning models.
EchoCLR: Self-Supervised Learning for Enhanced Automated Diagnosis
A key innovation in this approach to automated diagnosis is the use of self-supervised learning. Traditional deep learning models require large amounts of labeled data, which can be expensive and time-consuming to obtain in the medical field. Self-supervised learning (SSL) offers a way to leverage unlabeled data to pre-train models, enabling them to learn useful representations from the inherent structure of the data itself.
The researchers developed a novel SSL algorithm called EchoCLR, specifically designed for echocardiogram videos. EchoCLR incorporates two key components: multi-instance contrastive learning and a frame reordering pretext task.
Multi-instance Contrastive Learning: Unlike standard contrastive learning that creates artificial “positive” pairs through image augmentations, EchoCLR leverages the fact that a single echocardiogram exam typically includes multiple videos of the same patient. EchoCLR considers distinct videos from the same patient, acquired during the same exam, as positive pairs. This approach, termed multi-instance SimCLR (MI-SimCLR), allows the model to learn robust representations by contrasting different but related views of the same heart.
Frame Reordering Pretext Task: To capture the temporal dynamics inherent in echocardiogram videos, EchoCLR includes a frame reordering task. The frames within each video are randomly shuffled, and the model is trained to predict the correct order. This task encourages the AI to understand the temporal coherence of cardiac motion, which is crucial for accurate automated diagnosis of cardiac conditions.
By combining MI-SimCLR with the frame reordering task, EchoCLR learns powerful representations from unlabeled echocardiogram videos. This pre-training allows the model to be efficiently fine-tuned for specific diagnostic tasks, such as severe AS and LVH detection, even with limited labeled data.
Supervised Fine-tuning and Evaluation of Automated Diagnostic Models
After self-supervised pretraining with EchoCLR, the 3D-ResNet18 encoder was fine-tuned for the specific tasks of LVH and severe AS classification. To demonstrate the effectiveness of EchoCLR, its performance was compared against models initialized with:
- SimCLR: A standard contrastive learning method (without multi-instance learning or frame reordering).
- Kinetics-400: A transfer learning approach using weights pre-trained on a large video dataset of human actions.
- Random Initialization: A baseline model with no pre-training.
To evaluate the data efficiency of each initialization method for automated diagnosis, a training ratio experiment was conducted. Models were fine-tuned using varying percentages of the available labeled data (1%, 5%, 10%, 25%, 50%, and 100%). Performance was assessed using AUROC (Area Under the Receiver Operating Characteristic curve) and AUPR (Area Under the Precision-Recall curve), with a focus on AUPR due to the imbalanced nature of some cardiac disease datasets.
The results demonstrated that EchoCLR consistently outperformed other initialization methods, especially when labeled data was scarce. This highlights the advantage of self-supervised pretraining and the specific benefits of EchoCLR’s multi-instance learning and temporal coherence approach for automated echocardiogram diagnosis.
Interpretability of AI in Automated Diagnosis: Saliency Map Analysis
To ensure that the automated diagnosis system is learning clinically relevant features and not relying on spurious correlations, interpretability analysis was performed using Grad-CAM. Saliency maps were generated to visualize which regions of the echocardiogram videos were most influential in the model’s predictions of severe AS.
The saliency maps highlighted that the EchoCLR-pretrained model focused on anatomically plausible regions of the heart, particularly the aortic valve area, when diagnosing severe AS. This provides evidence that the AI is indeed learning to recognize clinically meaningful patterns, increasing confidence in the reliability and trustworthiness of the automated diagnosis system.
Conclusion: Advancing Automated Diagnosis in Cardiology with Deep Learning
This research demonstrates the potential of automated diagnosis in cardiology through the development and validation of EchoCLR, a novel self-supervised learning approach for echocardiogram analysis. By leveraging unlabeled data and incorporating temporal coherence into its learning process, EchoCLR enhances the efficiency and accuracy of deep learning models for cardiac disease detection. The superior performance of EchoCLR, particularly in data-limited scenarios, signifies a significant step towards more practical and scalable applications of AI in echocardiogram analysis, paving the way for improved patient care through automated and reliable diagnostic tools. The interpretability analysis further strengthens the clinical relevance of this automated approach, ensuring it aligns with medical expertise.