Figure 1
Figure 1

COPD Average Age of Diagnosis: Understanding the Impact of Age on Chronic Obstructive Pulmonary Disease

Introduction

Chronic Obstructive Pulmonary Disease (COPD) stands as a major global health challenge, recognized as a leading cause of both illness and death. While COPD can manifest at various stages of life, symptoms typically become noticeable after the age of 40. The way COPD presents itself clinically is highly variable, differing not only at the initial diagnosis but also throughout the disease’s progression and its overall impact on a patient’s life. This variability underscores the need to fully understand the diverse factors that contribute to the burden of COPD in each individual [1, 2]. Current COPD treatments are largely based on clinical trials involving subjects who represent an ‘average’ COPD patient. However, these trials often do not fully explore the influence of co-existing conditions, which significantly affect how COPD progresses. Furthermore, pivotal clinical trials informing COPD treatment strategies often focus on patients with a mean age around 65 years [3, 4, 5, 6, 7]. This selective approach can limit the applicability of trial outcomes to a broader patient population, especially older individuals who may require tailored treatment approaches due to their unique health profiles.

To gain a comprehensive understanding of COPD, it is essential to consider that a patient’s condition is frequently shaped by other health issues, comorbidities, which may not always correlate with the severity indicated by spirometry tests. These comorbidities often become more prevalent and impactful with advancing age [8, 9]. A common observation in COPD patients is that even with similar levels of forced expiratory volume in 1 second (FEV1), there can be significant differences in functional impairment, clinical symptoms, frequency of exacerbations, and overall quality of life. This variation may stem from the heterogeneous nature of COPD, potentially arising from different underlying disease mechanisms and manifesting as diverse clinical phenotypes. Studies across general populations have consistently shown that as COPD patients age, the likelihood of developing multiple chronic comorbidities increases, contributing to a more pronounced decline in their clinical condition [10, 11].

Beyond the increased incidence of comorbidities, age itself is recognized as a significant factor influencing COPD progression. While there are parallels between lung aging (often termed “senile lung”) and COPD, the physiological changes associated with aging should not automatically be classified as pathological conditions requiring intervention. Currently, there is insufficient evidence to definitively state whether lung aging alone is a primary driver for COPD patients needing medical attention or hospitalization due to exacerbations. It may, however, explain certain related pathological changes, particularly emphysema [12]. Regardless of the precise role of aging in COPD development, it is undeniable that as COPD patients get older, their clinical characteristics evolve, necessitating careful identification of these changes as they can significantly affect the course of the disease.

Recent research has increasingly emphasized the importance of considering time, or aging, as a critical element in the interplay between genetics and environmental factors in the development and origin of COPD. The age at which genetic predispositions and environmental exposures interact is crucial, as are prior exposures experienced by the individual or even their parents. A holistic approach that integrates genetic, environmental, and temporal dimensions promises to provide deeper insights into lung function and explain the diverse clinical presentations of COPD [13].

Our central hypothesis is that the clinical profile of COPD is not static but varies with patient age, leading to differences in the overall disease burden. Recognizing this age-related variability is crucial for designing effective clinical studies and developing targeted treatment programs. Such an approach could facilitate more personalized patient management, focusing on treatable traits and potentially alleviating the current burden of COPD.

The application of big data analytics and artificial intelligence in healthcare is revolutionizing our ability to manage and derive meaningful insights from the vast amounts of complex data generated by electronic health records (EHRs). This technology allows for the evaluation of key indicators in clinical processes, minimizing biases typically associated with traditional selection methods. Big data is now a cornerstone tool, transforming our understanding of complex phenomena, especially in epidemiology. Its significance is particularly evident in public health, where the massive collection and analysis of data enable early disease detection, monitoring of epidemiological trends, and the development of more effective preventative strategies.

In epidemiology, big data integration from diverse sources, including EHRs, social media, and wearable devices, is invaluable. Big data analytics also allows for more precise customization of health interventions and policies, tailoring them to meet the specific needs of different populations. Furthermore, the synergy between big data and artificial intelligence (AI) significantly enhances health management capabilities. AI plays a vital role in processing and analyzing the large datasets generated by big data. Advanced algorithms can detect patterns and correlations in real-time, facilitating early disease detection and rapid responses to critical epidemiological situations. Machine learning capabilities of AI further improve predictive models, refining their accuracy as new data is incorporated. This combination of big data and AI is a fundamental asset for enhancing epidemiological resilience and promoting global health in the modern era.

The primary objective of this study is to identify, within the context of routine clinical practice, how the clinical profile of COPD patients differs based on age and how these differences impact the global burden of the disease. By achieving this, we aim to enhance our understanding of gene-environment-time interactions, ultimately leading to improved treatments and prevention strategies for COPD.

Figure 1: Patient Flowchart. Flowchart detailing patient inclusion in the COPD study, highlighting the progression from initial population identification to the final study cohort.

Materials and Methods

This study adopted an observational, retrospective, and non-interventional design, utilizing secondary data extracted from the free text sections of electronic medical records (EMRs). Conducted within the Castilla-La Mancha region of Spain, the study leveraged the Savana Manager v.3.0 tool, which is employed by the regional healthcare administration (SESCAM). This tool enabled the analysis of data extending back to January 1, 2011. The study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines [14]. The study population comprised all patients over 40 years of age diagnosed with COPD and treated between January 1, 2011, and January 24, 2021.

The methodology employed has been previously detailed in several publications [15, 16, 17, 18]. Savana Manager operates as a data extraction system powered by artificial intelligence (specifically, natural language processing, NLP) and big data technologies. This system is designed to extract unstructured clinical information (natural language or free text) from electronic health records (EHRs), transforming it into structured and reusable data for research purposes while ensuring patient anonymity at all times. Furthermore, computational linguistic techniques, combined with the SNOMED CT [19] tool, are used to scientifically detect and validate comprehensive clinical content from EHR data across SESCAM’s specialized care network (including hospitalization, emergency, and outpatient services) and primary care facilities.

Data Management and Protection: The hospital’s IT services are responsible for the initial data processing and anonymization, ensuring that Savana receives only non-identifiable data. To further protect patient privacy, an algorithm is used during data extraction to randomly insert confounding information per patient while only recovering a fraction of individual information. This process results in a completely anonymized patient database, where all study reports contain only aggregated data, and individual patients or physicians cannot be identified. According to the European Data Protection Authority, anonymized clinical records, stripped of personal data, are no longer subject to the General Data Protection Regulation. This study received ethical approval from the Research Ethics Committee (Comité de Ética de la Investigación, or CEIm) of the Guadalajara public healthcare area (ref 1/2023, dated January 17, 2023).

Information Extraction Assessment: For this study, COPD as a variable was identified within the free text using a named-entity recognition approach. Additional layers of negation and temporality detection were applied to refine the accuracy. The machine learning model for negation detection combines a rule-based layer with a binary convolutional neural network. This network, trained on real Spanish EHRs and evaluated against extensive reference standards, classifies each clinical entity as either affirmative or non-affirmative based on its lexical and semantic context. Temporality detection is handled by an NLP module that uses multiple layers to assign dates to clinical entities. The first layer is a named-entity detection engine that identifies any mention of dates within the EHRs’ free text. Subsequently, a relationship model, based on a Bi-LSTM network, determines if a detected date is related to a detected clinical entity. A normalization layer then converts various date formats found in the EHRs’ free text into a standardized representation. The final NLP processing step involves several quality control operations and integrates the outputs from different NLP modules into a finalized database.

Following cNLP processing, three authors validated the tool’s outputs and the technology’s performance. This validation aimed to confirm the reliability of the EHRead® technology in identifying records mentioning “COPD” and related variables. A set of 560 documents was manually verified to establish a gold standard for reliability. The performance of Savana was evaluated against this gold standard using precision (P), recall (R), and F-score metrics.

Precision, indicating the reliability of retrieved information, was calculated as P = tp/(tp + fp). Recall, measuring the amount of information retrieved, was calculated as R = tp/(tp + fn). The F-score, the harmonic mean of precision and recall, was calculated as F = 2 × precision × recall/(precision + recall), providing an overall measure of information retrieval performance. In these calculations, true positives (tp) were correctly identified records, false negatives (fn) were unidentified records, and false positives (fp) were incorrectly retrieved records.

Previous assessments have shown that the values for these metrics exceeded 0.9, confirming the adequacy of the diagnostic approach for identifying the study population. The F-values for different terms analyzed ranged between 0.92 and 0.97.

Statistical Analysis: All variables were analyzed using SPSS software (version 25.0; IBM, Armonk, NY, USA) and OpenEpi (https://www.OpenEpi.com accessed on February 6, 2023). Standard descriptive statistical analyses were employed. Qualitative variables are presented as absolute frequencies and percentages, while quantitative variables are expressed as means, 95% confidence intervals, and standard deviations. For numerical variables, the Student’s t-test for independent measures was used, and the Chi-squared test was used to assess associations and compare proportions between qualitative variables. To determine if variables were related to the selected population, significance was evaluated using a Chi-squared 2 × 2 contingency table, controlling for sex and age biases. A p-value less than 0.05 was considered statistically significant. Savana presents events in order based on the odds ratio (observed vs. expected frequency). In all tests, p-values less than 0.05 were deemed significant.

Results

During the study period from January 1, 2011, to January 14, 2021, a total of 73,901 patients diagnosed with COPD received treatment from Castilla-La Mancha Public Healthcare Services (SESCAM). The average age at COPD diagnosis in this cohort was 73 years (95% CI: 72.9–73.1), with a significant majority, 76.8% (56,763), being male. Table 1 details the primary clinical and demographic characteristics of the study population.

Table 1. Basal Characteristics of the Study Population.

Characteristic Male COPD Population (n = 56,763) Female COPD Population (n = 17,138) p-value
Age, years (95% CI) 72.9 (72.8–73.0) 72.3 (72.1–72.5)
Comorbidities (%)
Arterial Hypertension 70.0 72.3
Dyslipidemia 49.6 52.5
Diabetes 37.9 38.5
Smoking 41.7 35.9
Obesity 23.9 32.7 <0.05
Heart Failure 37.3 48.3 <0.05
Atrial Fibrillation 19.4 18.4
Ischemic Cardiopathy 14.4 7.7 <0.05
Obstructive Sleep Apnea 13.5 10.8
Depression 10.0 27.2 <0.05
Hiatal Hernia 12.7 17.3 <0.05

Analyzing patients by sex revealed that obesity, heart failure, depression, and hiatal hernia were significantly more prevalent in women (p < 0.05).

When examining patients across different age ranges, a clear trend of increasing cardiovascular risk factors and associated diseases, particularly cardiovascular diseases, with advancing age was observed (Table 2).

Table 2. Comorbidities in COPD by Age Range.

Comorbidity (%) >40 without COPD (Mean Age 62.1) Total COPD Population >40 (Mean Age 73) COPD 40–49 (Mean Age 45) COPD 50–59 (Mean Age 54) COPD 60–69 (Mean Age 64) COPD 70–79 (Mean Age 74.4) >COPD 80 (Mean Age 85.3)
Sex, male (%) 51.4 76.8 66.7 71.2 79.4 82.1 77.1
Comorbidities
Arterial Hypertension 29.9 70.5 30.8 46.2 61.1 72.7 78.7
Dyslipidemia 21.0 50.3 27.4 39.9 47.8 51.0 47.2
Diabetes 14.4 38.1 19.6 28.2 35.3 40.0 37.2
Smoking 11.2 40.3 70.3 68.2 52.0 32.1 17.7
Obesity 8.2 25.9 22.8 25.5 26.2 25.7 20.4
Heart Failure 6.9 40.1 5.1 8.0 10.6 30.9 58.7
Atrial Fibrillation 4.5 19.1 1.9 4.5 9.2 17.4 27.5
Ischemic Cardiopathy 2.7 12.9 2.6 6.4 10.3 13.5 14.7
Obstructive Sleep Apnea 2.5 12.9 14.1 16.3 15.4 11.0 7.1
Depression 6.9 14.0 16.4 15.5 12.0 11.2 11.7
Hiatal Hernia 5.1 13.8 9.1 10.2 10.8 12.4 14.3

Notably, heart failure was highly prevalent in patients over 70, affecting 30.9% of those aged 70–79 and 58.7% of those over 80. Compared to the general population aged 40 and above, COPD patients showed a significantly higher incidence of cardiovascular risk factors, cardiovascular disease, depression, and hiatal hernia.

Table 3 demonstrates the impact of age on disease burden, as indicated by hospital admissions and mortality rates.

Table 3. Burden of Disease Evaluated by Hospitalizations and Hospital Mortality.

Age Range (Years) 40–49 50–59 60–69 70–79 >80
% of total COPD population 3.2 11.9 20.8 28.0 36.1
% COPD patients hospitalized for acute deterioration 11.9 14.7 19.9 24.2 30.7
Number of hospitalizations per hospitalized patient 2.3 2.5 2.9 3.0 3.0
In-hospital death rate (%) 3.7 2.6 3.2 4.3 6.2

The data reveals a significant increase in hospital admissions and mortality with age, highlighting the substantial healthcare burden associated with older COPD patients.

Sex-related differences observed in the general COPD population were consistent across all age ranges (Figure 2).

Figure 2: Comorbidity Differences by Sex and Age. (A-E) illustrate sex-based differences in comorbidity prevalence across various age groups above 40 years, highlighting variations in clinical profiles.

Discussion

The findings from this study reinforce that COPD patients often present with a complex array of conditions, which is partly attributable to comorbidities and progressively increases with age. As patients age, shifts in their clinical profile are associated with greater healthcare utilization, particularly hospitalizations, and increased mortality.

COPD should be viewed not as a singular disease entity but as a complex syndrome characterized by functional and structural lung alterations, leading to chronic symptoms. This complexity arises from the interplay of environmental and genetic factors, significantly influenced by the dimension of time, or aging [13]. The ultimate clinical and biological outcomes result from these genetic-environmental interactions and prior accumulated exposures—both in the patient and potentially across generations—with aging serving as a critical axis. A deeper understanding of the role of aging in COPD could pave the way for identifying novel early therapeutic and preventive targets for this disease.

Certain comorbidities, such as cardiovascular disease, especially heart failure, can profoundly impact COPD patients’ clinical symptoms. Agustí et al., using data from the ECLIPSE study, suggested a shared pathogenic mechanism [20]. However, whether this reflects a causal relationship or merely an association due to shared risk factors remains unclear. Nevertheless, our study definitively shows a high prevalence of comorbidities in COPD patients, which can be crucial in the clinical manifestation of COPD, with some comorbidities significantly linked to increased morbidity and mortality [21]. Multiple observational studies have previously reported a higher prevalence of comorbidities in COPD patients compared to the general population, a finding echoed in our study [8, 22]. The key contribution of this study is quantifying the extent of this problem in a real-world clinical setting, free from the selection biases inherent in many prior observational studies, and identifying its magnitude adjusted for age.

Several studies have indicated an inverse correlation between patient health status and the presence of comorbidities, particularly when three or more are present, irrespective of lung function [23, 24, 25]. Furthermore, the risk of exacerbations, hospitalizations, mortality, and the economic burden on healthcare systems are all associated with the number of comorbidities [26, 27, 28].

Some studies have shown that COPD patients in real-world settings have a higher incidence of obesity, depression, obstructive sleep apnea (OSA), and hiatal hernias [29, 30, 31], conditions whose frequency does not necessarily intensify with patient age. However, other comorbidities, particularly cardiovascular diseases, show a marked age-related increase, with heart failure being notably common in COPD patients over 70. The presence of heart failure in older COPD patients can significantly worsen their baseline condition and mimic or exacerbate COPD exacerbations.

Our data, derived from a contemporary, real-world clinical setting, confirm that COPD predominantly affects older adults, with this age group comprising a larger proportion of patients and experiencing more complex disease presentations, leading to a greater demand on healthcare services. The application of big data methods and AI has provided a realistic view of the treatment needs within a defined region, indicating that over 50% of COPD patients in our setting were over 70. This is consistent with the EPISCAN2 study, which recently highlighted COPD prevalence in Spain [32]. That study found that only 6.03% of COPD patients were 40–49 years old, while 30.08% were 70–79. Importantly, our study also included and highlighted the over-80 age group, which represented 36% of the total COPD population—a demographic often underrepresented in other studies.

A limitation of this study is the potential inclusion of patients whose COPD diagnosis was not always confirmed with post-bronchodilator spirometry, especially relevant in patients over 80. However, the fact that these patients are diagnosed and treated for COPD underscores the significant healthcare needs of this population. As most prior studies do not focus on this older demographic, it is crucial to develop specific management strategies for these age groups, as personalized treatment approaches that consider individual patient characteristics are essential for effective care.

Multimorbidity and comorbidity are aspects of complexity but may not fully capture a patient’s overall health status. Some patients with a single condition may require complex care, while others with multiple diseases might be relatively straightforward to manage. Thus, comorbidity assessment alone cannot replace functional evaluation in diagnosis, prognosis, or therapeutic planning, especially in older populations where the impact of individual diseases is often overshadowed by the cumulative effect of multiple physiological impairments. Despite these considerations, the novel use of a big data approach in this study is noteworthy. We believe it can overcome the historical exclusion of elderly patients from studies on chronic conditions and multimorbidity, despite this group having the highest disease prevalence and healthcare expenditure. They are, in essence, the primary patients of our healthcare systems [33, 34].

Conclusions

The primary conclusion of this study, strengthened by its large-scale, real-world setting, is that COPD is predominantly a disease affecting older individuals who frequently have comorbidities that significantly influence the disease trajectory. A holistic understanding of the patient, beyond just the disease itself, is critical for effective COPD management. Utilizing frailty assessment tools developed by geriatric specialists can enhance our understanding of older COPD patients, enabling the application of tailored interventions. The oldest age groups bear the greatest disease burden, both for the individuals and the public healthcare system. Insights into the role of time in COPD (GETomics [13]) may lead to the identification of new early therapeutic and preventive targets, complementing existing knowledge of genetic and environmental risk factors.

Author Contributions

D.M., J.L.I., J.R., and J.M.R. were responsible for the design, development, data extraction, analysis, and manuscript preparation. J.R., J.C., M.B., and A.P. contributed to data extraction, analysis, and manuscript correction. All authors have reviewed and approved the final manuscript.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Guadalajara’s Hospital for studies involving humans.

Informed Consent Statement

Informed consent was not required as the study was anonymous, observational, and retrospective.

Data Availability Statement

Data are available upon request from the corresponding author, subject to restrictions due to belonging to the National Health System of Castilla La Mancha.

Conflicts of Interest

The authors declare no conflicts of interest or financial compensations related to this submission and no links to tobacco manufacturers or the tobacco industry.

Funding Statement

This project was funded by the Chair of Inflammatory Diseases of the Airways, University of Alcalá.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions, and data in this publication are solely those of the authors and contributors, not of MDPI and/or the editors. MDPI and/or the editors disclaim responsibility for any injury to persons or property resulting from any content or references within.

References

[List of references from the original article would be placed here, maintaining the original numbering and links.]

Associated Data

Data Availability Statement

The data presented are available upon request from the corresponding author, subject to data access restrictions from the National Health System of Castilla La Mancha.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *