Revolutionizing Cancer Detection: Identifying Cancer Before Diagnosis with the Taizhou Longitudinal Study

Early cancer detection is critical for improving patient outcomes, and innovative approaches are needed to identify cancer before it becomes symptomatic and is diagnosed through conventional methods. The Taizhou Longitudinal Study (TZL), a large-scale prospective cohort study conducted in China, was specifically designed to explore the possibilities of identifying cancer in a non-invasive manner, focusing on the crucial period Before Diagnosis. This article delves into the study’s design and methodology, highlighting its potential to revolutionize our understanding and approach to cancer detection before diagnosis.

The TZL, initiated in 2007, aimed to recruit 200,000 participants in Taizhou, Jiangsu province, China, a region known for its high incidence of digestive cancers. This ambitious study sought to follow participants for at least 40 years, creating a rich resource for understanding cancer development and progression. Taizhou’s demographics and cancer incidence rates made it an ideal location for investigating factors contributing to cancer, particularly digestive cancers like esophageal, gastric, and liver cancer. The region’s cancer mortality rate, nearly double the national average in 2010, underscored the urgent need for improved early detection strategies, ideally before diagnosis at later stages.

The Taizhou Longitudinal Study: A Cohort Designed for Pre-Diagnosis Cancer Research

The baseline survey for the TZL study spanned from 2007 to 2016 and included men and women aged 30–75 living in specific districts of Taizhou. The recruitment process was extensive, involving community leaders, health workers, and government-supported publicity campaigns to invite eligible citizens. A dedicated Regional Coordinating Centre (RCC) and trained survey teams were established to manage the baseline survey, ensuring the collection of high-quality data.

Participants in the TZL study were continuously monitored for cancer occurrence through linkages with local cancer registries and health insurance databases. This long-term follow-up is essential for identifying cancer cases as they develop, including those diagnosed before symptoms prompt clinical investigation. At baseline, comprehensive exposure data was gathered using detailed questionnaires, physical measurements, and biological samples, including blood samples. By collecting samples and data from a large cohort before diagnosis, the TZL study laid the groundwork for identifying pre-diagnostic biomarkers. Over 123,000 individuals were recruited, with an average follow-up time of 8.1 years, creating a powerful dataset for studying cancer development before diagnosis.

PanSeer Assay: Developing a Non-Invasive Cancer Detection Tool for Pre-Symptomatic Identification

Utilizing samples collected from the TZL study, researchers aimed to develop a classification model capable of detecting cancer non-invasively and, crucially, before diagnosis through conventional methods. This led to the development of the PanSeer assay, a novel approach focused on identifying cancer signals in blood samples drawn before diagnosis. The study protocol was rigorously reviewed and approved by the Human Ethics Committee of Fudan University, and all participants provided written informed consent before their inclusion in the TZL study, ensuring ethical conduct throughout the research process.

Sample Selection: Healthy, Pre-Diagnosis, and Post-Diagnosis Groups

The study meticulously defined sample groups to validate the PanSeer assay’s ability to detect cancer before diagnosis.

  • Healthy Samples: Individuals in this group remained cancer-free for at least five years after their initial blood draw, confirming the absence of cancer at the time of sample collection and during the follow-up period.
  • Pre-Diagnosis Samples: This critical group comprised individuals who received a confirmed diagnosis of lung, liver, stomach, esophageal, or colorectal cancer within four years after their initial blood draw. These samples represent the “before diagnosis” window, allowing researchers to investigate biomarkers present prior to clinical diagnosis.
  • Post-Diagnosis Samples: Samples in this group were collected from patients who had already received a diagnosis of lung, liver, stomach, esophageal, or colorectal cancer before their initial blood draw and were treatment-naïve, providing a comparison point to understand cancer-related signals after diagnosis.

Strict exclusion criteria were applied to ensure sample quality and data integrity, including the exclusion of samples with incomplete clinical information, insufficient plasma volume, or hemolysis. Furthermore, a quality control threshold of at least 200,000 unique mapped DNA molecules in sequencing data was set to ensure high-quality samples were used for analysis.

Statistical Power and Sample Size

The study’s statistical plan was designed to ensure sufficient power to validate the PanSeer assay. A sample size of 144 control individuals and 144 case patients was deemed adequate to verify the assay’s expected sensitivity and specificity of 75%, with a power of 90% and a significance level of 0.05. The inclusion of additional case and control patients further strengthened the statistical robustness of the study.

In total, 1156 plasma samples from the TZL cohort and local hospital biobanks were included in the study. This included 191 pre-diagnosis samples, 191 matched healthy samples, and 223 post-diagnosis samples, along with 223 matched healthy samples for the post-diagnosis group. The meticulous matching of samples based on time of collection, sex, age group, and DNA molecule count minimized confounding factors and strengthened the comparative analysis between groups, particularly for assessing the assay’s performance in the pre-diagnosis setting.

Samples were randomly divided into training and test sets (approximately 50%:50%) to build and validate the cancer detection model. The training set was used to develop the model and fix parameters, while the test set, with concealed clinical outcomes, was used for independent validation of the model’s performance in identifying cancer before diagnosis.

Sample Collection and PanSeer Assay Methodology

Tissue and Plasma Sample Handling

To ensure the PanSeer assay accurately reflects cancer-specific signals, both tissue and plasma samples were collected and processed with rigorous protocols. Tissue samples, including cancer and healthy tissue, were used to identify cancer-specific methylation patterns. Plasma samples were collected from TZL participants and hospital biobanks, following standardized procedures. Blood samples were drawn into EDTA vacutainers, stored at 4 °C, centrifuged, and plasma aliquoted for long-term storage at −80 °C or below. This standardized collection and storage procedure was crucial for maintaining sample integrity and minimizing pre-analytical variability, especially for the pre-diagnosis samples collected years before cancer diagnosis.

Cell-free DNA (cfDNA) was extracted from plasma using a specialized kit, with modifications to enhance DNA recovery. The extracted cfDNA, representing DNA circulating in the blood and potentially carrying cancer-specific signals before diagnosis, was then used for the PanSeer assay.

PanSeer Assay: Semi-Targeted Methylation Sequencing

The PanSeer assay employs a semi-targeted PCR approach to analyze DNA methylation patterns at 595 genomic regions encompassing 11,787 CpG sites. DNA from both tissue and cfDNA samples underwent bisulfite conversion, a process that allows for the detection of methylation, a key epigenetic modification often altered in cancer. Sequencing libraries were prepared and sequenced on an Illumina NextSeq 500 platform.

Initial data analysis involved standard bioinformatics pipelines, including read demultiplexing, merging, adapter trimming, UMI processing, and alignment to the human reference genome. Samples with fewer than 200,000 uniquely mapped DNA molecules were excluded due to low quality. For each sample, the average methylation fraction (AMF) was calculated for each of the 595 target regions, providing a quantitative measure of methylation levels. This AMF matrix formed the basis for downstream analysis and cancer classification, aiming to identify methylation patterns that could differentiate between healthy individuals and those with cancer, particularly in the pre-diagnosis phase.

Alt text: Illustration depicting the Average Methylation Fraction (AMF) calculation process in the PanSeer assay, highlighting the ratio of cytosines to total sequencing depth at CpG sites within target genomic regions. This visual explains how methylation levels are quantified to detect potential cancer signals before diagnosis.

Development of the Cancer Detection Algorithm

Marker Selection and Cancer Detection Algorithm

To ensure the PanSeer assay focused on cancer-relevant methylation changes, marker selection was performed using tissue samples. By comparing methylation patterns in cancer tissue versus healthy tissue, 477 genomic regions showing statistically significant differences in AMF values were identified and retained for further analysis. These regions, encompassing 10,613 CpG sites, were shown to differentiate between healthy and cancer tissue and were further validated using publicly available TCGA data to confirm their pan-cancer relevance. This rigorous marker selection process ensured that the PanSeer assay targeted methylation sites specifically associated with cancer, enhancing its potential for accurate cancer detection before diagnosis.

A logistic regression (LR) classifier was constructed using the training set samples to classify plasma samples as either healthy or cancerous. To prevent overfitting, a cross-validation approach was employed. The training set was repeatedly split into model-building and validation sets, and LR classifiers were trained on each model-building set. This ensemble approach, using 1000 different LR equations, provided a robust and reliable classification model.

The LR score, representing the probability of a sample being cancerous, was calculated for each sample. A cutoff value was determined using the training set to achieve high specificity. The final LR score for each test set sample was computed by averaging scores from all 1000 equations, and this score was compared to the cutoff to classify samples as healthy or cancerous. This ensemble classifier approach, validated on an independent test set, demonstrated the PanSeer assay’s ability to detect cancer signals in plasma samples, including those collected before diagnosis.

Alt text: Diagram outlining the workflow of the Logistic Regression (LR) classifier used in the PanSeer assay. The diagram illustrates the steps from AMF matrix input, training set processing, cross-validation, model building with LogisticRegressionCV, to final LR score calculation for cancer detection before diagnosis.

Statistical Validation and Limit of Detection

Statistical analysis was performed to evaluate the performance of the PanSeer assay and assess the impact of various factors on assay performance. Accuracy metrics, including sensitivity and specificity, were computed for different sample sets. The Kruskal-Wallis H-test and Mann–Whitney U-test were used to investigate the influence of covariates on model scores.

A limit of detection study was conducted using spike-in samples consisting of cancer cell line DNA mixed with healthy plasma. This analysis determined the analytical sensitivity of the PanSeer assay, demonstrating its ability to detect low levels of cancer-derived DNA in plasma. The assay successfully distinguished spike-in samples from baseline healthy plasma, indicating its potential for detecting cancer at early stages, even before diagnosis.

Conclusion: PanSeer Assay – A Promising Tool for Pre-Diagnosis Cancer Detection

The Taizhou Longitudinal Study provided a valuable resource for developing and validating the PanSeer assay, a non-invasive cancer detection tool with the potential to identify cancer before diagnosis. By analyzing methylation patterns in cfDNA from plasma samples collected before diagnosis, the PanSeer assay demonstrated promising performance in distinguishing between healthy individuals and those who would later be diagnosed with cancer. This study highlights the importance of large-scale longitudinal studies like TZL in advancing early cancer detection research. The PanSeer assay represents a significant step forward in the quest for effective pre-diagnosis cancer screening methods, potentially leading to earlier interventions and improved outcomes for cancer patients. Further research and clinical validation are crucial to fully realize the potential of this innovative approach in routine clinical practice and to improve cancer detection before diagnosis.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *