Fig 1
Fig 1

Cranial Osteopathy Diagnosis: An Evidence-Based Review of Reliability and Efficacy

Introduction

Osteopathy, a healthcare discipline emphasizing manual contact for diagnosis and treatment, was formally recognized by the World Health Organization (WHO) in 2010, highlighting cranial osteopathy as a key osteopathic skill. This recognition arrives despite persistent questions and scientific scrutiny regarding the reliability of cranial osteopathy diagnosis and the proven effectiveness of its treatments. Often referred to as Osteopathic Manipulative Medicine (OMM) in the cranial field, this approach remains a subject of debate within the medical and osteopathic communities.

Cranial osteopathy, or “osteopathy in the cranial field,” was pioneered by William Garner Sutherland in the early 20th century, building upon the foundational principles of osteopathy established by Andrew Taylor Still in 1874. At the heart of cranial osteopathy lies the concept of the “primary respiratory mechanism,” a controversial biological model. This mechanism proposes inherent rhythmic movements within the brain that influence cerebrospinal fluid fluctuations and induce subtle, palpable changes in the dural membranes, cranial bones, and sacrum. Cranial osteopathic practitioners utilize gentle, hands-on manipulation of the skull, aiming to modulate this primary respiratory mechanism.

The adoption and integration of cranial osteopathy into mainstream healthcare remain varied globally. While some countries, like France, restrict the teaching of cranial techniques, the WHO’s benchmarks include it as an important osteopathic competency. This inclusion by the WHO underscores the necessity for evidence-based validation of safety, efficacy, and quality assurance – prerequisites for any healthcare modality seeking broader integration. For cranial osteopathy, this necessitates demonstrating both the reliability of its diagnostic procedures and the clinical efficacy of its therapeutic interventions.

Previous reviews have critically examined the diagnostic reliability and therapeutic efficacy of cranial osteopathy. However, these reviews often present limitations, including non-systematic approaches, incomplete data analysis, or the lack of rigorous bias assessment. Therefore, a comprehensive and critical evaluation of the scientific literature remains crucial to definitively assess the evidence underpinning cranial osteopathy, particularly concerning the reliability of “Cranial Omm Diagnosis” and the effectiveness of related treatments.

This article presents a systematic review and critical evaluation of the existing scientific literature concerning the reliability of diagnostic methods and the clinical efficacy of treatments within cranial osteopathy. This analysis aims to provide a robust, evidence-based perspective on the role and scientific validity of cranial osteopathy in contemporary healthcare.

Methods

This systematic review rigorously examined the scientific literature pertaining to the reliability of cranial osteopathy diagnosis and the clinical efficacy of its techniques.

Literature Sources and Search Strategy

A comprehensive search was conducted in August 2015 across several electronic databases, including MEDLINE, PEDro, OSTMED.DR, and the Cochrane Library. Additionally, Google Scholar, the Journal of American Osteopathy Association (JAOA), and the International Journal of Osteopathic Medicine (IJOM) websites were searched to ensure broad coverage of relevant publications. The search was updated to include articles published up to June 30, 2016, with no initial date restrictions.

The search strategy employed specific keyword combinations tailored for reliability and efficacy studies:

  • Reliability Studies: Keywords included [“reliability” OR “agreement” OR “reproducibility”] AND [“cranial” OR “craniosacral” OR “cranium” OR “primary respiratory mechanism”]. To refine searches with excessive results, the terms [“osteopathy” OR “osteopathic”] were added.
  • Efficacy Studies: Keywords included [“cranial manipulation” OR “osteopathy in the cranial field” OR “cranial osteopathy” OR “craniosacral technique”] AND [“medicine” OR “treatment” OR “therapy” OR “technique” OR “manipulation” OR “osteopathy” OR “osteopathic”].

Keywords were applied using database-specific interfaces, utilizing advanced search tools for titles, abstracts, and keywords where available. Date of publication filters were intentionally not applied to capture the entirety of the available literature.

To further enhance the search, a complementary approach was implemented. This included examining the bibliographies of articles selected for inclusion, reviewing existing systematic reviews on cranial osteopathy, and contacting study authors and professional organizations to identify potentially missed studies.

Eligibility Criteria

The eligibility criteria were designed to ensure the inclusion of studies directly relevant to the research questions, focusing on both diagnostic reliability and treatment efficacy within cranial osteopathy.

Reliability of Diagnosis

Studies were considered eligible if they:

  • Compared diagnostic outcomes from at least two examiners (inter-rater reliability) or from repeated examinations by the same examiner (intra-rater reliability).
  • Involved human subjects, either patients or healthy volunteers.

Efficacy Studies

For efficacy studies, the inclusion criteria were:

  • Randomized-controlled trials (RCTs) or crossover studies.
  • Studies involving patients (excluding studies on healthy subjects).
  • Focused specifically on cranial osteopathy techniques.

Exclusion criteria were applied to refine the selection and maintain the focus of the review:

  • Articles not published in English or French.
  • Studies with non-RCT or non-crossover designs (for efficacy studies).
  • Studies lacking clear specification of cranial osteopathy techniques.
  • Studies evaluating combined treatments without subgroup analysis for cranial osteopathy.
  • Studies using non-human models or simulators.
  • Studies for which full-text versions were not accessible.
  • No restrictions were imposed based on the type of disease, healthcare setting, or health outcomes investigated in efficacy studies.

Study Selection Process

The study selection was a systematic, multi-stage process designed to ensure rigor and minimize bias. The process consisted of three primary steps:

  1. Title Screening: Initial screening based on titles to remove duplicates and off-topic articles.
  2. Abstract Review: Abstracts of the remaining articles were analyzed against the eligibility criteria. Articles not meeting the criteria based on abstract content were excluded.
  3. Full-Text Assessment: Full-text versions of articles passing the abstract review were obtained and further assessed against all eligibility criteria.

For references identified through the complementary search approach, abstracts were reviewed, and full texts were obtained as necessary to determine eligibility based on the established criteria. This rigorous selection process ensured that only studies directly relevant to the reliability and efficacy of cranial osteopathy were included in the final review.

Data Extraction

Data extraction was standardized to capture pertinent information from each included study. The following data points were systematically extracted:

  • Study Design: Including randomization and blinding procedures.
  • Sample Size and Characteristics: Number of participants, disease status, age, and inclusion criteria.
  • Main Outcomes and Results: Primary and secondary outcomes, and key findings.

For reliability studies, additional data points were extracted to assess methodological quality and examiner expertise:

  • Examiner Details: Number of examiners, professional qualifications (e.g., DO, PT), and level of expertise in cranial osteopathy.
  • Statistical Methods: Specific statistical methods used to assess reliability.

For efficacy studies, further details were collected to understand the interventions and comparators:

  • Primary Outcome: The main outcome measure used to evaluate efficacy.
  • Treatment Description: Precise description of the cranial osteopathy techniques and comparison treatments.

Assessment of Risk of Bias

To ensure the critical appraisal of study quality, a dual-screener approach was used for study selection and risk of bias assessment. Two independent reviewers performed these assessments using standardized forms. Disagreements were resolved through consensus discussions to ensure consistency and objectivity.

Risk of Bias Assessment for Reliability Studies

The risk of bias in reliability studies was assessed using a modified version of the Quality Appraisal tool for Studies of Diagnostic Reliability (QAREL). The modified QAREL checklist focused on seven key domains relevant to methodological bias:

  1. Spectrum of Examiners: Representative of intended practitioners. (Applicability, not bias)
  2. Spectrum of Subjects: Representative of target population. (Applicability, not bias)
  3. Examiner Blinding (Other Examiners): Blinded to findings of other examiners.
  4. Examiner Blinding (Prior Findings): Blinded to their own prior findings.
  5. Blinding to Reference Standard: Blinded to reference standard results. (Not applicable in cranial osteopathy)
  6. Blinding to Clinical Information: Blinded to extraneous clinical information.
  7. Blinding to Additional Cues: Blinded to cues outside the test.
  8. Order of Examination: Variation in examination order.
  9. Time Interval Stability: Time interval appropriate for outcome stability. (Not applicable in cranial osteopathy)
  10. Test Application/Interpretation: Appropriate application and interpretation. (Modified for manual therapy context)
  11. Statistical Analysis: Appropriate statistical methods. (Separately analyzed with specific criteria)

Items 1, 2, 5 and 9 of the original QAREL were excluded as they were deemed related to applicability rather than risk of bias in this context, or not applicable to cranial osteopathy. Items 10 and 11 related to statistical analysis were retained but interpreted with more precise criteria.

Two additional items were included to address factors specifically relevant to manual therapy reliability:

  1. Examiner Expertise: Level of training and experience of examiners.
  2. Blinding Procedure (Simultaneous Examiners): Adequacy of blinding when examiners assessed subjects simultaneously.

Rating Rules for Reliability Studies

Each of the seven bias items was rated as ‘Low risk’, ‘High risk’, or ‘Unclear risk’ of bias. Criteria for ratings were defined as:

  • Low Risk: Methodologically sound approach minimizing bias.
  • High Risk: Significant methodological flaw likely to introduce bias.
  • Unclear Risk: Insufficient information to assess risk of bias.

For examiner expertise, ‘High risk’ was assigned if examiners were students or untrained, ‘Low risk’ if examiners were experienced graduates, and ‘Unclear risk’ if expertise was not reported.

Overall risk of bias for each study was categorized as:

  • High Risk: At least one item rated as high risk.
  • Major Doubt: More than two items unclear risk, all others low risk.
  • Minor Doubt: Two or fewer items unclear risk, all others low risk.
  • Low Risk: All items rated as low risk.

Statistical Analysis Interpretation for Reliability Studies

Beyond the general bias assessment, the statistical analysis in each reliability study was critically evaluated. Inspired by QAREL but with more precise criteria, the assessment focused on the appropriateness and interpretation of statistical methods.

Reliability or agreement was considered satisfactory only if classified as “excellent” or “almost perfect”:

  • Excellent Reliability: Intraclass Correlation Coefficient (ICC) above 0.75 (Fleiss’ classification).
  • Almost Perfect Agreement: Kappa coefficient (κ) above 0.81 (Landis & Koch classification).

These stringent criteria reflect the need for high statistical rigor given the contested theoretical basis of cranial osteopathy.

Statistical method appropriateness was assessed based on variable type:

  • ICC: Appropriate for inter-rater reliability of quantitative, ordinal, interval, and ratio variables.
  • Kappa: Useful for inter-rater reliability of nominal (categorical) variables.

Correlation statistics like Spearman or Pearson, percentage agreement, and measures of precision were deemed inappropriate for reliability estimation.

Risk of Bias Assessment for Efficacy Studies

Risk of bias in efficacy studies was assessed using the Cochrane Risk of Bias tool. This tool evaluates bias across six domains:

  1. Random Sequence Generation: Adequacy of randomization process.
  2. Allocation Concealment: Protection of allocation sequence before assignment.
  3. Blinding of Participants and Personnel: Blinding of participants and providers.
  4. Blinding of Outcome Assessment: Blinding of outcome assessors.
  5. Incomplete Outcome Data: Handling of missing data.
  6. Selective Reporting: Selective reporting of outcomes.
  7. Other Bias: Any other potential sources of bias.

Rating Rules for Efficacy Studies

Each domain in the Cochrane Risk of Bias tool was rated as ‘Low risk’, ‘High risk’, or ‘Unclear risk’. The 2010 CONSORT checklist was used as a guide to determine these ratings, especially for unclear reporting. For the “Other bias” domain, potential biases specific to clinical trials, such as lack of placebo, compliance bias, etc., were considered.

Given that blinding is inherently challenging in manual therapy studies, a modified overall risk of bias assessment was applied:

  • High Risk: At least one domain (excluding “blinding”) rated as high risk.
  • Major Doubt: Two or more domains (excluding “blinding”) rated as unclear risk, all others low risk.
  • Minor Doubt: One domain (excluding “blinding”) rated as unclear risk, all others low risk.
  • Low Risk: All domains (excluding “blinding”) rated as low risk.

This modified approach acknowledged the inherent difficulties in blinding within manual therapy research while maintaining rigor in assessing other sources of bias.

Results

Reliability Studies

The standard electronic database search yielded 1280 articles, with eight meeting the inclusion criteria for reliability studies (Fig 1). The complementary search strategy added four more articles, of which only one met the inclusion criteria. Table 1 summarizes the key characteristics and findings of these nine studies.

Fig 1. Flow chart of the study selection process for the systematic review of studies dealing with the reliability of diagnosis in the field of cranial osteopathy.

Open in a new tab

Table 1. Summary of included studies dealing with the reliability of diagnosis in cranial osteopathy.

First authors Subjects (number; disease status; age in yrs) Raters (number ; degree(s) ; expertise) Study Characteristics & Parameter(s) Reliability Measure Used Main Results
Upledger [24] N = 25 ; not reported; A = 3–5 N = 4 ; one DO (founder of CST) and three MDs; one trained by the CST founder and two considered as “skilled examiners” Inter-examiner : (1) CRI-F; (2) restriction of motion in several areas (19 modalities) Reliability coefficient (no more information) inter: (1) : missing data; (2) : coefficient ranged from 0 to 1 for all modalities and examiners
Wirth-Pattullo [25] N = 12; history of trauma, surgery, or “learning disabilities” ; A = 10–62 N = 3 (X, Y and Z) ; PT trained in CST ; 2–4 yrs Inter-rater : cranial motion F ICC (2,1) inter: X-Y: -0.33 ; X-Z: -0.,60 ; ;Y-Z: 0.49 ; X-Y-Z: -0.2
Norton [26] N = 9 ; healthy ; A = 22–28 N = 6 ; MD-DO ; “extensive training and experience in cranial osteopathy” • intra-rater of : (1)flexion-duration of the CR ; and (2)duration of cranial cycles (second)• inter-rater : CR-F (cpm) Pearson product-moment correlation coefficient intra : (1) : missing data; (2)• inter : -0.32 to -0.28
Hanten [27] N = 30 ; any disease or trauma about the skull or spine ; A = 22–54 N = 2 (X & Y) ; PT students ; 11 months Intra & inter-rater : CR-F ICC (1,1) intra : X : 0.78 ; Y : 0.83• inter : 0.22
Rogers [28] N = 28 ; healthy ; A = 18–48 N = 2 (X & Y) ; one PT & one RN trained in CST ; : 5 & 17 yrs respectively Intra & inter-rater : CR-F to the head and feet ICC (2,1) intra : X : 0.18 for head and 0.30 for feet ; Y : 0.27 for head and 0.29 for feet• inter : 0.08 (head) and0.19 (feet)
Vivian [29] N = 48 ; not reported; some subjects could have chronic or recurrent pain ; A = 7–63 N = 2 ; DO ; 12 & 15 yrs Inter-rater of : (1) presence of a partially flexion-restricted motion of the skull ; (2) presence of a total flexion-restriction motion of the skull Cohens’s kappa inter : (1) : -0.02 ; (2) : -0.09.
Moran [30] N = 11 ; healthy ; A = 18–44 N = 2 (“X” & “Y”) ; DO ; 4.5 & 6.5 yrs Intra & inter-rater of : CRI-F to the head and/or sacrum ICC (2,1) intra X : 0.65 for sacrum; 0.47 for head; Y : 0.52 for sacrum; 0.73 for head• inter : 0.0 for X and Y to the head and 0.05 for X and Y to the sacrum
Sommerfeld [31] N = 49 ; healthy ; A = 19–61 N = 2 ; DO ; 7 yrs Intra & inter-rater of : PRM-frequency ; PRM flexion-stage duration ; ratio of the flexion-stage and the extension-stage duration of the PRM 95 % limit of agreement (Visual representation) intra & interexaminer agreement could not be described beyond chance agreement.
Halma [32] N = 48 ; 16 asthma, 17 headache, 15 healthy ; A : 18–75 N = 2 ; MD-DO ; 14 & 6 yrs Intra-rater of : (1) CRI-F ; (2) cranial strain patterns ; (3) quadrants of restriction with 4 modalities Cohens’s kappa with 95 % confidence intervals intra (1) : 0.23 ; (2) : 0.67; (3) : from 0.33 to 0.52 according to modalities

Open in a new tab

Note: “N”: number; “A”: age; “DO”: doctor of osteopathy; “PT”: physical therapist; “RN”: registered nurse; “CST”: craniosacral therapy; “CRI”: cranial rhythmic impulse; “F”: frequency; “CR”: cranial rhythm; “R”: rater; “PRM”: primary respiratory mechanism; “ICC”: intraclass correlation coefficient

Two studies presented results deemed unusable due to critical errors in data presentation or statistical analysis, irrespective of their reliability findings. The study by Upledger et al. 24 was noted to have significant methodological flaws, including selective reporting and miscalculated reliability statistics. Similarly, Sommerfeld et al. 31 did not include a Bland & Altman graph as planned, hindering proper interpretation of agreement.

Critical appraisal of the remaining studies revealed a high risk of bias in eight studies and major doubt in one 32 (Figs 2 and 3). The primary source of high bias was the lack of examiner blinding. Notably, even the study with “major doubt” 32 reported unreliable diagnostic outcomes based on the defined criteria.

Fig 2. Assessment of methodological risk of bias for each of the included reliability studies.

Open in a new tab

Green indicates a low risk of bias, yellow an unclear risk of bias and red a high risk. Grey indicates non-applicable items. For the overall assessment of bias, purple indicates major doubt as to the overall risk of bias.

Fig 3. Assessment of methodological risk of bias for the reliability studies taken together.

Open in a new tab

Green indicates a low risk of bias, yellow an unclear risk of bias and red a high risk. Grey indicates non-applicable items. For the overall assessment of bias, purple indicates major doubt as to the overall risk of bias.

Efficacy Studies

The initial database search for efficacy studies identified 556 articles, with 12 meeting the inclusion criteria (Fig 4). The complementary search added 14 articles, with two additional studies meeting inclusion. Table 2 provides a detailed summary of these 14 efficacy studies.

Fig 4. Selection process for studies dealing with the clinical efficacy of techniques and therapeutic strategies used in cranial osteopathy.

Open in a new tab

Table 2. Description of included studies dealing with the clinical efficacy of techniques used in osteopathy in the cranial field.

First author Disease & number of participants Intervention and comparison Primary study outcome & result Other outcomes & results
Low risk of bias Elden [42] Pelvic Girdle Pain: 123 EG “standard treatment + craniosacral therapy” / “standard treatment” G Sick leaves & pain intensity in the morning and the evening. Results showSSD but no MCID for pain intensity in the morning in favour of EG (VAS increased by 7mm in CG and decreased by 0.5mm in EG after treatment)
Haller [44] Non specific neck pain: 54 EG “craniosacral therapy protocol” / PG “sham of the craniosacral therapy protocol” Pain intensity.Results showSSD & MCID 3 months after treatment for the pain intensity in favour of EG (-21 and -16.8mm after 8 and 20 weeks, respectively) 32 criteria (16 outcomes immediately after treatment and at 3 months).1. Pain on movement (Pain on Movement Questionnaire).2. Maximum pressure pain sensitivity : point of maximum pain, levator scapulae, trapezius and semispinalis capitis muscles (algometer).3. Functional disability (Neck Disability Index).4. Quality of life (through two subscales of the 12-item Short Form Health Survey).5. Well-being (16-item Questionnaire for Assessing Subjective Physical Well-being).6. Anxiety and depression (Hospital Anxiety and Depression Scale).7. Stress perception (Perceived Stress Questionnaire).8. Pain acceptance (through the subscale “Positive life constrution Scale” of the Emotional/Rational Disease Acceptance Questionnaire).9. Body connection (through two subscales “body awareness” and “body dissociation” of the Scale of Body Connection).10. Global impression of improvement (Patients’ ratings of their Globa Impression of Improvement).• Results show SSD to 7 outcomes after treatment and 5 outcomes at three months in favor of the EG group**.
Castro-Sànchez [45] Non specific low back pain: 64 EG “craniosacral therapy”/ CG “Classic massage” Roland Morris Disability Questionnaire. Results show no SSD 32 criteria (16 outcomes immediately after treatment and one month later).1. Low back pain disability (Oswestry Low Back Pain Disability Index).2. Pain intensity (10-point numeric pain rating scale).3. Kinesiophobia (Tampa Scale of Kinesiophobia).4. Hemoglobin oxygen saturation, systolic blood pressure, diastolic blood pressure and hemodynamic (cardiac index) (Electro Interstitial Scanner).5. Interstitial levels of sodium, serum potassium, chloride, phosphate, ionized or free calcium, magnesium and lactic acid (Electro Interstitial Scanner).6. Isometric endurance of trunk flexor muscles (McQuade test).7. Lumbar mobility in flexion (finger-to-floor distance).• Results show SSD to 6 outcomes after treatment and 3 outcomes one month later in favour of CST group**.
Major doubt on risk of bias Hanten [33] Tension-type headache : 60 EG “CV-4 technique as described by Upledger and Vredevoodg”/ untreated G(1)/ resting position G(2) None
Hayden [34] Infantile colic : 28 EG “standard cranial osteopathic techniques” / untreated G None 3 criteria immediately after treatment, that are Crying and sleeping daily durations (parent reporting) andDuration of parent holding and rocking (parent reporting). Results show SSD for all criteria (daily amount of crying : −1h, sleeping time : −1.17h and helding or rocking time : −0.7h) after treatment in favor of the EG.
Nourbakhsh [36] Lateral epicondylitis : 23 EG “The OEMT [Oscillating-energy Manual Therapy] (V-spread) was administered based on the standard method described in many osteopathic texts.”/ PG None 7 criteria (4 outcomes immediately after treatment and 3 outcomes at 6 months).1. Grip strength (Jamar dynamometer).2. Functional level (Patient-Specific Functional Scale).3. Pain intensity (11-point scale).4. Pain limited activity (11-point scale).Results show differences immediately after treatment for grip strength (PG: -1.9; EG: +12.3), functional level (PG: +4.7; EG: +14.5), pain intensity (PG: -0.5; EG: -3.1) and pain limited activity (PG: -0.1; EG: +3.3) and no SSD at 6 months for functional level, pain intensity and pain limited activity.
Sandhouse [37] Myopia & hyperopia : 29 EG “The specific OMT technique performed was balanced membranous tension”/ PG None 12 criteria immediately after treament.1. Presence of a cranial dysfunction (manual evaluation).2. Visual acuity (right and left) (distance visual acuity testing).3. Accomodation amplitude (right and left) (Donder push-up testing).4. Stereoscopic visual acuity (local stereoacuity testing).5. Pupillary size (right and letft ; under bright light and dim light) (Pupillary testing).6. Ocular deviation (Cover test with prism neutralization).7. Near point of convergence (break point and record point) (distance in cm).Results show SDD for 1 criterion out of 12 between EG and PG : the left pupillary size measured under bright light with respectively +0.13mm and -0.40mm of difference between after and before intervention for EG and PG, respectively.
Castro-Sánchez [38] Fibromyalgia : 92 EG “a craniosacral therapy protocol”/ PG None 75 criteria (25 outcomes immediately after treatment and 2 months and 1 year later).1. Body composition (extracellular, cellular and lean mass analysed with bioelectrical impendance).2. Pain at 18 tender point sites (pressure algometer).3. Heart rate, temporal standard deviation of RR segments (HRV) and root mean square deviation of HRV index (Holter).4. Clinical global impression of improvement (7-level Likert scale).• Results : number of criteria above 20*.
Matarán-Peñarrocha [39] Fibromyalgia : 104 EG “a craniosacral therapy protocol”/ PG None 54 criteria (18 outcomes immediately after treatment, 6 months and 1 year later).1. Pain intensity (VAS).2. Quality of life (SF-36 : one outcome for each of the 8 questionnaire sections − thus 8 outcomes).3. Sleep quality (Pittsburgh Sleep Quality Index : one outcome for each of the 6 questionnaire sections − thus 6 outcomes).4. Depression state (Beck depression inventory).5. Trait anxiety and state anxiety (2 outcomes) (State Trait Anxiety Inventory).• Results : number of criteria above 20*.
Amrovabady [40] Attention deficit hyperactivity disorder : 24 EG “Craniosacral therapy”/ standard treatment G None 10 criteria immediately after treatment[the Conners Parents Rating Scale 48-question version (divided into 5 sub-outcomes and the Child Symptoms Inventory-4th (divided into 5 sub-outcomes)]. Results show SSD for all results in favor of the EG with, for instance, a total CPRS difference of +0.58 in the standard treatment G vs. +7.5 in the EG.
Árnadóttir [41] Migraine : 20 Cross-over range on 12 wks with 2 G(“Upledger Craniosacral therapy” vs. no treatment) HIT-6 questionnaire. Results show SSD immediately after treatment (effect size : 0.48) and 1 mo after treatment (effect size : 0.43). None
Bialoszewski [43] Non specific low back pain : 55 EG “Craniosacral therapy”/ trigger point therapy G None 8 criteria immediately after treatment.1. Pain severity (VAS).2. Pain intensity (modified Laitinen questionnaire).3. Pain frequency (modified Laitinen questionnaire).4. Analgesic use (modified Laitinen questionnaire).5. Functional impact of pain (modified Laitinen questionnaire).6. Lombosacral mobility (Schober test).7. Resting tension of the multifidus muscle (right and left) (Electromyography).Results show no SSD after treatment for all criteria.
High risk of bias Mehl-Madrona [35] Chronic asthma : 89 CST G “standard craniosacral therapy treatments in accordance with the protocol taught at the Upledger Institute in Michigan”/ acu G / CST + acu G/ PG/ waiting list None
Raith [46] Preterm infants : 30 EG “standardised craniosacral therapy”/ standard treatment G General Movement Assessment. Results show no SSD. 1 criterion immediately after treatment (General Movement Optimality Score).Results show no SSD.

Open in a new tab

Legend. EG: experimental group; G: group; SSD: significant statistic difference; CST: craniosacral therapy; acu: acupuncture; PG: placebo group; VAS: visual analogic scale; MCID: minimal clinically important difference.

*Considering the risk of inflated alpha value and for sake of clarity, the results of the studies that both had not chosen primary study outcomes and had used more than 20 criteria were not reported.

** No detail is given for sake of clarity.

Of the 14 efficacy studies, two were rated as high risk of bias 35,46, nine with major doubt 33,34,3641,43, and only three as low risk of bias 42,44,45 (Figs 5 and 6). Common sources of bias included the absence of a clear primary outcome, lack of correction for inflated alpha values in multiple outcome measures, unclear or absent blinding methods, and lack of clinical relevance interpretation.

Fig 5. Assessment of methodological risk of bias for each efficacy study included.

Open in a new tab

Green indicates a low risk of bias, yellow an unclear risk of bias and red a high risk. Grey indicates non-applicable items. For the general assessment of bias, purple shading indicates a major doubt as to the overall risk of bias.

Fig 6. Assessment of methodological risk of bias for the efficacy studies taken together.

Open in a new tab

Green shading indicates a low risk of bias, yellow an unclear risk of bias and red a high risk. Grey shading colour indicates non-applicable items. For the general assessment of bias, purple shading indicates a major doubt as to the overall risk of bias.

Discussion

This systematic review aimed to critically evaluate the scientific literature on the diagnostic reliability and therapeutic efficacy of cranial osteopathy. The findings reveal a consistent lack of robust evidence supporting both the reliability of “cranial OMM diagnosis” and the efficacy of cranial osteopathy treatments.

Diagnostic Reliability in Cranial Osteopathy

Our analysis of nine reliability studies revealed that eight had a high risk of bias, and one had major doubts regarding bias 32. Notably, even the study with major doubts reported unreliable results. Across inter-rater and intra-rater reliability assessments, no study demonstrated satisfactory reliability for any of the investigated diagnostic parameters within cranial osteopathy. These results strongly suggest that diagnostic procedures in cranial osteopathy lack the consistency and reproducibility expected of reliable clinical tools.

Therapeutic Efficacy of Cranial Osteopathy

Regarding therapeutic efficacy, of the 14 studies reviewed, only three were classified as having a low risk of bias 42,44,45. The remaining studies were either rated as high risk or having major doubts concerning bias. Given the methodological limitations and potential for bias in the majority of studies, we focused our discussion on the three studies with low risk of bias to assess the highest quality evidence available.

The study by Elden et al. 42 investigated craniosacral therapy as an adjunct to standard treatment for pelvic girdle pain in pregnancy. While statistically significant differences were observed for some outcomes, the clinical relevance was questionable, and the study design introduced potential confounding factors, such as differential practitioner contact between groups. These factors make it difficult to attribute observed effects specifically to cranial osteopathy techniques versus non-specific or contextual effects.

Haller et al. 44 compared craniosacral therapy to sham treatment for chronic non-specific neck pain. This study reported statistically and clinically relevant improvements in pain intensity and several secondary outcomes favoring craniosacral therapy. However, limitations, such as multiple outcome measures without alpha correction and potential practitioner effects, warrant cautious interpretation. It remains unclear whether the observed benefits are due to the specific techniques of cranial osteopathy or broader contextual factors.

Castro-Sànchez et al. 45 compared craniosacral therapy to classic massage for low back pain. While some secondary outcomes favored craniosacral therapy, no significant difference was found for the primary outcome of disability. Furthermore, methodological issues, such as inequitable treatment duration between groups, limit the strength of conclusions regarding the specific efficacy of craniosacral therapy in this context.

Across these three studies with low risk of bias, alternative interpretations of positive findings, such as non-specific treatment effects or contextual factors, cannot be excluded. This consistent pattern across the highest quality studies suggests that current evidence does not robustly support the specific therapeutic efficacy of cranial osteopathy techniques.

Implications for Research and Practice

The pervasive high risk of bias across studies in cranial osteopathy highlights the need for improved methodological rigor in future research. Many studies suffered from unclear reporting, leading to unclear risk of bias ratings. Researchers should prioritize detailed methodological descriptions, even within journal length constraints, to enhance transparency and facilitate bias assessment. Publishing in journals with fewer length restrictions may also be beneficial.

For reliability studies in cranial osteopathy diagnosis, future research should adopt the modified QAREL criteria used in this review, paying particular attention to examiner expertise and blinding procedures. Simultaneous examiner evaluations, as employed in some included studies 28,30,31, offer a robust methodological approach to minimize information exchange between examiners. Rigorous blinding of examiners and subjects to tactile, visual, auditory, and olfactory cues, as demonstrated by Halma et al. 32, is also critical.

Efficacy studies should adhere to the Cochrane Risk of Bias tool and the CONSORT guidelines to ensure methodological quality. Addressing the inherent challenges of blinding in manual therapy research requires careful consideration of placebo controls and standardization of treatment contexts. Future research should focus on clearly defined primary outcomes, avoid multiple comparisons without appropriate statistical corrections, and prioritize objective outcome measures where feasible. Evaluating placebo credibility and standardizing treatment contexts across groups are essential steps to differentiate specific from non-specific treatment effects.

Conclusion

This systematic review reveals a critical absence of evidence to support the reliability of cranial osteopathy diagnosis and the specific clinical efficacy of cranial osteopathy techniques and therapeutic strategies. The majority of studies are compromised by a high risk of bias, and even the methodologically strongest trials do not provide compelling evidence of specific therapeutic benefits beyond what might be attributed to non-specific or contextual effects. Consistently with previous reviews, our findings underscore the lack of robust scientific foundation for cranial osteopathy as a diagnostic or therapeutic modality. Currently, evidence is insufficient to support the use of cranial osteopathy in patient care. Further high-quality research, rigorously addressing methodological limitations, is essential to definitively determine the potential role, if any, of cranial osteopathy within evidence-based healthcare.

Acknowledgments

We thank Dr. Alison Foote from the “Publication in English” service of Grenoble-Alpes University Hospital for critically editing the manuscript.

Data Availability

All relevant data are within the paper.

Funding Statement

This study was supported by the French national council of physiotherapists (Conseil National de l’Ordre des Masseurs Kinésithérapeutes, CNOMK). The sponsor had no influence or editorial control over the content of the study.

References

Associated Data

Data Availability Statement

All relevant data are within the paper.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *