» Articles » PMID: 19523085

Comparing Methods of Analysing Datasets with Small Clusters: Case Studies Using Four Paediatric Datasets

Overview
Date 2009 Jun 16
PMID 19523085
Citations 9
Authors
Affiliations
Soon will be listed here.
Abstract

Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

Citing Articles

Accounting for Twins and Other Multiple Births in Perinatal Studies of Live Births Conducted Using Healthcare Administration Data.

Brown J, Yland J, Williams P, Huybrechts K, Hernandez-Diaz S Epidemiology. 2025; 36(2):165-173.

PMID: 39887117 PMC: 11790255. DOI: 10.1097/EDE.0000000000001809.


Accounting for Twins and Other Multiple Births in Perinatal Studies Conducted Using Healthcare Administration Data.

Brown J, Yland J J, Williams P, Huybrechts K, Hernandez-Diaz S medRxiv. 2024; .

PMID: 38343813 PMC: 10854318. DOI: 10.1101/2024.01.23.24301685.


The effect of missing levels of nesting in multilevel analysis.

Park S, Chung Y Genomics Inform. 2022; 20(3):e34.

PMID: 36239111 PMC: 9576476. DOI: 10.5808/gi.22052.


Correlation between neonatal outcomes of twins depends on the outcome: secondary analysis of twelve randomised controlled trials.

Yelland L, Schuit E, Zamora J, Middleton P, Lim A, Nassar A BJOG. 2018; 125(11):1406-1413.

PMID: 29790271 PMC: 8189665. DOI: 10.1111/1471-0528.15292.


Severe sepsis in women with group B Streptococcus in pregnancy: an exploratory UK national case-control study.

Kalin A, Acosta C, Kurinczuk J, Brocklehurst P, Knight M BMJ Open. 2015; 5(10):e007976.

PMID: 26450426 PMC: 4606445. DOI: 10.1136/bmjopen-2015-007976.