» Articles » PMID: 39839975

Exome Sequencing of UK Birth Cohorts

Abstract

Birth cohort studies involve repeated surveys of large numbers of individuals from birth and throughout their lives. They collect information useful for a wide range of life course research domains, and biological samples which can be used to derive data from an increasing collection of omic technologies. This rich source of longitudinal data, when combined with genomic data, offers the scientific community valuable insights ranging from population genetics to applications across the social sciences. Here we present quality-controlled whole exome sequencing data from three UK birth cohorts: the Avon Longitudinal Study of Parents and Children (8,436 children and 3,215 parents), the Millenium Cohort Study (7,667 children and 6,925 parents) and Born in Bradford (8,784 children and 2,875 parents). The overall objective of this coordinated effort is to make the resulting high-quality data widely accessible to the global research community in a timely manner. We describe how the datasets were generated and subjected to quality control at the sample, variant and genotype level. We then present some preliminary analyses to illustrate the quality of the datasets and probe potential sources of bias. We introduce measures of ultra-rare variant burden to the variables available for researchers working on these cohorts, and show that the exome-wide burden of deleterious protein-truncating variants, burden, is associated with educational attainment and cognitive test scores. The whole exome sequence data from these birth cohorts (CRAM & VCF files) are available through the European Genome-Phenome Archive, and here we provide guidance for their use.

Citing Articles

Exome sequencing of UK birth cohorts.

Koko M, Fabian L, Popov I, Eberhardt R, Zakharov G, Huang Q Wellcome Open Res. 2025; 9:390.

PMID: 39839975 PMC: 11747307. DOI: 10.12688/wellcomeopenres.22697.2.

References
1.
Davies N, Hemani G, Neiderhiser J, Martin H, Mills M, Visscher P . The importance of family-based sampling for biobanks. Nature. 2024; 634(8035):795-803. PMC: 11623399. DOI: 10.1038/s41586-024-07721-5. View

2.
Arciero E, Dogra S, Malawsky D, Mezzavilla M, Tsismentzoglou T, Huang Q . Fine-scale population structure and demographic history of British Pakistanis. Nat Commun. 2021; 12(1):7189. PMC: 8664933. DOI: 10.1038/s41467-021-27394-2. View

3.
Ideozu J, Liu M, Riley-Gillis B, Paladugu S, Rahimov F, Krishnan P . Diversity of CFTR variants across ancestries characterized using 454,727 UK biobank whole exome sequences. Genome Med. 2024; 16(1):43. PMC: 10956269. DOI: 10.1186/s13073-024-01316-5. View

4.
Jun G, Flickinger M, Hetrick K, Romm J, Doheny K, Abecasis G . Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet. 2012; 91(5):839-48. PMC: 3487130. DOI: 10.1016/j.ajhg.2012.09.004. View

5.
Agarwal I, Fuller Z, Myers S, Przeworski M . Relating pathogenic loss-of-function mutations in humans to their evolutionary fitness costs. Elife. 2023; 12. PMC: 9937649. DOI: 10.7554/eLife.83172. View