UK Biobank: Current status and what it means for epidemiology

UK Biobank: Current status and what it means for epidemiology

Health Policy and Technology (2012) 1, 123–126 Available online at www.sciencedirect.com journal homepage: www.elsevier.com/locate/hlpt UK Biobank:...

197KB Sizes 0 Downloads 93 Views

Health Policy and Technology (2012) 1, 123–126

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/hlpt

UK Biobank: Current status and what it means for epidemiology Naomi Allena,b,n, Cathie Sudlowa,c, Paul Downeya, Tim Peakmana, John Daneshd, Paul Elliotte, John Gallacherf, Jane Greeng, Paul Matthewsh, Jill Pelli, Tim Sprosenj, Rory Collinsa,b, on behalf of UK Biobank1 a

UK Biobank, Adswood, Stockport, UK Clinical Trial Service Unit and Epidemiological Studies Unit, University of Oxford, UK c Division of Clinical Neurosciences, University of Edinburgh, Edinburgh, UK d Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK e MRC-HPA Centre for Environment and Health, Imperial College London, UK f Department of Primary Care & Public Health, Neuadd Meirionnydd, Heath Park, Cardiff, UK g Cancer Epidemiology Unit, University of Oxford, UK h Department of Medicine, Division of Brain Sciences, Imperial College London, UK i Institute of Health and Wellbeing, University of Glasgow, UK j School of Public Health, Imperial College London, UK b

Available online 3 August 2012

Abstract UK Biobank is a very large prospective study which aims to provide a resource for the investigation of the genetic, environmental and lifestyle determinants of a wide range of diseases of middle age and later life. Between 2006 and 2010, over 500,000 men and women aged 40 to 69 years were recruited and extensive data on participants’ lifestyles, environment, medical history and physical measures, along with biological samples, were collected. The health of the participants is now being followed long-term, principally through linkage to a wide range of health-related records, with validation and characterisation of health-related outcomes. Further enhancements are also underway to improve phenotype characterisation, including internet-based dietary assessment, biomarker measurements on the baseline blood samples and, in sub-samples of the cohort, physical activity monitoring and proposals for extensive brain and body imaging. UK Biobank is now available for use by all researchers, without exclusive or preferential access, for any health-related research that is in the public interest. The open-access nature of the resource will allow researchers from around the world to conduct research that leads to better strategies for the prevention, diagnosis and treatment of a wide range of life-threatening and disabling conditions. & 2012 Fellowship of Postgraduate Medicine. Published by Elsevier Ltd. All rights reserved.

n Corresponding author at: Clinical Trial Service Unit and Epidemiological Studies Unit, University of Oxford, Roosevelt Drive, Oxford, OX3 7LF, UK. Tel.: +44 1 865 743805; fax: +44 1 865 743985. E-mail address: [email protected] (N. Allen). 1 Request for reprints: UK Biobank Coordinating Centre, 1 & 2 Spectrum Way, Adswood, Stockport, Cheshire, SK3 0SA, UK. Tel.: +44 1 61 475 5360; fax: +44 1 61 475 5361. mailto:[email protected].

2211-8837/$ - see front matter & 2012 Fellowship of Postgraduate Medicine. Published by Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.hlpt.2012.07.003

124

N. Allen et al.

Introduction

Recruitment and follow-up of participants

Large cohorts with stored biological samples are crucial for understanding the determinants of complex disease. In a prescient move over a decade ago, the Medical Research Council and Wellcome Trust decided to establish UK Biobank, a large population-based prospective cohort with extensive and reliable measurement of a wide range of exposures, along with rigorous follow-up of health outcomes, to allow detailed investigation of the genetic and environmental determinants of a wide range of diseases of middle and old age [1,2]. This article provides an overview of the rationale behind UK Biobank, its current status and future developments of the cohort.

UK Biobank aimed to be as inclusive as reasonably possible, with all people aged 40–69 years who were registered with the National Health Service and living up to about 25 miles from one of the 22 study assessment centres invited to participate. Overall, about 9.2 million invitations were mailed in order to recruit 503,325 participants (i.e. a response rate of 5.47%). Regardless of participation rates, as long as there are sufficiently large numbers of participants with different levels of the relevant risk factors under investigation, generalisable associations between baseline characteristics and subsequent health outcomes can be made. Successful recruitment of 0.5 million participants during 2006–2010 was largely achieved through extensive piloting and establishment of highly efficient, centralised and bespoke processes, with assistance from UK Biobank’s extensive academic collaborative network. Volunteers who attended an assessment centre gave informed consent and completed questionnaires on their lifestyle, environment and medical history, had a wide range of physical measures performed and had samples of blood, urine and saliva collected [10] (Table 1). Some lifestyle factors, such as diet and physical activity, are notoriously difficult to measure reliably using questionnaires. For this reason, UK Biobank aims to conduct more detailed phenotyping in subsamples of the cohort in order to calibrate the baseline measures. For example, a series of web-based 24-h dietary questionnaires will supplement the dietary data collected at the assessment clinic [11] and mailed tri-axial accelerometers will supplement the questionnaire data on physical activity (and provide more reliable assessment of other aspects of normal daily living, such as sleep patterns). Other enhancements being planned include baseline blood measurements on the entire cohort of biomarkers known to be relevant for disease (e.g., lipids for cardiovascular disease), of high diagnostic value (e.g., HbA1c for diabetes) or that characterise phenotypes not otherwise well assessed (e.g. liver and kidney function measures). UK Biobank has also submitted a proposal for funding to perform a range of imaging modalities (e.g. magnetic resonance imaging of brain and body, carotid ultrasound and dual-energy X-ray absorptiometry body scans) in up to 100,000 of the participants. In addition, since measurement error in risk factor levels (due to short-term biological variability or to longer-term within-person fluctuations) may substantially underestimate the true aetiological associations that exist (i.e. regression dilution bias [12]), a repeat of the baseline assessment visit will be conducted every few years in subsets of 20–25,000 participants. The value of UK Biobank depends not only on its ability to obtain detailed baseline data and biological samples but also on achieving detailed follow-up of the health of participants, which is made possible through linkage to routine data available from the UK National Health Service (e.g. mortality, cancer registrations, hospital admissions, primary care data, etc.). Information will also be sought directly from participants about conditions that are typically under-reported (e.g., cognitive decline, depression). Many cohort studies have not been in a position to well-characterise the wide range of health outcomes identified during follow-up, leading to a loss of statistical power caused by misclassification of cases and/or the grouping together of disparate subtypes. UK Biobank is

Setting the standard for modern populationbased epidemiology Understanding the determinants of common life-threatening and disabling diseases is challenging. Such conditions are typically caused by a variety of different exposures which may each have modest effects and interact with each other in complex ways [3,4]. In order to investigate a wide range of exposures, extensive information needs to be collected through questionnaires and physical measures, as well as through storing biological samples that allow many different types of assay to be performed (e.g., genetic, proteomic, metabonomic, biochemical). Prospective cohorts have a number of advantages for assessing the combined effects of lifestyle, environment, genes and other exposures on a variety of health outcomes [4,5]. In particular, exposures can be assessed before they are affected by disease or its treatment (thereby avoiding recall bias and minimising reverse causation bias). In addition, the prospective nature of the study means that a wide range of conditions can be investigated, including those that are difficult, if not impossible, to study retrospectively (e.g., dementia and rapidly fatal conditions). Moreover, the overall beneficial and adverse effects of a specific exposure on the life-time risks of multiple health outcomes can be considered (e.g., associations of obesity with different causes of death [6]). However, because only a small proportion of the participants will develop any one condition and the effects of different exposures on the development of that condition are likely to be modest, prospective studies need to be large, with many tens of thousands of participants [3]. While prospective studies are crucial for the reliable identification and quantification of risk factors for disease, they require substantial long-term investment and typically collect either a large amount of data on a small number of participants (e.g. the Framingham Heart Study, with a wide range of physical measures on 5000 participants [7]), or a relatively small amount of data on a large number of participants (e.g. the Million Women Study, with questionnaire data on 1.3 million women [8]). Other prospective studies have focused on the assessment of certain types of exposure on specific outcomes (e.g. diet and cancer in the EPIC study of 500,000 people in Europe [9]). Because UK Biobank collected extensive baseline questionnaire data, physical measures, and biological samples on 0.5 million participants, who are now being exhaustively followed up, it is a rich resource for investigating why some people develop particular diseases while others do not.

UK Biobank: Current status and what it means for epidemiology

Table 1

125

Data collected at baseline.

Type of measure

Topic area

Details

Questionnaire (touch-screen and verbal interview)

Socio-demographics

Employment status, marital status, education, income, car ownership, ethnicity Family history of major diseases, birth place, birth weight, breastfeeding, maternal smoking, childhood body size Mental health, social support Current address, occupation, housing, domestic heating and cooking fuel, means of travel, shift work, mobile phone use Smoking, alcohol consumption, physical activity, diet, sleep Medical history, medications, operations, hearing, sight, sexual and reproductive history Hearing test Tests for episodic and numeric memory, reaction time, fluid intelligence, prospective memory

Family history and early life exposures Psychosocial factors Environmental factors Lifestyle Health status Hearing threshold Cognitive function

Physical measures

Blood pressure, heart rate Hand grip Anthropometry Spirometry Bone density Arterial stiffnessa Fitness testa Eye examinationa

Two automated measures one minute apart Left and right hand grip strength Height, weight and bioimpedance, hip/waist circumference Lung function tests Left and right heel calcaneal ultrasound Pulse wave velocity Cycle ergometry with ECG monitoring Refractive index, intraocular pressure, visual acuity, optical coherence tomography

Biological samples

Blood Urine Salivaa

Plasma, serum, buffy coat, red cells, DMSO blood, RNA

a

Measures introduced towards the end of recruitment and available for 70,000 to 120,000 participants.

therefore undertaking substantial efforts to ascertain, confirm and characterise the most common health outcomes, both for prevalent and incident disease. This will involve cross-referencing diagnoses via multiple sources of information, starting with the use of lower-cost electronic sources and followed by the use of more resource-intensive methods for their confirmation and further sub-phenotyping, thereby enabling researchers to focus on particular sub-types of disease.

Opportunities for the future Opportunities now exist for research based on prevalent disease (e.g., there are 24,000 participants with selfreported diabetes and 11,000 with breast cancer) and other information recorded at baseline. Over the next few years, large-scale research will be possible on incident cases of some of the more common conditions (e.g. diabetes mellitus, coronary heart disease, chronic obstructive pulmonary disease and breast cancer). Beyond the fifteenth year of follow-up (i.e. after 2020), UK Biobank will become sufficiently mature to allow reliable investigation of an increasingly wide range of conditions (Table 2).

Access to the resource The UK Biobank resource launched in April 2012 and is now available for use by researchers, without exclusive or

preferential access, for any health-related research that is in the public interest (http://www.ukbiobank.ac.uk). In order to encourage extensive use of the resource for health research, all bona fide researchers can apply, including those from the academic, charity, public and commercial sectors, both in the UK and internationally. The online application process enables researchers to select data fields specific to their research proposal and is linked to auto mated systems for sample retrieval. Robust safeguards are in place to help ensure anonymity and confidentiality of participants’ data and samples [13]. UK Biobank is a registered charitable company and, as such, researchers are only required to pay for access to the resource on a cost-recovery basis for their proposed research, the results of which will be incorporated back into the resource so that others can benefit from their findings. The involvement of UK Biobank in international initiatives aimed at data harmonisation across studies (such as DataSHaPER [14] and BBMRI [15]) will also help to improve accessibility and collaboration with other research studies.

Conclusion UK Biobank has shown that it is possible to establish a large population-based prospective study with a high quality of data collection, both of participants’ baseline characteristics and their subsequent health outcomes. This has been made possible with an emphasis on highly-efficient and

126

N. Allen et al.

Table 2

Estimated numbers of incident cases of various disease outcomes during follow-up in UK Biobanka.

Condition

Diabetes Myocardial infarction and coronary death Stroke COPD Breast cancer Colorectal cancer Prostate cancer Lung cancer Hip fracture Rheumatoid arthritis Alzheimer’s disease Parkinson’s disease a

Incident cases By 2012

By 2017

By 2022

By 2027

10,000 7000 2000 3000 3000 1000 1000 1000 1000 1000 1000 1000

24,000 17,000 5000 8000 6000 4000 4000 2000 3000 2000 3000 3000

40,000 28,0000 9000 14,000 10,000 7000 7000 4000 6000 3000 9000 6000

68,000 47,000 20,000 25,000 16,000 14,000 14,000 8000 17,000 5000 30,000 14,000

Based on UK age-and sex-specific rates with adjustment for potential ‘healthy-cohort effects’ and losses to follow-up [2].

centralised processes with close collaboration with the academic community. The open-access nature of the resource will allow researchers from around the world to conduct research that lead to better strategies for the prevention, diagnosis and treatment of a wide range of lifethreatening and disabling conditions.

Acknowledgements UK Biobank is funded by the Medical Research Council, Wellcome Trust, Department of Health, British Heart Foundation, Northwest Regional Development Agency, Scottish Government, and Welsh Assembly Government. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

References [1] Palmer LJ UK. Biobank: bank on it. Lancet 2007;369:1980–2. [2] UK Biobank. UK Biobank: rationale, design and development of a large-scale prospective resource, /http://www.ukbiobank. ac.uk/resources/S. [3] Burton PR, Hansell AL, Fortier I, et al. Size matters: just how big is BIG?: Quantifying realistic sample size requirements for human genome epidemiology Int J Epidemiol 2009;38: 263–273. [4] Manolio TA, Bailey-Wilson JE, Genes Collins FS. Environment and the value of prospective cohort studies. Nat Rev Genet 2006;7:812–20.

[5] Grimes DA, Schulz KF. Cohort studies: marching towards outcomes. Lancet 2002;359:341–5. [6] Whitlock G, Lewington S, Sherliker P, et al. Body-mass index and cause-specific mortality in 900,000 adults: collaborative analyses of 57 prospective studies. Lancet 2009;373:1083–96. [7] Higgins MW. The Framingham Heart Study: review of epidemiological design and data, limitations and prospects. Prog Clin Biol Res 1984;147:51–64. [8] The Million Women Study Collaborative Group. The million women study: design and characteristics of the study population. Breast Cancer Res 1999;1:73–80. [9] Riboli E, Kaaks R. The EPIC Project: rationale and study design. European Prospective Investigation into Cancer and Nutrition. Int J Epidemiol 1997;26(Suppl. 1):S6–14. [10] Elliott P, Peakman TC. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int J Epidemiol 2008;37:234–44. [11] Liu B, Young H, Crowe FL, et al. Development and evaluation of the Oxford WebQ, a low-cost, web-based method for assessment of previous 24 h dietary intakes in large-scale prospective studies. Public Health Nutr 2011;14:1998–2005. [12] Clarke R, Shipley M, Lewington S, et al. Underestimation of risk associations due to regression dilution in long-term followup of prospective studies. Am J Epidemiol 1999;150:341–53. [13] UK Biobank. Access Procedures: application and review procedures for access to the UK Biobank Resource, /http://www. ukbiobank.ac.uk/resources/S. [14] Fortier I, Burton PR, Robson PJ, et al. Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Int J Epidemiol 2010;39:1383–93. [15] BBMRI. Biobanking and Biomolecular Resources Research Infrastructure, /http://wwwbbmri.eu/S.