How Will Haematologists Use Proteomics?

How Will Haematologists Use Proteomics?

Blood Reviews (2007) 21, 315–326 www.elsevierhealth.com/journals/blre REVIEW How Will Haematologists Use Proteomics? Richard D. Unwin *, Anthony D...

550KB Sizes 1 Downloads 75 Views

Blood Reviews (2007) 21, 315–326

www.elsevierhealth.com/journals/blre

REVIEW

How Will Haematologists Use Proteomics? Richard D. Unwin *, Anthony D. Whetton

1

Stem Cell and Leukaemia Proteomics Laboratory, Faculty of Medical and Human Sciences, University of Manchester, Christie Hospital, Kinnaird House, Kinnaird Road, Withington, Manchester, UK M20 4QL

KEYWORDS

Summary Proteomics technologies are emerging as a useful tool in the identification of disease biomarkers, and in defining and characterising both normal physiological and disease processes. Many cellular changes in protein expression in response to an external stimulus or mutation can only be characterised at the proteome level. In these cases protein expression is often controlled by altered rates of translation and/or degradation, making proteomics an important tool in the analysis of biological systems. In the leukaemias, post-translational modification of proteins (e.g. phosphorylation, acetylation) plays a key role in the molecular pathology of the disease: such modifications can now be detected with novel proteomic methods. In a clinical setting, serum remains a relatively un-mined source of information for prognosis and response to therapy. This protein rich fluid represents an opportunity for proteomics research to benefit hematologists and others. In this review, we discuss the technologies available for the study of the proteome that offer realistic opportunities in haematology. c 2007 Elsevier Ltd. All rights reserved.

Biomarker Discovery; Proteomics; Liquid Chromatography; Mass Spectrometry

 Introduction

The term ‘proteome’, referring to the protein product of the genome, was first coined in 1995.1 Since then the number of, ‘-omics’ technologies has proliferated with the suffix used to describe the systematic collection of large scale datasets of anything from metabolites (metabolomics), through protein interactions (interactomics) to

* Corresponding author. Tel.: +44 0 161 446 8451. E-mail addresses: [email protected] (R.D. Unwin), [email protected] (A.D. Whetton). 1 Tel.: +44 0 161 446 8453.



secreted proteins (the secretome). Over the last three decades, major improvements in several fields have enabled the growth of proteomics. More comprehensive sequencing of the genome, better and faster bioinformatics systems, and new and enhanced methods for protein analysis have all played a role in developing proteomics into a powerful suite of analytical tools. Such techniques are absolutely required. The proteome is a complex and dynamic entity. Unlike the relatively static genome, the proteome is controlled by an astonishing number of factors: complexity in terms of protein synthesis, degradation, and modification can affect protein localisation,

0268-960X/$ - see front matter c 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.blre.2007.07.002

316 binding partners and function. Over the last decade in systematic biology, there has been a growing trend towards analysis of mRNA expression using microarray technologies to study changes in gene expression.2 Whilst the sensitivity and coverage afforded by these methods are clearly advantageous, it is apparent that a change in the level of transcript is just one of many ways in which the proteome can be altered. It was noted ten years ago that absolute levels of mRNA do not correlate with protein levels,3 and several studies have shown that both absolute levels and perturbations in the transcriptome and proteome upon a given stimulus are not necessarily congruent.4 Given this, it is clear that the application of proteomics technologies is vital in obtaining an insight into complex biological systems, whether comparing proteomes for biomarker identification,5 studying protein-protein interactions6 or identifying protein post-translational modifications.7 A key area for proteomics is in biomarker discovery. Here biological samples are screened to identify protein expression which correlates with disease, or prognosis. The most common material for these studies currently is serum/plasma, and analysis of this fluid is currently commanding focussed attention to overcome some of the unique challenges which it presents, including large dynamic range, high abundance and constituent protein instability.8

Early proteomic analyses in haematology Perhaps the earliest systematic analysis of the proteome in haematology was achieved using 2dimensional polyacrylamide gel electrophoresis (2D-PAGE). It involves separating a complex mixture of proteins by two independent parameters, namely by their isoelectric point (pI; the point in a pH gradient where the net charge on the protein is zero) in one direction, followed by separation on the basis of mass by standard SDS-polyacrylamide gel electrophoresis in the second dimension. Proteins are then visualised using a stain, and the spot patterns produced by different samples can be compared to identify differences in their proteomes. 2D gels have been extensively used in haematology. This method is highly adaptable depending on the type of study which the investigator wishes to perform, from screening of protein expression in whole cells/tissues, to specific enrichment and identification of specific protein types and families. Below we describe selected studies to illustrate the range of options this approach allows (Fig. 1).

R.D. Unwin, A.D. Whetton One of the first studies using 2D-PAGE to analyse haematological samples was published in 1988, where whole cell lysates from a range of normal peripheral blood lymphocytes, acute lymphoblastic leukaemia (ALL) and acute myeloid leukaemia samples were analysed using 2D gels.9 A protein designated p18, since further characterised as oncoprotein 18 (Op18; Stathmin10), was identified as being overexpressed in both types of leukaemia. Stathmin remains of great interest. It is a key cell cycle regulatory protein11 whose expression or phosphorylation appears to be altered in a number of haematologic and other tumour types, including childhood acute leukaemias12 and upon expression of the oncogene PML/RARa.13 This approach has also been used for the analysis of platelet proteins. In a study by O’Neill et al, 1102-1256 spots could be identified on gels with a wide pH gradient. However, using novel approaches where several smaller range pH gradients are analysed almost 2300 protein features were seen on the 2D gels of these platelets.14 Above all, this study demonstrates the penetration of 2DPAGE, and how it can be increased to allow identification of a greater proportion of the proteome. An alternative method for increasing the coverage of the proteome is generation of organelle fractions i.e. purifying membranes, nuclei, mitochondria etc. This allows increased amounts of low copy number proteins to be loaded onto the gels to enable their detection, it has the added advantage of being able to detect changes in protein localization, not just changes in global expression. Poirier et al demonstrated that 960 proteins could be visualised by 2D-PAGE separation of membrane fractions from the lymphoma cell line DG75, allowing identification of 49 proteins whose (cell membrane) expression altered upon treatment with the chemotherapeutic DNA methyltransferase inhibitor AZC.15 Similarly, Bavelloni et al used nuclear fractionation coupled to 2D-PAGE to visualise differences induced by expression of phospholipase C beta in Friend erythroleukaemia cells.16

Mass Spectrometry Mass spectrometry underlies many cutting edge forms of proteomic analysis. The basic principle behind MS analysis of the proteome is measurement of the mass of proteins and peptides. This is determined as a mass:charge ratio (m/z) either by measuring the time taken to reach a detector (time of flight mass analyser; ToF), or by sequential ejection of peptides (smallest first) from an electric field in which the ions were ‘trapped’ (ion trap). In many cases, MS instrumentation is also

How Will Haematologists Use Proteomics?

317

Figure 1 Options for performing proteome analysis by 2D-PAGE. 2D-PAGE allows proteomes and subproteomes to be analysed in several ways. Sample may be prepared from whole cells, or from organelle fractions, or by enriching proteins based on their biochemical properties. These proteins may then be analysed over a variety of pH ranges, and run on gels either side-by-side, or on the same gel following fluorescent labelling (Difference in-Gel Electrophoresis; DiGE). Finally, proteins can be visualised using a variety of stains from total protein, protein modification, or using autoradiography.

used to induce a peptide to fragment, allowing the determination of amino acid sequence. For proteins separated by 2D-PAGE, spots can be excised from the gel, and treated with a protease to obtain smaller peptide fragments by in-gel digestion.17 The protein can then be identified. Frequently peptides are first separated by reverse-phase liquid chromatography (LC), which allows sample clean up (removal of salts which can interfere with MS), concentration of each peptide (from microlitres into a few nanolitres) and also separation so that the MS deals with only a few peptides at a time. Peptides are introduced to the mass spectrometer sequentially, and their mass measured. A specific peptide can then be isolated within the instrument, and collided with an inert gas (such as nitrogen or argon), which causes the peptide to fragment along its backbone. The masses of the fragments can be measured and used to infer the amino acid sequence (Fig. 2). For several years, the major role for mass spectrometry in proteomics was in identifying proteins from gel spots from 2D separations, or from 1D gel bands. However, advances in peptide separation, mass spectrometry instrumentation,

and methods for relative and absolute quantification have brought MS to the forefront of cutting edge proteomics research. For the analysis of complex mixtures, such as whole cell or organelle lysates, or protein complexes, samples must be fractionated prior to MS analysis. This need is exacerbated by the fact that for direct MS analysis proteins need to be digested into peptides, providing analytes of optimum mass for detection, sequencing, and identification of post-translational modification at the expense of increasing sample complexity. This separation is most commonly performed by liquid chromatography, either ‘off-line’, where fractions are collected and analysed individually, or ‘on-line’ where peptide elute from a column directly into the MS. Such a method is illustrated by Vaughn et al., in their assessment of proteins which are secreted by follicular lymphoma-derived cells.18 Despite this, one mode of chromatography is usually insufficient to resolve all peptides in a complex sample into a sufficiently simple mixture for the MS to be able to analyse all peptides introduced to it at any given time. Therefore a multi-dimensional approach is commonly used.19 Peptides are initially separated

318

R.D. Unwin, A.D. Whetton

Figure 2 Determination of peptide sequence from tandem mass spectrometry data. (a) the tubulin peptide LHFFMPGFAPLTSR annotated with standard nomenclature for y-ions (red) and b-ions (green) generated by collisionally induced cleavage at the peptide bond between amino acids. (b) and (c) show the same tandem mass spectrum, annotated in two ways. (b) shows how the amino acid sequence can be determined by de novo sequencing, with the mass differences between the ions determining which amino acid is likely to be next in the peptide sequence. (c) shows the masses of the ions, along with the sequence of the peptide fragment which they represent. Note that all labelled ions carry a single charge, therefore the m/z is equivalent to the mass, since the charge (z) = 1.

by chromatography using one form of media e.g. separated on the basis of charge by cation exchange chromatography. Fractions from this separation are then injected onto a reverse-phase column for further separation prior to MS analysis. In theory, other forms of chromatography can be used to further fractionate peptides, with the caveat that the more fractions are produced, the longer the MS analysis will take. These advances allow large scale cataloguing of proteins in a mixture, but still do not allow relative quantification of proteins between samples per se. It is possible that, given sufficiently high reproducibility in the chromatography, differences between

two samples run sequentially can be calculated by comparison of the signal generated by peptides in each sample.20 However, in most cases chemical labelling is used in order to be able to mix samples and analyse (and therefore relatively quantify) simultaneously, with expression differences determined based on signal strength. Chemical labels used to distinguish peptides in a mass spectrometry experiment are necessarily chemically identical, allowing the same peptide from two (or more) samples to behave identically in terms of chromatographic retention and ionisation efficiency. Therefore, tags containing stable isotopes are covalently bound to proteins/peptides

How Will Haematologists Use Proteomics? to provide the necessary mass difference to determine which peptide peaks derived from which sample (Fig. 3a). The first such approach was isotope coded affinity tagging (ICAT),21 which has been superceded by a second generation cleavable ICAT (cICAT).22 This technique has been successfully used in a haematology setting by Barnidge et al., who used it to study the differences between CLL subtypes carrying either unmutated or mutated immunoglobulin variable heavy chains. From cytosolic and membrane fractions they identified 13 proteins which differed by more than three-fold between disease subtypes, including Cytochrome c oxidase polypeptide VIb (COX G), which was shown by western blotting to be higher in 6 mutated cases than in 6 unmutated cases.23 ICAT methods suffer

319 from several detrimental features. Firstly, only proteins/peptide which contain cysteine are analysed, so an estimated 1 in 7 proteins are not covered by this method, and a further set can only generate a single labelled peptide per protein, making multiple measures on the same protein impossible.24 Also, the purification step can lead to sample losses. To try and overcome these problems, Ong et al., developed a labelling method using stable-isotope labelled amino acids in culture medium, which are incorporated into proteins of growing cells (Stable Isotope Labelling of Amino acids in Culture; SILAC).25 This method has several advantages over ICAT. The labelling is more efficient and more readily quantified (labelled) peptides can be generated, using amino acids more common than

Figure 3 Schematic of two methods of protein relative quantification by mass spectrometry. (a) Stable isotope labelling, either by culturing cells in media containing a ‘heavy’ amino acid, or using a chemical derivatisation (as shown) allows two samples to be mixed. Upon analysis by mass spectrometry, the same peptide from the two samples appears with a defined mass shift, and the relative peak heights represent the relative amounts of protein in each sample. (b) iTRAQ analysis using isobaric tags to label proteins (up to eight different tags are available, two are shown for clarity). When samples are mixed, the same peptide from the two samples appears as the same mass, increasing sensitivity of detection. Upon fragmentation, the peptide sequence can be identified, and tag-specific ‘reporter’ ions are generated from which to determine relative quantity.

320 cysteine. For example, while cysteine accounts for 3.3% of all proteinaceous amino acids, leucine (the most common amino acid in vertebrate proteins) accounts for 7.6%, theoretically generating 2.3 times more Leu containing peptides than Cys containing. However, there is no opportunity here for purification of labelled peptides, so the many peptides which do not contain a leucine remain unquantified, and also potentially interfere with the MS analysis. An alternative approach is the use of isobaric tags for relative and absolute quantification (iTRAQ).26 These chemical tags work in a different manner to the ICAT/SILAC strategies. The tags are sensitive to decay in a tandem mass spectrometer generating fragments which contain relative quantification information. The iTRAQ tags fragment in a specific manner to release a distinctive ‘reporter’ ion of specific mass, with the relative abundance of the ‘reporter ion’ giving relative quantification of that peptide (Fig. 3b). Within the tag itself, the different masses of the reporter group are offset by a ‘balance’ group, ensuring that the overall mass of each tag remains the same. The mass differences between the reporters and balances are coded in using heavy isotopes, to ensure that the same peptide tagged with different tags behaves in an identical manner. iTRAQ tags have several advantages over ICAT/ SILAC approaches. Firstly, the tag reacts with, and therefore relatively quantifies, every single peptide in a complex sample since it reacts with free amine groups (on the N-terminus of every peptide, or on the side chain of lysine residues). Secondly, as the labelling reaction is not in live cells, it is suitable for all types of protein sample, not the case with SILAC. iTRAQ tags are currently available in four masses, allowing the relative quantification of four sample (soon to be expanded to eight samples, Pierce et al, unpublished observations), whereas labelling with ICAT is limited to two samples and SILAC has only been used so far with three. iTRAQ also potentially increases sensitivity, as the signal from all peptides in the experiment is summed (they all have the same mass) in both MS and MS/MS. This is exemplified in a study by Unwin et al27 comparing the proteome of primitive haematopoietic Lin-Sca+Kit+cells to a Lin-Sca+Kit- cell population. Using primary cells, from bone marrow means that very little material is available. However, the iTRAQ approach, allowed relative quantification of almost 1000 proteins from these cells, using just 20 lg of protein from each channel (sufficient to assess one protein in a western blot). Since this was primary material, SILAC could not be employed.

R.D. Unwin, A.D. Whetton The disadvantage of the iTRAQ system over, for example, SILAC, is the fact that sample labelling is limited in terms of the amount of protein that can be practically labelled, and also that labelling is performed at the peptide level. SILAC allows the culturing of large amounts of material, ready labelled at the protein level. This is invaluable when performing protein fractionations as cells/protein from different samples are pooled prior to enrichment (either organelle fractionation or phosphoprotein enrichment, for example), minimising the effect of ‘noise’ generated during more complex sample preparation protocols.

Post-Translational Modifications: Focus on phosphorylation Protein post-translational modification has an intrinsically important role in processes such as clotting, hematopoiesis, the immune response and so forth. As well as identifying changes in protein expression MS analysis is also now employed to study protein post-translational modification (PTM). Study of PTMs can be large scale, attempting to identify changes in global phosphoprotein profiles, for example. This is done using the labelling strategies described above, coupled to a pre-enrichment step to purify (and therefore specifically analyse) the modification of choice. For phosphorylation, this pre-enrichment can take one of several forms, for example enrichment of phosphotyrosyl peptides or proteins by immunoaffinity purification,28 specific SH2 domains29 or enrichment of phosphopeptides by immobilised metal affinity chromatography (IMAC) with either immobilised metal ions (commonly Fe3+ or Ga3+)30,31 or titanium dioxide.32 Tyrosine phosphorylation is much less common than Ser/Thr phosphorylation, and so specific methods must be used if pY modifications are the major focus of a study. These methods still generally require improvement for greater sensitivity and penetration.33 Many of the enrichment methods suffer from non-specific binding, especially of acidic peptides. Also, stoichiometry becomes an issue (if, say only 10% of a protein is phosphorylated) so it tends to be the case that larger amounts of starting material are required for these experiments. However, using phosphotyrosine profiling Goss et al identified 188 tyrosine phosphorylation sites from BCR/ABL expressing cells, using 2 · 108 cells,28 and Shu et al identified 193 phosphorylation sites (on Ser, Thr and Tyr) from the B-cell lymphoma cell line WEHI-23134 from a total of 5mg of protein (around 5 · 107 cells). This demonstrates the challenge presented

How Will Haematologists Use Proteomics? by global phosphorylation profiling methods. Where sufficient material is available, and the most sensitive mass spectrometry instrumentation is used, it is possible to obtain relative quantification on 6,600 phosphorylation sites, from 2,244 proteins.35 However, generation of data on this scale requires resources and expertise present in only a handful of laboratories worldwide. Plainly developments in instrumentation will make these approaches more accessible. As an alternative to global phosphoprotein profiling, mass spectrometry is also used to search for specific phosphorylation sites. Several methods are currently used, the details of which are well reviewed elsewhere.36 However, several good examples of their use exists in haematology, ranging from the identification of tyrosine phosphorylation sites on the leukaemic oncogene BCR/ABL,37 to phosphorylation of RasGRP3 by Protein Kinase C in B-cell receptor signalling,38 and Death-associated protein-related apoptotic kinase-2 (DRAK2) autophosphorylation in primary lymphocytes.39 Notably, most current studies use MS to identify sites of phosphorylation and then generate phosphorylation site specific antibodies for further and more sensitive quantitative analysis, although mass spectrometric methods for specific phosphorylation site quantification are now becoming available.40–42 One other advantage of performing phosphorylation analyses is that it is possible to use this data to identify other targets of gene expression. Mercher et al used a phosphotyrosine immunoprecipitation protocol to identify phosphorylation events in an AMKL (acute megakaryoblastic leukaemia) cell line.43 This approach identified 346 phosphopeptides, with several from janus kinase (JAK) family members. Analysis of these proteins revealed a mutation in JAK2 (JAK2 T875N) which, upon further investigation, was shown to be capable of conferring growth factor-independent growth properties on a Ba/F3 cell line, an indication of oncogenic potential. Expression of this mutant protein in a murine bone marrow transplant led to the development of a myeloproliferative disorder with AMKL characteristics. While T875 was not initially identified to be a target of phosphorylation, this example demonstrates that the identification of protein targets by mass spectrometry can lead into new avenues of research.

‘Targeted’ Antibody-directed methods Both 2D-PAGE and mass spectrometry are ‘discovery-led’ techniques, which need no prior knowledge of the proteins which will be identified and

321 quantified. In practice, this means that these techniques are limited in terms of their dynamic range, with the most abundant proteins in the sample being readily analysed, but the least abundant often being missed. While fractionation approaches go some way to alleviate this problem, it is clear that if the aim of an analysis is to look specifically at a certain protein, protein family or pathway, then a more focussed or targeted method is more suitable. The most simple of these is obviously the immunoblot, where protein immobilised to a membrane support is detected by means of a specific antibody. More complex tools are now being developed, commonly using an array format, for targeted proteome analyses. These arrays can be classified into two types. Antibody arrays (forward-phase arrays) are produced by spotting many antibodies (or other specific capture reagents) onto a glass slide. This is incubated with a single whole cell lysate to generate a profile of specific proteins, in much the same way as a cDNA microarray. Use of fluorescent Cy dyes to label protein prior to incubation allows relative quantification of protein binding, even for low molecular weight, low abundance proteins such as cytokines.44 Reverse-phase arrays, on the other hand, are constructed by spotting many samples onto a glass slide, and probing with a specific detection reagent such as an antibody. This allows the processing of many samples simultaneously – an important feature of methods for screening patient samples and has the added advantage of not requiring sample protein labelling. Also, since nanolitres of sample are required for each spot, and many replicate arrays can be produced in a single run, these arrays offer advantages in sensitivity – especially important where clinical material is limited. A good example of the utility of this approach is given by Tibes et al, where expression and phosphorylation of several signalling proteins in AML was analysed.45 As an alternative to reverse-phase arrays where protein samples are used, it is also possible to array tissue samples. Here, cylindrical ‘cores’ are punched from paraffin embedded biopsies. Cross sections of the block are then taken and mounted onto glass slides, allowing immunohistochemical detection of protein in many samples at once.46 This is especially important in highly heterogeneous tissues, where protein expression in a specific cell type is diagnostic/prognostic. This is demonstrated by Garcia-Cosio et al., whose study of transcription factor expression in Hodgkins lymphoma samples involved assessment of protein levels in Hodgkin/

322 Reed Sternberg cells and B-cells within many samples simultaneously.47 The use of arrays can carry several disadvantages. Firstly, they are limited to the analysis of proteins for which an antibody is available, therefore the range of proteins that can be identified as changing (or indeed potential biomarkers) is immediately limited. Secondly, that antibody has to work well in the array format. This is especially key in forward-phase arrays, where each antibody has to be shown to be specific for its protein target in the same buffer conditions as every other antibody on the array. However, the big advantage over gel or MS-based methods is that the arrays can theoretically be supplied ‘out of the box’, allowing large numbers of samples to be analysed without the requirement for too much method optimisation requiring key skills and experience. Further development of this technology will undoubtedly make it an important tool in the development and detection of diagnostic and prognostic biomarkers, and tumour classification. Other focussed techniques for assessment of disease markers are being developed. A key method here is ‘single-cell proteomics’.48 This flow cytometry-based method was first described in studies of signalling pathways in acute myeloid leukaemia cells, in response to various stimuli, allowing clinically relevant clustering of cases of AML according to their responses. Approaches such as this therefore allow the design of more informed, mechanism-based therapies. One drawback with this approach for clinical studies is in obtaining reproducible staining of samples, and of analysis time when dealing with potentially hundreds of clinical samples. To this end, Krutzik and Nolan recently introduced a fluorescent cell barcoding approach, whereby individual samples are ‘barcoded’ with a fluorescent molecules, and can then be mixed and analysed simultaneously. This allowed the analysis of phosphorylation status of STAT3 and ERK following treatment with potential kinase/ phosphatase inhibitors in 96 samples simultaneously, by treating and barcoding in a 96-well plate format, then pooling the cells and performing a single antibody staining and flow cytometry experiment .49 This represents a powerful tool for the targeted proteomic assessment of new drug targets or therapies.

Clinical proteomics A new research area has recently come to the fore where biological fluids are analysed for the discovery of diagnostic and prognostic markers of dis-

R.D. Unwin, A.D. Whetton ease. This area presents specific challenges not apparent in the analysis of model systems. Chief amongst these are the patient-to-patient variability in samples, both due to heterogeneity in the population and also due to sample handling during routine collection in clinics, surgeries and operating theatres. While this means that, for these experiments, appropriate sample collection and storage protocols must be in place, it also presents a challenge to the experimental systems used for sample analysis. The most obvious way to counter this variation is by increasing the number of samples analysed. This therefore makes selection of an analytical method dependent not only on the strength of the method, but also its throughput. Technologies like 2D gels have been used for analysis of clinical samples and biomarker identification/validation,9,50,51 these studies analysing 190 primary samples, 30 primary samples (in triplicate, 90 gels in total) and 15 samples respectively. This is a large amount of work for a screening protocol and presents challenges in term of experimental reproducibility and data analysis in comparing spot profiles from, for example, 90 separate images. Mass spectrometry based protocols potentially provide a cleaner way of doing these analyses. One MS-based application which does give higher throughput for biomarker detection is surface enhanced laser desorption/ionisation time-of-flight mass spectrometry (SELDI-ToF MS). In this novel technology initially developed by Ciphergen Inc, samples (mainly fluids such as serum, plasma, urine etc) adhere to a metal ‘chip’ which has a chromatographic matrix embedded onto the surface. Thus different chips enrich different populations of protein/peptide and allow analysis of a specific protein fraction. The chip is then analysed in a basic MS instrument, measuring the masses of the protein components and by comparing profiles, biomarkers can be identified. SELDI-ToF MS offers many advantages in terms of throughput – it is possible to analyse up to 80 samples (in duplicate for one binding condition) in a day. SELDI-ToF has been used in leukaemias with Albitar et al., identifying biomarkers which predict disease recurrence in adult acute lymphoblastic leukaemia.52 However, there remain several key limitations in this technology. The resolution of the instrument means that mainly sample components of <20 kDa are analysed, making it complementary to gelbased approaches. Also sample preparation appears to be of vital importance in obtaining reproducible profiles. In an early demonstration of this methodology, Petricoin et al published a study using SELDI-ToF to identify novel biomarkers for ovarian cancer.53 However, the data in this

How Will Haematologists Use Proteomics? paper has not been reproduced elsewhere, and reanalysis of the data has suggested that sample processing, rather than the disease itself, may have been responsible for some of the key protein profile differences.54,55 Since this study, many laboratories have concentrated on methods for sample handling and data analysis to allow the generation of reproducible and reliable profiling.56 In SELDI-ToF MS protein biomarkers are not immediately identified, and so while sample screening is rapid, enabling large numbers of sample to be used, biomarker identification takes more time. Of course, it is feasible that the identification of a protein is not vital, since the generation of a peak at a certain mass is enough diagnostic/prognostic information. However, since SELDI-ToF MS involves a high initial financial outlay, and the method does currently present difficulties in terms of inter-laboratory reproducibility, it is logical that the best approach is to try and identify the protein responsible for a biomarker peak so that other, robust assays such as ELISAs can be developed. In some cases, candidate markers can be identified by database searching to find proteins of similar intact mass to the biomarker, and confirming changes between samples by western blotting.57 However, this approach often leads to several candidates for any particular peak, and frequently biomarkers may be fragments of larger protein and so this approach is not possible. Here, protein fractionation schemes using chromatography based around retention properties on the SELDI chips themselves can be considered,58 although this is a much more time consuming and technically challenging route. What is becoming clear from the protein biomarker work using SELDI-ToF MS and other proteomic approaches is that in many cases there is no single biomarker to predict disease, rather that clinical data correlates better with profiles, monitoring the behaviour of several proteins and generating decision trees in order to correlate with clinical state. In the study of serum from ALL patients by Albitar, for example, recurrence was correctly predicted in 92% of cases, and non-recurrence correctly predicted in 72% of cases, using a decision tree where if the intensity of a peak at m/z 162,503 is low, then the patient will not relapse. Of those cases where this peak is present, if a peak at 10,430 is expressed at low levels, then the patient will not relapse. If this second peak is high, then this model would predict a relapse.52 Searching the human protein databases Swiss-PROT and TrEMBL, with 0.1% mass accuracy suggests that the bigger peak could be one of six proteins, the smaller could be one of eleven. This demonstrates the importance of being able to identify the pro-

323 teins responsible for these peaks using, for example, tandem mass spectrometry. A similar approach to the SELDI-ToF, ClinProt, is also available. Here, the binding chemistries are bead-based, rather than immobilised onto the chip surface. This means that bound and eluted proteins can be analysed using a variety of mass spectrometric approaches, allowing more accurate mass determination and possibly direct protein identification.59 Biomarker analysis can also be performed using some of the mass spectrometry methods described above. iTRAQ provides an attractive option for this work due to its increased multiplexing capability allowing increased throughput, and by definition proteins are identified as they are quantified. In a recent example, Kristiansson et al., identified several proteins as being biomarkers for inflammation in the plasma of lymphoma-bearing mice.60 This and other approaches offer further in depth proteomic analysis for biomarker discovery.

Conclusion and Future Developments The development of proteomics technology platforms is proving to be an important step in expanding our knowledge of systems in haematology and many other disciplines. It is becoming clearer that systems and pathway act in concert, rather than in isolation, making systems biology approaches increasingly relevant. With the expanding evidence that many changes at the protein level are not necessarily driven by alterations in rates of transcription or levels of transcript, systems level proteomics experiments provide a basis for the generations of new and systems based hypothesis. Several platforms for proteomics experiments now exist, and are ever improving, but selection of the correct platform to provide the most comprehensive answers to the questions being asked is not straightforward. For targeted analyses protein arrays provide a much more sensitive method of assessing the expression of (known) target proteins. This technology is still being developed to increase sensitivity, reproducibility and depth of coverage, but where the investigator has prior knowledge of the proteins that need to be quantified from a sample, they provide a good alternative to more stochastic screening approaches such as 2D-PAGE or MS. For untargeted approaches, mass spectrometry provides more sensitivity than gel-based methods, yet is unable to resolve splice variants or differentially modified forms of proteins (unless that modification is being specifically targeted). Both methods still suffer at the hands of the large

324 dynamic range of the proteome, where high abundance proteins mask signals from those of low abundance (for example in serum, it is thought that there may be up to 1012 molecules of albumin for every one copy of the least abundant secreted protein61). While effective prefractionation strategies can currently go some way to increasing the coverage of the proteome,62 it is apparent that there is still some time before a whole mammalian proteome can be studied in a single experiment of this type. However, even given the current technological limitations, it remains the case that proteomics data provides valuable information on biological and disease systems from which to form new hypotheses and ultimately expand our knowledge of haematology. Research agenda  Ongoing improvements in mass spectrometry speed and sensitivity to allow analysis of less abundant proteins from smaller amount of starting material (e.g. stem cells), and to allow more information to be obtained from a single analysis.  Methods for identifying and relatively quantifying post-translational modification and protein splice variants for diagnostic, prognostic and research purposes.  More specific methods for phosphoprotein isolation and alternative (and more sensitive) methods of phosphorylation site determination to enable greater understanding of haematological processes and develop incisive tools for clinical use.  Increased coverage of the proteome in (commercially available) antibodies both in solution and especially in array format for rapid characterization of specific cell populations or biological fluids.

Conflict of interest statement The authors declare that there is no conflict of interest in the subject of this manuscript.

References 1. Wasinger VC, Cordwell SJ, Cerpa-Poljak A, et al. Progress with gene-product mapping of the Mollicutes: Mycoplasma genitalium. Electrophoresis 1995;16:1090–4. 2. Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM. Expression profiling using cDNA microarrays. Nat Genet 1999;21:10–4.

R.D. Unwin, A.D. Whetton 3. Anderson L, Seilhamer J. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 1997;18:533–7. 4. Unwin RD, Whetton AD. Systematic proteome and transcriptome analysis of stem cell populations. Cell Cycle 2006;5: 1587–91. 5. Rees-Unwin KS, Morgan GJ, Davies FE. Proteomics and the haematologist. Clin Lab Haematol 2004;26:77–86. 6. Vasilescu J, Figeys D. Mapping protein-protein interactions by mass spectrometry. Curr Opin Biotechnol 2006;17: 394–9. 7. Jensen ON. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 2004;8:33–41. 8. Barelli S, Crettaz D, Thadikkaran L, Rubin O, Tissot J-D. Plasma/serum proteomics: pre-analytical issues. Expert Review of Proteomics 2007;4:363–70. 9. Hanash SM, Strahler JR, Kuick R, Chu EH, Nichols D. Identification of a polypeptide associated with the malignant phenotype in acute leukemia. J Biol Chem 1988;263:12813–5. 10. Melhem RF, Zhu XX, Hailat N, Strahler JR, Hanash SM. Characterization of the gene for a proliferation-related phosphoprotein (oncoprotein 18) expressed in high amounts in acute leukemia. J Biol Chem 1991;266:17747–53. 11. Rubin CI, Atweh GF. The role of stathmin in the regulation of the cell cycle. Journal of Cellular Biochemistry 2004;93:242–50. 12. Melhem R, Hailat N, Kuick R, Hanash SM. Quantitative analysis of Op18 phosphorylation in childhood acute leukemia. Leukemia 1997;11:1690–5. 13. Zada AA, Geletu MH, Pulikkan JA, et al. Proteomic analysis of acute promyelocytic leukemia: PML-RARalpha leads to decreased phosphorylation of OP18 at serine 63. Proteomics 2006;6:5705–19. 14. O’Neill EE, Brock CJ, von Kriegsheim AF, et al. Towards complete analysis of the platelet proteome. Proteomics 2002;2:288–305. 15. Poirier F, Joubert-Caron R, Labas V, Caron M. Proteomic analysis of a lymphoma-derived cell line (DG75) following treatment with a demethylating drug: modification of membrane-associated proteins. Proteomics 2003;3: 1028–36. 16. Bavelloni A, Faenza I, Cioffi G, et al. Proteomic-based analysis of nuclear signaling: PLCbeta1 affects the expression of the splicing factor SRp20 in Friend erythroleukemia cells. Proteomics 2006;6:5725–34. 17. Wilm M, Shevchenko A, Houthaeve T, et al. Femtomole sequencing of proteins from polyacrylamide gels by nanoelectrospray mass spectrometry. Nature 1996;379:466–9. 18. Vaughn CP, Crockett DK, Lin Z, Lim MS, Elenitoba-Johnson KS. Identification of proteins released by follicular lymphoma-derived cells using a mass spectrometry-based approach. Proteomics 2006;6:3223–30. 19. Washburn MP, Wolters D, Yates 3rd JR. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 2001;19:242–7. 20. Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE. Comprehensive label-free method for the relative quantification of proteins from biological samples. J Proteome Res 2005;4:1442–50. 21. Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 1999;17:994–9. 22. Hansen KC, Schmitt-Ulms G, Chalkley RJ, Hirsch J, Baldwin AL, Burlingame AL. Mass Spectrometric Analysis of Protein Mixtures at Low Levels Using Cleavable 13C-Isotope-coded

How Will Haematologists Use Proteomics?

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33. 34.

35.

36.

37.

38.

39.

Affinity Tag and Multidimensional Chromatography. Mol Cell Proteomics 2003;2:299–314. Barnidge DR, Jelinek DF, Muddiman DC, Kay NE. Quantitative protein expression analysis of CLL B cells from mutated and unmutated IgV(H) subgroups using acid-cleavable isotope-coded affinity tag reagents. J Proteome Res 2005;4:1310–7. Vaughn CP, Crockett DK, Lim MS, Elenitoba-Johnson KS. Analytical characteristics of cleavable isotope-coded affinity tag-LC-tandem mass spectrometry for quantitative proteomic studies. J Mol Diagn 2006;8:513–20. Ong SE, Blagoev B, Kratchmarova I, et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 2002;1:376–86. Ross PL, Huang YN, Marchese JN, et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using aminereactive isobaric tagging reagents. Mol Cell Proteomics 2004;3:1154–69. Unwin RD, Smith DL, Blinco D, et al. Quantitative proteomics reveals posttranslational control as a regulatory factor in primary hematopoietic stem cells. Blood 2006;107: 4687–94. Goss VL, Lee KA, Moritz A, et al. A common phosphotyrosine signature for the Bcr-Abl kinase. Blood 2006;107: 4888–97. Dierck K, Machida K, Voigt A, et al. Quantitative multiplexed profiling of cellular signaling networks using phosphotyrosine-specific DNA-tagged SH2 domains. Nat Methods 2006;3:737–44. Stensballe A, Andersen S, Jensen ON. Characterization of phosphoproteins from electrophoretic gels by nanoscale Fe(III) affinity chromatography with off-line mass spectrometry analysis. Proteomics 2001;1:207–22. Posewitz MC, Tempst P. Immobilized gallium(III) affinity chromatography of phosphopeptides. Anal Chem 1999;71:2883–92. Pinkse MW, Uitto PM, Hilhorst MJ, Ooms B, Heck AJ. Selective isolation at the femtomole level of phosphopeptides from proteolytic digests using 2D-NanoLC-ESI-MS/MS and titanium oxide precolumns. Anal Chem 2004;76: 3935–43. Reinders J, Sickmann A. State-of-the-art in phosphoproteomics. Proteomics 2005;5:4052–61. Shu H, Chen S, Bi Q, Mumby M, Brekken DL. Identification of Phosphoproteins and Their Phosphorylation Sites in the WEHI-231 B Lymphoma Cell Line. Mol Cell Proteomics 2004;3:279–86. Olsen JV, Blagoev B, Gnad F, et al. Global, in vivo, and sitespecific phosphorylation dynamics in signaling networks. Cell 2006;127:635–48. Spickett CM, Pitt AR, Morrice N, Kolch W. Proteomic analysis of phosphorylation, oxidation and nitrosylation in signal transduction. Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 2006;1764:1823–41. Steen H, Fernandez M, Ghaffari S, Pandey A, Mann M. Phosphotyrosine mapping in Bcr/Abl oncoprotein using phosphotyrosine-specific immonium ion scanning. Mol Cell Proteomics 2003;2:138–45. Zheng Y, Liu H, Coughlin J, Zheng J, Li L, Stone JC. Phosphorylation of RasGRP3 on threonine 133 provides a mechanistic link between PKC and Ras signaling systems in B cells. Blood 2005;105:3648–54. Friedrich ML, Cui M, Hernandez JB, et al. Modulation of DRAK2 autophosphorylation by antigen receptor signaling in primary lymphocytes. J Biol Chem 2007;282: 4573–84.

325 40. Mayya V, Rezual K, Wu L, Fong MB, Han DK. Absolute quantification of multisite phosphorylation by selective reaction monitoring mass spectrometry: determination of inhibitory phosphorylation status of cyclindependent kinases. Mol Cell Proteomics 2006;5: 1146–57. 41. Wolf-Yadlin A, Hautaniemi S, Lauffenburger DA, White FM. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc Natl Acad Sci U S A. 2007;104:5860–5. 42. Gerber SA, Kettenbach AN, Rush J, Gygi SP. The absolute quantification strategy: application to phosphorylation profiling of human separase serine 1126. Methods Mol Biol 2007;359:71–86. 43. Mercher T, Wernig G, Moore SA, et al. JAK2T875N is a novel activating mutation that results in myeloproliferative disease with features of megakaryoblastic leukemia in a murine bone marrow transplantation model. Blood 2006;108:2770–9. 44. Ingvarsson J, Lindstedt M, Borrebaeck CAK, Wingren C. OneStep Fractionation of Complex Proteomes Enables Detection of Low Abundant Analytes Using Antibody-Based Microarrays. J Proteome Res 2006;5:170–6. 45. Tibes R, Qiu Y, Lu Y, et al. Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol Cancer Ther 2006;5: 2512–21. 46. Fedor HL, De Marzo AM. Practical methods for tissue microarray construction. Methods Mol Med 2005;103: 89–101. 47. Garcia-Cosio M, Santon A, Martin P, et al. Analysis of transcription factor OCT.1, OCT.2 and BOB.1 expression using tissue arrays in classical Hodgkin’s lymphoma. Mod Pathol 2004;17:1531–8. 48. Irish JM, Hovland R, Krutzik PO, et al. Single cell profiling of potentiated phospho-protein networks in cancer cells. Cell 2004;118:217–28. 49. Krutzik PO, Nolan GP. Fluorescent cell barcoding in flow cytometry allows high-throughput drug screening and signaling profiling. Nat Methods 2006;3:361–8. 50. Pizzatti L, Sa LA, de Souza JM, Bisch PM, Abdelhay E. Altered protein profile in chronic myeloid leukemia chronic phase identified by a comparative proteomic study. Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 2006;1764:929–42. 51. Cho JW, Kim JJ, Park SG, et al. Identification of B-cell translocation gene 1 as a biomarker for monitoring the remission of acute myeloid leukemia. Proteomics 2004;4:3456–63. 52. Albitar M, Potts SJ, Giles FJ, et al. Proteomic-based prediction of clinical behavior in adult acute lymphoblastic leukemia. Cancer 2006;106:1587–94. 53. Petricoin EF, Ardekani AM, Hitt BA, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359:572–7. 54. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 2004;20: 777–85. 55. Baggerly KA, Morris JS, Edmonson SR, Coombes KR. Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. J Natl Cancer Inst 2005;97: 307–9. 56. Poon TC. Opportunities and limitations of SELDI-TOF-MS in biomedical research: practical advices. Expert Rev Proteomics 2007;4:51–65.

326 57. Lin Z, Jenson SD, Lim MS, Elenitoba-Johnson KS. Application of SELDI-TOF mass spectrometry for the identification of differentially expressed proteins in transformed follicular lymphoma. Mod Pathol 2004;17:670–8. 58. Miguet L, Bogumil R, Decloquement P, et al. Discovery and identification of potential biomarkers in a prospective study of chronic lymphoid malignancies using SELDI-TOF-MS. J Proteome Res 2006;5:2258–69. 59. Cheng A-J, Chen L-C, Chien K-Y, et al. Oral cancer plasma tumor marker identified with bead-based affinity-fractionated proteomic technology. Clin Chem 2005;51: 2236–44.

R.D. Unwin, A.D. Whetton 60. Kristiansson MH, Bhat VB, Babu IR, Wishnok JS, Tannenbaum SR. Comparative Time-Dependent Analysis of Potential Inflammation Biomarkers in Lymphoma-Bearing SJL Mice. J Proteome Res 2007;6:1735–44. 61. Corthals GL, Wasinger VC, Hochstrasser DF, Sanchez JC. The dynamic range of protein expression: a challenge for proteomic research. Electrophoresis 2000;21: 1104–15. 62. Yates 3rd JR, Gilchrist A, Howell KE, Bergeron JJ. Proteomics of organelles and large cellular structures. Nat Rev Mol Cell Biol 2005;6:702–14.