Resource
Yeast Proteome Dynamics from Single Cell Imaging and Automated Analysis Graphical Abstract
Authors Yolanda T. Chong, Judice L.Y. Koh, ..., Charles Boone, Brenda J. Andrews
Correspondence
[email protected]
In Brief High-content, high-throughput imaging and automated analysis enables quantification of the abundance and localization of the GFP-tagged ORFeome in yeast, including how protein dynamics shift over time and respond to different perturbations.
Highlights d
d
Combined SGA technology, high throughput imaging, and automated analysis using SVM Quantified in vivo abundance and localization of 4,100 GFP fusion proteins
d
Identified proteome changes over time in rapamycin, hydroxyurea, and rpd3D strain
d
Images and abundance and localization data for all screens in CYCLoPs database
Chong et al., 2015, Cell 161, 1413–1424 June 4, 2015 ª2015 Elsevier Inc. http://dx.doi.org/10.1016/j.cell.2015.04.051
Resource Yeast Proteome Dynamics from Single Cell Imaging and Automated Analysis Yolanda T. Chong,1,4,6 Judice L.Y. Koh,1,4,7 Helena Friesen,1 Kaluarachchi Duffy,1,2 Michael J. Cox,1,2 Alan Moses,3 Jason Moffat,1,2,5 Charles Boone,1,2,5 and Brenda J. Andrews1,2,5,* 1The
Donnelly Centre of Molecular Genetics 3Department of Cell and Systems Biology University of Toronto, Toronto, ON M5S3E1, Canada 4Co-first author 5Co-senior author 6Present address: Cellular Pharmacology, Discovery Sciences, Janssen Pharmaceutical Companies, Johnson & Johnson, 30 Turnhoutseweg, Beerse 2340, Belgium 7Present address: Cancer Therapeutics and Stratified Oncology, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore *Correspondence:
[email protected] http://dx.doi.org/10.1016/j.cell.2015.04.051 2Department
SUMMARY
Proteomics has proved invaluable in generating large-scale quantitative data; however, the development of systems approaches for examining the proteome in vivo has lagged behind. To evaluate protein abundance and localization on a proteome scale, we exploited the yeast GFP-fusion collection in a pipeline combining automated genetics, high-throughput microscopy, and computational feature analysis. We developed an ensemble of binary classifiers to generate localization data from single-cell measurements and constructed maps of 3,000 proteins connected to 16 localization classes. To survey proteome dynamics in response to different chemical and genetic stimuli, we measure proteome-wide abundance and localization and identified changes over time. We analyzed >20 million cells to identify dynamic proteins that redistribute among multiple localizations in hydroxyurea, rapamycin, and in an rpd3D background. Because our localization and abundance data are quantitative, they provide the opportunity for many types of comparative studies, single cell analyses, modeling, and prediction.
INTRODUCTION The regulation of most biological processes involves changes in protein abundance and localization. However, because of an absence of quantitative data describing global protein localization and abundance, our understanding of proteome responses to perturbations remains rudimentary. Methods for rapid acquisition of in vivo quantitative data about the dynamic proteome are needed to enable the identification of members of protein complexes and coregulated proteins, give global information
about post-translational modifications, protein stability and kinetics, and suggest dependency relationships. One of the most comprehensive reagent sets available for proteome-wide surveys of cell biological phenotypes such as protein localization is the budding yeast open reading frame (ORF)-GFP collection (Huh et al., 2003). Each strain in the ORF-GFP collection carries a unique fusion gene construct in which an ORF is fused to the GFP gene, generating a full-length protein with a COOH-terminus GFP fusion, whose expression is driven by the endogenous ORF promoter. The ORF-GFP collection contains 4,100 strains (of a possible 5,797; Saccharomyces Genome Database http://www.yeastgenome.org/) that gave a GFP signal above background in standard growth conditions and for the majority of proteins, the GFP moiety appears to have little effect on protein function (Huh et al., 2003). The ORF-GFP fusion collection has been analyzed using widefield microscopy and manual inspection to assign 70% of the proteome to subcellular locations (Huh et al., 2003). Additional studies have examined the collection in various conditions and several have quantified abundance of the GFP-tagged proteins (Benanti et al., 2007; Breker et al., 2013; Mazumder et al., 2013), but all have relied on manual assessment of localization, which is non-quantitative and suboptimal since assignments made by different individuals can be subject to biases and often show poor agreement (for review, see Chong et al., 2012). A few recent studies have used automated approaches to analyze protein localization in yeast. Unsupervised clustering of subcellular localization patterns identified colocalized proteins (Handfield et al., 2013; Loo et al., 2014) and a classification approach assessed a mixture of spatial patterns to identify changes in protein localization in a microchemostat array (De´nervaud et al., 2013). However, while these approaches are information-rich, they have not attempted to computationally assign a specific localization for each protein and thus are of limited use to biologists. Here, we describe a high-content screening and machine learning approach to measure protein abundance and localization changes in a systematic and quantitative fashion on a genome scale. Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc. 1413
RESULTS AND DISCUSSION Automated Assessment of Protein Abundance and Localization Our goal was to create an automated platform to quantify the abundance and localization of the 4,100 visible fusion proteins in the ORF-GFP collection. To do this efficiently, we used yeast synthetic genetic array (SGA) technology (Tong et al., 2001) to introduce a cytosolic red fluorescent protein (RFP), a marker of cell boundaries, into the ORF-GFP array (a list of all strains assayed is shown in Table S1). SGA technology allows introduction of any mutation of interest into the array, thus enabling us to combine high throughput genetics and imaging. Our experimental pipeline also includes high-throughput microscopy, automated image analysis, and pattern classification through machine learning. To develop and validate our screening protocol, we imaged our wild-type collections during early log-phase growth, capturing data for >200 cells on average for each individual strain (3 million cells total, n = 3). Cell boundaries were defined in the red channel and features from both the red and the green channel were extracted for each cell using CellProfiler (Carpenter et al., 2006) (Experimental Procedures; Supplemental Experimental Procedures; CellProfiler pipeline available at http:// cyclops.ccbr.utoronto.ca/DOWNLOAD/Download.html). As an initial benchmark of the quality of our data, we extrapolated protein abundance from the mean GFP intensity (Ig) for each protein in the collection (Table S1A–S1D). Our measurements agreed with other studies that estimated protein abundance in strains from the original ORF-GFP collection using flow cytometry (r 0.9 for 2,172 proteins; Figure S1A) (Newman et al., 2006), western blot analysis of TAP-tagged strains (r 0.64 for 2,595 proteins; Figure S1B) (Ghaemmaghami et al., 2003), and mass spectrometry (r 0.66 for 2,684 proteins; Figure S1C) (Kulak et al., 2014), and GFP intensities across all of our biological replicates were highly correlated (r > 0.98; Figure S1D). These findings validate our image acquisition and feature extraction pipelines for studying global protein abundance at the single cell level. To extract information about protein localization patterns from our images, we adopted a machine learning approach. Individual cells were segmented based on their cytosolic RFP and texture measurements from both the RFP and GFP channels were extracted. These measurements, from a representative subset of strains known to localize to single subcellular compartments (Huh et al., 2003), were used to generate training sets for automated classification of remaining proteins. To discriminate among multiple compartments, we found that we needed an ensemble of 60 binary classifiers, which we developed from training sets that comprised >70,000 cell instances hand-picked from images from one of our wild-type screens (Data S1; Supplemental Experimental Procedures). We were able to computationally distinguish among 16 subcellular compartments (Supplemental Experimental Procedures); examples of cells with an ORF-GFP corresponding to each localization are shown in Figure 1A. The classifiers were used to generate a set of 16 quantitative localization scores (LOC scores) for each protein that reflects the proportion of cells assigned to each of the 16 defined subcellular compartments under a given condition 1414 Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc.
(Experimental Procedures; Table S2A). We assessed the performance of our classifiers in several ways. First, we compared our computationally derived pattern assignments to visually assigned localization annotations (Huh et al., 2003) and found 94% agreement among the set of 1,097 proteins assigned to a single compartment by both methods (Figure S1E, upper). Second, we assessed localization of proteins assigned to one or more of seven major localization classes covering 91% of the proteome by comparison to the manual assessment of the yeast ORF-GFP collection (Huh et al., 2003) and found that our computational approach achieved >70% in recall and precision (Figures S1E, lower, and S1F). We explored instances in which our classifier ensemble incorrectly assigned a localization. We found that misclassified proteins tended to be lower abundance (median Ig for correctly classified proteins = 1.1 3 10 3, median Ig for incorrectly classified proteins = 6.6 3 10 4; p < 2.2 3 10 16, Wilcoxon rank sum test). Most false positives also reflected mis-assignment of localizations to compartments that look highly similar. For example, the vacuole and nucleus are both large round compartments and our computational approach misclassified a fraction of vacuolar proteins as localizing to the nucleus or to both the nucleus and the vacuole (35% of the proteins assigned to the vacuole in Huh et al., 2003) (Figure S1E, lower). In contrast, our automated imaging approach had high precision for nucleus (70%) with only 0.7% of our nuclear proteins assigned to the vacuole in Huh et al., 2003). Likewise, small punctate compartments are difficult to distinguish computationally and we observed occasional proteins misclassified as cortical patches, spindle pole, peroxisome, or nucleolus. The original visual assignments of the ORF-GFP collection were validated using secondary assays for 700 proteins whose localization was difficult to determine (Huh et al., 2003); our computational approach achieved a relatively high level of precision and recall based on a single experiment using automated analysis (Figures S1E and S1F). To display the results from our genome-wide screens in a way that would facilitate the visualization of functional information about the proteome, LOC scores, and protein abundance information were extrapolated from GFP signals to generate an ‘‘abundance localization map’’ or ALM (Figure 1B; ALMs 1–18 in Data S2). The wild-type ALM consists of 2,834 proteins (nodes) connected to one or more of 16 localization classes (hubs). Because of auto-fluorescence of yeast cells, our approach could not distinguish between cells with protein of very low abundance and a complete absence of the protein (see Experimental Procedures); 646 proteins with abundance below our threshold are not included in the ALM (Table S1) as well as 664 proteins that were below the threshold for significant LOC scores. In addition to localization information, absolute protein abundance measurements are represented in the ALM by node color (Figure 1B). A large cohort of proteins (1,029) localized to both the cytoplasm and the nucleus; in the ALM, these are divided into two groups: proteins whose localization is predominantly cytosolic and proteins whose localization is predominantly nuclear. By representing these data as an ALM, we are able to visualize proteins with a common localization. The expanded views of
A
Bud
Budneck
Budsite
Cell Periphery
Cortical Patches
Cytoplasm
ER
Endosome
Golgi
Mitochondria
Nuclear Periphery
Nucleolus
Nucleus
Peroxisomes
Spindle Pole
Vacuole/Vac Memb
B
Intensity Molecules Log2 (Ig) per cell
YKL075C GSG1
NAT3 VAM7
CEX1
TTI1
DPH2 YKL215C
YMR102C
RTG2
STB6
YIR014W
RTC5 MKS1
SLF1
SGN1
YBP1 TSR1
ACF4 BRR1
GDT1
PRP2 P
YDR239C
DOM34 SPT2
MUK1
PRP45
HDA1 SAL1
RMT2
NGR1
PRP40
YVH1
YGR203W
MSN2
CUE5
ATG16 QNS1
PCL7
ARD1 YOL098C
MUD1
CDC123
PTP3 APN1
HAP5
IPI1
BUL2
PIB1
CWC22
FIS1 RIM20
YAP3
KTR4
TAF8
AIM20
RPP2A
SPP41 CYK3
OCA1
GYL1 GIM5
MTR4
MGE1
PHO2
FKH2
FMN1
YDR266C
CLF1
BER1 KRE5 TRM9 RAV1
OPI10
BRN1
HOS3
TFB3 PDR17
SMC5
TFC3
YGR017W
SET1
SRB7 RPN14
YLR063W MET10
LYS14
SKP1
HAL9
RTR2 SDS22 RAD1
SCJ1
CAB2
SHG1
YPK2
UPF3
DDI1
GRX2 TYE7 SCC2
CKI1
UBP15
APC5
NTC20
Y U2 YJ
PPH22
VTS1
SPT20
HST2 MRP8
KOG1
EAF5
BDP1
UFD4 CTS2
SRB5
PHA2 DOT6
SKI8
PSK2 K
MDJ2
LSM8
GID8
JLP2 P
CAB4
HRK1
SNF5 ARP2
ECM21
HSF1
YTA6 TRM7
IRC24 HUA1
CDD1
MAF1
MED11
ZDS2 EPL1 ADD37
BST1 CRZ1
CDC36
HEM12
HEM13
SIC1
CDC7
TRM44
RTG1
MTC2
ERG28
HSM3
PRP9
HDA3
JHD2
SPT3
YRB2 ECM14
PUF4 DTD1 ECM25
PDR1
YPR022C
ISY1
AAC3 SNF4
SOM1
VID24
OTU2
MSB1
ORC6
HIF1 TAF3
MCD1
SPP382
RPR2 R
ATG18
KIC1
NOB1
PRS4 S TAH1
WHI4
REG1 TFC8
LSM12 RHB1 BUD14
PDC2
CUL3
URE2 LPP1
PIP2 BUD17
SUT1
SRX1
PLB3 SKS1
PRP3
VPS71
PML1
AFT1
DOG2
RTT101 YGR210C
TRM11
CCL1
RNR2
DFR1 YMR259C
PRP22
SUA5
UBX4
SIP1
GIM4 SNP1
NPY1
YNL040W GCD6
NYV1 SDS24 RPN10
DBF20 CSE2
ICY2
YAT2
KAP114
RKM3
OTU1 SNM1
STB4
GCD2
MAK3
SAP155 SRL1
AMD1 ELP2
XKS1
SLH1
SDS3 HFI1
YOL022C
DYN1
ROX1 YBP2
ECM17
YFL034W
PLP2
SIZ1
DCD1 HRR25 NMA1
SLT2
ECM32
YRB30
MPM1
PSK1
SRS2
KCS1
CDC23
GRE2
CNS1
MAK10
PAN3
K S1 KS
SOK1
GDS1 SHE2
YGL046W
PGM3 TFC1
LSM3
JJJ3 J
YJR116W
IWR1
POC4
NPC2
RPH1
YAP7
SPC110
TFB5 SYH1 RPS28B
BEM4
HSP42
GDB1
GLN3
SCP1
SOK2
MSH6
SYF1
XDJ1 PPH3
MET4
RPL16A
RGT1
NPL4
SNT1 RIB4 APM2
PBS2 S
RAD3
SAP185
YDC1
PMD1
PEX19 MMR1
RIM11
MOD5 RGM1
YGR250C
BNA3
RPS29A
CIN2
SDC1
YHC1
DOT5
BUL1
APC1
YPL009C NFT1 MEU1
SPE2 E
RPS0A 0
SET3
HUL5
NAS6 RAV2
RNT1
URK1
PRP21
CRH1
MHP1 MET18
DPB3 TEA1
TRM12
RIO2
NPR1
CSM1
MTQ2
IES4
CUP9
PYK2 SSN2
IMD2
HST3 BDF2 CDC25 RGA2
VPS72
FZF1 YPS7
YBR028C OCA2 CHS7
COX12
YTH1 MRI1
NAT5
Utp4
-9.5
299
Utp5
-9.2
413
Utp8
-9.1
418
Utp9
-9.2
408
Utp10
-8.7
630
Utp15
-9.8
224
Nan1
-9.1
415
Nup82
-9.6
274
Nup159
-9.5
312
Gle1
-10.7
75
HSH49
DEG1
RUB1
SMY2
UBA4
SWC5
SEN1
COX23
SSK2 K
YJR085C
TRL1
PRS3
PRP28
YCG1 PAP2
KIN1
SKN7
HDA2
MED2
SNF11
YHR122W
CCR4 RPS10A
RNA14 YPK9 YNL311C GCN3
CAF40
UBC8 SPT8
PAT1
FSH2
SNT309
THG1 PCL6
GYP5
IMH1
SMC6
DPH1
PGD1
PPH21 TFB2
NMD4
GSF2
UBP3
YML096W
SWP82
HRQ1
NTH1
RTT102
ZRG8
YDR049W COX19
PCS60
ADI1 YNR014W
SNF12
CST6
VHS1
TFB1
NIT3
ULS1
CUS2
PSY SY4
KTR1
VTC3
PPM2
SWI4
RCN2
OCA5
TFB4
SWI3
YLR225C BEM3
ATG2
CMK2
IRE1
RTG3
MAG2
PRP19
KAR4 HIT1
YDR539W
RAD26
TFC6
PFS2
YMR1
CAC2
MMS21
CDC43
GPB1
GCD10
AIR2
DIP2
AFG2
MPH1
CMP2 GFD1 UBC5
UBP8 VID30
CAF16
LTV1 SSD1
HIR2
CWC27
PCF11
ZTA1
PBP1
GIS1
NAS2
UFD2
GBP2
YBR238C MUP3
HER1
SMD1
YAE1
YKL069W
YPT32
DUS3
MOT3
TIP41 MDM20
DSE4
ARG80
RFC5
RXT2
GDA1
BRF1
IES2
GDE1 SMF3
TRI1
ROX3
MPE1
PRP42 MOB1 TAN1
YJL070C
ERG12
NMD5
TAF1
LDB19 RPS18A
YML079W
PSP2 P
AOS1
MVB12
NDL1 PIN4
SRB4 B
RUP1
YFR006W TSL1
MTR10
DOA1
PIN3
SWI6
NTH2
DBF2
MIG2
HBS1
DUG2
ARO7
YOR131C NST1
MBP1
GCN5
HSP31
KEL3
NPL6
MIG1
RAD50
RPN7
EAF3
YGL039W VID27
CET1
FYV6
SSL2
PAP1 HAC1
YKE2 EDC2
LSM1
SUA7 HOS2
PAH1
TMA108
WRS R 1
LEU3
CUP1-2
VHR1
SMB1
HSH155
SCD5 SKY1
PRP11
IKI1
MDH3
RKM4 INO1
TRZ1 HMS2
YLR118C
SIW14 YOR289W
PBP4 P
YLR126C
YOR296W
YKL023W
PRP6 PRP5
DUS1
TOF1
CDC16
ERP5 P
GSH1 POP1
NGL2
RCY1
GIS4 PSP1
GLR1
STR2
NAM7
RTT106 CAB1
SPE4 E
DIG1
CTI6
MOG1
SSK1
BPL1
RFA1
BAS1 FHL1
RFC1
YGR117C
MET7
CTF4
UME6
MCM1
YBR197C
ECM30
EAF6
APE2
PHO23
RAD2
CBF1
IRA2 AHC2 SRP1
FRA2
MPT5
YMR244C-A
NBA1
RDH54
PHO91
PAN5
PDR3
YBL010C
LSM4
MCA1 DPH5
LOT6
MED1
AZF1
ALY1
POL31
RNQ1
TAF6
IOC4
TPM2 YBL104C
RRP1
YJL012C-A
ALG14
GCR1
YJR011C
RFC3 RIX1
CSI2 CWC25
MDM35 CUE3
RAD23
PRP39
YNL143C RRD1 LSG1
YGR151C
RBS1
URM1
YMR196W YMR074C
STB3
TFA1
URA6
SGT1
YPR045C STE7 YNL313C
SER33
YHL039W
EDC1
ABD1 MSH3
SKI3
DOS2
MMS2
TMA20
GRE3
SQS1
DST1
YGR093W
YPR115W
ART5
GCN2 PRB1
UBA2
TRP1
YOR262W
APD1
SGT2
RKM1
SNU71
HIS2
AKL1
POP5
YGR205W
LSB1
TIF4632
YHR009C YNL157W
RAD27
POL12
YLR218C YDR170W-A LSM2
ASF1
DAL81
BUR2 MFT1
JJJ1
MOT2
S E2 SS E
MOT1
EPS P 1
WTM2
MSL5
PFD1
RKI1
GGA1 STE5
DET1
YPL191C
TDH1
IST3
RET1 PAC10
EAP1
TRM3
KSP1
GYP7
FOL1
RPC17
BUD13
CWC24
YMR178W TYR1
NOT3
GAT1
DYS1
PGM2
YGL079W
LSM7
SAS5
XPT1
NGG1
DHH1
SPB4 B RPD3
NRK1
PRS2
ELP3
YLR108C
PTA1
LUC7
Nucleus
YBR242W VPS64 DUG1
MSB3
IKI3
OCA6
UFD1
SAP A 1
MRS6 YPL067C
MAP1
TPS1
MYO2
NSE3
CNA1 YMR315W UTR4
GIR2 UBP7
STE11
PUS7
RAD52
YBR225W
AIM7
CTK3 RTT107
ADD66 TMA22
SMM1
YDR186C
TFC4 LAP2
IML2 POP2
RBG1
NAT1
YLR345W YGR111W
YDR391C
BNI1
KIN28 YIR035C
YGL101W
SNF8
ATG13
PRS5 YOR006C
SNF1
MPP6
GSP2
RIM101 RPS18B
UBC1
SGF11
PRY1
GRX3 MRC1
ACF2 YML081W HYR1
YFR017C
YNL247W
REI1
STR3 CUP1-1
PUF3 BRE5
SCH9
RTC3
GLC8
CPR6
YJL055W
TOP2 HPR1
GLO2
STE12
ROD1
SGF73
YOL134C
SRP68
YOR214C
RPC53
ARP7
TAF9
SMX3 ISW1
PTI1 RSC2
RFA3
IES1
BUR6
LEA1
TFG1
REF2
NOP13
UBP2 LSB5
DSK2
IES3
UGP1
CHZ1
CWP1 RRP43
NOG2
ISW2
HNT2 YNG2
YER079W
MSO1
ABZ1
BUD27
MXR1
GLN4
RMD1 BCK1
MDY2 DAK1
YAR1
HHO1
BRE1
SMC4
GPM3
MED4 POL5 YPR172W
MED7
YBR139W
RAD9
ATG1
YPL225W
LAP4
RPE1
YOL057W
MUQ1
DDR48
PHO81
IFH1
NRD1
BUD31 SFH1
CYC1
HPA3
REH1
HEM2 RBG2
SER3 R NNT1 SES1
PRO3 TPS3
STH1
YLR455W MET13
MDR1
YER163C
SUB1 NMA2
Cytoplasm
VIP1
FEN2
KTI12
ARF2 HTS1
GET3
TRM112
FES1
ADE16
RBK1
GCS1
CDC37
GVP36 TMA7
REE1 HIR3 PRM5
VPS9
YBL028C
RFA2 RFC4
ESS S 1 HPC2
GAL11 YHB1
CTK1 BNA6
TMA17 PCM1
ABF1 DAT1 SFH5
ECM15 NRP1
YIL161W
YLR143W
POP8
YEL014C
CBC2
SNF2
TAE1
YGR269W
ITC1 YER084W
YER152C
TPA1 POL2
YAR009C VPS75
CDC8
PUS1 RVB1
HSL7
YCR090C
SWC4 VID22 MVD1 MAD2 POP3
YHL008C YKR023W
SMX2
TPK2
FSH1
SNU114
PRD1 ROM2
ADA2
RPB8
PPT1
RFC2
SGV1
YCK2 HSN1
KAP120
YKL169C
RPL19B
NCB2
YCS4 FRS1
CGR1
PAN6
SKT5
SFL1 NMD3
SCD6 YDL144C
GCD1
STO1
SFP1
COS6
RPS17A
LTP1
HUG1
YPD1
YER134C PUF2 LTE1
HOM3 SDO1
PRO1
CCA1
GCN1
STF2
YNK1
NPT1
RPT4
POL1
YPT7
ADH6
RPL1A
RSC8
SPT4 T
YDL085C-A
CDC53
RPB9
STI1 CSE1
PTC3 GET4 RNA1
RAI1
RPL23A
PUT3
SIP5
CFT2
RPL12A QRI1
RPL18B IZH1
JIP5 SYC1
TIM17
YPL229W
HAM1
THI7 OCA4
YBL055C
RPL21A
CAM1
LAP3
SSA4 A
RPT6
SMC2
YEL047C
INP51
HTL1
SUR1
ECM29
SAP30 SPO12
SAK1 RPL14A
TRM82
INM1
GDI1
SPT15
REB1
SET2
YNL010W
EFT1
CYS4
WWM1
RAT1 CKA1
YML082W CTR9
NAF1 RBA50 RPS6B RRP12
A 1 AAH NUP116
YDR250C
YSA1
GIM3
MRT4 SMC1 FPR3 R
GRX1 KES1
ATP12
FRE6
SIT1
RAM1
TRP4
THP2
RIB2 YGR126W
RPL34B
CFT1
GUS1
ERG10 BAR1
TPK3
PHO13 SKI2
MRN1
ATP5
HRB1
UBC4
RSF2
RPO31
DSS4 POB3
THI80
TAH11
NCS2
BCP1
NHP6B
RGR1
TOS1
HOS4
PTC2 P E1 PS
ATO3
DUG3
MES1
NUG1
KTI11 RPL5 GSY2 Y CLU1
BUD20 RPL40B
RAS1
PUB1
RPS4B
WAR1 CTR86
ARO1
MDS3
AGA1
YLR179C
AHA1
RPL6A
KEM1
HHF1
PSY SY2
TBF1 ASG1
GAL83
ATG3
IMD3
LHP1 ATG26 TPS2 RPL11B
KAE1
NMA111
NOG1
NCL1 GPX2
RPS19B
SPT7
YOR238W TPD3
THR1
ARB1
TIF1
CCS1
URH1
RPN12
KAP104
LOT5
RPB10
RRP4 P
NAT4 ZWF1
FPR2 R
SSZ1
YLR132C
YCP4 REX2 X RPL17B CWC21
NOC3
TFA2
WTM1
SBP1 MCK1
IDS2
RTT10
RPS9B
YBR096W
HSP12 SOL3
RPL36B
DPS1
TAL1
IOC2
CAR2
YPL108W
TRP3
ERG13
ARP5
SHQ1
SPT16
NBP35
GYP1
ECM1 FAP7
YHR131C RPL33B
RPS10B
RPS30B SBE22 YPT31
PYC1
CPR5
FRA1
YGR219W
PRE8
URA5 WHI5
IDI1 RIB1
TRM8 TIF11
YBL036C
UTR1
YAP1
TIF4631 FUN12
RPL15B
RPO21
FCP1
ATX1
YOR283W
IES5
PMI40
ALA1
NUT2
NOP53
SGF29
HOG1
PDE2
RPS6A 6 RPO41
RPN8
SOH1
RPL21B
YMR226C
GCN20
RIB3
LYS20 YNL190W
YCR018C-A UBI4
ARO10 IMD4
OLA1
HRP1
STE2
SPL2
LYS1 CDC24 PRP8
RPL9A
HTA2 PLP1 FAS2
PBI2 ACE2
TYS1 RPL35B
RPS16A
YRA1
MCM6
YML108W
CYS3 CDC39
RPL19A
RXT3
YPL260W
SCW4 W
YMR291W
BNA1
RLI1 TMA16 RPS24B
RPN1
MCM7
UBP12
RPL40A
SUI2
FCY1
PRT1
YLR363W-A
RPL22B
FPR1
MSB2
YKL102C RPL14B
DRE2
MTD1
GPD2
SSB1
BCY1
YLR301W
NHP6A
HPT1
HTB2
MSN5
CKS1
RDI1
HXK1
RIX7
RPL43B YHR132W-A
YTA7
YKR043C CNB1
SIN4
WHI3 SVF1
YOR051C
DLS1
MCM4
RPS26B
DAS2
CDC28
YNL155W
RPS0B SEM1
TRM5
GIS2 ARG1
LYS21 RAD6
YNL108C
RPS25B YBR090C
SRB6
RPS7B
SNO1
ADE5,7
PGK1
CDC34
HTB1
TUB4 DIS3
YKR096W
RPL37B
YLR257W
BRR2 R
YNL058C
CDC60 APT1
YOR302W
PBY1
RPL36A
YGR001C YNL134C ZUO1
YCR051W
MET22
SDS23 RPG1
UBA1
SAM4 URA2
PGM1 YNL035C
CKB2
SNA3 SSE1 RPL20B
PKH2
RRP17
NUT1
YJL169W
DDP1
RPL42A
TOS9
GRS1
POP6 ADO1
RPL2A
TUB1
RPS24A
ENO1
SCL1
CKB1
SUM1 YLL032C
TIF5
RPN2
YMR122W-A
PRE5
YFL006W PMU1 HOR2
SUP45
ADE17
RAS2 TRP2 GUK1 PGI1 S F2 SY TRM10
YPT52
BMH1
RPN6
TOM1
ACB1
HXK2
RPL34A
DCS1 BUD32
SRO9
PRS1
RPL4B
YIH1
TCI1
RPL24A
MSH2
KAP122
HEM3 THO2 PRE9
ADE8
UBC9 CPR7
SUP35
TIF6
ARO3
PRP38
ARO9 CAF20 RPL20A
YLR287C
RPS21A
RSR1 RPS17B
TIF3
RRP40
RRP45
SYN8
CDC33
YER156C NAM8
VPS74
RPL7B RPL27B
MCM2
MIC17
LYS2
SBA1
SMC3
URA4
YCK1
RPS4A S4 4
YPR1
ADE6 RPS28A
YSH1
TRX1
TIF2
RPN11
SLX9
SOD1 NEW1
TSR2
NPL3
AAP1
VPS21
RPS8B
DPB2 CDC9 VTC1
LIA1
FUN30 NPA3
HIS7
ADE1
TMA19
RPS27A RPS9A S9
RPL38
MED6
URA7 AIM29 RPL11A
IOC3
UBC13
CRM1 SWI1
YKR018C
NAB2
SPP381
HMT1
IPP1
ACS2
ADE2 SET5
RRP6
RPN5
FUS3
PRE10 HSC82
STM1
SFA1
SAM1
VAS1
THS1
PRE6
PFY1 APA1
FAS1
RRP42
ARG4
NIP1 CKA2 YKL071W
HCH1
PRO2
RPO26
HOM6
PYC2 TRX2
EGD1
CSL4
RPN9
GLY1
YDL063C
SEC14
GPM1 RPL41A
RPS29B
APT2 RPL37A
DEF1 DUR1,2
CDC15
YER034W
SER1
ILS1
NOT5 UGA1 CPR1
SER2 R ARO4
INO80
ALB1
RNR4
MFA1
BAT2
RPL6B
TRP5
NIP7
SAM2
TKL1
GPH1
ELF1
KAP123
PDS5
LYS9
CIC1 BDF1
YOR385W PRE4 E
RPP1A
BDH1 MAK21 ARX1
RPL26A EGD2
SPT6 T CDC55 RPL41B
RPB2
HRT2 CPA1 OYE2
YBR085C-A
HSP82
TSA1
NSA2
PPX1
GDH2
RPS22A YDL124W
MBF1
YIL091C YIL108W RPL13B
ADH5
BRX R 1 RPS22B RHO2
RPP1B
ADK1
TMA23
RPS1B
RPL1B
SOL1
RRP5
PXR1
THR4
FOL3 RPB11
SGD1 SML1 TPI1 HOM2
UTP13 ADE12
FPR4 R
MAK5
UTP6
MEF1 MAP2
RPL43A ASN2
TDH2 RPP2B
SIS1 HNT1 CYC8
BFR2 R
HSP104
RPL7A
RPS19A LEU1
SRP40 ENP1
DBP8
URA1 ASN1
PFK2 DBP6
ARO8 DBP7 UTP30 ARC1
YCR016W NCE103
CAR1
ZPR1 PUF6
BMH2 DLD3
RRT14
YCR087C-A
HGH1
RPS21B
HTZ1
RPC19
RRP46 GDH1 PAF1
TMT1
RPL10
LEO1
NOP7 RSA3
RPB4 B BUD21 UTP14
UTP18 SMT3 ARG3 RPL24B
RPS26A
UTP22
RPS23A NOP4
COX17
TRR1
RCL1
SSB2 B SHM2 RSC9 SSF2
ALD6
YGR251W SRM1 RPL26B NSA1 RPS27B URB1 TAF12
REX4 X
NOC2
MAK11
PSA1
UBP10
NOP15
RTF1
TEF4
SSF1
RPS8A S8
EFG1
RPC40
DRS1
RPL3 RRP8 CHD1 HIS4 DBP3 SSA1 CDC73
FYV7
RPC37 ECM16
RPC82
PFK1
RTR1
RPL29 RPL9B
GLK1
MKT1
ESF2 F
RPB3
Budneck
ADE3
RPL12B
RPS25A
EFT2
UTP4
NAN1
NOC4
RSC58 URB2 TOF2 UTP8
UTP15 RRP9 ESF1 TOP1 RRP15 FAL1
UTP5
YGR283C
NOP16 NOP2 YTM1
KAR3
UTP10
MTR3 SPT5 T
UTP9
NOP12
BUB1 TUP1
NOP8 SKI6
MYO1
BNI5
Nucleolus
MLC2
ELM1
BNR1
CDC14
FOB1 MOB2 SIS2
NAP1
RHR2
FAA3
YPL150W
GLN1
CSR1 YLR407W VHS2
SST2 T TEF1
LSB3 ASK1
RPA34 HAA1 MET6
ZDS1 YEH2
NUF2
ADE4
FPK1
GAR1
GND1 PRP43 RTS1
GIP3
AIM44
RPL8A
GIN4 DBP2
RPA12
KIN4
RRP7
SPC105
YPK1 SOF1
MUP1 MCH1 ASK10 AME1
ALR1
MRPL6
NOP10
NKP2
HMO1
SLN1
Spindle Pole
YEF3 VTH1
TEF2 ALY2 OPY1
RRB1 NSG1 BEM2 SIR3 EMP47 PCT1 ITR2 SPO14 GIP4 TRM1
RBL2 SAC7 SIR2 YOR304C-A
MDM1 CTR1
OPI1 CEP3
CAF120
SEC2
AHP1
AVL9 ENT4
ACK1 RGD1
YDL025C
JSN1
TDH3
MSS4
WHI2
BNI4
PXL1
STE20
SSA2 A CDC11 CDC10
CLA4
KIN2 YVC1 SHS1 RCK2 YKR045C
BUD3 TAT1 YLR422W CPA2 AXL2
BUD22 SPC24
PWP1 PWP2 P
RPA43
NSL1
TID3
RPS14B
DAD4
NET1
DAD2
DSN1 RPA49
NOP9
NOP6 MRPL24 CBF2
TSC13 RPA14
SLK19
SPC25 STU2
HCA4
YJL018W
NVJ1
KRI1
NHP2
DAD1 ZIM17
NOP58 NOP56
NSR1
ERG6
CBF5
MDG1
RPA135
PTK2
YLR326W
RPA190
BBP1 MDH2 BUD2
YDR034W-B TGL4
YFR016C
Cell Periphery
Budsite
YKL105C
TAT2
YDL173W
CAB3
PKC1
OSH6
YPP1
BOR1
ERG25
YJL171C BSD2
MRPL38
YOR285W
DAP1
BIG1
ARE1
VPS13 ACC1
RCR1
ANT1
NUD1
CNM67
DAD3
BIM1
SPC97
SPC98 SPC29
Peroxisome
TEM1
MTW1
SPC34
TGL3
PEX3 X
PEX5 X
BIK1
AXL1
DFG5 RHO5
CDC48
SQT1 SCS7
SRO7
UBP1 ZRT2 YLR194C
PIS1
BUD4 KAP95
TYW1
AQR1 SRP14
PIR1
TPO3
GAS3 OPI3 MNR2
SCS3
VCX1
DNF3
FAA1 SWP1 PHO90
CHO2 HLJ1 S P1 SY
ILV2 YLR050C
YDR090C SKG1
ALG13
SAC1 PEX14 APM4
PHS1
EMC6
PEX13 FLC1
AVT7
VPH2
BAP2
ISU2 ATP3
RER1 HXT2 PEX25
APL1
YGL140C YGL010W
SEC66
PEA2
SCP160
YOP1
APL3 VTC2
ACO2 ATP1
TPN1
ITR1 ECM40
SKG6
VMA22 SRP21
DCP1
PDR16 YMR124W PMT1
ATF1 GCV2 PEX10
RTN1
ATP14 SMY1 MRH1
GCV3
LSC2
BEM1
MAE1
ENT2
LYS4
FAR3
HXT6 SRP72 SEC3
PEX1 PEX17
CAB5 YAP1802 GCV1 ACP1
ENT1
AVT3
PMT2
EPT1 BOI2
YOR1
LAS17
AIM45
EFR3 R
TIM44 YGL108C
YAP1801 YEH1
SEC5 MDN1
YDL012C
PEX12
BUD6
ENA5 LCB4
ENA1
PST1
DGK1
PEX6 X
VHT1
ALD4
ISU1
QCR7
USA1 BOI1 VPS55
Bud
FET3
SNQ2 FMP27
RAX2 YNL146W
ATR1 EGT2
FLC3
ENA2 CRP1 GNP1
YER053C-A
YDR348C YBL029C-A A YGR266W
FAR7
PDR12
GPA2
HIP1 YLR413W END3 YMR295C FAR11
TCB3
UIP5 QDR2 FCY2 FUI1
TCB2 TPO1
SFK1
HXT3
MID2
QDR3
RFS1
MEP1
PDR5 TSC10
HOL1
EXO84
NFS1
ERG27 PRX1
SEC6
SEC15
MIR1
CRN1 KEL1
MCX1
PUT2
ILV1 MSS116
SEC10 ALD5
RGA1
DLD2
EXO70
CTP1 MRPL32
ISD11
YGR235C
SEC7 YDR061W
PDB1 MSS51
YGR026W
NIS1
PEX30
YHM2 NDI1
FTR1 ATP2 MDM38 CHS5 LRG1 YTA12
ER
UFE1 ALG2
TIP20
GTB1
NNF2
ESA1
PBN1
PSD1 FAR8 SLP1
MNS1
HNM1
CYT1
GRX5 IDP1
TIM23
Mitochondria
SEY1
MRP13
AIM42
YMR031C TPO4 IST2 DNF2
YGR130C SLM1
LEM3 ARF3
VPS8 SLA2 BSP1
VPS3
YDR210W
PAM16 TUF1 YOR356W COX5A MIS1 ABF2
TOM20
CBK1
SVL3
COY1
MNP1
YLH47
COX7
SOD2 KGD1
ERG7
YHR003C
MDH1
STF1
COX6 MRPL13
NAT2
MDJ1
SEC20
PDA1
MAS2
CAX4 YJR111C SLY41
FPS1 KIP1
YBR016W
ZRG17
VPS16
MSC3
YLR414C
MRPL8 PEX21
PER1
CEM1
YHR162W
TIM50
CCZ1
YSC85 GUP1
RIM1
GEA2
SKG3
ILV6
NUM1
RVS161
GET2 DCP2
ARC15
MYO3
RSN1
YSY6
RGD2 SHO1
FLC2
MYO5
PPZ1
DSL1
PRK1
AIM5
EDE1 YCR043C
TCB1 UBX2
INP52
MRPL25
LEU9
GLO3
NNF1
SED4
ALG5
SDP1
BZZ1 IST1 SHE3
ABP1
R E1 RS
PUT1
COX14
MCM22
TWF1
WSC2
YMR086W
TOM70
ALO1
YMR253C
SSM4
PAM17 YKL027W
GGA2
ERG26 SEC12
ATP15 SNX41 PIN2
CBP3
RMD9
ALG12
ATP19 RUD3
TNA1
SPC3
ARG8
LEU4
UTP23
DJP1 ENT5 SEC39 TIM54
ALG1
ATP7
CAP1 AUR1
COR1 CLC1
SEC11 CMD1
COX15
YDR367W ARG2
YHR045W W PIL1 MSC2 ALE1 SCT1 CUE4
YDR476C SVP26
YGR031W
Cortical Patches Nuclear Endosome Vacuole/Vacuolar Periphery Membrane
MRPL19
FUM1 RBD2 YMR010W
IRC23
YMR134W
ORM2
NEO1
LAS21
SUR2 R Y U3 YJ
RIC1 GRH1
BCS1 CBP6 KEX2 YGR021W
OSH3
OSH2
YIM1
OXA1
SFB3
GPI8 CYB5
YNR040W YGR207C
BCH1
ARC35
ORM1
EMC4
APL2 ATP11 MRPL50
LAA1
AIM37
PAN1
LAC1
KEX1
SYS1
PST2
RSM25
CYC3
EHT1
SPC1
ADP1
TIM21 NFU1
RVS167
APQ12
NHA1
AIM21
NCE102
CBR1
VRP1
NSG2
DNF1
COG8
MHR1 ARG5,6 MSC6
USO1 ATP17
MEP3
YBR287W
SEC24 ERP3
YPR071W SEC31
YNL181W
ERP2 P
YEL043W YDR307W
NCP1 YGR125W PEX29
PNC1
AIM41 QCR2 YNL122C IMG1 YER078C ABC1
MDL2
COG6 BUD7
GTS1
DPL1
MRPS17
YMR148W
SDH2
SHM1 DID4 AIM38 COX8 MRP1
UPS1
YGR102C NAM9
MBA1
AIM17
YPL107W MRPL10 YJR003C
TOM71
YML6
MTM1
COG5
RSM26
MRPL31
MSF1 MNE1 MSP1
MRPL15 YFH1
NCS6
MRP4 MRPL20
SRV RV2
CWH43 LCB2
UPS2
COG3
AIP1
ARC18
SEC16
GPI14
TMN3
MYO4
ARC40
YJL021C
GWT1
TOS7
FYV4
SLA1
SAC6
MSC7
RCR2
GOT1
NHX1
SFB2
YDL119C MRP7 ARH1
YMR31
MRPL35
IFM1
PIM1
BET3
HOC1
NTE1 CPT1
FMP52
YHR192W
AIM13
ERP4 RP
YPR063C
ODC1 FZO1
LIP5 MRP49
SEN54 RSM7
RSM27 YME1
MSY1
RPT3
COX20
SNX4
PGA2
GPT2
GPI13
SLY1
SWA2 GAS1
TOS6
YLR290C
NCA2
MSE1
TRS130
LCB1
EMC2
ISA1
MRPS35
PET127 VPS5
MRPL11
PET100
AIM24
ISA2 ADH3
PET10 SWS2 S
MNN9
AFG3
MTF2 ECM19 PDX1 SSQ1 CDC50 RTC6
MNN10
FMP21
DSS1 AIM34
MAS1
OM45
VPS17
CYM1
PET123
MRPL51
ATG20
YDL121C FAA4
YNR021W PHO86
PGA3
DCW1 KRE27 CCW14
UTR2
ARF1
ABP140 SBH2
YIP3
MRPL1
AIM36 VPS52 RSM23
MMT1
CIT1 MRPL16
QCR10
AIM9
MRPL39
MAD1
DPM1
KRE1
QCR8
ATM1 SGM1
HEM15
MRPL4
PEP5 P
YML007C-A
CHA1
PNT1
PEP8 P
BFR1 SNL1 WBP1 SEC63
YGR150C
MRPL37 COX18
MRPL23 VPS53
MLP2
SEC13
AAC1
ERV41
01.Oct
MRPL27
SLC1
STT3
COQ9 SSH1
ERV29 SLG1
CBP2
SEN15 RSM24
COX4
HOR7
STE24
ANP1 COQ1
EMC1
BCH2 EMP24
AIM31
MSK1
LYS12
PIC2
MRPL28
VPS35
SEC18
ATG21
ISM1
YLR064W
MIP1 OST1
SNA2
CPD1
ERG5 KRE11
ERG3
MRPL36 YME2
MRPL9
FMP41 TIM11
SMI1 TGL5 MIA40
TVP38 FEN1
CCW12 ELO1
COG1
EHD3
MSH1
YDR056C SOP4 SNX3
PRY3
VAC14
SPF1 YMD8 PKR1 VPS41 IFA38 SHP1
VPS60
TCO89
DIP5 DFM1
ERG11
SUR4 R
PMR1
CTR2
MRL1
EAR1 YLR001C
SEC72
ATG27 ICE2
TIP1
BPT1 GTR2 ERV25 SHR3
GLE2
PIB2
GTR1
CWP2
OLE1
VPS68
ALG9 PMP1
RPM2
Golgi
DNM1 CCP1
YPS30 AFG1 SAC3 YOL107W
VPS54
YHL017W
OMS1
TRS31 HEM1
YLR253W
PHB2 YER077C
PHB1 HSE1
IDH1
OSH1 YGR012W
YMC2 STV1
CAF4
KGD2
YKL063C CST26 DOA4
TCA17 MMF1
LOS1
SFT2
NTA1 VAC8
LAG1
YKL061W
IRC22
NUP170
MEX67
NDE1 SCY1
MEH1 APC11
OST4
MRP20 VMA10
TMS1
GPI11
YBT1
ESBP6 P
MRS4
GEA1
COA1
MAM3 FET5
APM3
YET3
VBA4 V BA4 YDR352 W
HMG1
YLL023C
MDV1
YER128W
YOL092W
MFB1
VPS15
NUP188
VMA4
LSC1
FSF1 RIF1 VMA5
PMP2
PTH2
SIR4
SLM4 ERG1
FTH1 SEC22
COT1
MRPL49
PHO88
TRS20 TUB3
VAM6 IMG2 SYG1
BNA4
BTS1 EMP70 GPI17
YML018C
YPT35
MRP51 RTC1 JAC1
PEP3 P BRO1
YCF1 YKL077W DBP5
MAM33 ARL3
VTA1
RSM22
MVP1
YBR241C
RFU1 YJL062W-A VPH1 TRS120 KIP3
TFP1
AAT1
SEC21
SEC28 SNA4
APM1 KRE33 VMA2 DOP1
IDH2 GYP6 MRPL22
VPS45
MRPS5
VPS1 VPS33
TVP15 NUP159
POL3 CHC1
APL5
Log 2 (Intensity Ig)
NUP82
FRQ1
COP1
APS P 3
AKR1 GLE1 NUP60
NUP1
VRG4
NUP53
TRS23 POM34
SEC27 NUP85
NUP133
PGA1
TVP18 SRC1
NUP157 PEP1 ASM4
NUP84 YPR174C
ARL1
NUP57
OAC1 NIC96 MON2
NUP100 SEC26
NUP49 NUP145
MTC1 DRS2
ULP1 HEH2 RET2 NDC1 ESC1 SSP120
APL6
NUP192
INP53 GTT3
HMG2
MLP1
-11
-4
Figure 1. Abundance Localization Map of a Wild-Type Strain (A) Images of single cells, showing typical proteins assigned by our classifiers to each localization. Each strain produces a cytosolic RFP to mark cell boundaries and a different GFP-tagged protein. (B) Left: abundance Localization Map (ALM) illustrating protein abundance and localization information for 2,834 proteins (nodes) connected to one or more of 16 colored hubs that represent distinct cellular locations. Absolute protein abundance measurements are represented on a gray-green-yellow scale for each protein. Right: expanded views of components of the network are shown in blue and red boxes. Micrographs of cells expressing proteins are shown, whose localization is typical of the assigned compartment (nucleolus [blue box] and nuclear periphery [red box]) with protein abundance (in molecules per cell), estimated from the GFP intensity values. See also Figure S1, Tables S1, S2, and S3, and Data S2.
components of the network in Figure 1B illustrate members of known protein complexes that colocalize, with representative micrographs. For example, our automated classifiers correctly
assigned all seven components of the t-UTP complex to the nucleolus (Figure 1B, blue box) and 25 of the 27 components of the Nup82 nuclear pore complex available in the GFP Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc. 1415
nucleus organization vesicle organization transport protein complex biogenesis vacuole organization cellular homeostasis cellular component morphogenesis cell cycle signaling process conjugation vesicle-mediated transport cellular membrane organization cytokinesis cell budding cytoskeleton organization fungal-type cell wall organization cellular lipid metabolism peroxisome organization translation cofactor metabolism generation of precursor metabolites, energy mitochondrion organization cellular respiration cellular amino acid & derivative metabolism ribosome biogenesis response to chemical stimulus cellular carbohydrate metabolism protein modification cellular aromatic compound metabolism cellular protein catabolism heterocycle metabolism sporulation RNA metabolism chromosome organization DNA metabolism transcription meiosis transposition response to stress chromosome segregation essential non-essential phosphorylated protein non-phosphorylated protein conserved in eukaryotes
A
B
Nucleus
Budneck Ce ll Perip h e ry Bu d site
SpindleP ole
Peroxis ome ER Co rtica l Pa tch e s
V a cu o le / Vacu o la r Me m b ra n e
Mitochondria
N u cle a r Perip h e ry Endos ome Log 2(Intensity Ig) - 11
20
ER Budneck Budsite Bud Cortical Patches Endosome Golgi Cell Periphery Vacuole/Vac Membrane Cytoplasm Nucleus Spindle Pole Peroxisome Nuclear Periphery Nucleolus Mitochondria Single localization Multiple localizations
Nucleolus
Bu d
Golgi
1 Log(P-value)
Cyto p la sm
- 6.5
Figure 2. Thematic Diagram and Functional Enrichment Analysis of Shared Protein Localization (A) A summary network of protein localization based on the data illustrated in Figure 1B is shown. (B) Functional enrichment analysis of proteins assigned to various cell localizations using our automated classifiers. Log (p values) indicating the significance of the functional enrichments are color-coded (key at top of graph). See also Figure S2.
collection (Aitchison and Rout, 2012) to the nuclear periphery (some of which are shown in Figure 1B, red box). Two components of the Nup82 complex, Sec13, and Nup116, appeared to behave differently because they were not assigned to the nuclear periphery; our computational localization assignments for these proteins correspond with published manual annotations (Huh et al., 2003). These examples illustrate the utility of ALMs for providing in vivo validation of protein complexes identified through biochemical or other means (see below for more discussion). For the purposes of visualizing broader trends in our data, we summarized the ALM into a network where each compartment is represented by a node whose size indicates the number of proteins with that localization (Figure 2A). The edges connecting nodes represent dual occupancy of proteins to more than one subcellular compartment and edge thickness indicates the number of proteins that are shared between compartments (Figure 2A). Node and edge colors reflect mean protein abundance (Figure 2A; Data S2). By depicting our ALM data in this manner, we are able to easily visualize the proportion of shared localization patterns between various subcellular compartments, as well as the average abundances of shared proteins. For example, the highest fraction of proteins with multiple localizations were those shared between the nucleus and the cytoplasm (67% of 1,538 colocalized proteins), consistent with the well-characterized shuttling of many proteins involved in transcription, RNA metabolism, and DNA repair (Aitchison and Rout, 2012). We saw strong colocalization of proteins to cortical patches, bud, budsite, and cell periphery (38% of the 264 localization assignments to these four compartments), largely due to the localization of cell polarity proteins associated with the actin cytoskeleton to each of these compartments at different times in the cell cycle. 1416 Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc.
We saw enrichment of proteins with annotated roles associated with specific organelles in cognate cellular compartments, e.g., proteins with annotated roles in cytoskeletal organization, budding, and cytokinesis were highly enriched in the bud-associated compartments and cortical patches (Figure 2B). Essential proteins were enriched in nucleus, nucleolus, and spindle pole and were more frequently associated with multiple localization classes (Figure 2B) and physical interactions among proteins were enriched significantly within each organelle except for the cytoplasm (Figure S2A). Finally, in agreement with previous studies (Huh et al., 2003), the majority of the localizations were to the cytoplasm (37%) and nucleus (28%; Figures 1B, 2A, and S2B), with most of these proteins assigned to both classes (Figures 2B and S2C). However, even outside of the shared nuclear and cytoplasmic proteins, 18% of the proteome (511 proteins) showed multiple localizations in up to five different compartments (Figures S2C and S2D), emphasizing that the proteome is far more complex than a mutually exclusive organization of subcellular units. Proteins that localize to multiple compartments were enriched in roles that involve regulation, such as cell cycle, signaling, cytokinesis, and budding and were more likely to be phosphoproteins, consistent with a role for phosphorylation in regulating subcellular movement (Figure 2B). Analysis of total GFP fluorescence also provided useful proteome-wide information about protein abundance. For example, the most abundant proteins in the cell were enriched in the cytoplasm (some of which colocalized at the cell periphery) and within the ER, Golgi, vacuole/vacuolar membrane, and nucleolus, which function in part to maintain the integrity of the intracellular environment (Figures 2A and S2E). Indeed, highly abundant proteins were enriched for roles in cell homeostasis (vacuole/vacuolar membrane, p < 5.9 3 10 5)
and transport (cell periphery, p < 4.5 3 10 8; ER, p < 2.0 3 10 6; Golgi, p < 2.7 3 10 5), as well as translation (cytoplasm, p < 3.4 3 10 69) and metabolism (cytoplasm, p < 1.5 3 10 6) (Figures 2A and S2F). Our automated analysis allowed us to distinguish proteins that varied in concentration over three orders of magnitude, from an estimated lower limit of 40 molecules per cell for Are1, an ER protein that was undetectable by western blot as a TAP-tagged protein, to an upper limit of 19,000 molecules per cell for Gln1, glutamine synthase, one of the highest abundance proteins in the TAP data set (Ghaemmaghami et al., 2003) (Figure 1B; Table S1). Our data set can also be mined using more detailed single cell analyses. For example, we compared single cells containing two GFP-tagged proteins that are known to relocalize between nucleus and cytoplasm in a cell cycle-dependent manner, Mcm2 (Yan et al., 1993) and Whi5 (Costanzo et al., 2004), and two proteins that are localized to the nucleus but are degraded in a cell cycle-specific manner, Sic1 (Visintin et al., 1998) and Far1 (Blondel et al., 2000). For each protein, we plotted the fraction of single cells assigned to cytoplasm, nucleus, or whose protein was below our threshold of detection for abundance (Supplemental Experimental Procedures). The Mcm2 and Whi5 proteins in all of the single cells were classified as localizing to either nucleus or cytoplasm (Figure S2G) with a larger fraction of cells containing nuclear Mcm2 than nuclear Whi5, consistent with their cellcycle biology. In contrast, for Sic1 and Far1, most of the cells that had detectable protein were classified as nuclear, while the majority of cells had undetectable protein levels (Figure S2G), suggesting that the protein had been degraded. Thus, processing single cell data offers the potential to reveal complex regulatory mechanisms governing protein localization and abundance. The increased sensitivity afforded by our approach allowed us to reliably assign a quantitative localization for 52 of the 156 proteins previously annotated as ‘‘ambiguous’’ (Table S3). For example, we confidently assigned Itr2, a myo-inositol transporter, Yeh1, a steryl ester hydrolase, and Bor1, a boron efflux carrier (http://www.yeastgenome.org/), to the cell periphery. Orthogonal functional genomic data sets supported our novel assignments of several proteins to subcellular compartments. We assigned the uncharacterized protein Ymr010w to the Golgi, and YMR010W shows strong synthetic lethal interactions with genes involved in transport/retention of proteins in the ER and Golgi (Costanzo et al., 2010; Koh et al., 2010). We also assigned Yir003w (Aim21) to cortical patches, and YIR003W has many genetic interactions with known actin patch genes (Costanzo et al., 2010). Together, these data suggest that our classifiers efficiently capture localization patterns and abundance information for most proteins and that these localizations can reveal important biology and contribute to predictions of protein function. Quantitative Analysis of Proteome Abundance Changes in Response to Environmental and Genetic Perturbations We exploited the ALMs to explore the response of the proteome to either environmental or genetic perturbation in an unbiased and quantitative fashion. We chose two chemical perturbations—rapamycin and hydroxyurea (HU) treatment—both of which compromise yeast proliferation and have been used
extensively as reagents to study nutrient deprivation and DNA replication stress, respectively. Rapamycin is an inhibitor of the TORC1 kinase complex, which controls cell growth by affecting protein synthesis and metabolism (Loewith and Hall, 2011), while HU inhibits DNA replication through deoxyribonucleotide depletion (Alvino et al., 2007). As a proof-of-principle for analysis of a genetic perturbation, we also examined proteome abundance and dynamics in a mutant lacking the catalytic subunit of a lysine deacetylase (KDAC), Rpd3 (Yang and Seto, 2008), which we had previously analyzed in systematic genetic screens (Kaluarachchi Duffy et al., 2012). Rpd3-mediated lysine deacetylation is a dynamic posttranslational modification with a well-defined role in negatively regulating gene expression through modification of histones. Rpd3 also deacetylates many non-histone proteins, regulating properties such as protein stability and protein-protein interactions (Kaluarachchi Duffy et al., 2012; Henriksen et al., 2012). Our quantitative approach is ideal for identifying changes in proteins that are not related to transcriptional changes. Because of the automated nature of our analysis, we were able to examine 15 different screening conditions (three times for HU treatment; nine times for rapamycin treatment; three rpd3D screens) and extract features from a total population of >20 million cells. With our classifiers, we assigned localization patterns for the GFP-fusion proteins expressed in 95% of cells imaged. Our compendium of high-resolution images as well as quantitative information on protein abundance and localization in both wild-type and perturbed cells is available as a searchable web-accessible interface called Collection of Yeast Cells and Localization Patterns (CYCLoPs) (http://cyclops. ccbr.utoronto.ca). Protein abundance information revealed 674 unique changes across the 15 conditional screens (Table S4; proteins with abundance changes using a stringent 2-fold cutoff (dPL); dPL < 1 (red) or dPL >1 (green) shown in Figure 3 and Table S5). Importantly, we detected protein abundance changes consistent with known biology and with other studies. For example, deletion of RPD3 largely resulted in increases in protein abundance (Figure 3), consistent with the well-characterized role of Rpd3 as a repressor of gene expression. We also observed 22 proteins, including several ribosome biogenesis and ribosomal proteins, that were downregulated in rapamycin and upregulated in rpd3D (Figure 3, center right). These findings reflect that inhibition of TORC1 by rapamycin leads to repression of Ribi (ribosome biogenesis) and RP (ribosomal protein) coding genes in a manner dependent on Rpd3 (for review, see Broach, 2012). Inhibition of TORC1 by rapamycin also causes repression of energy-consuming processes (Dechant and Peter, 2008)—our imaging assay showed a time-dependent decrease in abundance of regulators of protein translation by rapamycin (Figures 3, bottom right, and S3A) while levels of proteins involved in vacuolar protein degradation (e.g., Prc1 peptidase) and energy regeneration (e.g., Cox4 subunit of cytochrome c oxidase) were increased (Figures 3, top right, and S3A). Interestingly, there was very little overlap between our HU and rapamycin screens with respect to protein abundance changes (Figure 3). This difference largely appears to reflect activation of the Environmental Stress Response (ESR) (Gasch et al., 2000) following rapamycin treatment and growth inhibition (209/355 Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc. 1417
Figure 3. Protein Abundance Changes following Perturbations Detected Using Automated Imaging Hierarchical clustergram of GFP proteins showing a >2-fold change in protein level (GFP intensity). dPL values were calculated from treated over untreated data sets and clustered using average linkage with uncentered correlation. Parts of the clustergram showing distinct trends in the data are boxed in yellow. To the right are shown expanded views of the boxed regions, illustrating groups of proteins that have increased abundance in rapamycin (top), increased abundance in rpd3D together with decreased abundance in rapamycin (middle), and decreased abundance in rapamycin (bottom). See also Figure S3 and Tables S4 and S5.
of total changes occurred in proteins encoded by ESR genes; Table S5) but not following HU treatment (only 28/91 total changes occurred in proteins encoded by ESR genes; Table S5) (Dubacq et al., 2006). This difference may be explained by the effect of HU versus rapamycin on growth of the cells: HU treatment activates the intra-S phase checkpoint, causing cycling cells to accumulate in S/G2 (Alvino et al., 2007), whereas 1418 Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc.
rapamycin causes inhibition of growth (Loewith and Hall, 2011), which is associated with the ESR (Brauer et al., 2008). The changes in protein abundance in cells treated with rapamycin correlated closely with published protein abundance changes assessed by mass spectrometry (Fournier et al., 2010) (Figure S3B, upper). To ask if protein abundance changes correlated with changes in transcript levels, we compared the protein abundance values from our rapamycin experiment with published data sets measuring mRNA levels. We saw that rapamycin caused a delayed relationship between mRNA and protein levels with maximal correlation between mRNA expression at 120 min and protein levels at 620 min of rapamycin treatment (R = 0.7), similar to the findings by mass spectrometry (Fournier et al., 2010) (Figure S3B, lower). To ask if transcript and protein abundance were also correlated following a genetic perturbation, we considered any protein whose abundance changed detectably in the rpd3D strain (no stringent cut-off applied). Of all the proteins that increased in abundance, 79% were associated with genes whose transcript levels increased in the absence of RPD3 (p < 3.3 3 10 12) (Figure S3C) (Fazzio et al., 2001), and this set of proteins was enriched for components of the cellular stress response (Figure S3A), consistent with Rpd3 biology (Alejandro-Osorio et al., 2009). While we see generally good agreement between our protein levels and published mRNA levels, proteins that change in abundance in the absence of changes in the corresponding transcript are of particular interest as candidates for post-translational regulation. For example, of the seven proteins that changed in abundance >5-fold with no
corresponding change in published mRNA levels (Table S4) (Fazzio et al., 2001), three have been found to be hyperacetylated in an rpd3D strain by mass spectrometry (Henriksen et al., 2012): Pbi2, Tsl1, and Hyr1. Acetylation of these proteins may be regulating their abundance, consistent with known roles for acetylation in negatively regulating protein stability in yeast (Robert et al., 2011). Because our data are quantitative, we can identify unexpected changes in the proteome, even with a high background of protein abundance changes resulting from Rpd3 repressing transcription. Flux Networks Map Localization and Abundance Changes in Environmental and Genetic Perturbations As described above, ALMs provide a useful means of displaying proteome-scale information about protein abundance and localization in living cells. We reasoned that comparative analysis of ALMs derived from perturbed cell populations would enable us to examine protein dynamics in an unbiased and quantitative manner. To do this, we used LOC scores derived from wildtype and perturbed ALMs to define a ‘‘z-LOC score.’’ The z-LOC score quantifies changes in protein localization between sub-cellular compartments; a negative z-LOC value denotes the exit of a protein from a particular compartment, while a positive z-LOC value represents movement toward a specific location. For example, in the rpd3D strain, Pct1, an enzyme involved in phospholipid biosynthesis (Howe et al., 2002), showed a negative z-LOC score for the nuclear periphery and a positive z-LOC score for the nucleus, indicating displacement from the nuclear periphery to the nucleus (Figures 4A–4D). To display z-LOC score data in a biologically useful way, we generated ‘‘flux networks,’’ which are directional, quantitative vectors that define the relationship between two ALMs and represent the dynamic localization and abundance changes following a perturbation. In the flux networks, large nodes represent subcellular locations while small nodes are moving proteins, whose abundance relative to unperturbed cells is color-coded. The vectors illustrate the direction of movement of the protein and the thickness of the vector corresponds to the proportion of cells in the population with the indicated change in localization. The flux networks for our rpd3D, HU, and rapamycin experiments are shown in Figures 4C, S4A, and 5A. Some proteins, such as Dig2, a repressor of mating-specific gene expression (Figures 4C and 4D), are computationally scored with a specific localization pattern only in the perturbed situation (Dig2 retains some cytoplasmic localization but is enriched in the nucleus in the rpd3D strain). In this case, the vector on the rpd3D flux network shows movement into a compartment with no specific departure localization (Figure 4C). Interestingly, both Pct1 (see above) and Dig2 are acetylated proteins in vivo and both are hyperacetylated in an rpd3D strain (Henriksen et al., 2012), suggesting that Rpd3-mediated deacetylation may regulate their localization. In general, flux networks provide a quantitative snapshot of protein changes within the cell following environmental or genetic perturbation. A recent screen visually identified localization changes in the GFP collection treated with HU (Tkach et al., 2012), allowing us an opportunity to benchmark our computational method for detecting protein localization changes. Our computational analysis
identified 54 of the 97 proteins for which we could visually identify the changes reported by Tkach et al. in our images (Figure S4A; Table S6A; Supplemental Experimental Procedures). For another 36 of the 97 proteins, our computational approach identified the relevant localization in wild-type and HU but the change was not statistically significant; generally for these proteins a small fraction of the cells showed the localization change. Our computational approach also identified a novel set of 40 significant localization changes that had not been scored in the previous manual screen (Figure S4A; Table S6B). The majority of these (34) were proteins that had a mixed localization pattern in wild-type (for example nucleus and cytoplasm), which translates to more subtle quantitative localization changes in HU that would be challenging to score manually. Thus our computational approach is able to efficiently and quantitatively detect changes in protein localization with high sensitivity, capturing dynamic information that cannot be reasonably scored by assessment of thousands of images by eye. Integrating protein interaction information into flux networks highlights co-translocation of protein complexes or functionally related proteins. For instance, HU treatment resulted in relocalization of several proteins that form P-bodies or cytoplasmic granules, which sequester untranslated mRNAs under certain stresses like glucose deprivation (Teixeira and Parker, 2007; Tkach et al., 2012). We observe nine P-body proteins recruited from a diffuse cytoplasmic pattern to foci (Figures S4A and S4B), which expands upon similar observations (Tkach et al., 2012; De´nervaud et al., 2013). Proteins Change in Localization or Abundance but Usually Not Both Flux networks can also be used to explore the general relationship between protein abundance and localization changes. Following 80 min of HU treatment, 28 localization and 40 abundance changes (vPL > 1) were observed in the yeast proteome (Tables S2 and S4). Remarkably, only one protein, Smf3, an iron transporter, changed both localization and abundance using our criteria. Treatment with HU induces activation of the Aft regulon, which controls iron mobilization (Dubacq et al., 2006), leading to an increase in SMF3 transcript abundance; however, relocalization of Smf3 has not been previously observed. Similar trends were seen after prolonged exposure to HU and in cells perturbed by rapamycin treatment or deletion of RPD3, suggesting that the response of the proteome to genetic or chemical insults involves a change in protein abundance or localization but rarely both (Figure S4C). Our Rapamycin flux networks (Figure 5) also uncover intriguing time-dependent changes in the proteome that recapitulate known and meaningful biology. For example, our flux network displaying changes after 140 min of rapamycin treatment (RAP140) (Figure 5A) revealed translocation of transcription factors Sfp1, which promotes RP gene transcription, and Stp1, which is involved in amino acid sensing, from the nucleus to the cytoplasm. In contrast, Stb3, a repressor of RP gene expression, moved from the cytoplasm to the nucleus, reflecting the need to inhibit protein translation in starvation conditions, which are mimicked by rapamycin treatment (Figure 5A) (Broach, 2012) We also detected subcellular movement of two other Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc. 1419
A
B
C
D
transcription factors, Gln3 and Gat1, whose translocation to the nucleus is a hallmark of TORC1 inhibition (Figure 5A) (Broach, 2012). Observing cells over time following rapamycin treatment allowed us to track proteins that move transiently between com1420 Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc.
Figure 4. Automated Assessment of the Proteome in an RPD3 Deletion Mutant (A) Changes in protein localization to the nucleus in an rpd3D strain. The graph illustrates the distribution of z-LOC scores of all GFP-fusion proteins scored using our nuclear classifier in an rpd3D strain relative to wild-type. One outlier, Pct1, with increased nuclear localization is highlighted by the arrow. (B) Pairwise comparison between z-LOC scores for the nucleus versus those for nuclear periphery localization in rpd3D relative to wild-type. Pct1 is highlighted as an example of a protein showing obvious relocalization from the nuclear periphery to a more general nuclear localization in the rpd3D strain. (C) Flux network illustrating abundance and localization changes in the proteome in an rpd3D strain. The wild-type and rpd3D ALMs were used to generate the flux network where large nodes represent cell locations while the small nodes are moving proteins. (D) Representative micrographs showing Pct1GFP and Dig2-GFP localization in wild-type and rpd3D cells. See also Figure S4.
partments. A striking example is Rts3, a PP2A and Sit4 phosphatase-associated protein of unknown function (Gavin et al., 2006; Breitkreutz et al., 2010). Rts3 redistributed from the nucleus to the cytoplasm at early time points (Figure 5B), with a re-concentration in the nucleus after prolonged rapamycin treatment, which excludes it from the flux network in Figure 5A. This type of movement is often associated with signaling molecules that respond to a change in a stimulus and then adapt and return to their stead- state localization, e.g., the Hog1 MAP kinase in response to high osmolarity (Ferrigno et al., 1998). Our flux network also revealed an increase in Rts3-GFP abundance, which we confirmed using a TAPtagged version of Rts3 (Figure 5B). The transient relocalization and the abundance change suggest that Rts3 may play a key role in response to TORC1 inhibition. Indeed, we and others have found that Rts3 is required for cell survival in rapamycin (Figure 5B) (Parsons et al., 2004; Xie et al., 2005; Hood-DeGrenier, 2011). Thus, flux network-based analysis of the proteome over time can highlight unusual proteome dynamics in response to environmental or genetic stress and facilitate attribution of protein function. Several proteins that translocated out of the nucleolus following exposure of cells to rapamycin for 140 min also shared physical
A C
D
B
Figure 5. Proteome Dynamics in Rapamycin-Treated Cells over Time (A) Flux network illustrating protein localization and abundance changes after 140 min of rapamycin treatment. Proteins annotated as having physical interactions (BioGrid) are indicated with red edges. Representative micrographs of cells expressing GFP-fusions of transcription factors whose localization is influenced by rapamycin treatment are shown below: untreated cells (RAP0) and cells treated with rapamycin for 140 min (RAP140). (B) Micrographs showing transient relocalization of Rts3 from largely nuclear in untreated cells to diffusely cytoplasmic within 15 min of rapamycin treatment, with a re-concentration in the nucleus after prolonged rapamycin treatment (120 min). At bottom left is a western blot showing Rts3-TAP, following treatment with rapamycin for times indicated, with Swi6 loading control. Bottom right shows spot dilutions revealing sensitivity of an rts3D strain to rapamycin. (C) Pairwise comparison between z-LOC scores for the nucleus versus those for the nucleolus in cells treated with rapamycin for 60 min. Components of the exosome are highlighted with red arrows while proteins involved in ribosome biogenesis are highlighted with blue arrows. Rapamycin treatment induces exosome complex redistribution from being concentrated in the nucleolus to becoming dispersed throughout the nucleus as early as 60 min. (D) Representative micrographs of untreated and rapamycin-treated cells expressing GFP-fusions to exosome components (left; green images), a Sik1-RFP nucleolar marker (middle; red images) and an overlay of the two images (right).
interactions (Figure 5A, protein interactions indicated by red edges). Rrp43, Rrp42, Rrp4, Ski6, and Dis3 are members of the exosome, a complex involved in processing of rRNA, small nucleolar RNAs (snoRNAs), small nuclear RNAs (snRNAs), tRNAs, and cryptic unstable transcripts (CUTs) (Lykke-Andersen et al., 2009). A re-distribution from the nucleolus to the nucleus was
observed for eight out of ten components of the exosome after a 60 min exposure to rapamycin (Figure 5C). We confirmed this movement of the exosome using colocalization of GFP-tagged exosome components in cells expressing the nucleolar marker Sik1-RFP (Figure 5D). Subcellular movement of the exosome during rapamycin treatment has not been observed previously, Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc. 1421
and suggests a significant nuclear role for the complex and perhaps the need for specific RNA regulation or processing events during the cellular response to rapamycin. Thus, integrating flux networks with protein interaction data can provide new insight into protein dynamics and point to new compartment-specific functions for protein complexes. PERSPECTIVE Although incredibly powerful, functional genomic approaches that explore gene expression (Kemmeren et al., 2014), protein-protein interactions (Rees et al., 2011) and genetic interactions (Costanzo et al., 2010) fail to yield a spatio-temporal resolution that will be required to understand biological processes as complex dynamic systems. Here, we describe a high-content screening system that allows us to rapidly survey proteome dynamics in living cells. This approach is designed to define proteome flux using a computational image-based method and provide a proof-of-principle for both a technical and a conceptual platform that ought to be easily adapted to other systems. This study is our first attempt at a quantitative exploration of abundance and localization of the proteome. We used a high quality, available resource, the ORF-GFP collection, which includes 71% of the full proteome, with 1,492 strains excluded that failed to yield a GFP signal above background in the original manual assessment of the collection under standard growth conditions (Huh et al. [2003] and Saccharomyces Genome Database http://www.yeastgenome.org/). Our experimental approach can be readily modified to include technical improvements that should enable more comprehensive analysis of the proteome. A number of the ‘‘missing’’ proteins are detectable by western blot when C-terminally tagged with the TAP moiety (344 proteins) (Ghaemmaghami et al., 2003) and peptides corresponding to another 543 proteins have been identified in at least one mass spectrometry study (Nagaraj et al., 2012; Kulak et al., 2014); these proteins are clearly produced under standard conditions. Some proteins may be undetectable due to destabilization caused by the C-terminal tag; for these, insertion of the tag at the N terminus of the protein may produce useful alleles. In general, use of yeast-optimized versions of GFP that outperform the original GFP(S65T) for brightness, photostability, and function as fusion proteins (Lee et al., 2013) is likely to allow visualization of more low abundance proteins. In addition, low abundance proteins may be detectable using a recently developed protein tag known as the SunTag, which can recruit multiple copies of an antibody-GFP fusion protein and permits imaging of single molecules (Tanenbaum et al., 2014). The remaining proteins, which are likely not expressed under standard conditions, will have to be characterized on a case-by-case basis; growth in other conditions, chosen from transcription data or that affect growth of the corresponding mutant, may enable image-based analysis of these proteins. Automated image analysis of the proteome offers biological insight at several different levels. First, our compendium of images offers a qualitative view of the abundance and localization of each protein. In contrast to some other imaging data sets, we provide data for an average of 80 cells per image, allowing 1422 Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc.
assessment of general trends for each protein, including cellcycle distribution and variability within a population. Second, these data reveal previously unidentified regulatory events and offer many opportunities for generation of particular hypotheses. Third, although our data set contains many interesting individual findings, we predict that this type of data will be of particular value for comparative analyses using quantitative methodologies. Comparison of image compendia from different mutant strains or conditions with relevant proteomic data sets promises to provide insight into regulated proteolysis, post-translational modifications, and the regulation of protein-protein interactions. We also demonstrate that we can use single cell measurements to identify and distinguish two types of changes in cell-cycleregulated proteins and many other types of single-cell analysis should be possible with our data. Finally, quantitative data sets are particularly important for the development of models for pathway activity and cellular function. Just as transcriptome data have revolutionized our understanding of gene regulation, quantitative proteomic data, with the dimension of localization added to more straightforward abundance measures, will change the way researchers understand the cellular consequences of genetic and environmental stress. We suggest that integrating condition-specific quantitative abundance and localization data for the proteome will allow even greater predictive power and represent an essential component underlying modeling approaches. EXPERIMENTAL PROCEDURES Generation of Yeast Strains and Growth Conditions S. cerevisiae strains used in this experiment are listed in Table S1. Using a modified SGA protocol (Tong et al., 2001) RPL39pr-tdTomato and marked alleles were introduced into the arrayed yeast GFP collection (Huh et al., 2003). Haploid MATa strains from SGA were grown and imaged in low fluorescence synthetic medium (Sheff and Thorn, 2004) supplemented with methionine, NAT, and 2% glucose. Image Acquisition, Analysis, and Pattern Classification Large-scale acquisition of fluorescent micrographs was conducted using a high-throughput spinning-disc confocal microscope (Opera, PerkinElmer). For co-localization analysis, images were acquired using a spinning-disc confocal (WaveFX, Quorum Technologies) connected to a DMI 600B fluorescence microscope (Leica Microsystems) and ImagEM charge-coupled device camera (Hamamatsu C9100-13, Hamamatsu Photonics). Image analysis, segmentation, and acquisition of measurements (Figure S5A) were conducted using CellProfiler (Carpenter et al., 2006). Mean GFP intensity was used to represent the protein abundance in the cell and changes > 2-fold compared to wild-type were defined as significant. To generate the training sets, objects from the WT1 screen were visually inspected using CellProfiler Analyst (Jones et al., 2008), to select objects suitable for training different patterns (see Data S1) representing distinct locations and stages of the cell cycle. The resulting handpicked ‘‘designer’’ training sets consisted of over 70,000 objects. Using a supervised machine learning approach, an ensemble of 60 binary classifiers was generated to identify cells in 3 stages of the cell cycle, 16 localization patterns, ghost objects, and dead cells (Figures S5B–S5D). Classifiers were generated using SVM (Platt, 1998) (sequential optimization, PolyKernel with 1 as exponent) and improvement in accuracy was achieved through bootstrap aggregation (Breiman, 1996). An automated classification system was developed, using WEKA (Hall et al., 2009) and in-house statistical programs to generate training models from WT1 and to apply classifiers to all single cell instances across 18 screens.
Defining Changes in Localization For a protein i, the LOC score reflects the proportion of cells that are assigned to a specific localization class j under condition k:
Alvino, G.M., Collingwood, D., Murphy, J.M., Delrow, J., Brewer, B.J., and Raghuraman, M.K. (2007). Replication in hydroxyurea: it’s a matter of time. Mol. Cell. Biol. 27, 6396–6406.
nijk LOCijk = P : nijk
Benanti, J.A., Cheung, S.K., Brady, M.C., and Toczyski, D.P. (2007). A proteomic screen reveals SCFGrr1 targets that regulate the glycolytic-gluconeogenic switch. Nat. Cell Biol. 9, 1184–1191.
j
For each localization a cutoff was determined such that each strain was assigned to that localization if the fraction of cells assigned to that localization passed the cutoff (Table S7). Data Visualization The abundance localization maps (ALMs) and flux networks were generated using Cytoscape (Cline et al., 2007). Two types of node were defined in the ALMs: 16 localization classes as ‘‘hub’’ nodes, and individual proteins as ‘‘protein’’ nodes. For the ALMs in Data S2, node positions were determined by edge-weighted spring-embedded layout, and the visual distance of the edge between the hub and protein node depicts the quantitative localization membership of the protein to the specific organelle (i.e., shorter distance for higher LOC score). For the WT1 ALM shown in Figure 1B, the positions of the localization clusters were determined by edge-weighted spring-embedded layout and protein nodes were manually shifted to more clearly show abundance groupings. The node colors in gray-green-yellowscale are proportional to the Ig value (e.g., darker node for more abundant protein). SUPPLEMENTAL INFORMATION Supplemental Information includes Supplemental Experimental Procedures, five figures, seven tables, and two data sets and can be found with this article online at http://dx.doi.org/10.1016/j.cell.2015.04.051. AUTHOR CONTRIBUTIONS B.J.A., C.B., and J.M. designed and supervised the project. Y.T.C., M.J.C., and S.K.D. carried out and analyzed experiments. J.L.Y.K., Y.T.C., H.F., and A.M. performed large-scale analysis and interpretation. B.J.A., C.B., J.M., J.L.Y.K., and Y.T.C. created figures. B.J.A., C.B., J.M., Y.T.C., J.L.Y.K., and H.F. wrote the manuscript. ACKNOWLEDGMENTS We thank Jeff Liu, David Wade-Farley, Quaid Morris, Lee Zamparo, Harm Van Bakel, Kyle Tsui, and Corey Nislow for technical assistance and advice. We thank Michael Costanzo for helpful discussions and comments on the manuscript. This work was supported by grant MOP-97939 from the Canadian Institutes for Health Research to B.A. and C.B., and with funds from the Ontario Institute for Cancer Research to J.M. Infrastructure for high-content imaging was acquired with funds from the Canadian Foundation for Innovation and the Ministry of Research and Innovation (Ontario) (grant 21745 to B.A. and C.B.). B.A., J.M., and C.B. are Senior Fellows in the Genetic Networks program of the Canadian Institute for Advanced Research.
Blondel, M., Galan, J.M., Chi, Y., Lafourcade, C., Longaretti, C., Deshaies, R.J., and Peter, M. (2000). Nuclear-specific degradation of Far1 is controlled by the localization of the F-box protein Cdc4. EMBO J. 19, 6085–6097. Brauer, M.J., Huttenhower, C., Airoldi, E.M., Rosenstein, R., Matese, J.C., Gresham, D., Boer, V.M., Troyanskaya, O.G., and Botstein, D. (2008). Coordination of growth rate, cell cycle, stress response, and metabolic activity in yeast. Mol. Biol. Cell 19, 352–367. Breiman, L. (1996). Bagging predictors. Mach. Learn. 2, 123–140. Breitkreutz, A., Choi, H., Sharom, J.R., Boucher, L., Neduva, V., Larsen, B., Lin, Z.Y., Breitkreutz, B.J., Stark, C., Liu, G., et al. (2010). A global protein kinase and phosphatase interaction network in yeast. Science 328, 1043–1046. Breker, M., Gymrek, M., and Schuldiner, M. (2013). A novel single-cell screening platform reveals proteome plasticity during yeast stress responses. J. Cell Biol. 200, 839–850. Broach, J.R. (2012). Nutritional control of growth and development in yeast. Genetics 192, 73–105. Carpenter, A.E., Jones, T.R., Lamprecht, M.R., Clarke, C., Kang, I.H., Friman, O., Guertin, D.A., Chang, J.H., Lindquist, R.A., Moffat, J., et al. (2006). CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100. Chong, Y.T., Cox, M.J., and Andrews, B. (2012). Proteome-wide screens in Saccharomyces cerevisiae using the yeast GFP collection. Adv. Exp. Med. Biol. 736, 169–178. Cline, M.S., Smoot, M., Cerami, E., Kuchinsky, A., Landys, N., Workman, C., Christmas, R., Avila-Campilo, I., Creech, M., Gross, B., et al. (2007). Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382. Costanzo, M., Nishikawa, J.L., Tang, X., Millman, J.S., Schub, O., Breitkreuz, K., Dewar, D., Rupes, I., Andrews, B., and Tyers, M. (2004). CDK activity antagonizes Whi5, an inhibitor of G1/S transcription in yeast. Cell 117, 899–913. Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh, J.L., Toufighi, K., Mostafavi, S., et al. (2010). The genetic landscape of a cell. Science 327, 425–431. Dechant, R., and Peter, M. (2008). Nutrient signals driving cell growth. Curr. Opin. Cell Biol. 20, 678–687. De´nervaud, N., Becker, J., Delgado-Gonzalo, R., Damay, P., Rajkumar, A.S., Unser, M., Shore, D., Naef, F., and Maerkl, S.J. (2013). A chemostat array enables the spatio-temporal analysis of the yeast proteome. Proc. Natl. Acad. Sci. USA 110, 15842–15847. Dubacq, C., Chevalier, A., Courbeyrette, R., Petat, C., Gidrol, X., and Mann, C. (2006). Role of the iron mobilization and oxidative stress regulons in the genomic response of yeast to hydroxyurea. Mol. Genet. Genomics 275, 114–124.
Received: November 7, 2014 Revised: February 5, 2015 Accepted: April 9, 2015 Published: June 4, 2015
Fazzio, T.G., Kooperberg, C., Goldmark, J.P., Neal, C., Basom, R., Delrow, J., and Tsukiyama, T. (2001). Widespread collaboration of Isw2 and Sin3-Rpd3 chromatin remodeling complexes in transcriptional repression. Mol. Cell. Biol. 21, 6450–6460.
REFERENCES
Ferrigno, P., Posas, F., Koepp, D., Saito, H., and Silver, P.A. (1998). Regulated nucleo/cytoplasmic exchange of HOG1 MAPK requires the importin beta homologs NMD5 and XPO1. EMBO J. 17, 5606–5614.
Aitchison, J.D., and Rout, M.P. (2012). The yeast nuclear pore complex and transport through it. Genetics 190, 855–883. Alejandro-Osorio, A.L., Huebert, D.J., Porcaro, D.T., Sonntag, M.E., Nillasithanukroh, S., Will, J.L., and Gasch, A.P. (2009). The histone deacetylase Rpd3p is required for transient changes in genomic expression in response to stress. Genome Biol. 10, R57.
Fournier, M.L., Paulson, A., Pavelka, N., Mosley, A.L., Gaudenz, K., Bradford, W.D., Glynn, E., Li, H., Sardiu, M.E., Fleharty, B., et al. (2010). Delayed correlation of mRNA and protein expression in rapamycin-treated cells and a role for Ggc1 in cellular sensitivity to rapamycin. Mol. Cell. Proteomics 9, 271–284. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., and Brown, P.O. (2000). Genomic expression programs in the
Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc. 1423
response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241– 4257.
Lykke-Andersen, S., Brodersen, D.E., and Jensen, T.H. (2009). Origins and activities of the eukaryotic exosome. J. Cell Sci. 122, 1487–1494.
Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Du¨mpelfeld, B., et al. (2006). Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636.
Mazumder, A., Tummler, K., Bathe, M., and Samson, L.D. (2013). Single-cell analysis of ribonucleotide reductase transcriptional and translational response to DNA damage. Mol. Cell. Biol. 33, 635–642.
Ghaemmaghami, S., Huh, W.K., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O’Shea, E.K., and Weissman, J.S. (2003). Global analysis of protein expression in yeast. Nature 425, 737–741.
Nagaraj, N., Kulak, N.A., Cox, J., Neuhauser, N., Mayr, K., Hoerning, O., Vorm, O., and Mann, M. (2012). System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra HPLC runs on a bench top Orbitrap. Mol. Cell. Proteomics 11, M111.013722.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H. (2009). The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18. Handfield, L.F., Chong, Y.T., Simmons, J., Andrews, B.J., and Moses, A.M. (2013). Unsupervised clustering of subcellular protein expression patterns in high-throughput microscopy images reveals protein complexes and functional relationships between proteins. PLoS Comput. Biol. 9, e1003085. Henriksen, P., Wagner, S.A., Weinert, B.T., Sharma, S., Bacinskaja, G., Rehman, M., Juffer, A.H., Walther, T.C., Lisby, M., and Choudhary, C. (2012). Proteome-wide analysis of lysine acetylation suggests its broad regulatory scope in Saccharomyces cerevisiae. Mol. Cell. Proteomics 11, 1510–1522. Hood-DeGrenier, J.K. (2011). Identification of phosphatase 2A-like Sit4-mediated signalling and ubiquitin-dependent protein sorting as modulators of caffeine sensitivity in S. cerevisiae. Yeast 28, 189–204. Howe, A.G., Zaremberg, V., and McMaster, C.R. (2002). Cessation of growth to prevent cell death due to inhibition of phosphatidylcholine synthesis is impaired at 37 degrees C in Saccharomyces cerevisiae. J. Biol. Chem. 277, 44100–44107. Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., and O’Shea, E.K. (2003). Global analysis of protein localization in budding yeast. Nature 425, 686–691. Jones, T.R., Kang, I.H., Wheeler, D.B., Lindquist, R.A., Papallo, A., Sabatini, D.M., Golland, P., and Carpenter, A.E. (2008). CellProfiler Analyst: data exploration and analysis software for complex image-based screens. BMC Bioinformatics 9, 482. Kaluarachchi Duffy, S., Friesen, H., Baryshnikova, A., Lambert, J.P., Chong, Y.T., Figeys, D., and Andrews, B. (2012). Exploring the yeast acetylome using functional genomics. Cell 149, 936–948. Kemmeren, P., Sameith, K., van de Pasch, L.A., Benschop, J.J., Lenstra, T.L., Margaritis, T., O’Duibhir, E., Apweiler, E., van Wageningen, S., Ko, C.W., et al. (2014). Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors. Cell 157, 740–752. Koh, J.L., Ding, H., Costanzo, M., Baryshnikova, A., Toufighi, K., Bader, G.D., Myers, C.L., Andrews, B.J., and Boone, C. (2010). DRYGIN: a database of quantitative genetic interaction networks in yeast. Nucleic Acids Res. 38, D502–D507. Kulak, N.A., Pichler, G., Paron, I., Nagaraj, N., and Mann, M. (2014). Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat. Methods 11, 319–324. Lee, S., Lim, W.A., and Thorn, K.S. (2013). Improved blue, green, and red fluorescent protein tagging vectors for S. cerevisiae. PLoS ONE 8, e67902. Loewith, R., and Hall, M.N. (2011). Target of rapamycin (TOR) in nutrient signaling and growth control. Genetics 189, 1177–1201. Loo, L.H., Laksameethanasan, D., and Tung, Y.L. (2014). Quantitative protein localization signatures reveal an association between spatial and functional divergences of proteins. PLoS Comput. Biol. 10, e1003504.
1424 Cell 161, 1413–1424, June 4, 2015 ª2015 Elsevier Inc.
Newman, J.R., Ghaemmaghami, S., Ihmels, J., Breslow, D.K., Noble, M., DeRisi, J.L., and Weissman, J.S. (2006). Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846. Parsons, A.B., Brost, R.L., Ding, H., Li, Z., Zhang, C., Sheikh, B., Brown, G.W., Kane, P.M., Hughes, T.R., and Boone, C. (2004). Integration of chemical-genetic and genetic interaction data links bioactive compounds to cellular target pathways. Nat. Biotechnol. 22, 62–69. Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods—Support Vector Learning (MIT Press), pp. 185–208. Rees, J.S., Lowe, N., Armean, I.M., Roote, J., Johnson, G., Drummond, E., Spriggs, H., Ryder, E., Russell, S., St Johnston, D., et al. (2011). In vivo analysis of proteomes and interactomes using Parallel Affinity Capture (iPAC) coupled to mass spectrometry. Mol. Cell. Proteomics 10, M110.002386. Robert, T., Vanoli, F., Chiolo, I., Shubassi, G., Bernstein, K.A., Rothstein, R., Botrugno, O.A., Parazzoli, D., Oldani, A., Minucci, S., and Foiani, M. (2011). HDACs link the DNA damage response, processing of double-strand breaks and autophagy. Nature 471, 74–79. Sheff, M.A., and Thorn, K.S. (2004). Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiae. Yeast 21, 661–670. Tanenbaum, M.E., Gilbert, L.A., Qi, L.S., Weissman, J.S., and Vale, R.D. (2014). A protein-tagging system for signal amplification in gene expression and fluorescence imaging. Cell 159, 635–646. Teixeira, D., and Parker, R. (2007). Analysis of P-body assembly in Saccharomyces cerevisiae. Mol. Biol. Cell 18, 2274–2287. Tkach, J.M., Yimit, A., Lee, A.Y., Riffle, M., Costanzo, M., Jaschob, D., Hendry, J.A., Ou, J., Moffat, J., Boone, C., et al. (2012). Dissecting DNA damage response pathways by analysing protein localization and abundance changes during DNA replication stress. Nat. Cell Biol. 14, 966–976. Tong, A.H., Evangelista, M., Parsons, A.B., Xu, H., Bader, G.D., Page´, N., Robinson, M., Raghibizadeh, S., Hogue, C.W., Bussey, H., et al. (2001). Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368. Visintin, R., Craig, K., Hwang, E.S., Prinz, S., Tyers, M., and Amon, A. (1998). The phosphatase Cdc14 triggers mitotic exit by reversal of Cdk-dependent phosphorylation. Mol. Cell 2, 709–718. Xie, M.W., Jin, F., Hwang, H., Hwang, S., Anand, V., Duncan, M.C., and Huang, J. (2005). Insights into TOR function and rapamycin response: chemical genomic profiling by using a high-density cell array method. Proc. Natl. Acad. Sci. USA 102, 7215–7220. Yan, H., Merchant, A.M., and Tye, B.K. (1993). Cell cycle-regulated nuclear localization of MCM2 and MCM3, which are required for the initiation of DNA synthesis at chromosomal replication origins in yeast. Genes Dev. 7, 2149– 2160. Yang, X.J., and Seto, E. (2008). The Rpd3/Hda1 family of lysine deacetylases: from bacteria and yeast to mice and men. Nat. Rev. Mol. Cell Biol. 9, 206–218.