JOURNALOF
Dermatological Science
ELSEVIER
Journal of Dennatological
Science 14 (1997) l-l 1
Isolation and characterization of genomic clones of human sequences presumably coding for hair cysteine-rich proteins Nathalie Emonet, Jean-Jacques Michaille*, L.E.D.A.C.,
Institut
Albert
Bonniot,
Fact&P
de Midecine,
Domaine
Danielle Dhouailly
de L,a Merci,
38706
Lu Tronche
Gdex,
France
Received 26 December 1995; revised 11 March 1996; accepted 15 March 1996
Abstract The major biochemical components of the mammalian hair are the intermediate filaments or keratins and the keratin associated proteins. Keratin associated proteins are classified into two groups (high-cysteine and high glycine-tyrosine-rich polypeptides) according to the content of these amino acids. Cysteine-rich group contains high sulphur (16-24% cysteine) and ultra-high sulphur ( > 30% cysteine) proteins. We report here the identification of a human sequence presumably coding for a new ultra-high sulphur protein (hUHSp21) and the isolation and characterization of four genomic clones containing six related sequences. We also discuss the possibility that all the genes encoding keratin associated proteins are evolutionary related. These human clones should provide useful molecular tools for studies of hair differentiation and understanding of the molecular basis of human trichothiodystrophy. Copyright 0 1997 Elsevier Science Ireland Ltd. Keywords:
Cysteine-rich
gene; HSp; Human
genomic clones; Keratin-associated
1. Introduction 1.1. Keratin associatedproteins, cysteine-rich proteins and genes
The mammalian hair follicle is a specialized epidermal appendage consisting of an array of up to seven different cellular layers. Upon division,
* Corresponding author, Tel.: + 33 76 54 95 72; fax: + 33 76 54 94 25; e-mail:
[email protected].
protein;
UHSp
the follicle cells move up the shaft of the follicle and give rise to either the cortex, cuticle, medulla or the inner root sheath. Cortical cells have an internal structure of keratin intermediate filaments embedded in an interfilamentous matrix of proteins that are referred to collectively as keratin associated proteins (KAPs). It has thus been estimated that 50-100 proteins belonging to at least 10 distinct families are expressed in the growing hair [l]. According to their overall amino acid (aa) composition, KAPs are grouped into two multimembered classes (for a review, see [2],
0923-181 l/97/$17.00 Copyright 0 1997 Elsevier Science Ireland Ltd. All rights reserved PlZSO923-1811(96)00541-5
2
N. Emonet
et al. /Journal
of Dermatological
and references therein): the glycine-tyrosine rich proteins or HGTps, with 60-70% of the composition being composed of these two aa and serine, and the cysteine-rich proteins. The latter are classified into high-sulphur proteins or HSps (1624% cysteine) and ultra-high sulphur proteins or UHSps ( > 30% cysteine). Characterization of cDNAs and genes isolated from genomic DNA show that a hair KAP family often consists of more than 10 homologous genes that have been conserved during evolution, except for trichohyalin which is produced from a single gene [3,4]. A number of genes or cDNAs coding for the following HSps have been partially or totally sequenced: B2A, B, C and D (i.e., KAPl family [5]), BIIIB2, 3 and 4 (i.e., KAP3 family [6]), and KAP4.1 [7] in sheep, and HB2A and B in human [8]. Three UHSp genes were sequenced in mice, i.e., one gene encoding an UHSp richer in arginine residues than other known UHSps [9], which will therefore be referred to as Arg-UHSp, and two genes encoding the Serl- and Sed-UHSp [lo]. One gene (KAP5.1, formerly KRNI, [ll]) and two cDNAs encoding the KAP5.4 and 5.5 UHSps [12] have also been sequenced in sheep. In contrast, only one human UHSp gene (KAP5.1) has yet been published [l 11. Three genes coding for HGTps have also been reported in sheep, i.e., KAP6.1 [13], and KAP7 and KAP8 [14]. All described KAP genes lack introns and appear to be clustered, as demonstrated by the frequent occurrence of two or more genes on the same genomic fragment. Accordingly, MacKinnon et al. [15] showed that the human KAP5.1 gene is located at 1 lq13 and that sequences cross-hybridizing with it are only found at 11~15. 1.2. Sulphur-dejicient
hair in trichothiodystrophy
In human, trichothiodystrophy (TTD), or sulphur-deficient brittle hair, is a clinical marker for a neuroectodermal symptom complex that usually features mental and physical retardation and may also include nail dystrophy, lamellar ichthyosis, ocular dysplasia, dental caries and decreased fertility. Cysteine-deficient hair is common to all patients. By two-dimensional electrophoresis, it was shown that hair abnormality such as striking
Science
14 (1997)
1-11
bright and dark bands seen with polarizing microscopy using crossed polarizer correlates with a decreased synthesis of cysteine-rich proteins of the matrix [16]. In particular, the TTD cysteine-rich proteins have lost the large heterogeneous UHSp group and at least 8 major HSp components [17]. The identification of the human HSp and UHSp genes, whose function is perturbed in TTD patients, should therefore help us to understand the molecular basis of this syndrome and also to study the regulation of the expression of the genes implicated in hairshaft building. With the aim to isolate genes coding for human HSps and UHSps, we screened a genomic DNA library with an oligonucleotide probe equivalent to a cysteine-rich repeat in a sheep wool HSp, using low stringency conditions for post-hybridization washes. In this paper, we report the isolation and characterization of five clones bearing seven human genomic sequences related to this oligonucleotide and thus putatively corresponding to HSp or UHSp genes, and the identification of a sequence presumably encoding a new UHSp. We also discuss the possibility that all the genes coding for KAPs are members of the same superfamily.
2. Materials
and methods
2.1. Screening of a human genomic DNA library The oligonucleotide LW 199 (5’-GAGACTGGCAGCACTGGGGTCTGCAGCAGCTGG ACACACAGC-3’) was originally designed to isolate the genomic counterpart of a still unpublished cDNA encoding a mouse arginine-rich UHSp, and permitted the isolation of two serine-rich UHSp encoding genes [lo]. After 5’ end-labelling using T4 polynucleotide kinase (Boehringer) and Y-~~P-ATP (ICN), it was used as a probe to screen 1 x lo6 clones of a iEMBL3 human genomic DNA library, constructed by Dr. J.-M. Garnier (IGBMC, Illkirch, France). Plating of phages and purification of positive clones were performed according to conventional methods [ 181. Hybridization conditions were: pre-hybridization at 42°C in 6 x NET (NET: 150 mM NaCl, 1 mM EDTA,
N. Emonet et al. 1 Journal of Dermatological Sciedwe 14 (1997) 1-11
15 mM Tris (pH 7.5)), 4 x Denhardt (Denhardt: 0.01% Ficoll (Type 400), 0.01% polyvinylpyrrolidone), 2.5 mM EDTA, 0.2% SDS and 100 @g/ml denatured herring DNA; hybridization at 42°C in the same buffer with the addition of 0.5 x lo6 cpm/ml labelled probe; and washes including 2 x SSC (SSC: 150 mM NaCl, 15 mM sodium citrate (pH 7)), 0.1% SDS at room temperature twice for 15 min, 2 x SSC, 0.1% SDS at 50°C twice for 10 min followed by 1 x SSC, 0.1% SDS once at 50°C for 10 min. 2.2. Restriction mapping and Southern blot analysis
DNAs extracted from the ;1 clones were purified by standard protocols [18]. After digestion with restriction enzymes (Boehringer), DNAs were fractionated in 0.8 or 1.2% agarose gels and transferred to Nylon membranes (Amersham). 32P-labelled DNA probe was prepared by random priming using the ‘Ready to Go’ kit of Pharmacia. Hybridization conditions were as described [ 181: pre-hybridization at 42°C in 50% formamide, 5 x SSC, 2 x Denhardt, 2.5 mM EDTA, 0.2% SDS and 100 pg/ml denatured herring DNA; hybridization at 42°C in the same buffer with the addition of 1 x lOh cpm/ml labelled probe; and washes including 2 x SSC, 0.1% SDS at room temperature twice for 15 min, 2 x SSC, 0.1% SDS at 65°C twice for 10 min followed by 0.1 x SSC, 0.1% SDS once at 65°C for 10 min. The oligonucleotide PM 115 (5’-GTTGGAAAGCACAGATCTTCGGGAGCTACC-3’) was designed from the specific 3’-untranslated part of the human KAP5.1 gene [l l] and used as a probe in the same conditions as LW199. 2.3. DNA subcloning and sequenceanalysis
Fragments of interest were isolated by electroelution from the agarose gel and were further purified over Elutip-D columns (Schleicher and Schuell). They were then subcloned in the multiple cloning site of the Bluescript SK+ vector (Stratagene) by standard protocols [ 181. Sequence analysis was performed on both strands using the T7-SequencingTM Kit from Pharmacia, according to the manufacturer’s instructions.
3. Results 3.1. Isolation of genomic DNA clones
Using the oligonucleotide LW199 as a probe, we screened 1 x lo6 clones of a hEMBL3 human genomic DNA library. With the aim of isolating genes encoding several different HSps and/or UHSps, post-hybridization washes were conducted at low stringency. It was thus possible to purify 22 /i clones containing human genomic inserts whose sizes ranged from 13.5 to 19 kilobase pairs (kbp). These clones were of five different types according to the intensity of the signal they gave. One clone of each family (i.e., /ilJHSp3, 4, 9, 18 and 21) was selected for further analysis. ‘The five inserts were mapped (Fig. 1) and the corresponding DNAs tested with the LW199 probe in the same conditions as those used for the screening of the genomic DNA library. Two clones contained two non-contiguous positive fragments: the first one (AUHSp3) contained a 3 kbp-Hind111 and a 1.85 kbp-EcoRI/ Hind111 fragments that gave medium and high-intensity signals, respectively. The second clone (LUHSp4) contained a 2.5 kbp-BamHI and a 3.7 kbp-BamHI/EcoRI fragments that gave low and medium-intensity signals, respectively. In contrast, the three other clones (IUHSp9, 18 and 21) each contained a unique positive fragment (0.75 kbp-HindIII, 5.3 kbp-EcoRI/HindIII and 2.5 kbp-EcoRI/HindIII, respectively), each gave a medium-intensity signal. The comparison of the maps of the five clones suggests that they should most probably contain five different fragments of the human genome. However, none of these clones contained the known human KAP5.1 gene [ll], as suggested by the lack of any signal with the KAPS. 1-derived PM1 15 probe. 3.2. IdentiJication of a human sequence presumably encoding a new UHSp
A 246 from the ment that subcloned nucleotide
base pairs (bp)-PstI
fragment
derived fraghybridized with the LW199 probe was and sequenced on both strands. Its (nt) sequence and the aa sequence
JUHSp21 2.5 kbp-EcoRIIHirzdIII
4
IV. Emonet
et al. /Journal
of Dermatological
Science
14 (1997)
l-l
I
IhUHSp3
LhUHSp4
I
E I
LhUHSp9
SHB II I
E I
E I
iChUHSpl8 IL
IhUHSp21
SH
BE
H
I
I
1OOObp
H Fig. 1. Restriction maps of the 1UHSp3, 4, 9, 18 and 21 clones. After partial Sal1 digestion, human genomic fragments have been inserted in the Sal1 cloning sites of the IEMBL3 vector. The open boxes respectively represent the left (A,) and right (1,) arms of the phage. The black boxes give the positions of the shortest restriction fragments detected by the LW199 probe. Their widths are roughly proportional to the intensity of the signal obtained after autoradiography. The maps of the RUHSp4 and 18 have been cut in order to save space. Restriction enzymes are: B, BarnHI; E, EcoRI; H, HindIII; M, SmaI; S, SalI.
deduced from one of the four open reading frames are given in Fig. 2A. This aa sequence contains sixteen repeated cysteine-rich pentapeptides. These repeated motifs are of two types according to the presence or the lack of a proline residue (P’ and P - motifs, respectively). Both these types of pentapeptides were aligned in Fig. 2B. There are three well conserved aa in each motif: the two cysteines and the proline in the P+ pentapeptides, the two cysteines and the first serine in the P - ones. The third and fifth aa are less conserved in both types of motifs. The third aa is often basic in the P+ repeat, whereas the corresponding P - aa is mostly hydrophobic. In contrast, the fifth aa tends to be hydrophilic in the sixteen repeats. The high content in cysteine (40%) of this deduced aa sequence strongly suggests that it repre-
sents a cysteine-rich protein, most probably an UHSp, that will therefore be referred to as human UHSp21 (hUHSp21). The presence of only 7.4% hydrophobic aa and the lack of any tyrosine or aspartic acid in hUHSp21 is in general accordance with the aa composition of the known UHSps (Table 1). The richness of hUHSp21 in proline, glutamine and arginine (11.1, 7.4 and 7.4%, respectively), the presence of glutamic acid (1.2%) and threonine (4.9%), as well as the lack of glycine, suggest that this protein represents an UHSp related to the mouse previously described Arg-UHSP [9]. However, the aa sequence of hUHSp21 does not fit well that of the mouse Arg-UHSp, suggesting that this human polypeptide may be a new member of the arginine-rich UHSp subfamily.
N. Emonet
A ..t-
C
AGO
et
CCC
al. 1 Journal
*
... .
&3C R
CCC P
ATC
TCC
AGC
Motif
TGC c
TGC c
c
.
AG
8
TCT 8
CAG
Q
GTG
14 (1997)
Oc TGC C C
V
TGC! TOT C C
ATC
IS
TCC
AGC
8
I-11
CAA
ccc
Q
P
AC
T
15 6 3 12 10 1
P+ Conemneumr
Consensus:
ACG
E
T
94 31
ACC
T C
TOT C
ATG M
TCC 8
GC
C
TOT
CGC
CCC
A0
C
R
P
TGC! TOT C C
S
AGC
8
GTG V
TCC 8
238 79
AGO R
246 81
Nucleotider
Number
7 14 9 16 2 11 5
GAG
TGC
13 8 4
P -'
TGC C
Science
.
TOT C
TkC. c
TGC C
of Dermatological
Amino
Acid@
TQC --_-__--------..-
TQC CAC ccc AQC -_-Q- --_ __. --T -Q- --__. --T -Q- --_-_ ----m-T e-0 ----A __- -C_-m-0 __- -c* --A-Q ___ C.Q --AQQ --CAQ
c _ .
c _ -
H R K R
P -
: Q I[ R
-
TQC
TQC
CAC
CCC
C
C
R
P
8
TQC -------__
TQC _-e-T --T ---
--T
QTQ
A-Q
-C-
c _
V
CAQ QAQ
c -
s -
_---T
TCC AQC! _---_ m-s --_ ----Q --T QTQ
I _ M
--_-_
ATC -_q-0 Q-Q CAQ
: II
T
s R V v T
AQC
C
C
I/Q
S
S
TQC TQC ATQ TCC
AQC
8 R T =
Fig. 2. Partial nt sequence of the putative hUHSp.21 gene. (A) Nucleotide sequence of the 246 bp-PstI fragment derived from IUHSpZf. The deduced aa sequence is given under the nt sequence in the one-letter code. The PstI cloning sites are indicated by a bar. The proline-containing cysteine-rich motifs are boxed. Undetermined nt and aa are indicated by dots. The nt sequence has been deposited to the EMBL/GenBank Data Libraries under accession number X78336. (B) Comparison of the nt and deduced aa sequences of the 16 repeated motifs contained in the hUHSp21 246 bp-PstI fragment. The number of each motif is indicated at the left of the figure. The respective nt and aa consensus of motifs that contain (Pi-) and do not contain (P-) a proline residue are given.
3.3. Comparative analysis of the AUHSp3, 4, 9, 18 and 21 genomic clones The RUHSp3, 4, 9 and 18 clones contain a total of six fragments that were hybridized by the LW199 probe. In order to determine the degree of relationship of these nt sequences between themselves and with the putative hUHSp21 gene, the
five clones were digested with either the HindIII, Hind111 plus PvuII, or RsaI restriction endonucleases. PvuII was chosen because it specifically recognizes the 5’-CAGCTG-3’ sequence, which often codes for the serine-cysteine dipeptide (AGCTG[C or T]) in the HSp and UHSp genes. Namely, the coding region of the HSp genes contains one (sheep BZZZB2 and BZZZB4), four
Mouse
hacl-1
[19]
[14]) [14]) 11.9
10.8 7.1 6.6
Sheep KAP6.1 [13] Sheep KAP7 (partial, Sheep KAPI (partial,
HCiTp
[7])
25.3 26.1 28.5 16.2 17.2 23.1 21.7 22.5
Human Human Sheep Sheep Sheep Sheep Sheep Sheep
HSP
HB2A [S] HBZB [8] KAP4.1 (partial, BIIIB2 [6] BIIIW [6] B2A [5] B2C [5] B2D [5]
40.7 35.3 37.4 40.3 41.1 30.6 31.9 29.4
Human UHSp21 (partial) Human KAP5.1 [ll] Mouse Arg-UHSp [9] Mouse Serl-UHSp [lo] Mouse Sed-UHSp [lo] Sheep KAP5.1 [I I] Sheep KAP5.4 1121 Sheep KAP5.5 (partial, [12])
-
8.3
7.1 6.6
8.4 6.8 9.9 13.1 16.2 10.4 8.6 9.3
11.1 5.9 11.8 6.5 6.2 4.4 5.2 4.1
P
C
9.8
37.3 21.4 23.0
10.1 8.5 4 3.0 2.0 8.7 9.2 8.2
15.9 3.2 15.2 14.3 27.9 25.7 27.9
GNQHR
acid percentages
in the growing
Amino
expressed
UHSp
proteins
Peptides
of different
Families
Table 1 Amino acid composition
1.6
1.2 4.8 1.6
0.7 4.0 3.0 0.5
1.1 0.4 ~ -
hair
8.3
~
6.2 7.4 4.6 2.0 3.0 8.7 7.2 9.3
7.4 4.1 9.6 4.3 5.8 1.1 1.6 2.0
1.0
~ 1.2 ~
1.1 2.3 0.7 2.0 2.0 ~ ~
2.5 ~ 0.5 1.1 1.0 1.0
5.2
4.8 4.8 3.3
3.9 3.4 10.6 6.1 3.0 3.5 4.6 2.7
~
7.4 2.4 5.3
-
~
~ 1.0 1.0 ~ 0.5
1.2 5.3 5.6 7.6 5.5 5.2 5.6
KS
14.5
14.5 14.3 14.8
16.9 15.9 18.5 10.1 10.1 14.5 16.4 14.3
16.0 22.9 13.4 22.9 21.0 20.2 20.9 20.8 0.5
8.0 0.4 0.4
4.9
6.0 3.3 13.5
-
8.4 8.0 7.3 9.1 12.1 10.4 11.2 12.1
-
-
T
2.2 2.3 3.3 2.0 1.0 2.3 2.6 2.2
1.8 0.5
2.1
21.7 13.1 18.0
~ -
~
Y
1.6
-
1.1 1.7 4.0 2.0 0.6 0.7 0.5
~ 0.6 0.4 ~ -
DEW
1.0
-
3.9 3.4 0.7 2.0 3.0 2.9 2.6 2.7
1.2 1.1 ~ -
0.5
~ 3.3
0.6 0.6 1.0 0.6 0.7 0.5
~ 0.5 ~ -
20.7
9.6 20.3 19.7
11.2 13.1 11.2 24.2 22.1 14.5 14.4 14.1
7.4 5.3 7.4 3.4 3.0 8.6 8.3 8.1
&F,LLM,V
N. Emonet
,
H
::H+P::
of
et al. / Journal
R
Dermatological
Science
,
14 (1997)
)- H
3491821349182l349l821
349182134918
1- 11
:;H+P;;
R 2134918
, 21
-3.3 -2.0 A6
0.6,
X-
0.2, OS, Fig. 3. Comparative Southern blot analysis of the IUHSp3, 4, 9, 18 and 21 DNAs. (A) After digestion with the restriction enzymes Hind111 (H), Hind111 plus PuuII (H + P) or RsaI (R), the DNAs extracted from the AUHSp3, 4, 9, 18 and 21 clones were loaded onto a 1.2% agarose gel After electrophoresis and Southern blotting, the clones were analyzed with the LW199 probe. The sizes of the fragments are given in kbp at the left (Hind111 and Hind111 plus Pun11 digests) or the right (RsaI digest) of the figure. (B) After washing, the same blot was rehybridized with the labelled hUHSp21246 bp-PstI fragment. The positions of the fragments previously detected in (A) are indicated by arrowheads. The size of the newly appearing fragment (2.9 kbp) is given, Fragments that were not detected by this second probe and those which give rise to a fainter signal with it than with the LW199 oligonucleotide are respectively indicated by the X or the open circles.
(human HB2A and sheep B2C), five (sheep B2A and B2D), six (human HBZB) or ten (sheep KAP5.4) PuuII sites. Similarly, the coding part of the UHSp genes contains seven (human KAPS.l), nine (mouse Arg-UHSp and sheep KAPS. 1 and KAP5.3, ten (sheep KAP5.4), twelve (mouse SeR- UHSp) or sixteen (mouse Ser 1- UHSp) PvuII sites. Accordingly, the 246 bp-PstI fragment derived from the JUHSp21 clone contains seven PuuII sites (Fig. 2A). After Southern blotting, the five DNAs were tested with the LW199 probe (Fig. 3A). The digestion with Hind111 generates fragments corresponding to the restriction maps, the two fragments present in the iUHSp4 DNA are both
contained in the 17.5 kbp-fragment ending in the right arm of the rZ bacteriophage. In the &a1 digest, six fragments displayed differential signals, which should reflect their relative degree of homology with the LW199 probe. Interestingly, the Hind111 fragments issuing from the AUHSp3, 4 and 21 clones were cut into small 0.1 kbp-fragments by PvuII, while a 0.2 kbp-fragment, also found in AUHSp4 and 21, and giving rise to a very low-intensity signal, was only detected in the /zUHSp18 digest. This suggests that the sequence previously detected by the LW199 probe in the 6.3 kbp-Hind111 fragment of the latter clone was almost entirely cut into fragments smaller than 0.1 kbp, which were therefore not retained in the gel.
8
N. Emonet
et al. / Journal
of Dermatological
The presence of multiple PuuII sites in six of the seven fragments detected by the LW199 probe further suggests that they could contain genes coding for one HSp or UHSp. In contrast, the 0.6 kbp-fragment of the IUHSp9 clone did not contain any PvuII site, and should thus represent a sequence less closely related to the six others. After washing, the same blot was rehybridized with the labelled 246 bp-PstI fragment derived from the 1UHSp21 DNA (Fig. 3B). As shown in the RsaI digest, the relative intensity of the signal given by the 1.2 and 1.5 kbp-fragments respectively contained in the 2 UHSpl8 and 21 DNAs was almost the same as in panel A, which suggests that the two corresponding sequences are the most closely related. In the same way, the 1.6 kbp-fragment of the XJHSp4 clone still gave a low-intensity signal indicating that the corresponding sequence is as poorly related to the putative UHSp21 gene as to the LW199 probe. In contrast, the two 1.6 and 2.0 kbp-fragments contained in the XJHSp3 DNA, as such as the IUHSp4 3.3 kbp-fragment, gave a low intensitysignal with this second probe: these three fragments should thus contain nt sequences more closely related to the LW199 probe than to hUHSp21. However, a newly appearing 2.9-kbp RsaI fragment in the IUHSp4 DNA suggests that one of the two genes possibly contained in this clone shares with hUHSp21 some nt sequence that is not related to that of LW199. Finally, the lack of any signal in the three lanes containing the LUHSp9 DNA shows that the LW199-related sequence contained in this clone is only very poorly related to that contained in hUHSp21, which is consistent with its lack of any PuuII site. 4. Discussion 4.1. The jive characterized clones contain human sequences presumably encoding different hair cysteine-rich proteins
In this study, we isolated 22 h clones probably containing 17 different human genomic fragments. Five of these clones were mapped. One of them, 1UHSp21, presumably contains a gene
Science
14 (1997)
1-l 1
which could encode a new UHSp related to the previously described mouse ArgUHSp [9], although we cannot exclude that it could represent a pseudogene at this time. The /zUHSp3, 4, 9 and 18 clones could probably also contain HSp and/or UHSp encoding genes, because: (i) they contain six fragments hybridized by the LW199 probe; (ii) all these fragments, except that contained in 1UHSp9, are cut into small pieces by PuuII and are hybridized by the 246 bp-PstI fragment derived from the putative hUHSp21 gene; and (iii) two of these clones (RUHSp3 and 4) contain two non-contiguous fragments responsive to the LW199 probe. The presence of two genes in the same genomic fragment was also described in sheep (i.e., genes encoding B2A and B2C [5], and genes encoding the KAP5.1 and another predicted UHSp [I l]), and in mouse (i.e., genes coding for the Serl- and SerZUHSp [lo]). The differential response obtained with the LW199 probe could arise either from the number of LW199 oligonucleotides able to keep hybridized at low stringency (for example, seven in the 246 bp-PstI fragment of hUHSP21) and/or for their different degrees of nt homology with this probe. However, the differential response also observed with the JUHSP21 246 bp-PstI probe strongly suggests that several different cysteinerich human genes are likely to be contained in these clones. (hUHSp21)
4.2. KAP genes appear to be the members of the same superfamily
From several features such as similar aa compositions, nt sequence homologies in the 5’-untranslated regions and the presence of conserved motifs putatively acting as regulatory elements in the promoter regions, it has been previously suggested that the HSp and UHSp genes are members of several families that evolved by successive duplications and subsequent divergence by genetic drift [2,5,6,11,12]. However, certain members of the different families show similar percentages in some of their aa (Table 1). For example, hUHSp21 and mouse Arg-UHSp are similar to human HB2A and B and sheep B2A, C and D in
Sheep KAP6.1 [13] Sheep KAP7 (partial, Sheep KAPS (partial,
HGTp
[14]) [14])
~ CY ~
x 7 1 -
x ? T PSV
-
-K ~ ~
-;
C
PV V csv
C
Q
x ? ?
Q L LQV QPR
;::
v v v
K
R
P+ repeat consensus
X
~ -
-
-
1
P
repeats contained
TV
x L 3
VT RST RTD Tl PT TP
STCR cv C VA VA VA
s cv
proteins
12
x 5 5
13 11 14 IO I1 15 12 16
9 IO 15 17 14 8 10 9
Number
in the different
GTCS
G FCG CGY
cs
cisc Cl
CG CG CG CG CG
C
CC
CY CGS GY
Cl
Cl
GC
GSC GSC GSC
C CSG
-
SYG SG ?
TS TS CL CN TSG TS T
1 SGC SGC SGC
s SC ST
hair
T
GL GY GS
SG SG T RS ~ SG SG SG
1 GSC GSC GSC
S SG ST
20
16 9 8
20 22 22 8 6 I8 17 19
7 19 13 27 27 26 25 25
subtypes of motifs are present in the P+ or P- motifs.
IQR
G GNY ?
:GV QGV QGV
GEQV QGEV IQRV L
$C SGKC SGKC SCGK
;G GQRT
in the growing
P- repeat consensus
expressed
Symbols: x, absent repeat type; -, amino acid homologies with UHSp21; ?, no defined consensus amino acid. When different same protein. the consensus amino acids at each position are given according to the decreasing frequency of their occurrence
hacl-1 [19]
HB2A [X] HB2B [8] KAW. 1 (partial, BIIIBZ [6] BIlIB4 [6] B2A [5] B2C [5] B2D [5]
Human Human Sheep Sheep Sheep Sheep Sheep Sheep
HSP
Mouse
UHSp21 (partial) KAP5.1 [II] Arg-UHSp [9] Serl-UHSp [lOI SerZ-UHSp [lo] KAPS. 1 [l I] KAP5.4 [12] KAP5.5 (partial, [12])
Human Human Mouse Mouse Mouse Sheep Sheep Sheep
UHSp
[7])
Peptides
in the two types of pentapeptide
Families
Table 2 Consensus amino acids found
i X-X
the
that they are all rich in proline, glutamine and threonine and poor in lysine and serine. They also resemble sheep KAP4.1 in their lack of aspartic acid and their richness in arginine. In the same way, the richness of the hUHSp21 and mouse Arg-UHSp in serine and threonine is close to that of the HGTps, and their content in glutamine, arginine, serine and glutamic acid resembles that of the peptide encoded by the mouse hacl-1 gene [19]. In contrast, the UHSps of the KAPS family are very rich in glycine but poor in glutamine, lysine and glutamic acid, as are the HGTps. On the other hand, the HGTp content in proline, arginine and lysine resembles that of the HSp B2 family and hacl-1, whereas their paucity in acidic aa corresponds to that of the UHSps. These observations suggest either a common origin or a convergent evolution. It also appears, however, that all these proteins contain a variable number of P + and P - pentapeptides. The consensus aa found at the five positions are given in Table 2. Strikingly, the HSps and UHSps share the same types of P+ motifs, containing either a basic or a hydrophobic aa at the third position and a hydroxylated or hydrophobic aa at the fifth position. P+ motifs with a glutamine at the third position are found in mouse Arg-UHSp, HSps of the B2 family, and in the mouse hacl-1. Although the P - motifs are more variable in sequence, the UHSps, except the KAPS family, contain P- repeats with a glutamine, an isoleucine or an arginine at the third position and a hydroxylated aa at the fifth position, as do the HSps of the B2 family and the hacl-1. In contrast, the UHSps of the KAPS family and the HGTps have a similar richness in glycine and share P- repeats with a glycine at any of the five positions. Thus, proteins with a similar composition in some aa also share similar pentapeptide repeats. The HSp, UHSp and HGTp encoding genes share two other features: (i) they all lack introns [20], as does the hacl-1 gene [19], and (ii) their promoters contain some similar sequences putatively acting as regulatory elements [20]. However, a similar 18 bp-motif found in the 5’-untranslated region of the genes encoding the HSps of the B2 and BIII families and the HGTps [5,6,14] is ab-
sent in the genes coding for the UHSps. All together, these observations strongly suggest that these different gene families constitute a superfamily which includes ha&-l and which arose from successive duplications and subsequent divergence from an ancestral gene actually coding for the two types of five aa repeats. Because very homologous repeats are coded by all these genes, although at different frequencies, some selective pressure, probably resulting from the need to maintain the chemical and physical properties of the hairs, is thus likely to have limited the overall divergence resulting from genetic drift. 4.3. Concluding remarks
The results presented here should render a set of new probes specific for human cysteine-rich genes. These probes will allow the analysis of the expression of these genes during the normal development of the hair follicle in human. They will also allow a better definition of the evolutionary relationship between these sequences. Furthermore, a comparative study of the expression of these genes in patients suffering from TTD should help to identify the cysteine-rich genes whose expression is modified in homozygous children and lead to a better understanding of this human disorder. Acknowledgements
We are particularly grateful to Dr. J.-M. Garnier of the team of Prof. P. Chambon (IGBMC, Illkirch, France) for the gift of the human genomic DNA library, and to Dr. G. Vogeli (The Upjohn Company, Kalamazoo, USA) for the gift of the LW199 oligonucleotide. This work was supported by the ‘Association FranGaise contre les Myopathies’ and the ‘Fondation pour la Recherche Mtdicale’. References [l] Powell BC, Nesci A, Rogers GE: Regulation of keratin gene expression in hair follicle differentiation. Ann NY Acad Sci 642: l-20, 1991.
N. Emonet
et al. / Journal
of Dermatological
[2] Marshall RC, Orwin DFG, Gillespie JM: Structure and biochemistry of mammalian hard keratin. Electron Microsc Rev 4: 47-83, 1991. [3] Rogers GE, Fietz hlJ, Fratini A: Trichohyalin and matrix proteins. Ann NY Acad Sci 642: 64-81, 1991. [4] Fietz MJ, McLaughlan CJ, Campbell MT, Rogers GE: Analysis of the sheep trichohyalin gene: potential structural and calcium-binding roles of trichohyalin in the hair follicle. J Cell Biol 121: 8555865, 1993. [5] Powell BC, Sleigh MJ, Ward KA, Rogers GE: Mammalian keratin gene families: organisation of genes coding for the B2 high-sulphur proteins of sheep wool. Nucl Acids Res 11: 5327--5346, 1983. [6] Frenkel MJ, Powell BC, Ward KA, Sleigh MJ, Rogers GE: The keratin BIIIB gene family: isolation of cDNA clones and structure of a gene and a related pseudogene. Genomics 4: 182-191, 1989. [7] Fratini A, Powell BC, Hynd PI, Keough RA, Rogers GE: Dietary cysteine regulates the levels of mRNAs encoding a family of cysteine-rich proteins of wool. J Invest Dermatol 102: 178-185, 1994. [S] Zhumbaeva BD, Gening LV, Gazarian KG: Cloning and structural characteristics of human hair keratin genes rich in sulfur. Mol Biol (Mosk) 26: 8133820, 1992. [9] McNab AR, Wood L, Theriault N, Gierman T, Vogeli G: An ultra-high sulfur keratin gene is expressed specifically during hair growth. J Invest Dermatol 92: 263266, 1989. [lo] Wood L, Mills M, Hatzenbuhler N, Vogeli G: Serinerich ultra high sulfur protein gene expression in murine hair and skin during the hair cycle. J Biol Chem 265: 21375521380, 1990. [ll] MacKinnon PJ, Powell BC, Rogers GE: Structure and expression of genes for a class of cystein-rich proteins of
[12] [13]
[14] [15]
[16]
[17] [18] [19]
[20]
Science
14 (1997)
I - I1
11
the cuticle layers of differentiating wool and hair follicles. J Cell Biol 111: 2587-2600, 1990. Jenkins BJ, Powell BC: Differential expression of genes encoding a cysteine-rich keratin family in the hair cuticle. J Invest Dermatol 103: 310-317, 1994. Fratini A, Powell BC, Rogers GE: Sequence, expression and evolutionary conservation of a gene encoding a glycineityrosine-rich keratin-associated protein of hair. J Biol Chem 268: 4511-4518, 1993. Kuczek ES, Rogers GE: Sheep wool glycine + tyrosinerich keratin genes: a family of low sequence homology. Eur J Biochem 166: 79985, 1987. MacKinnon PJ, Powell BC, Rogers GE, Baker EG, MacKinnon RN, Hyland VJ, Callen DF, Sutherland GR: An ultrahigh-sulphur keratin gene of the human hair cuticle is located at 1lq13 and cross-hybridizes with sequences at 11~15. Mammalian Genome 1: 53-56, 1991. Price V’H, Odom RB, Ward WH, Jones FT: Trichothiodystrophy: Sulfur-deficient brittle hair as a marker for a neurectodermal symptom complex. Arch Dermatol 116: 1375-1384, 1980. Gillepsie JM, Marshall RC: A comparison of the proteins of normal and trichothiodystrophic human hair. J Invest Dermatol 80: 1955202, 1983. Sambrook J, Fritsh EF, Maniatis T: Molecular Cloning: A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New-York, 1989. Huh N, Kashiwagi M, Konishi C, Hashimoto Y, Kohno Y, Nomura S, Kuroki T: Isolation and characterization of a novel hair-follicle-specific gene, had-l. J Invest Dermatol 102: 716-720, 1994. Rogers GE, Powell BC: Organization and expression of hair follicle genes. J Invest Dermatol (suppl) 101: 5OS55s, 1993.