20.6 The startpoint for RNA polymerase II

Key terms

TATA box is a conserved A·T-rich septamer found about 25 bp before the startpoint of each eukaryotic RNA polymerase II transcription unit; may be involved in positioning the enzyme for correct initiation.

Key Concepts

· RNA polymerase II promoters have a short conserved sequence Py2CAPy5(the initiator InR) at the startpoint.

· Most RNA polymerase II promoters have an A·T-rich octamer called the TATA box ~25 bp upstream of the startpoint.

RNA polymerase II cannot initiate transcription itself, but is absolutely dependent on auxiliary transcription factors. The enzyme together with these factors constitutes the basal transcription apparatus that is needed to transcribe any promoter. Our starting point for considering promoter organization is therefore to define a "generic" promoter, the shortest sequence at which RNA polymerase II can initiate transcription, and to characterize the enzyme subunits and transcription factors that are needed to recognize it.

A generic promoter can in principle be expressed in any cell. The accessory proteins that are required for polymerase II to initiate at such a promoter define the general transcription factors involved in the mechanics of binding to DNA and initiating transcription. The general factors are described as TFIIX, where "X" is a letter that identifies the individual factor. A generic promoter functions at only a low efficiency; activators are required for a proper level of function. The activators are not described systematically, but have casual names reflecting their histories of identification.

We may expect any sequence components involved in the binding of RNA polymerase and general transcription factors to be conserved at most or all promoters. As with bacterial promoters, when promoters for RNA polymerase II are compared, homologies in the regions near the startpoint are restricted to rather short sequences. These elements correspond with the sequences implicated in promoter function by mutation. Figure 20.14 shows the minimal sequence for a pol II promoter, which has two sequence elements.

At the startpoint, there is no extensive homology of sequence, but there is a tendency for the first base of mRNA to be A, flanked on either side by pyrimidines. (This description is also valid for the CAT start sequence of bacterial promoters.) This region is called the initiator (Inr), and may be described in the general form Py2CAPy5. The Inr is contained between positions –3 and +5. A promoter consisting only of the Inr has the simplest possible form recognizable by RNA polymerase II.

Most promoters have a sequence called the TATA box, usually located ~25 bp upstream of the startpoint. It constitutes the only upstream promoter element that has a relatively fixed location with respect to the startpoint. The core sequence is TATAA, usually followed by three more A·T base pairs. The TATA box tends to be surrounded by G·C-rich sequences, which could be a factor in its function. It is almost identical with the –10 sequence found in bacterial promoters; in fact, it could pass for one except for the difference in its location at –25 instead of –10.

Single base substitutions in the TATA box act as strong down mutations. Some mutations reverse the orientation of an A·T pair, so base composition alone is not sufficient for its function. So the TATA box comprises an element whose behavior is analogous to our concept of the bacterial promoter: a short, well-defined sequence just upstream of the startpoint, which is necessary for transcription. The minority of promoters that do not contain a TATA element are called TATA-less promoters.


Figure 20.14
The minimal pol II promoter has a TATA box ~25 bp upstream of the InR. The TATA box has the consensus sequence of TATAA. The Inr has pyrimidines (Y) surrounding the CA at the startpoint. The sequence shows the coding strand.

20.7 TBP is a universal factor

Key Concepts

· TBP is a component of the positioning factor that is required for each type of RNA polymerase to bind its promoter.

· The factor for RNA polymerase II is TFIID, which consists of TBP and 11 TAFs, with a total mass ~800 kD.

The first step in complex formation at a promoter containing a TATA box is binding of the factor TFIID to a region that extends upstream from the TATA sequence. TFIID contains two types of component. Recognition of the TATA box is conferred by the TATA-binding protein (TBP), a small protein of ~30 kD. The other subunits are called TAFs (for TBP-associated factors). Some TAFs are stoichiometric with TBP; others are present in lesser amounts. TFIIDs containing different TAFs could recognize different promoters. Some (substoichiometric) TAFs are tissue-specific. The total mass of TFIID typically is ~800 kD, containing TBP and 11 TAFs, varying in mass from 30-250 kD. The TAFs in TFIID are named in the form TAFII00, where "00" gives the molecular mass of the subunit. The TAFIIs are not confined exclusively to TFIID; certain TAFIIs are found also in protein complexes that act to modify the structure of chromatin prior to transcription (see 21 Regulation of transcription) (651, 653, 657; for review see 225, 1709).

Positioning factors that consist of TBP associated with a set of TAFs are responsible for identifying all classes of promoters. TFIIIB (for pol III promoters) and SL1 (for pol I promoters) may both be viewed as consisting of TBP associated with a particular group of proteins that substitute for the TAFs that are found in TFIID (for review see 1709). TBP is the key component, and is incorporated at each type of promoter by a different mechanism. In the case of promoters for RNA polymerase II, the key feature in positioning is the fixed distance of the TATA box from the startpoint.

Figure 20.15 shows that the positioning factor recognizes the promoter in a different way in each case. At promoters for RNA polymerase III, TFIIIB binds adjacent to TFIIIC. At promoters for RNA polymerase I, SL1 binds in conjunction with UBF. TFIID is solely responsible for recognizing promoters for RNA polymerase II. At a promoter that has a TATA element, TBP binds specifically to DNA, but at other promoters it may be incorporated by association with other proteins that bind to DNA. Whatever its means of entry into the initiation complex, it has the common purpose of interaction with the RNA polymerase.

Any individual molecule of TBP itself is not necessarily available for all promoters, but may in effect be sequestered by its associated proteins to be used continuously by a specific class of promoter. TBP must have the capacity to interact appropriately with the variety of factors and/or polymerases that are employed at each type of promoter.

TFIID is ubiquitous, but not unique. All multicellular eukaryotes also express an alternative complex, which has TLF (TBP like factor) instead of TBP (1708). A TLF is typically ~60% similar to TBP. It probably initiates complex formation by the usual set of TFII factors.However, TLF does not bind to the TATA box, and we do not yet know how it works. Drosophila also has a third factor, TRF1, which behaves in the same way as TBP and binds its own set of TAFs, to form a complex that functions as an alternative to TFIID at a a specific set of promoters (1707).

This section updated 4-30-2001

Reviews

225:

Burley, S. K. and Roeder, R. G. (1996). Biochemistry and structural biology of TFIID. Ann. Rev. Biochem. 65, 769-799.

1708:

Berk, A. J. (2000). TBP-like factors come into focus. Cell 103, 5-8.

1709:

Lee, T. I. and Young, R. A. (1998). Regulation of gene expression by TBP-associated proteins. Genes Dev. 12, 1398-1408.

Research

651:

Martinez, E. et al. (1994). TATA-binding protein-associated factors in TFIID function through the initiator to direct basal transcription from a TATA-less class II promoter. EMBO J. 13, 3115-3126.

653:

Verrijzer, C. P. et al. (1995). Binding of TAFs to core elements directs promoter selectivity by RNA polymerase II. Cell 81, 1115-1125.

657:

Horikoshi, M. et al. (1988). Transcription factor ATD interacts with a TATA factor to facilitate establishment of a preinitiation complex. Cell 54, 1033-1042.


Figure 20.15
RNA polymerases are positioned at all promoters by a factor that contains TBP.

 

20.8 TBP binds DNA in an unusual way

Key Concepts

· TBP binds to the TATA box in the minor groove of DNA.

· It forms a saddle around the DNA and bends it by ~80°.

· Some of the TAFs resemble histones and may form a structure resembling a histone octamer.

TBP has the unusual property of binding to DNA in the minor groove. (Virtually all known DNA-binding proteins bind in the wide groove.) The crystal structure of TBP suggests a detailed model for its binding to DNA. Figure 20.16 shows that it surrounds one face of DNA, forming a "saddle" around the double helix. In effect, the inner surface of TBP binds to DNA, and the larger outer surface is available to extend contacts to other proteins. The DNA-binding site consists of sequences that are conserved between species, while the variable N-terminal tail is exposed to interact with other proteins (647, 648, 649).

Binding of TBP may be inconsistent with the presence of nucleosomes. Because nucleosomes form preferentially by placing A·T-rich sequences with the minor grooves facing inward, they could prevent binding of TBP. This may explain why the presence of nucleosomes prevents initiation of transcription.

TBP not only sits in the minor groove, but also bends the DNA by ~80°, as illustrated in Figure 20.17. The TATA box bends towards the major groove, widening the minor groove. The distortion is restricted to the 8 bp of the TATA box; at each end of the sequence, the minor groove has its usual width of ~5 Å, but at the center of the sequence the minor groove is >9 Å. This is a deformation of the structure, but does not actually separate the strands of DNA, because base pairing is maintained.

This structure has several functional implications. By changing the spatial organization of DNA on either side of the TATA box, it allows the transcription factors and RNA polymerase to form a closer association than would be possible on linear DNA. The bending at the TATA box corresponds to unwinding of about 1/3 of a turn of DNA, and is compensated by a positive writhe. We do not know yet how this relates to the initiation of strand separation.

The presence of TBP in the minor groove, combined with other proteins binding in the major groove, creates a high density of protein-DNA contacts in this region. Binding of purified TBP to DNA in vitro protects ~1 turn of the double helix at the TATA box, typically extending from –37 to –25; but binding of the TFIID complex in the initiation reaction regularly protects the region from –45 to –10, and also extends farther upstream beyond the startpoint. TBP is the only general transcription factor that makes sequence-specific contacts with DNA.

Within TFIID as a free protein complex, the factor TAFII230 binds to TBP, where it occupies the concave DNA-binding surface. In fact, the structure of the binding site, which lies in the N-terminal domain of TAFII230, mimics the surface of the minor groove in DNA. This molecular mimicry allows TAFII230 to control the ability of TBP to bind to DNA; the N-terminal domain of TAFII230 must be displaced from the DNA-binding surface of TBP in order for TFIID to bind to DNA (654).

Some TAFs resemble histones; in particular TAFII42 and TAFII62 appear to be (distant) homologs of histones H3 and H4, and they form a heterodimer using the same motif (the histone fold) that histones use for the interaction. Together with other TAFs, they may form the basis for a structure resembling a histone octamer which is involved in the nonsequence-specific interactions of TFIID with DNA. Histone folds are also used in pairwise interactions between other TAFIIs.

Research

647:

Nikolov, D. B. et al. (1992). Crystal structure of TFIID TATA-box binding protein. Nature 360, 40-46.

648:

Kim, Y. et al. (1993). Crystal structure of a yeast TBP/TATA box complex. Nature 365, 512-520.

649:

Kim, J. L., Nikolov, D. B., and Burley, S. K. (1993). Cocrystal structure of TBP recognizing the minor groove of a TATA element. Nature 365, 520-527.

654:

Liu, D. et al. (1998). Solution structure of a TBP-TAFII230 complex: protein mimicry of the minor groove surface of the TATA box unwound by TBP. Cell 94, 573-583.


Figure 20.16
A view in cross-section shows that TBP surrounds DNA from the side of the narrow groove. TBP consists of two related (40% identical) conserved domains, which are shown in light and dark blue. The N-terminal region varies extensively and is shown in green. The two strands of the DNA double helix are in light and dark grey. Photograph kindly provided by Stephen Burley.


Figure 20.17
The cocrystal structure of TBP with DNA from -40 to the startpoint shows a bend at the TATA box that widens the narrow groove where TBP binds. Photograph provided by Stephen Burley.

20.9 The basal apparatus assembles at the promoter

Key Concepts

· Binding of TFIID to the TATA box is the first step in initiation.

· Other transcription factors bind to the complex in a defined order, extending the length of the protected region on DNA.

· When RNA polymerase II binds to the complex, it initiates transcription.

Initiation requires the transcription factors to act in a defined order to build a complex that is joined by RNA polymerase. The series of events can be followed by the increasing size of the protein complex associated with DNA. Footprinting of the DNA regions protected by each complex suggests the model summarized in Figure 20.18. As each TFII factor joins the complex, an increasing length of DNA is covered. RNA polymerase is incorporated at a late stage (644; for review see 223, 226).

Commitment to a promoter is initiated when TFIID binds the TATA box. When TFIIA joins the complex, TFIID becomes able to protect a region extending farther upstream. TFIIA may activate TBP by relieving the repression that is caused by the TAFII230.

Addition of TFIIB gives some partial protection of the region of the template strand in the vicinity of the startpoint, from –10 to +10. This suggests that TFIIB is bound downstream of the TATA box, perhaps loosely associated with DNA and asymmetrically oriented with regard to the two DNA strands. The crystal structure shown in Figure 20.19 confirms this model. TFIIB binds adjacent to TBP, extending contacts along one face of DNA. It may provide the surface that is in turn recognized by RNA polymerase. (In archaea, the homologue of TFIIB actually makes sequence-specific contacts with the promoter. (652))

The factor TFIIF consists of two subunits. The larger subunit (RAP74) has an ATP-dependent DNA helicase activity that could be involved in melting the DNA at initiation. The smaller subunit (RAP38) has some homology to the regions of bacterial sigma factor that contact the core polymerase; it binds tightly to RNA polymerase II. TFIIF may bring RNA polymerase II to the assembling transcription complex and provide the means by which it binds. The complex of TBP and TAFs may interact with the CTD tail of RNA polymerase, and interaction with TFIIB may also be important when TFIIF/polymerase joins the complex.

Polymerase binding extends the sites that are protected downstream to +15 on the template strand and +20 on the nontemplate strand. The enzyme extends the full length of the complex, since additional protection is seen at the upstream boundary.

What happens at TATA-less promoters? The same general transcription factors, including TFIID, are needed. The Inr provides the positioning element; TFIID binds to it via an ability of one or more of the TAFs to recognize the Inr directly. The function of TBP at these promoters is more like that at promoters for RNA polymerase I and at internal promoters for RNA polymerase III.

Many of the general factors consist of multiple subunits, so the total number of polypeptides involved in the basal apparatus is rather large. There are probably ~20 polypeptides with a total mass of ~500 kD. Remember that RNA polymerase II itself has ~10 subunits with a mass of ~500 kD, so we see that initiation involves the assembly of an extremely large complex.

Assembly of the RNA polymerase II initiation complex provides an interesting contrast with prokaryotic transcription. Bacterial RNA polymerase is essentially a coherent aggregate with intrinsic ability to bind DNA; the sigma factor, needed for initiation but not for elongation, becomes part of the enzyme before DNA is bound, although it is later released. But RNA polymerase II can bind to the promoter only after separate transcription factors have bound. The factors play a role analogous to that of bacterial sigma factor—to allow the basic polymerase to recognize DNA specifically at promoter sequences—but have evolved more independence. Indeed, the factors are primarily responsible for the specificity of promoter recognition. The process of assembling the transcription complex reminds us of ribosome subunit assembly, in which ribosomal proteins must bind to rRNA (or to other proteins in the complex) in a certain order. Only some of the factors participate in protein-DNA contacts (and only TBP makes sequence-specific contacts); thus protein-protein interactions are important in the assembly of the complex.

The sequences in the vicinity of the startpoint comprise a "core" promoter at which the basal transcription apparatus is assembled. When a TATA box is present, it determines the location of the startpoint. Its deletion causes the site of initiation to become erratic, although any overall reduction in transcription is relatively small. Indeed, some TATA-less promoters lack unique startpoints; initiation occurs instead at any one of a cluster of startpoints. The TATA box aligns the RNA polymerase (via the interaction with TFIID and other factors) so that it initiates at the proper site. This explains why its location is fixed with respect to the startpoint. Binding of TBP to TATA is the predominant feature in recognition of the promoter, but two large TAFs (TAFII250 and TAFII150) also contact DNA in the vicinity of the startpoint and influence the efficiency of the reaction.

Although assembly can take place just at the core promoter in vitro, this reaction is not sufficient for transcription in vivo, where interactions with activators that recognize the more upstream elements are required. The activators interact with the basal apparatus at various stages during its assembly (see 20.20 Activators interact with the basal apparatus).

Reviews

223:

Zawel, L. and Reinberg, D. (1993). Initiation of transcription by RNA polymerase II: a multi-step process. Prog Nucleic Acid Res Mol Biol 44, 67-108.

226:

Nikolov, D. B. and Burley, S. K. (1997). RNA polymerase II transcription initiation: a structural view. Proc. Nat. Acad. Sci. USA 94, 15-22.

Research

644:

Buratowski, S., Hahn, S., Guarente, L., and Sharp, P. A. (1989). Five intermediate complexes in transcription initiation by RNA polymerase II. Cell 56, 549-561.

652:

Nikolov, D. B. et al. (1995). Crystal structure of a TFIIB-TBP-TATA-element ternary complex. Nature 377, 119-128.


Figure 20.18
An initiation complex assembles at promoters for RNA polymerase II by an ordered sequence of association with transcription factors.


Figure 20.19
Two views of the ternary complex of TFIIB-TBP-DNA show that TFIIB binds along the bent face of DNA. The two strands of DNA are green and yellow, TBP is blue, and TFIIB is red and purple. Photograph kindly provided by Stephen Burley.

20.10 Initiation is followed by promoter clearance

Key Concepts

· TFIIE and TFIIH are required to melt DNA to allow polymerase movement.

· Phosphorylation of the CTD may be required for elongation to begin.

· Further phosphorylation of the CTD is required at some promoters to end abortive initiation.

· The CTD may coordinate processing of RNA with transcription.

Most of the transcription factors are required solely to bind RNA polymerase to the promoter, but some act at a later stage. Binding of TFIIE causes the boundary of the region protected downstream to be extended by another turn of the double helix, to +30. Two further factors, TFIIH and TFIIJ, join the complex after TFIIE. They do not change the pattern of binding to DNA. TFIIH has several activities, including an ATPase, a helicase, and a kinase activity that can phosphorylate the CTD tail of RNA polymerase II; it is also involved in repair of damage to DNA (see 20.11 A connection between transcription and repair) (650).

The initiation reaction, as defined by formation of the first phosphodiester bond, occurs once RNA polymerase has bound. Figure 20.20 proposes a model in which phosphorylation of the tail is needed to release RNA polymerase II from the transcription factors so that it can make the transition to the elongating form. Most of the transcription factors are released from the promoter at this stage.

On a linear template, ATP hydrolysis, TFIIE, and the helicase activity of TFIIH (provided by the XPB subunit) are required for polymerase movement. This requirement is bypassed with a supercoiled template. This suggests that TFIIE and TFIIH are required to melt DNA to allow polymerase movement to begin (946). TFIIH is an exceptional factor that may play a role also in elongation.

RNA polymerase II stutters at some genes when it starts transcription. (The result is not dissimilar to the abortive initiation of bacterial RNA polymerase discussed in 9.10 Sigma factor controls binding to DNA, although the mechanism is different.) At many genes, RNA polymerase II terminates after a short distance. The short RNA product is degraded rapidly. To extend elongation into the gene, a kinase called P-TEFb is required (for review see 948). This kinase is a member of the cdk family that controls the cell cycle (see 27 Cell cycle and growth regulation). P-TEFb acts on the CTD, to phosphorylate it further. We do not yet understand why this effect is required at some promoters but not others or how it is regulated.

The CTD may coordinate processing of RNA with transcription. The capping enzyme (guanylyl transferase), which adds the G residue to the 5 end of newly synthesized mRNA, binds to the phosphorylated CTD: this may be important in enabling it to modify the 5 end as soon as it is synthesized. Some splicing factors bind to the CTD and so do some components of the cleavage/polyadenylation apparatus, suggesting that it may be a general focus for connecting other processes with transcription.

The general process of initiation is similar to that catalyzed by bacterial RNA polymerase. Binding of RNA polymerase generates a closed complex, which is converted at a later stage to an open complex in which the DNA strands have been separated. In the bacterial reaction, formation of the open complex completes the necessary structural change to DNA; a difference in the eukaryotic reaction is that further unwinding of the template is needed after this stage.

This section updated 4-30-2000


Reviews

948:

Price, D. H. (2000). P-TEFb, a cyclin dependent kinase controlling elongation by RNA polymerase II. Mol. Cell Biol. 20, 2629-2634.

Research

650:

Goodrich, J. A. and Tjian, R. (1994). Transcription factors IIE and IIH and ATP hydrolysis direct promoter clearance by RNA polymerase II. Cell 77, 145-156.

946:

Holstege, F. C., van der Vliet, P. C., and Timmers, H. T. (1996). Opening of an RNA polymerase II promoter occurs in two distinct steps and requires the basal transcription factors IIE and IIH. EMBO J. 15, 1666-1677.


Figure 20.20
Phosphorylation of the CTD by the kinase activity of TFIIH may be needed to release RNA polymerase to start transcription.

20.11 A connection between transcription and repair

Key Concepts

· Transcribed genes are preferentially repaired when DNA damage occurs.

· TFIIH provides the link to a complex of repair enzymes.

· Mutations in the XPD component of TFIIH cause three types of human diseases

In both bacteria and eukaryotes, there is a direct link from RNA polymerase to the activation of repair. The basic phenomenon was first observed because transcribed genes are preferentially repaired. Then it was discovered that it is only the template strand of DNA that is the target—the nontemplate strand is repaired at the same rate as bulk DNA.

In bacteria, the repair activity is provided by the uvr excision-repair system (see 14 Recombination and repair). Preferential repair is abolished by mutations in the gene mfd, whose product provides the link from RNA polymerase to the Uvr enzymes (for review see 224).

Figure 20.21 shows a model for the link between transcription and repair. When RNA polymerase encounters DNA damage in the template strand, it stalls because it cannot use the damaged sequences as a template to direct complementary base pairing. This explains the specificity of the effect for the template strand (damage in the nontemplate strand does not impede progress of the RNA polymerase).

The Mfd protein has two roles. First, it displaces the ternary complex of RNA polymerase from DNA. Second, it causes the UvrABC enzyme to bind to the damaged DNA. This leads to repair of DNA by the excision-repair mechanism (see Figure 14.28). After the DNA has been repaired, the next RNA polymerase to traverse the gene is able to produce a normal transcript (661).

A similar mechanism, although relying on different components, is used in eukaryotes. The template strand of a transcribed gene is preferentially repaired following UV-induced damage. The general transcription factor TFIIH is involved. TFIIH is found in alternative forms, which consist of a core associated with other subunits.

TFIIH has a common function in both initiating transcription and repairing damage. The same helicase subunit (XPD) creates the initial transcription bubble and melts DNA at a damaged site. Its other functions differ between transcription and repair, as provided by the appropriate form of the complex.

Figure 20.22 shows that the basic factor involved in transcription consists of a core (of 5 subunits) associated with other subunits that have a kinase activity; this complex also includes a repair subunit. The kinase catalytic subunit that phosphorylates the CTD of RNA polymerase belongs to a group of kinases that are involved in cell cycle control (see 27 Cell cycle and growth regulation). It is possible that this connection influences transcription in response to the stage of the cell cycle.

The alternative complex consists of the core associated with a large group of proteins that are coded by repair genes. These include a subunit (XPC) that recognizes damaged DNA, which provides the coupling function that enables a template strand to be preferentially repaired when RNA polymerase becomes stalled at damaged DNA. Other proteins associated with the complex include endonucleases (XPG, XPF, ERCC1). Homologous proteins are found in the complexes in yeast (where they are often identified by rad mutations that are defective in repair) and in man (where they are identified by mutations that cause diseases resulting from deficiencies in repairing damaged DNA) (662, 663). (Subunits with the name XP are coded by genes in which mutations cause the disease xeroderma pigmentosum (see 14.22 Eukaryotic repair systems). The basic model for repair is animated in Figure 14.37.

The kinase complex and the repair complex can associate and dissociate reversibly from the core TFIIH. This suggests a model in which the first form of TFIIH is required for initiation, but may be replaced by the other form (perhaps in response to encountering DNA damage). TFIIH dissociates from RNA polymerase at an early stage of elongation (after transcription of ~50 bp); its reassociation at a site of damaged DNA may require additional coupling components.

The repair function may require modification or degradation of RNA polymerase. The large subunit of RNA polymerase is degraded when the enzyme stalls at sites of UV damage. We do not yet understand the connection between the transcription/repair apparatus as such and the degradation of RNA polymerase. It is possible that removal of the polymerase is necessary once it has become stalled (664).

This degradation of RNA polymerase is deficient in cells from patients with Cockayne’s syndrome (a repair disorder). Cockayne’s syndrome is caused by mutations in either of two genes (CSA and CSB), both of whose products appear to be part of or bound to TFIIH. Cockayne’s syndrome is also occasionally caused by mutations in XPD.

Another disease that can be caused by mutations in XPD is trichothiodystrophy, which has little in common with XP or Cockayne’s (it involves mental retardation and is marked by changes in the structure of hair). All of this marks XPD as a pleiotropic protein, in which different mutations can affect different functions. In fact, XPD is required for the stability of the TFIIH complex during transcription, but the helicase activity as such is not needed. Mutations that prevent XPD from stabilizing the complex cause the severe disease of trichothiodystrophy. The helicase activity is required for the repair function. Mutations that affect the helicase activity cause the repair deficiency that results in XP or Cockayne’s syndrome (for review see 1641).

This section updated 4-30-2001


Reviews

224:

Selby, C. P. and Sancar, A. (1994). Mechanisms of transcription-repair coupling and mutation frequency decline. Microbiol. Rev. 58, 317-329. Abstract

Research

661:

Selby, C. P. and Sancar, A. (1993). Molecular mechanism of transcription-repair coupling. Science 260, 53-58. Abstract

662:

Schaeffer, L. et al. (1993). DNA repair helicase: a component of BTF2 (TFIIH) basic transcription factor. Science 260, 58-63. Abstract

663:

Svejstrup, J. Q. et al. (1995). Different forms of TFIIH for transcription and DNA repair: holo-TFIIH and a nucleotide excision repairosome. Cell 80, 21-28. Abstract

664:

Bregman, D. et al. (1996). UV-induced ubiquitination of RNA polymerase II: a novel modification deficient in Cockayne syndrome cells. Proc. Nat. Acad. Sci. USA 93, 11586-11590. Abstract

1641:

Lehmann, A. R. (2001). The xeroderma pigmentosum group D (XPD) gene: one gene, two functions, three diseases. Genes Dev. 15, 15-23. Abstract


Figure 20.21
Mfd recognizes a stalled RNA polymerase and directs DNA repair to the damaged template strand.


Figure 20.22
The TFIIH core may associate with a kinase at initiation and associate with a repair complex when damaged DNA is encountered.

 

20.12 Short sequence elements bind activators

Key terms

CAAT box is part of a conserved sequence located upstream of the startpoints of eukaryotic transcription units; it is recognized by a large group of transcription factors.

Key Concepts

· Short conserved sequence elements are dispersed in the region preceding the startpoint.

· The upstream elements increase the frequency of initiation.

· The factors that bind to them to stimulate transcription are called activators.

A promoter for RNA polymerase II consists of two types of region. The startpoint itself is identified by the Inr and/or by the TATA box close by. In conjunction with the general transcription factors, RNA polymerase II forms an initiation complex surrounding the startpoint, as we have just described. The efficiency and specificity with which a promoter is recognized, however, depend upon short sequences, farther upstream, which are recognized by a different group of factors, usually called activators. Usually the target sequences are ~100 bp upstream of the startpoint, but sometimes they are more distant. Binding of activators at these sites may influence the formation of the initiation complex at (probably) any one of several stages.

An analysis of a typical promoter is summarized in Figure 20.23. Individual base substitutions were introduced at almost every position in the 100 bp upstream of the -globin startpoint. The striking result is that most mutations do not affect the ability of the promoter to initiate transcription. Down mutations occur in three locations, corresponding to three short discrete elements. The two upstream elements have a greater effect on the level of transcription than the element closest to the startpoint. Up mutations occur in only one of the elements. We conclude that the three short sequences centered at –30, –75, and –90 constitute the promoter. Each of them corresponds to the consensus sequence for a common type of promoter element.

The TATA box (centered at –30) is the least effective component of the promoter as measured by the reduction in transcription that is caused by mutations. But although initiation is not prevented when a TATA box is mutated, the startpoint varies from its usual precise location. This confirms the role of the TATA box as a crucial positioning component of the core promoter.

The basal elements and the elements upstream of them have different types of functions. the basal elements (the TATA box and Inr) primarily determine the location of the startpoint, but can sponsor initiation only at a rather low level. They identify the location at which the general transcription factors assemble to form the basal complex. The sequence elements farther upstream influence the frequency of initiation, most likely by acting directly on the general transcription factors to enhance the efficiency of assembly into an initiation complex (see later).

The sequence at –75 is the CAAT box. Named for its consensus sequence, it was one of the first common elements to be described. It is often located close to –80, but it can function at distances that vary considerably from the startpoint. It functions in either orientation. Susceptibility to mutations suggests that the CAAT box plays a strong role in determining the efficiency of the promoter, but does not influence its specificity.

The GC box at –90 contains the sequence GGGCGG. Often multiple copies are present in the promoter, and they occur in either orientation. It too is a relatively common promoter component.


Figure 20.23
Saturation mutagenesis of the upstream region of the -globin promoter identifies three short regions (centered at -30, -75, and -90) that are needed to initiate transcription. These correspond to the TATA, CAAT, and GC boxes.

 

20.13 Promoter construction is flexible but context can be important

Key Concepts

· No individual upstream element is essential for promoter function, although one or more elements must be present for efficient initiation.

· Some elements are recognized by multiple factors, and the factor that is used at any particular promoter may be determined by the context of the other factors that are bound.

Promoters are organized on a principle of "mix and match." A variety of elements can contribute to promoter function, but none is essential for all promoters. Some examples are summarized in Figure 20.24. Four types of element are found altogether in these promoters: TATA, GC boxes, CAAT boxes, and the octamer (an 8 bp element). The elements found in any individual promoter differ in number, location, and orientation. No element is common to all of the promoters. Aalthough the promoter conveys directional information (transcription proceeds only in the downstream direction), the GC and CAAT boxes seem to be able to function in either orientation. This implies that the elements function solely as DNA-binding sites to bring transcription factors into the vicinity of the startpoint; the structure of a factor must be flexible enough to allow it to make protein-protein contacts with the basal apparatus irrespective of the way in which its DNA-binding domain is oriented and its exact distance from the startpoint.

Activators that are more or less ubiquitous are assumed to be available to any promoter that has a copy of the element that they recognize. Common elements that they recognize include the CAAT box, GC box, and the octamer. All promoters probably require one or more of these elements in order to function efficiently. An activator typically has a consensus sequence of <10 bp, but actually covers a length of ~20 bp of DNA. Given the sizes of the activators, and the length of DNA each covers, we expect that the various proteins will together cover the entire region upstream of the startpoint in which the elements reside.

Most usually a particular consensus sequence is recognized by a corresponding activator (or by a member of a family of factors). However, sometimes a particular promoter sequence can be recognized by one of several activators. A ubiquitous activator, Oct-1, binds to the octamer to activate the histone H2B (and presumably also other) genes. Oct-1 is the only octamer-binding factor in nonlymphoid cells. But in lymphoid cells, a different activator, Oct-2, binds to the octamer to activate the immunoglobulin light gene. So Oct-2 is a tissue-specific activator, while Oct-1 is ubiquitous. The exact details of recognition are not so important as the fact that a variety of activators recognize CAAT boxes.

The use of the same octamer in the ubiquitously expressed H2B gene and the lymphoid-specific immunoglobulin genes poses a paradox. Why does the ubiquitous Oct-1 fail to activate the immunoglobulin genes in nonlymphoid tissues? The context must be important: Oct-2 rather than Oct-1 may be needed to interact with other proteins that bind at the promoter. These results mean that we cannot predict whether a gene will be activated by a particular activator simply on the basis of the presence of particular elements in its promoter.

A pertinent question in considering transcription in vitro is that the template exists as an accessible DNA molecule. In vivo it is organized into nucleosomes, which suggests that its recognition by RNA polymerase is subject to different constraints. This may influence the geometry of the interactions of activators with DNA, with one another, and with RNA polymerase. To investigate the formation of an active transcription complex in natural circumstances, we need really to use a template consisting of DNA assembled into chromatin rather than free DNA.


Figure 20.24
Promoters contain different combinations of TATA boxes, CAAT boxes, GC boxes, and other elements.

20.14 Enhancers contain bidirectional elements that assist initiation

Key terms

Enhancer element is a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter.

Key Concepts

· An enhancer activates the nearest promoter to it, and can be any distance either upstream or downstream of the promoter.

· A UAS (upstream activator sequence) in yeast behaves like an enhancer but works only upstream of the promoter.

· Similar sequence elements are found in enhancers and promoters.

· Enhancers form complexes of activators that interact directly or indirectly with the promoter.

We have considered the promoter so far as an isolated region responsible for binding RNA polymerase. But eukaryotic promoters do not necessarily function alone. In at least some cases, the activity of a promoter is enormously increased by the presence of an enhancer, which consists of another group of elements, but located at a variable distance from those regarded as comprising part of the promoter itself (665; for review see 219).

The concept that the enhancer is distinct from the promoter reflects two characteristics. The position of the enhancer relative to the promoter need not be fixed, but can vary substantially. Figure 20.25 shows that it can be either upstream or downstream. And it can function in either orientation (that is, it can be inverted) relative to the promoter. Manipulations of DNA show that an enhancer can stimulate any promoter placed in its vicinity. In natural genomes, enhancers can be located within genes (that is, just downstream of the promoter) or tens of kilobases away in either direction.

For operational purposes, it is sometimes useful to define the promoter as a sequence or sequences of DNA that must be in a (relatively) fixed location with regard to the startpoint. By this definition, the TATA box and other upstream elements are included, but the enhancer is excluded. This is, however, a working definition rather than a rigid classification.

Elements analogous to enhancers, called upstream activator sequences (UAS), are found in yeast. They can function in either orientation, at variable distances upstream of the promoter, but cannot function when located downstream. They have a regulatory role: in several cases the UAS is bound by the regulatory protein(s) that activates the genes downstream.

Reconstruction experiments in which the enhancer sequence is removed from the DNA and then is inserted elsewhere show that normal transcription can be sustained so long as it is present anywhere on the DNA molecule. If a -globin gene is placed on a DNA molecule that contains an enhancer, its transcription is increased in vivo more than 200-fold, even when the enhancer is several kb upstream or downstream of the startpoint, in either orientation. We have yet to discover at what distance the enhancer fails to work.


Reviews

219:

Muller, M. M., Gerster, T., and Schaffner, W. (1988). Enhancer sequences and the regulation of gene transcription. Eur. J. Biochem. 176, 485-495. Abstract

Research

665:

Banerji, J., Rusconi, S., and Schaffner, W. (1981). Expression of -globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299-308. Abstract


Figure 20.25
An enhancer can activate a promoter from upstream or downstream locations, and its sequence can be inverted relative to the promoter,