Skip to content


Turning genes into people is more complicated than we thought

Meta

First, I would like to welcome you to Newton’s Concussion. I’m really excited to start this discussion and I hope that everyone will join in, comment, and have a fun and interesting discussion. My posts will be relatively short and sweet and will attempt to take some complicated science and reduce it to a level of broad audience readability. I will try to keep the number of science-y terms to a minimum, but will clearly define any that need to be there for clear comprehension of the material. My reviews will concentrate on the articles’ basic elements and important findings and will rarely get into the specific details of the research. I’ll always link to the source material so that those with access can read the article if they wish (I’d post the actual article, but I’m pretty sure that violates copywrite). So, without further ado, here is my first review.

Introduction

Badis, et. al published an article in the June 26, 2009 edition of Science magazine entitled, Diversity and Complexity in DNA recognition by Transcription Factors. This paper was particularly interesting because it sheds some light on the complexity of the process of gene transcription, a topic that has been poorly understood to date. Naturally, it turns out that our understanding of the process was vastly oversimplified.

Background

First, allow me to provide from background information for those of you who aren’t biologists or don’t remember freshman biology. DNA, as most of you know, is the basic genetic material in a cell. It contains the code for everything that makes you who and what you are. Different cell types (i.e – skin cells, brain cells, etc) all have the same compliment of DNA. What makes the cells different from each other is the compliment of genes that are active or dormant in that cell type. There are many ways in which a gene can be “turned off” in a cell. One way is to wrap the section of DNA that contains a specific gene up into a tight little ball around proteins called histones so that it can’t be accessed, effectively turning the gene off. There are also more intermediate methods of regulating genes. Rather than being turned completely on or off, genes can be more or less active instead. For example, different subtypes of cells may have a similar active gene set, but perform slightly different functions by controlling how much of the product of each gene is being made at any one time. This process is called gene expression, and much of that job is controlled by transcription factors, which are the topic of this article.

For any protein to be made within a cell, several steps have to happen. First, the DNA has to be transcribed into an RNA sequence, which then has to be exported from the nucleus to the cytoplasm of the cell (for those that don’t remember the cell structure of eukaryotes, read here). The RNA is then translated into a string of amino acids called a peptide, which is then folded into a functional protein or enzyme that performs a particular function within the cell. This is a grossly simplistic explanation, but it will suffice for this discussion.

In order for any cell to react to changing environments, conditions, or functions, it must be able to control how much or what kind of proteins are being made. Through a very complex set of chemical signals activated within cells by sensor proteins on the cell surface or within the cell (receptors), transcription factors (the topic of this paper) are either turned on (activated) or turned off (repressed). Transcription factors are generally DNA binding proteins that lie dormant in the cytoplasm of a cell. Upon activation, they are transported into the nucleus where they seek out the sequence of DNA that they recognize and attach themselves (bind) to it. In binding to their target, transcription factors recruit the protein complexes that transcribe DNA into RNA (see gene transcription above) to the gene that they are bound to and begin the process of transcription. A few minutes later (generally), a new functional protein has been created.

Now, it is important to note that genes are made up of several different kinds of sequences: promoters, exons, and introns are the most basic sections. Exons contain the information that will become the active protein. Introns do not contain coding information, but function in a regulatory fashion (until recently, introns were considered “junk” DNA that served no purpose and were leftovers from ancient ancestors that no longer exist, but that notion continues to be challenged with new research). Finally, promoters are short DNA sequences that sit at the very beginning of the gene. It is these promoter sequences that transcription factors bind to.

Review

Despite the obvious importance of transcription factors, the current level of technology has limited the number of transcription factors that have had all of the sequences that they bind to characterized. Most of these data have been obtained using chromatin immunoprecipitation (ChIP-chip) experiments. In this paper, Badis, et. al. set out to vastly increase our understanding of transcription factor binding sites using a relatively new microarray platform that they invented called the universal protein binding microarray (PBM). This technology affixes all possible DNA sequences for a given k-mer (where k is the number of nucleotides in the sequence) to the surface of a glass slide. Each sequence position is known, and when a purified DNA binding protein that has a visible label attached to it is incubated with the slide, the sequences that the protein can bind to will light up under a scanner and the sequences that the protein bound to can be analyzed using sophisticated software.

Without getting into the hardcore details of the various classes of transcription factors and their potential binding sites (you can read the paper for that), Badis, et. al. did discover something really interesting. Transcription factors are classified into groups based on similar known or predicted DNA binding motifs (sequences that they’ll bind to). What they found in this paper is that, while the known high-affinity sequences (sequences that the proteins bind strongly to) are shared between all of the members of a transcription factor class, the individual proteins preferentially bind to alternate, low-affinity sequences (See Fig 2b below).

fig2b_badis_science

The above figure is a scatter plot showing the binding affinities of the transcription factors lrf4 and lrf5. Each dot on the graph represents the binding affinity of both lrf4 and lrf5 to that sequence of DNA. The highlighted dots on the graph correspond to the the high-affinity binding sequences (red), which are common to the two represented transcription factors, and two lower affinity DNA sequences shown in blue and yellow. The strongest binding is represented by the upper-right hand corner of the graph, and the weakest binding by the lower-left hand corner. Dots that fall above the imaginary diagonal line between those corners mean that binding for lrf4 (x-axis) is stronger than lrf5 (y-axis), and the opposite is true for those below that line.

As you can see, there are several very high-affinity (red) sequences that both transcription factors can bind to. However, the graphs also show that there are an even greater number of low-affinity binding sequences for each of the proteins, and that the represented transcription factors don’t share any of those sequences in common (blue, yellow). This finding indicates that the transcription factors bind more often to the low-affinity binding sites, and suggests that members of the same class of transcription factors may not bind to the same genes at all, under normal conditions. This pattern of low-affinity preference was found to be prevalent throughout the 104 transcription factors that were analyzed in this study.

Impressions

This article caught my attention for a lot of reasons, the main one being that this analysis represents a significant improvement in our understanding of how transcription factors work and the diversity of genes that a single transcription factor can influence. The number of targets for each factor is enormous, and the fact that the low-affinity binding sites seem to be the preferential targets is very counter-intuitive in the field of molecular biology. It is always assumed that the intended binding partner for a molecule is the molecule that binds most strongly to it, not the other way around, as is apparent in this study.

Having a clear understanding of how transcription factors work has huge implications for things like drug development for every field that I can imagine. Many newer drugs are targeting various transcription factors in an effort to fundamentally alter the function of a cell in order to reduce the disease or condition that is being targeted. Many of the drugs have likely been developed with high-affinity targets in mind, which are likely not the main targets of the transcription factors being altered. And, even if the drug is successful in changing the transcription of the genes that the drug is targeting, the large number of low-affinity/preferential binding sites indicates that the number of secondary effects of suppressing that transcription factor is likely much higher than originally anticipated, leading to a large number of drug side-effects. This is a real downside to a field that was working on making drugs more specific with fewer side-effects. The good news is that Martha Bulyk at Harvard University (a lead scientist on this project and a corresponding author on the paper) is placing all of these binding sequences into a database called UniProbe for public use, and it looks like they continue to gather information on more and more transcription factors.

This paper is a great example of people creating new technologies to help solve problems that are fundamental to our understanding of how life on earth works. I thought this was a great bit of work that will be very useful to a lot of people.

Posted in Basic Science, Molecular Biology.