Re: The Lamb Center 3220 Old Lee Highway
Posted by:
hott topix
()
Date: November 27, 2011 11:31AM
2.2 Information
As revealed in the history of molecular biology, the language of information is used ubiquitously by molecular biologists. Genes as linear DNA sequences of bases are said to carry “information” for the production of proteins. During protein synthesis, the information is “transcribed” from DNA to messenger RNA and then “translated” from RNA to protein. During DNA replication, and subsequent inheritance, it is often said that what is passed from one generation to the next is the “information” in the genes, namely the linear ordering of bases along complementary DNA strands. Historians of biology have tracked the entrenchment of information-talk in molecular biology, and philosophers of biology have questioned whether a definition of “information” can be provided that adequately captures its usage in the field.
According to the historian Lily Kay, “Up until around 1950 molecular biologists…described genetic mechanisms without ever using the term information” (Kay 2000, 328). “Information” replaced earlier talk of biological “specificity.” Watson and Crick's second paper of 1953 (1953b), which discussed the genetical implications of their recently discovered double-helical structure of DNA (1953a), announced: “…it therefore seems likely that the precise sequence of the bases is the code which carries the genetical information…” (Watson and Crick 1953b, 244, emphasis added).
In 1958, Francis Crick used and characterized the concept of information in the context of stating what he called the central dogma of molecular biology. Crick characterized the central dogma as follows:
This states that once ‘information’ has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein. (Crick 1958, 152–153, emphasis in original)
Note that, as characterized by Crick, information was not static in the way that, say, coded words on a page were static. Instead, Crick's characterization of information was dynamic; that is, it required a mechanism operating to carry out a task, i.e., “the precise determination of sequence” (Darden 2006a, 2006b). Crick also distinguished three different kinds of transfer or flow in the mechanism of protein synthesis: flow of information, flow of matter, and flow of energy.
As a molecular biologist, Crick explicitly focused his attention on flow of information, and not on flow of matter or energy. He discussed biochemical work dealing with matter and energy flow. Again we see one of the primary differences between molecular biology and biochemistry: molecular biology was concerned with genetic information and its role in protein synthesis. Crick emphasized that the nucleic acid sequences determined amino acid sequences and not vice versa. In 1958, it was still an open question how protein synthesis and nucleic acid synthesis operated. Crick's statement about the direction of information flow denied that amino acid sequence could determine the sequence of nucleic acid bases: the flow was one way, from genetic information to protein, but not back.
The central dogma did not go unchallenged. In 1970, an anonymous article in Nature, entitled “Central Dogma Reversed,” discussed the implications of the newly discovered enzyme, reverse transcriptase (Baltimore 1970; Temin and Mizutani 1970). In some viruses, whose genetic material was RNA, this enzyme was found to copy RNA into DNA, which was then inserted into the host genome. This reversal, the article claimed, challenged the “cardinal tenet of molecular biology” that “the flow or transcription of genetic information from DNA to messenger RNA and then its translation to protein is strictly one way” (Anonymous 1970, 1198). Crick (1970) responded that his statement of the central dogma had not been challenged by this finding. The principle problem to which the central dogma was addressed was the finding of “general rules for information transfer from one polymer [a long chain molecule] to another” (Crick 1970, 561). Crick pointed out that the denial of flow of information from proteins back to nucleic acid still held, even if the nucleic acid RNA could be complementarily copied to the nucleic acid DNA. A narrow scope mechanism schema, with a reverse arrow from RNA to DNA, was added to the more widely found DNA to RNA one, qualifying Watson's diagrammatic representation (discussed in Darden 1995).
In addition to its use in seminal papers, “information” can be found throughout textbooks in the field. Information is said to be “transferred” from DNA to RNA templates to proteins during protein synthesis (e.g., Watson 1965, 297). “Nucleic Acids Convey Genetic Information” is a chapter title in Watson et al.'s (1988, Ch. 3) textbook, Molecular Biology of the Gene. Again: “The translation of genetic information from the 4-letter alphabet of polynucleotides into the 20-letter alphabet of proteins is a complex process” (Alberts et al. 2002, 8).
It is important not to confuse the genetic code and genetic information. The genetic code refers to the relation between three bases of DNA, called a “codon,” and one amino acid. Tables available in molecular biology textbooks show the relation between 64 codons and 20 amino acids. For example, CAC codes for histidine. (For the table of the genetic code see, e.g., Watson et al. 1988, frontispiece, or in the entry on biological information.) Only a few exceptions for these coding relations have been found, in a few anomalous cases (see the list in a small table in Alberts et al. 2002, 814). In contrast, genetic information refers to the linear sequence of codons along the DNA, which (in the simplest case) are transcribed to messenger RNA, which are translated to linearly order the amino acids in a protein. Many exceptions to the colinearity hypothesis have been found (as discussed in Section 1.2 above), such as in split and overlapping genes.
The information concept (with its associated concepts of code, transcription, translation, reading frame, etc.) has indisputably played a major role in the history of molecular biology. The question for philosophers of biology is whether an analysis of the concept of information can capture this role. The usage of “information” in the mathematical theory of communication is too impoverished to capture the molecular biological usage (for critiques see Sarkar 1996b, 1996c; Sterelny and Griffiths 1999, 101–104) and the usage in cognitive neuroscience, with its talk of “representations” (e.g., Crick 1988, 154–155) may be said to be too rich. The coded sequences in the DNA are more than just a signal with some number of bits that may or may not be accurately transmitted, yet they are not said to have within them a representation of the structure of the protein. (The way in which the linear order of amino acids in a protein determine its three dimensional structure is still an unsolved problem; however, even if these rules were known, it is doubtful that molecular biology would use the language of “representation.”) No definition of “information” as it is used in molecular biology has yet received wide support among philosophers of biology.
Stephen Downes distinguishes three positions on the relation between information and the natural world:
Information is present in DNA and other nucleotide sequences. Other cellular mechanisms contain no information.
Information is present in DNA, in other nucleotide sequences and other cellular mechanisms, for example cytoplasmic or extra-cellular proteins; and in many other media, for example, the embryonic environment or components of an organism's wider environment.
DNA and other nucleotide sequences do not contain information, nor do any other cellular mechanisms. (Downes 2006)
These options may be read either ontologically or heuristically. A heuristic reading of (1), for instance, views the talk of information in molecular biology as useful in providing a way of talking and in guiding research (Downes 2006). And so the heuristic benefit of the information concept can be defended without making any commitment to the ontological status (Sarkar 2000). Indeed, one might argue that a vague and open-ended use of information is valuable for heuristic purposes, especially during early discovery phases in the development of a field (for a similar discussion of the gene-concept, see Section 2.3).
Philosophers' discussions of the concept of information in biology has not sought its heuristic usage in discovery contexts but instead focus on its ontological reading. Three different philosophical accounts of information serve as exemplars of Downes' three categories. The goal is to see examples of differing positions on the issue of (a) whether DNA carries information from the perspective of its use in molecular biological mechanisms, such as DNA replication, transcription of messenger RNA, and protein synthesis, or, more broadly, (b) whether genes carry information for producing a phenotypic trait.
Take Downes' third category first. Kenneth Waters argues that information is a useful term in rhetorical contexts, such as seeking funding for DNA sequencing by claiming that DNA carries information. However, from an ontological perspective, Waters claims that explication of DNA's causal role has no need for the concept of information. Genes, he argues, should not be viewed as “immaterial units of information” (Waters 2000, 541). As discussed in Section 2.3 below, Waters' focus is on stretches of DNA whose causal roles are as actual specific difference makers in genetic mechanisms. On the unique causal role played by DNA sequences, as opposed to, say, different enzymes for synthesizing RNA, Waters says: “DNA is a specific difference maker in the sense that different changes in the sequence of nucleotides in DNA would change the linear sequence in RNA molecules in many different and very specific ways. RNA polymerase does not have this specificity. Intervening on RNA polymerase might slow down or stop synthesis of a broad class of RNA molecules, but it is not the case that many different kinds of interventions on RNA polymerase would change the linear sequence in RNA molecules in many different and very specific ways. This shows that DNA is a causally specific potential difference maker” (Waters 2007, Section 8). Talk of information is not needed; causal role function talk is sufficient. (For more on Waters' view see his entry on molecular genetics; for others who make similar points, see Sustar 2007; Weber 2005; 2006.)
Eva Jablonka (2002) is an example of Downes' second category. She argues that information is ubiquitous. She defines information as follows: a source becomes an informational input when an interpreting receiver can react to the form of the source (and variations in this form) in a functional manner. She claims a broad applicability of this definition. The definition, she says, accommodates information stemming from environmental cues as well as from evolved signals, and calls for a comparison between information-transmission in different types of inheritance systems — the genetic, the epigenetic, the behavioral, and the cultural-symbolic. Although her goal is to find a very general definition, the focus here will be on how well her definition applies to DNA as a source of information. She stresses the importance of organization in the source and the order of bases is certainly crucial to the information carried by DNA. She also notes that variations in the source lead to variations in the form of the response. Her example is from the very broad perspective of molecular developmental biology: “variations in DNA lead to variations in development.” One may add that in limiting discussion to DNA replication and protein synthesis, the same is true: variations in DNA base sequences (may) produce variations in the products produced (ignoring degeneracy of the genetic code in which some different codons still code for the same amino acid, a point that Jablonka does not make). She stresses, as did Crick, that what flows is information, neither matter nor energy. However, Jablonka's emphasizes the evolution of the “interpreting system of the receiver.” Presumably, for the molecular biology case, this is the machinery for translating messenger RNA into the linear order of amino acids in the protein, if the protein is to be considered a receiver. Although the evolution of the “interpreting system” (including ribosomes and transfer RNAs) was required for information in the DNA to be read, that is not typically the focus for understanding information in DNA. Nor does the protein produced during the reading of the coded sequence seem to be appropriately called the “receiver” of the information. (The term “receiver” applies much better to her case of the ape that interprets the dark sky as information about an approaching storm, which raises questions about the evolution of such ability in the ape.) On this view, as she explicitly claims, genes have no theoretically privileged informational status (Jablonka 2002, 583).
Downes' first category applies specifically to the usage of information in molecular biology: information is present in DNA and other nucleotide sequences but not in other cellular mechanisms. With a bit of a stretch, Ulrich Stegmann (2005) provides an example with his analysis of template-directed synthesis. Stegmann does explicitly allow that components other than nucleotide sequences might contain what he calls instructional information. However, his only example is a thought experiment involving enzymes linearly ordered along a membrane; nothing of the sort is known to actually exist or even seems very likely to exist. Furthermore, his analysis, with this caveat, explicates in a precise way the concept of information that has played such an important role in molecular biology, namely Crick's (1958) “the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.” Stegmann calls this the sequentialization view. Stegmann's instructional account of genetic information requires that the component carrying the information satisfy the following conditions: an advance specification of the kind and order of steps that yield a certain outcome if the steps are carried out. On his account, DNA qualifies as an instructional information carrier for replication, transcription and translation. The sequence of bases provides the order. The hydrogen bonding between specific bases and the genetic code provide the specific kinds of steps. Stegmann clearly distinguishes sequentialization from coding. DNA carries information for the sequence of bases during DNA replication and during transcription of messenger RNA, namely for the order of the nucleic acid bases. The genetic code, in contrast, provides the kind of relation between a codon (three such bases) and a specific amino acid during the translation of mRNA during protein synthesis. The mechanisms of replication, transcription and translation yield certain outcomes: a copy of the DNA helix, an mRNA, and (in bacteria, with no splicing and editing of mRNA) a linear order of amino acids. The requirement of advance specification explicates the idea that DNA stores information that might not be in current use and aids in distinguishing those mechanisms with a flow of information from those with none, such as the Krebs cycle. Because DNA carries information about an outcome, his instructional account qualifies as an intentional (something is about something else) view. (Stegmann (2009) argues that Sarkar's (2005) semiotic account does not adequately account for such intentionality.) Also, because DNA carries information for a specific outcome, an error can occur as the mechanism operates to produce that outcome; hence Stegmann's account allows for errors and error-correcting mechanisms (such as proof reading mechanisms that correct DNA mutations).
Stegmann's instructional account of genetic information seems to be the best so far proposed to capture Crick's (1958) usage that has played a role in molecular biology's claim that DNA sequences carry genetic information. He explicitly notes that his analysis of genetic information applies only to DNA's sequentialization role, not on the issue of whether DNA carries information for phenotypic traits, which involve numerous other causal factors in the mechanisms that produce them. (For a similar view of the role of the informational framework in foregrounding sequence properties, see Godfrey-Smith 2007.)
Philosophical work continues, first, to find an adequate characterization of “information” as it is used in molecular biology; second, to distinguish mechanisms in which information is said to be transferred (such as DNA replication and protein synthesis) from those in which it is not (such as many metabolic reactions); and third, to answer the question of whether something appropriately called “information” is to be found in molecules and mechanisms.
For more on information, see the entry on biological information.
2.3 Gene
The question of whether classical genetics could be (or already has been) reduced to molecular biology (to be taken up below) motivated philosophers to consider the connectibility of the term they shared: the gene. Investigations of reduction and scientific change raised the question of how the concept of the gene evolved over time, figuring prominently in C. Kenneth Waters' (1990, 1994, 2007, see entry on molecular genetics), Philip Kitcher's (1982, 1984) and Raphael Falk's (1986) work. Over time, however, philosophical discussions of the gene concept took on a life of their own, as philosophers raised questions independent of the reduction debate: What is a gene? And, is there anything causally distinct about DNA?
Falk (1986) explicitly asked philosophers and historians of biology, “What is a Gene?” Falk drew on Kenneth MacCorquodale and Paul E. Meehl's distinction between quantities that can be obtained by manipulating values of empirical variables without hypothesizing the existence of unobserved entities or processes (dubbed “intervening variables”) and concepts which assert the existence of entities and the occurrence of events not reducible to the observable (dubbed “hypothetical constructs”) (MacCorquodale and Meehl 1948). Employing this distinction, Falk claimed that the gene began as an intervening variable but morphed into a hypothetical construct with Morgan's chromosomal theory of inheritance and then with molecular biology, when the gene became equated with a sequence of DNA.
Discoveries such as overlapping genes, split genes, and alternative splicing (discussed in Section 1.2) made it clear that simply equating a gene with an uninterrupted stretch of DNA would no longer capture the complicated molecular-developmental details of mechanisms such as gene expression (Downes 2004). In light of the enormous complexity found in the process of moving from a stretch of DNA to a protein product, Falk's (1986) question persists: What is a gene? Two general trends have emerged in the philosophical literature to answer this question and to accommodate the molecular-developmental phenomena: first, distinguish multiple gene concepts to capture the complex structural and functional features separately, or second, rethink a unified gene concept to incorporate such complexity.
A paradigmatic example of the first line came from Lenny Moss's distinction between Gene-P and Gene-D (Moss 2001, 2002). Gene-P embraced an instrumental preformationism (providing the “P”); it was defined by its relationship to a phenotype. In contrast, Gene-D referred to a developmental resource (providing the “D”); it was defined by its molecular sequence. An example will help to distinguish the two: Cystic fibrosis is one of the most common genetic diseases affecting populations of Western European descent. The disease results from an abnormality in cellular membrane proteins that function to transport chloride between cells and the extracellular fluid (for an overview of this research, see Collins 1992). Individuals receive two copies of the cystic fibrosis transmembrane conductance regulator (CFTR) gene, one from each parent. If an individual receives two mutated copies of this gene, then they will lack the resources necessary to transport chloride, and an imbalance in extracellular chloride will result, generating the tell-tale mucus that coats victims' cells, potentially generating deadly infections. When one talked about the gene for cystic fibrosis, the Gene-P concept was being utilized; the concept referred to the ability to track the transmission of this gene from generation to generation as an instrumental predictor of cystic fibrosis, without being contingent on knowing the causal pathway between the particular sequence of DNA and the ultimate phenotypic disease. The Gene-D concept, in contrast, referred instead to just one developmental resource (i.e., the molecular sequence) involved in the complex development of the disease, which interacted with a host of other such resources (proteins, RNA, a variety of enzymes, etc.); Gene-D was indeterminate with regards to the ultimate phenotypic disease. Moreover, in cases of other diseases where there are different disease alleles at the same locus, a Gene-D perspective would treat these alleles as individual genes, while a Gene-P perspective treats them collectively as “the gene for” the disease. (For another example of a gene-concept divider, see Keller's distinction between the gene as a structural entity and the gene as a functional entity (Keller 2000, 70–72).)
A second philosophical approach for conceptualizing the gene involved rethinking a single, unified gene concept that captured the molecular-developmental complexities. For example, Eva Neumann-Held (Neumann-Held 1999, 2001; Griffiths and Neumann-Held 1999) claimed that a “process molecular gene concept” (PMG) embraced the complicated developmental intricacies. On her unified view, the term “gene” referred to “the recurring process that leads to the temporally and spatially regulated expression of a particular polypeptide product” (Neumann-Held 1999). Returning to the case of cystic fibrosis, a PMG for an individual without the disease referred to one of a variety of transmembrane ion-channel templates along with all the epigenetic factors involved in the generation of the normal polypeptide product. And so cystic fibrosis arose when a particular stretch of the DNA sequence was missing from this process. (For another example of a gene-concept unifier, see Falk's discussion of the gene as a DNA sequence that corresponded to a single norm of reaction for various molecular products based on varying epigenetic conditions (Falk 2001).)
Philosophers and historians of biology have not yet reached a consensus in answer to Falk's (1986) question: what is the gene? This fact has elicited a range of reactions. Rheinberger (2000) agreed that the gene concept was fuzzy but welcomed the imprecision; the gene was fruitful as an object of research in flux because the concept also remained operationally in flux (see Rheinberger and Mueller-Wille's entry on gene). Likewise, Weber (2005) has described the gene as having a “floating reference” that has allowed the concept to evolve over time. Paul Griffiths, meanwhile, in a review of the volume in which Rheinberger's essay appears, deemed the gene concept “Lost” but offered a “Reward to Finder” (Griffiths 2002; Beurton, Falk, and Rheinberger 2000). In fact, Griffiths and Karola Stotz are currently leading a philosophical search party of sorts. The “Representing Genes” project includes a group of philosophers and historians of biology who are attempting to operationalize some of the various philosophical claims about the gene concept discussed above and then test those claims (Griffiths and Stotz 2004, 2006; Stotz, Griffiths, and Knight 2004). The Representing Genes project can be monitored at its website: Representing Genes: Testing Competing Philosophical Analyses of the Gene Concept in Contemporary Molecular Biology.
Relatedly, philosophers have also debated the causal distinctiveness of DNA. Consider again the case of cystic fibrosis. A stretch of DNA on chromosome 7 is involved in the process of gene expression, which generates (or fails to generate) the functional product that transports chloride. But obviously that final product results from that stretch of DNA as well as all the other developmental resources involved in gene expression, be it in the expression of the functional protein or the dysfunctional one. Thus, a number of authors have argued for a causal parity thesis, wherein all developmental resources involved in the generation of a phenotype such as cystic fibrosis are treated as being on par (Griffiths and Knight 1998; Robert 2004); Stotz (2006), in particular, has pointed to the complications of post-genomics to make this point (see Section 1.4 on post-genomics).
Waters (2007, see also his entry on molecular genetics), in reply, has argued that there is something causally distinctive about DNA. Causes are often conceived of as being difference makers, in that a variable (i.e., an entity or activity in a mechanism) can be deemed causal when a change in the value of that variable would counterfactually have led to a different outcome (see the entry on scientific explanation). According to Waters, there are a number of potential difference makers in the mechanisms involved in developing or not developing cystic fibrosis; that is, an individual with two normal copies of the CFTR gene could still display signs of cystic fibrosis if a manipulation was done to the individual's RNA polymerase (the protein responsible for transcribing DNA to RNA), thereby undermining the functional reading of the stretch of DNA. So RNA polymerase is a difference maker in the development or lack of development of cystic fibrosis, but only a potential difference maker, since variation in RNA polymerase is not commonly identified as playing a role in the development or lack of development of cystic fibrosis in natural populations. The stretch of DNA on chromosome 7, however, is an actual difference maker. That is, there are actual differences in natural human populations on this stretch of DNA, which lead to actual differences in developing or not developing cystic fibrosis; the functional stretch of DNA is 230,000 base pairs long and generates a functional protein that is 1,480 amino acids long, but the most common mutation involves a deletion of three nucleotide bases in the stretch of DNA leading to a missing amino acid at the 508th position along the amino acid chain. DNA is causally distinctive, according to Waters, because it is an actual difference maker. Advocates of the parity thesis are thus challenged to identify the other resources (in addition to DNA) that are actual difference makers.
From the mechanistic perspective, Waters' concept of an actual difference maker points to a segment of DNA that actually plays a role in a gene expression mechanism that makes a difference to a phenotypic outcome. That DNA segment might be a coding region, making a difference in the coding of the information for determining the amino acid sequence in a protein, or it might be a regulatory sequence, making a difference in whether a coding region is copied to mRNA or not. The key is to understand how explanations of variation in outcomes are provided by molecular biology in the form of elucidated causal mechanisms that contain actual difference makers. Tabery (2009) refers to these as “difference mechanisms”.