Single nucleotide polymorphisms (SNPs) are point mutations in which one nucleotide is substituted for another at a particular locus. SNPs are the most common type of sequence differences between alleles. These polymorphisms may be used as genetic markers. In the April issue of Current Opinion in Plant Biology (vol. 5:94-100), Anotoni Rafalski (DuPont Crop Genetics, Newark, Delaware,) reviews the current status of our knowledge on SNPs and their application to crop genetics. The author focuses upon methodologies for SNP discovery, applications of SNPs in plant breeding and genetics, and implications of SNP research on association studies in higher plant systems.
Throughout the course of the review, the author defines several relevant terms, the first of which is linkage disequilibrium (LD). Different SNP markers may or may not segregate at random depending on the degree of linkage between them. Non-random segregation of SNPs at different loci is referred to as linkage disequilibrium. A group of alleles of closely linked loci, referred to as a haplotype, can be analyzed for the presence of linkage disequilibrium or linkage equilibrium; SNPs are one type of genetic marker used for this kind of analysis. SNP patterns observed in haplotypes are also useful for associating a particular phenotype with a specific genotype using association studies.
The author begins by describing various technologies that are applied to SNP discovery. These include re-sequencing of PCR amplification products (amplicons) and discovery of ?electronic? SNPs (eSNPs) in shotgun genomic libraries or expressed sequence tag (EST) libraries. Sequencing of DNA fragments is the most direct way to identify SNPs. The method consists of selecting 400-700 bp DNA fragments from genes of interest or ESTs of a population of highly diverse but inbred individuals. The author points out that use of EST sequences has proved to be useful for SNPs discovery in maize. Re-sequencing studies with a set of 502 EST-derived loci from eight elite maize inbreds, covering 400-500 bp per locus, disclosed a difference of one SNP in every 48 base pairs (bp) in the 3′ untranslated regions (UTRs) and one SNP per 130 bp in coding regions. Two hundred and fifteen insertion/deletion (indel) polymorphisms of at least one bp in size were also detected. In soybean, SNP frequency was found to be 1.64 SNPs per kb in coding regions and 4.85 SNPs per kb in non-coding regions. Thirty-three percent of 3? UTRs were found to contain an SNP in this study.
Difficulties are encountered in determining SNP haplotypes when inbred or homozygous individuals are not available. Direct sequencing of PCR products does not allow one to determine the phase of adjacent SNP alleles in heterozygotes. Software applications, such as PolyPhred, have been developed for computational derivation of correct haplotypes.
The author points out that in some species such as rice, rates of polymorphism are low in comparison to the high rate of SNP polymorphism found in maize. Thus, for some species, pre-screening of amplicons may be necessary to determine whether sufficient polymorphism exists to justify further screening for SNPs. Denaturing high-pressure liquid chromatography (dHPLC), single-strand conformational polymorphism (SSCP), or various chemical or enzymatic cleavage methods may be used for pre-screening.
The author notes that, when performing computational screening using genomic libraries to detect SNP polymorphisms, one must ensure that the libraries have been constructed from a diverse set of individuals and that sufficient redundancy exists. Using this approach, it was possible to identify one million SNPs in humans from genomic sequencing data derived from several individuals. In Arabidopsis, Celera Genomics has conducted shotgun genomic sequencing of the Landsberg ecotype. Comparison of these data with the complete sequence data of the Columbia ecotype allowed identification of 37,000 SNPs and 18,500 indels. The author also described the use of maize EST collections for SNP detection.
There are many commercially available assays for SNP genotyping; none has yet emerged as a dominant leader for this application. A high-throughput allele-specific hybridization assay for SNP scoring has been developed in a commercial setting for use in marker-assisted breeding of soybean.
The author discusses the advantages and disadvantages of using SNPs and indels as genetic markers compared to other types of markers. SNPs are easier to work with than simple sequence repeats (SSR) in association studies because SSR alleles of identical size but different evolutionary origins may exist. SNP markers may lend themselves more readily to high-throughput approaches than SSR markers. However, because SNPs are biallelic and expected heterozygosity is low compared to SSRs, SNPs are most useful when several SNP loci are closely positioned and allow haplotype definition. Such groups of SNPs have been called ?haplotype tags?, and have been identified in a study examining flanking sequences of maize microsatellites.
SNPs may also be used to genetically map ESTs in highly polymorphic species such as maize. This involves SNP identification in the 3? UTR of a gene of interest by re-sequencing using parents of a mapping population, followed by using a SNP-scoring procedure to map genotypes of the progeny. The author successfully used this method in maize for scoring the progeny of a cross between the mapping lines B73 and Mo17.
Another application of SNPs is the integration of physical and genetic maps. To do this, BAC (bacterial artificial chromosome) end sequences are screened for absence of repetitive elements and used to identify SNPs polymorphic between the mapping parents. The SNPs are then mapped genetically. This approach has been used in maize, in which 20% of BAC end sequences consist of nonrepetitive sequences.
The availability of inbred lines and a high frequency of polymorphisms allows definitive identification of haplotypes. The degree of linkage disequilibrium (LD), which is determined in part by population history (such as the presence of population bottlenecks) and recombination frequency in a particular genomic region, determines which association mapping method will be best. The author describes two different types of association mapping methods: the candidate gene approach or whole genome scanning approach. The candidate gene approach is used when LD declines rapidly around a causative gene and a high density of markers is required to identify a marker associated with the trait, while whole-genome scans are used when LD declines slowly around a causative gene, and few markers are required to pinpoint a causative gene.
LD increases as a result of inbreeding and population bottlenecks. The author points out that several crop species that were subjected to bottlenecks during domestication, and not surprisingly show extensive LD. For example, in soybean, four-fifths of the diversity was provided by seven to 10 plant introductions when the crop was imported from Asia. Reflective of this, 80% of SNPs found in the full set of 22 soybean genotypes were found in three mapping parental lines. A similar situation exists in sugarcane.
Studies of LD decay rate in maize by two different groups resulted in differing conclusions. One group of investigators observed higher LD decay rates in some maize lines. Other groups arrived at differing conclusions regarding overall rate of decay of LD as well as the rate of LD decay between different loci. The author noted that there are at least 100-fold differences in recombination frequency between different genome regions. Due to reduced recombination rates, LD is more extensive in centromeric regions than in genic regions.
The author remarks upon the application of association studies to analyze plant populations, noting that pre-existing populations may be used for such studies. Both candidate gene approaches and whole-genome scans may be used in plants, with the most efficient approach determined by the species and population under investigation. The author suggests that outbreeding populations may be used for association studies, provided that an appropriate level of LD exists, and remarks that methods have been developed to correct the effects of population structure in association studies.
In conclusion, the author discusses possible directions for future research using SNPs. The author states that SNPs are an inexhaustible source of polymorphic markers for use in high-resolution genetic mapping of traits, which could be extremely helpful to breeders seeking to improve crops further.
Click here for details.