CYANOBACTERIA: THEN AND NOW
The current data set includes nearly complete but largely unannotated sequences of six cyanobacterial genomes. These are Prochlococcus marinus MED4, P. marinus MIT 9313, Synechocystis PCC 6803, Nostoc punctiforme ATCC 29133, Anabaena PCC 7120 and Synechococcus WH8102, They encompass three major groups of cyanobacteria. In two cases there is a sequence from a sister group, but no close relative of PCC6803 has been fully sequenced. Fortunately, a partial genomic characterization of PCC 7002 is underway. The PCC7002 16S rRNA sequence shares 91.7% sequence identity with PCC 6803. The advantage of having such sister groups is that when functional regions (coding and regulatory) are compared they are readily alignable, whereas non-functional regions such as spacers are typically not. This allows reliable identification of true coding sequences and regulatory elements. Comparisons of these genomes has allowed us to identify core genes, and pathways, that are shared by essentially all cyanobacteria but not present in bacteria in general. We have also identified conserved genomic arrangements that are indicative of important regulatory elements. It will also be important to identify features that are unique to subgroups of the cyanobacteria in order to understand how and when geologically significant properties such as the ability to fix nitrogen and produce limestone came into existence. Ultimately, as the genomic data becomes more complete it will be possible to infer the likely biological properties of cyanobacteria at various points in the past and correlate these findings with the geological record.