Assessment of Species-specific Codon Usage by Principal Component Analysis

Shigehiko Kanaya (
Yoshihiro Kudo (

Department of Electrical and Information Engineering
Faculty of Engineering Yamagata University Yonezawa,
Yamagata 992 Japan


In order to examine differences of preferential usage of synonymous codons among species systematically, principal component analysis is applied to a matrix consisting of relative frequencies in synonymous codons. The first two principal components (PC1 and PC2) account for 66% and 8%, respectively. From the PC projection by the first two components, the following conclusion can be obtained: (1) The base-preference of A and U (G and C) at the third position in synonymous codon contributes negatively (positively) to the PC1: Vertebrates and chloroplasts are clusterized in narrow regions with positive and the most negative PC1, respectively. (2) The PC2 is important to distinguish between prokaryotes and eukaryotes: Eukaryotes (prokaryotes) prefer di-nucleotides GA, AG, CU and CA (CG, GC, and AA) at the second and the third positions in codons.