Rice Genotype Search and Summary
Locus and SNP/InDel: Annotation Venn:
Chr Length a MBK V3 Loci b SNP Num. c InDel Num. c
Nip R498 Nip R498 Nip R498 Nip R498
Chr1 43270923 44361539 6709 7151 1450225 1829472 247624 334698
Chr2 35937250 37764328 5407 5852 1171449 1075614 198183 209809
Chr3 36413819 39691490 5585 6013 1053768 1014419 182558 197559
Chr4 35502694 35849732 4746 4990 1509485 1197724 211999 205300
Chr5 29958434 31237231 4047 4433 1014356 887154 153959 160098
Chr6 31248787 32465040 4269 4604 1264776 1066204 190419 194435
Chr7 29697621 30277827 4001 4157 1224875 1012588 182227 181995
Chr8 28443022 29952003 3745 3890 1279867 1087538 182879 191435
Chr9 23012720 24760661 3000 3169 971971 846582 142944 147656
Chr10 23207287 25582588 3115 3353 1084044 924897 149655 154320
Chr11 29021106 31778392 3681 3950 1494008 1306939 217630 228054
Chr12 27531856 26601357 3417 3412 1332107 1028976 190727 182179
Total 373245519 390322188 51722 54974 14850931 13278107 2250804 2387538
a Oryza sativa reference genome:
(1) Nip (Nipponbare) Japonica subspecies, Os-Nipponbare-Reference-IRGSP-1.0
(2) R498 (ShuHui498) Indica subspecies, Os-R498-1.0
b Both Nip and R498 genome annotation using evidence-based gene prediction method, named as MBK V3 (version 3)in our MBKBase, detail see doi:10.1101/gr.088997.108.
c Total 4460 WGS samples be used for call SNP/InDel which mapped Nip and R498 reference genome respectively, SNP (AF >= 0.01) and InDel (AF >= 0.005) were used for statistics.
For Nip reference genome, there are three source of gene annotation: MSU Release 7, RAP-DB V1, and MBK V3.
The venn diagram shows the logical relations between gene loci collection of each annotation source. 89.3% MSU loci and 89.7% RAP loci are coincidence with MBK V1 annotation.
The unique loci for MSU, RAP and MBK are 18356, 5988 and 12682 respectively
Base on evidence-based gene prediction method, Nip and R498 were annotated respectively. The homologous Locus sequences of both variety were aligned and classified into one unified Locus. About 29192 homologous unified Loci were identified in both genome.
Locus Model for Genotyping: Locus Statistics:

One Locus maybe have more than one gene models, and different source locus (Source ID) maybe mapped to different reference genome region. In order to obtain a uniform standard of genotype, the overlap loci were integrated and coded with unified locus ID (Locus ID) which mapped to a unique reference genome region. The homologous genome sequences of different variety were aligned and classified into one unified Locus.

There are total 109121 unified loci for both Nip and R498 genome, the average length is 3kbp. 97% of loci sequence length less than 10kbp, 28% less than 1kbp.

Locus Genotyping and Show: Genotype Statistics:

In our rice genotype database, a locus genotype is an absolute measure of base composition of a group of WGS samples (germplasms) in a unified locus region. One genotype (GT) is same or differs subtly from reference genomic sequence, and the differs including both SNP and InDel, but not other big structural variations. Capital base means this position variation is homozygous, lowercase is heterozygous, '-' mean missing reads. Because of most cultivated rice germplasms are homozygous, so the rice genotype in our database is same as the conception of haplotype and allele.
For each potentially variant site, both SNP and InDel with allele frequency >0.15% were retained for building genotype. In Ref row, charts 'N' represents multiple continuous bases, which means the variation of this position for corresponding sample is deletion, in genotype row, the 'N' represents insertion of bases

Base on unified locus model, 4460 WGS rice samples population used for call genotype, and genotype frequency greater than 0.22% (sample number >=10) be retained for statistics. For each unified locus, it's genotype number can represent alleles number of corresponding gene model, and the alleles number of gene related to the length of model and the degree of variation.