Rice Genotype Search and Summary
Locus and SNP/InDel: Annotation Venn:
Chr Length a MBK V3 Loci b SNP Num. c InDel Num. c
Nip R498 Nip R498 Nip R498 Nip R498
Chr1 43270923 44361539 6709 7151 1450225 1829472 247624 334698
Chr2 35937250 37764328 5407 5852 1171449 1075614 198183 209809
Chr3 36413819 39691490 5585 6013 1053768 1014419 182558 197559
Chr4 35502694 35849732 4746 4990 1509485 1197724 211999 205300
Chr5 29958434 31237231 4047 4433 1014356 887154 153959 160098
Chr6 31248787 32465040 4269 4604 1264776 1066204 190419 194435
Chr7 29697621 30277827 4001 4157 1224875 1012588 182227 181995
Chr8 28443022 29952003 3745 3890 1279867 1087538 182879 191435
Chr9 23012720 24760661 3000 3169 971971 846582 142944 147656
Chr10 23207287 25582588 3115 3353 1084044 924897 149655 154320
Chr11 29021106 31778392 3681 3950 1494008 1306939 217630 228054
Chr12 27531856 26601357 3417 3412 1332107 1028976 190727 182179
Total 373245519 390322188 51722 54974 14850931 13278107 2250804 2387538
a Oryza sativa reference genome:
(1) Nip (Nipponbare) Japonica subspecies, Os-Nipponbare-Reference-IRGSP-1.0
(2) R498 (ShuHui498) Indica subspecies, Os-R498-1.0
b Both Nip and R498 genes were annotated using the Gramene-pipeline evidence-based gene prediction method (doi:10.1101/gr.088997.108.). The annotations were named as MBK V3 (version 3) in MBKBase.
c The WGS samples were mapped to Nip and R498 reference genomes for calling SNP and InDel. The SNPs (MAF >= 0.01) and InDels (MAF >= 0.005) were summarized.

The Nip gene annotations:

For Nip reference genome, there are three source of gene annotation: MSU Release 7, RAP-DB V1, and MBK V3.
Venn diagram shows the comparison among the three annotations. 89.3% MSU loci and 89.7% RAP loci overlap with MBK V3 annotation.
The unique loci for MSU, RAP and MBK are 13768, 5053 and 10258 respectively.

Virtual loci and multiple alleles in Nip and R498:

Base on evidence-based gene prediction method, Nip and R498 were annotated respectively. The gene models at the collinear positionsbetween the two genomes (29192) were identified. A virtual locus is defined to hold multiple gene modelsat the collinear positions or just a single gene modelif it is found in only one genome without collinear gene models in other genomes.
Locus Model for Genotyping: Locus Statistics:

When a locus containsmore than one gene models, different annotations for the same genome can have different boundaries on the same gene. The overlapping annotations were also integrated under the same Locus ID.

There are totally 95,325 loci for both Nip and R498 genome, with an average length of 3kb. 97% of loci sequences are shorter than 10kb, and 28% shorter than 1kb.

Locus Genotyping and Show: Genotype Statistics:

One genotype (GT) is same or differs from the reference sequence (REF), and the differs including both SNP and InDel (ALT). A Locus GT is an ALT group which located in the region of a gene locus. All GT be shown in a table, first row is reference genome position, and second row is reference base. Genotypes are encoded GT1 to GT5, and capital base means homozygous, lowercase means heterozygous, '-' means missing. In table, charts 'N' represents multiple continuous bases, which means deletion or insertion. Most cultivated rice are homozygous, so the genotype is same as the conception of haplotype and allele.

A total of 5280 WGS samples were used to call locus genotype (allele), and the genotype with sample number >=10 were summarized. For each locus, the number of genotypes can represent the number of alleles in the population.