Login

Input file formats

Contents

SNP genotype files
VCF file (Next Generation Sequencing genotypes)

SNP genotype files

Genotype files must be tabular with the samples as columns and the SNPs as rows, they can also be zipped or gzipped.
Since there appears to be an unlimited array of different formats for genotype files, we specify here those that can be imported into AutozygosityMapper without any further manipulation.
In every file, lines starting with the number sign (#) will be ignored. In each line, the SNP ID (Affymetrix ID or, with Illumina files, dbSNP ID) must be directly followed by the genotypes. The genotypes must be written in one fo the following ways:

Affymetrix (example [Chip: Mapping50K_Hind240])

SNP ID	Sample01	Sample02	Sample03	Sample04	Sample08	Sample09
SNP_A-1513509	BB	BB	AB	BB	AB	BB
SNP_A-1518411	BB	BB	BB	BB	BB	BB
SNP_A-1511066	AB	NoCall	AA	AA	AA	AA
SNP_A-1517367	AA	AB	AB	AA	AA	AB

Instead of AA/AB/BB/NoCall, also the 'number format' (0,1,2,-1) can be used.

The following columns will be ignored and do not have to be removed from the file:

dnsnp rs id
tsc id
chromosome
physical position

Illumina

DBSNP*	Sample01	Sample02	Sample03	Sample05	Sample06
rs10000010	3	0	3	2	1
rs10000023	3	3	2	1	2
rs10000030	3	3	0	2	3
rs1000007	0	3	1	0	0
rs10000092	3	0	1	3	0
rs10000121	1	1	1	2	2

Instead of 1/2/3/0, also the character format (AA, AB, BB, --) can be used.
Additionally, real genotypes are allowed. Please note that this will drastically reduce the upload speed.
*) As dbSNP IDs are very humane, in other species the column 'SNP NAME' is used instead.

VCF file (Next Generation Sequencing genotypes)

The VCF file must have the following columns:

#CHROM POS    ID  REF  ALT   QUAL  FILTER  INFO  FORMAT  Sample1  Sample2  Sample3 (...) 
chr1   14930  .   A    G     .     .       .     GT:DP   1/1:31   0/1:30   0/0:23

The content of the columns 'ID', 'QUAL', 'FILTER', 'INFO' is ignored. The format attribute is used to determine which part of the samples' genotypes is the genotype and which one is the coverage. Please note that the DP flag must be included in the FORMAT string (not only in INFO!), unless you set the minimum coverage value in the upload interface to 0. Without the DP flag in FORMAT it is impossible to exclude genotypes with a low coverage because the DP information in INFO aggegrates the coverage over all samples!
The file must be sorted by chromosome.

Sites at which the genotype is uncertain (two alt alleles) are skipped.

Here is a sample file.
(Cases: Sample1, Sample2; controls Sample3, Sample4 - should yield a hit on chr6.)

You can generate such a file from your aligned NGS data with SAMtools like this:

# all BAM files in the same directory
samtools mpileup -D -gf /path/to/genome.fa *.bam | bcftools view -c -g - > filename.vcf
# BAM files in different directories 
samtools mpileup -D -gf /path/to/genome.fa /path/to/bam1.bam /path/to/bam2.bam | bcftools view -c -g - > filename.vcf
# reference genome: /path/to/genome.fa
# output file: filename.vcf

GATK offers a similar option.
Please read the manuals of SAMtools / bcftools to find the appropriate settings for your data.