What are the basic steps and tools required to perform a genome-wide association study?

  1. Perform basic statistics and filtering of variants and samples using PLINK 1.9
  2. Perform genetic ancestry analyses using EIGENSOFT
  3. Perform phasing and imputation with SHAPEIT and IMPUTE2
  4. Perform association analyses with PLINK 1.9
  5. Perform annotation with VEP, CADD and GTEx

Does anyone have any other favourite tools and analyses to add to the list?

1 Like

This recent Nature Reviews Genetics article on the pros and cons of GWAS may also be of interest to this topic.

How about Michigan Imputation Server? Could I use this to replace the #3 to perform phasing and imputation with SHAPEIT and IMPUTE2?

This is a great suggestion! Read more about the Michigan Imputation Server here.

I like your list of basic steps and I wonder if you could kindly also provide us the basic steps, resources and tools needed to process whole genome or exome sequencing analysis?

The GATK Best Practices Workflows provides a great overview of whole genome/exome sequencing analyses.