Last updated: 2022-02-02

Checks: 7 0

Knit directory: snRNA_eqtl/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20211124)

The command set.seed(20211124) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: cb10986

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cb10986. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    data/magma_interaction_eqtl/
    Ignored:    data/magma_specific_genes/
    Ignored:    data_sensitive/
    Ignored:    output/figures/

Untracked files:
    Untracked:  analysis/._index.Rmd
    Untracked:  analysis/additional/
    Untracked:  analysis/alternatives/
    Untracked:  analysis/clean_analysis.Rmd
    Untracked:  analysis/log_run_slurm_out
    Untracked:  analysis/log_run_slurm_stderr
    Untracked:  analysis/portfolio/
    Untracked:  analysis/revision.Rmd
    Untracked:  analysis/run_slurm.Rscript
    Untracked:  analysis/run_slurm.sbatch
    Untracked:  data/dice/
    Untracked:  data/epigenome_enrichment/
    Untracked:  data/gencode/
    Untracked:  data/gnomad_loeuf/
    Untracked:  data/gtex/
    Untracked:  data/gwas/
    Untracked:  data/gwas_epigenome_overlap/
    Untracked:  data/metabrain/
    Untracked:  data/umap/
    Untracked:  output/coloc/
    Untracked:  output/eqtl/
    Untracked:  output_almost_final/
    Untracked:  output_tmp_QTLtools/
    Untracked:  output_tmp_fastQTL_sample_ambient_removed/

Unstaged changes:
    Modified:   .gitignore
    Modified:   analysis/_site.yml
    Modified:   analysis/plot_figures.Rmd
    Deleted:    output/README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/04_coloc.Rmd) and HTML (docs/04_coloc.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	cb10986	Julien Bryois	2022-02-02	coloc
html	91b1ccc	Julien Bryois	2022-02-02	Build site.
html	437b480	Julien Bryois	2022-02-02	Build site.
html	7eee105	Julien Bryois	2022-02-02	Build site.
Rmd	a2a25b6	Julien Bryois	2022-02-02	coloc
html	b48d8b4	Julien Bryois	2022-02-02	Build site.
Rmd	1be9439	Julien Bryois	2022-02-02	coloc
Rmd	d7d8d07	Julien Bryois	2022-02-02	coloc
html	3bdf530	Julien Bryois	2022-01-28	Build site.
Rmd	caf8cdd	Julien Bryois	2022-01-28	coloc
html	0ea1f33	Julien Bryois	2022-01-28	Build site.
Rmd	fbcbb80	Julien Bryois	2022-01-28	coloc
html	70a2464	Julien Bryois	2022-01-28	Build site.
html	177cc78	Julien Bryois	2022-01-28	Build site.
Rmd	6baadda	Julien Bryois	2022-01-28	coloc
html	764e618	Julien Bryois	2022-01-28	Build site.
Rmd	60a7f4f	Julien Bryois	2022-01-28	coloc
html	f4ad6a7	Julien Bryois	2022-01-25	Build site.
Rmd	29a59d5	Julien Bryois	2022-01-25	coloc
html	4358075	Julien Bryois	2022-01-25	Build site.
Rmd	3e7907f	Julien Bryois	2022-01-25	coloc
html	9e2f9bf	Julien Bryois	2022-01-25	Build site.
Rmd	e9917b0	Julien Bryois	2022-01-25	coloc
html	e0e499a	Julien Bryois	2022-01-25	Build site.
Rmd	81a42b2	Julien Bryois	2022-01-25	coloc
html	8281c18	Julien Bryois	2022-01-07	Build site.
Rmd	22fe629	Julien Bryois	2022-01-07	Coloc AD
html	5878783	Julien Bryois	2022-01-06	Build site.
Rmd	3c17a76	Julien Bryois	2022-01-06	Coloc AD
html	b4cdba1	Julien Bryois	2022-01-06	Build site.
Rmd	0209d6a	Julien Bryois	2022-01-06	Coloc AD
html	820efe7	Julien Bryois	2022-01-06	Build site.
Rmd	4c5ae3d	Julien Bryois	2022-01-06	Coloc AD
html	770527b	Julien Bryois	2021-12-17	Build site.
html	0be678f	Julien Bryois	2021-12-17	Build site.
Rmd	4ad463e	Julien Bryois	2021-12-17	Coloc PD
Rmd	0bad9f7	Julien Bryois	2021-12-17	Coloc PD
html	94e7384	Julien Bryois	2021-12-17	Build site.
Rmd	b51fccc	Julien Bryois	2021-12-17	Coloc PD
html	889d05a	Julien Bryois	2021-12-16	Build site.
Rmd	8ef85a1	Julien Bryois	2021-12-16	Coloc PD
html	a4a1ca2	Julien Bryois	2021-12-16	Build site.
Rmd	4198c1d	Julien Bryois	2021-12-16	Coloc PD
html	77dd2a4	Julien Bryois	2021-12-16	Build site.
Rmd	e0f0f51	Julien Bryois	2021-12-16	Coloc MS
html	8ddfc74	Julien Bryois	2021-12-16	Build site.
Rmd	4eef4d4	Julien Bryois	2021-12-16	Coloc MS
html	b5b88a0	Julien Bryois	2021-12-16	Build site.
Rmd	7df50cb	Julien Bryois	2021-12-16	Coloc MS
html	9408e85	Julien Bryois	2021-12-16	Build site.
Rmd	752513d	Julien Bryois	2021-12-16	Coloc MS

Libraries

library(tidyverse)
library(data.table)
library(coloc)
library(parallel)
library(GenomicRanges)
library(rtracklayer)

selected_trait <- 'pd'

Process GWAS

AD Schwartzentruber

if(selected_trait=='ad'){
sumstats <- vroom::vroom('data/gwas/ad/GCST90012877_buildGRCh37.tsv') %>% 
  mutate(chr=paste0('chr',chromosome)) %>% 
  dplyr::rename(bp_b37=base_pair_location) %>% 
  dplyr::select(variant_id,p_value,chr,bp_b37,effect_allele,other_allele,beta,se=standard_error) %>% 
  filter(!is.na(p_value))

  if(file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')){
  #Supplementary table file from Schwartzentruber et al.
  top_snps <- readxl::read_xlsx('data/gwas/ad/41588_2020_776_MOESM3_ESM.xlsx',sheet=2) %>% 
    dplyr::select(Chr,SNP,`Lead SNP pos`) %>% dplyr::rename(pos=`Lead SNP pos`) %>% 
    mutate(bp_37=paste0('chr',Chr,':',pos)) %>% dplyr::select(-Chr) %>% 
    mutate(SNP=gsub('_.+','',SNP))

  #Add novel loci only observed in replication 
  #+ discovery (not used for their downstream analysis)
  additional_snps <- readxl::read_xlsx('data/gwas/ad/41588_2020_776_MOESM3_ESM.xlsx',sheet=4,skip=43) %>%
    dplyr::select(1,2,3) %>% 
    setNames(c('chr','pos','SNP')) %>% 
    filter(!is.na(SNP)) %>% 
    mutate(bp_37=paste0('chr',chr,':',pos)) %>% 
    dplyr::select(SNP,pos,bp_37)

  top_snps <- rbind(top_snps,additional_snps)
}

Get 1KG LD for the top GWAS SNPs at each loci

if(!file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')){
proxy_snps <- lapply(top_snps$SNP,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", 
                     token = '72edb9cc22c9', file = FALSE)
saveRDS(proxy_snps,'data/gwas/ad/loci_LDlinkR.allSNPs.EUR.rds')
}

Define each locus as the most extreme coordinates of SNPs in LD (r2>=0.1) with the index GWAS SNP.

if(!file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,
                    GWAS_snp_pos=top_snp_pos,
                    locus_name=locus_name,
                    chrom=chr,
                    start=min(pos),
                    end=max(pos))
      return(locus)
   }
    else{
      return(NULL)
    }
  }) %>% 
    bind_rows() %>% 
  write_tsv(.,'data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/ad/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)

  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/ad/ad_GWAS_index_snps.v2.bed',col_names = FALSE)
}

if(!file.exists('data/gwas/ad/closest.protein.coding.bed')){
  gtf_b37 <- rtracklayer::import('data/gencode/gencode.v39lift37.annotation.gtf.gz') %>% 
    as.data.frame() %>% 
    as_tibble()
}

if(!file.exists('data/gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed')){
  protein_coding <- filter(gtf_b37,type=='gene',gene_type=='protein_coding') %>% 
    mutate(gene_label=paste0(gene_name,'_',gsub('\\..+','',gene_id))) %>% 
    dplyr::select(seqnames,start,end,gene_label) %>% 
    filter(seqnames %in% paste0('chr',1:22)) %>% 
    mutate(seqnames=as.character(seqnames)) %>% 
    arrange(seqnames,start) %>% 
    write_tsv('data/gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed',col_names = FALSE)
}

source ~/.bashrc
cd data/gwas/ad
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a ad_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

PD meta-analysis

if(selected_trait=='pd'){
sumstats <- vroom::vroom('data/gwas/pd/ParkinsonMeta_Nalls2014_2019.tbl') %>% 
  dplyr::select(variant_id=MarkerName,p_value=`P-value`,chr,bp_b37=bp,effect_allele=Allele1,other_allele=Allele2,beta=Effect,se=StdErr) %>%
  mutate(effect_allele=toupper(effect_allele),
         other_allele=toupper(other_allele)) %>% 
  unique() #some lines are duplicated in the

  if(file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')){
  top_snps <- readxl::read_xlsx('data/gwas/pd/Table S2. Detailed summary statistics on all nominated risk variants, known and novel_.xlsx') %>%
    filter(!is.na(`Locus Number`),`Locus Number`!='NA')
}

if(!file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')){
  proxy_snps <- lapply(top_snps$SNP,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", token = '72edb9cc22c9', file = FALSE)
  saveRDS(proxy_snps,'data/gwas/pd/loci_LDlinkR.allSNPs.EUR.rds')
}

if(!file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,GWAS_snp_pos=top_snp_pos,locus_name=locus_name,chrom=chr,start=min(pos),end=max(pos))
      return(locus)
    }
    else{
      return(NULL)
    }
  }) %>% 
    bind_rows()
  write_tsv(loci,'data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/pd/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)
}

if(!file.exists('data/gwas/pd/closest.protein.coding.bed')){
  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/pd/pd_GWAS_index_snps.v2.bed',col_names = FALSE)
}

source ~/.bashrc

cd data/gwas/pd
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a pd_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

MS GWAS

if(selected_trait=='ms'){
sumstats <- vroom::vroom('data/gwas/ms/discovery_metav3.0.meta') %>% 
  mutate(CHR=paste0('chr',CHR)) %>% 
  filter(!is.na(P)) %>% 
  mutate(beta=log(OR),
         se=abs(beta/qnorm(P/2))) %>% 
  dplyr::select(variant_id=SNP,p_value=P,chr=CHR,bp_b37=BP,effect_allele=A1,other_allele=A2,beta,se) %>% 
  filter(beta!=0) #Some SNPs have OR=1, so se estimate = 0, leading to issues in coloc, we exclude these here


  if(file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- readxl::read_xlsx('data/gwas/ms/aav7188_Patsopoulos_Tables S1-S10.xlsx',sheet=8,skip=3)
}

if(!file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')){
  proxy_snps <- lapply(loci$SNP,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", token = '72edb9cc22c9', file = FALSE)
  saveRDS(proxy_snps,'data/gwas/ms/loci_LDlinkR.allSNPs.EUR.rds')
}

if(!file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,GWAS_snp_pos=top_snp_pos,locus_name=locus_name,chrom=chr,start=min(pos),end=max(pos))
      return(locus)
    }
    else{
      return(NULL)
    }
  }) %>% bind_rows()
  write_tsv(loci,'data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/ms/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)
}

if(!file.exists('data/gwas/ms/closest.protein.coding.bed')){
  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/ms/ms_GWAS_index_snps.v2.bed',col_names = FALSE)
}

source ~/.bashrc

cd data/gwas/ms
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a ms_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

SCZ GWAS

if(selected_trait=='scz'){
sumstats <- vroom::vroom('data/gwas/scz/PGC3_SCZ_wave3_public.v2.tsv.gz') %>%
  mutate(CHR=paste0('chr',CHR)) %>% 
  filter(!is.na(P)) %>% 
  mutate(beta=log(OR),
         se=SE) %>% 
  dplyr::select(variant_id=SNP,p_value=P,chr=CHR,bp_b37=BP,effect_allele=A1,other_allele=A2,beta,se)


  if(file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- readxl::read_xlsx('data/gwas/scz/Supplementary Table 2 - Replication index.xlsx',sheet=2) %>% filter(`P-comb`<5e-8) %>%  
    mutate(locus=paste0('chr',CHR,':',left,'-',right)) %>% dplyr::rename(locus_name=locus) %>% mutate(start=BP,end=BP) %>%  
    mutate(GWAS_snp=SNP,GWAS_snp_pos=paste0('chr',CHR,':',BP))
}

if(!file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')){
  proxy_snps <- lapply(loci$GWAS_snp_pos,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", token = '72edb9cc22c9', file = FALSE)
  saveRDS(proxy_snps,'data/gwas/scz/loci_LDlinkR.allSNPs.EUR.rds')
}

if(!file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,GWAS_snp_pos=top_snp_pos,locus_name=locus_name,chrom=chr,start=min(pos),end=max(pos))
      return(locus)
    }
    else{
      return(NULL)
    }
  }) %>% bind_rows()
  write_tsv(loci,'data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/scz/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)
}

if(!file.exists('data/gwas/scz/closest.protein.coding.bed')){
  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/scz/scz_GWAS_index_snps.v2.bed',col_names = FALSE)
}

source ~/.bashrc

cd data/gwas/scz
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a scz_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

Add closest gene to loci

closest <- read_tsv(paste0('data/gwas/',selected_trait,'/closest.protein.coding.bed'),col_names = FALSE) %>% 
  setNames(c('chr_snp','start_snp','end_snp','GWAS_snp','beta','chr_gene','start_gene','end_gene','gene','distance')) %>% 
  mutate(GWAS_snp_pos=paste0(chr_snp,':',start_snp)) %>% 
  dplyr::select(GWAS_snp,GWAS_snp_pos,gene,beta,distance) %>% 
  separate(gene,into=c('symbol','ensembl'),sep='_') %>% 
  add_count(GWAS_snp) %>% 
  group_by(GWAS_snp) %>% 
  mutate(locus_name_gene=ifelse(n==1,symbol,paste0(symbol,collapse=' - '))) %>% 
  ungroup() %>% 
  dplyr::select(GWAS_snp_pos,locus_name_gene,beta_top_GWAS=beta) %>% 
  unique() %>% arrange(-abs(beta_top_GWAS))
loci <- left_join(loci,closest,by='GWAS_snp_pos')

GET MAF for each SNP

This section is run only once

source ~/.bashrc

ml PLINK/1.90-goolf-1.7.20
cd data_sensitive/genotypes/processed
plink --bfile ../combined_7 --freq

Get SNP position in hg19 - hg38

Use: Match eQTL SNPs with GWAS SNPs

snp_pos <- data.table::fread('data_sensitive/genotypes/processed/snp_pos_hg38_hg19.mappings.txt',data.table = FALSE) %>% as_tibble()

Code below is run only once

zcat data_sensitive/genotypes/processed/combined_final.vcf.gz | cut -f 1,2,3 | grep -v '^#' > data_sensitive/genotypes/processed/snp_pos_hg38.txt

snp_pos <- data.table::fread('data_sensitive/genotypes/processed/snp_pos_hg38.txt',data.table = F) %>% mutate(end=V2) %>% 
  setNames(c('chr','start','SNP_id','end'))
snp_pos_gr <- makeGRangesFromDataFrame(snp_pos,keep.extra.columns=TRUE)

path = system.file(package="liftOver", "extdata", "hg38ToHg19.over.chain")
ch = import.chain(path)
ch

seqlevelsStyle(snp_pos_gr) = "UCSC"  # necessary
cur19 = liftOver(snp_pos_gr, ch)
cur19 = unlist(cur19)
genome(cur19) = "hg19"

Number of SNPs not lifted over: 33

length(snp_pos_gr)-length(cur19)

cur19 <- as.data.frame(cur19) %>% 
  as_tibble() %>% 
  mutate(SNP2=paste0(seqnames,':',start)) %>% 
  select(SNP_id,seqnames,start,end,SNP2)

snp_pos <- dplyr::select(snp_pos,chr,start,SNP_id) %>% 
  left_join(.,cur19,by='SNP_id') %>% 
  setNames(c('chr_hg38','start_hg38','SNP','chr_hg19','start_hg19','end_hg19','SNP_id_hg19')) %>% 
  mutate(SNP_id_hg38=paste0('chr',chr_hg38,':',start_hg38)) %>% 
  dplyr::select(SNP,chr_hg19,start_hg19,SNP_id_hg19,chr_hg38,start_hg38,SNP_id_hg38) %>% 
  as_tibble()

maf <- data.table::fread('data_sensitive/genotypes/processed/plink.frq',data.table=FALSE) %>% dplyr::select(SNP,A1,A2,MAF) %>% as_tibble()

snp_pos <- inner_join(snp_pos,maf,by='SNP')

write_tsv(snp_pos,'data_sensitive/genotypes/processed/snp_pos_hg38_hg19.mappings.txt')

Run Coloc for each locus

Set location of nominal fastQTL result files:

eqtl_path <- 'data_sensitive/eqtl/PC70_nominal/'

ex <- paste0(eqtl_path,'Excitatory.neurons.')
inh <- paste0(eqtl_path,'Inhibitory.neurons.')
oli <- paste0(eqtl_path,'Oligodendrocytes.')
opc <- paste0(eqtl_path,'OPCs...COPs.')
micro <- paste0(eqtl_path,'Microglia.')
astro <- paste0(eqtl_path,'Astrocytes.')
endo <- paste0(eqtl_path,'Endothelial.cells.')
peri <- paste0(eqtl_path,'Pericytes.')

Functions

# eqtl_ex <- prepare_eqtl(micro,chrom_locus,sumstats_locus)
prepare_eqtl <- function(eqtl_path,chrom_locus,sumstats_locus){
  file <- paste0(eqtl_path,gsub('chr','',chrom_locus),'.gz')
  eqtl <- data.table::fread(file,header = FALSE,data.table=FALSE,nThread = 1) %>% 
    dplyr::select(gene=V1,SNP=V2,p_eqtl=V4,beta_eqtl=V5) %>% # if using fastQTL output
    #dplyr::select(gene=V1,SNP=V2,p_eqtl=V3,beta_eqtl=V4,se_eqtl=V5) %>% # if using QTLtools output
    mutate(se_eqtl=abs(beta_eqtl/qnorm(p_eqtl/2))) %>% #compute standard error from pvalue and beta
    inner_join(.,snp_pos,by='SNP') %>% 
  filter(SNP_id_hg19%in%sumstats_locus$SNP_id_hg19) %>% 
  add_count(gene) %>% 
  filter(n>10) #Only keep genes with at least 10 SNPs
}

#  ex_coloc <- run_coloc(eqtl_ex,'Excitatory neurons',sumstats_locus)
run_coloc <- function(tissue_sumstats,tissue_name,sumstats_locus){
  
  if(nrow(tissue_sumstats)==0){
    return (NULL)
  }
  
  out <- lapply(unique(tissue_sumstats$gene),function(x){
    message(x)
    tissue_sumstats_gene <- filter(tissue_sumstats,gene==x)
    sumstats_locus_gene <- sumstats_locus %>% inner_join(.,tissue_sumstats_gene,by='SNP_id_hg19')
    
    if (nrow(sumstats_locus_gene)>0){
  
     coloc_res_pval <- coloc.abf(
       dataset1=list(beta=sumstats_locus_gene$beta,
                     varbeta=sumstats_locus_gene$se^2,
                     type="cc"),
       dataset2=list(beta=sumstats_locus_gene$beta_eqtl,
                     varbeta=sumstats_locus_gene$se_eqtl^2,
                     sdY=1,
                     type="quant"))$summary %>% 
       as.data.frame()
    colnames(coloc_res_pval) <- x
    
    #Get direction of effect for all SNPs at the locus
    sumstats_locus_gene <- sumstats_locus_gene %>% 
      mutate(direction=case_when(
      (effect_allele==A1 & other_allele==A2)  ~ sign(beta*beta_eqtl),
      (effect_allele==A2 & other_allele==A1)  ~ -sign(beta*beta_eqtl),
    TRUE ~ 0))
   
   #Get Proportion of positive direction
   direction_prop <- sumstats_locus_gene %>% 
     summarise(prop_pos_direction=sum(direction==1)/n()) %>% 
     setNames(x) %>% 
     as.data.frame()
   rownames(direction_prop) <- 'prop_pos_direction'
   
   direction_sign <- sumstats_locus_gene %>% 
     #Take SNP with strongest evidence of an effect on gene expression
     filter(p_eqtl==min(p_eqtl)) %>% 
     slice_min(n=1,p_value,with_ties=FALSE) %>% 
     summarise(direction=direction,
               beta_gwas=case_when(
                    (effect_allele==A1 & other_allele==A2)  ~ beta,
                    (effect_allele==A2 & other_allele==A1)  ~ -beta
                    ),
              beta_eqtl=beta_eqtl,
               beta_smr=case_when(
                    direction== 1  ~ abs(beta)/abs(beta_eqtl),
                    direction== -1  ~ -(abs(beta)/abs(beta_eqtl))
                 ),
               ) %>% 
     t() %>% 
     as.data.frame()
   colnames(direction_sign) <- x
   
    #Add direction of effect to coloc results
    coloc_res_pval <- rbind(coloc_res_pval,direction_sign,direction_prop)

    return(coloc_res_pval)
    }
    else{
      return (NULL)
    }
    
})
  
  out_pvalue <- out %>% 
    bind_cols() %>% 
    t() %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    as_tibble() %>% 
    arrange(-PP.H4.abf) %>% 
    mutate(tissue=tissue_name)
  
  return(out_pvalue)
}

Run

coloc_results_all <- mclapply(1:nrow(loci),function(i){

  #Get coordinates from the GWAS locus
  chrom_locus <- loci$chrom[i]
  start <- loci$start[i] %>% as.numeric()
  end <- loci$end[i] %>% as.numeric()
  
  closest_gene_locus <- loci$locus_name_gene[i]
  beta_top_GWAS_locus <- loci$beta_top_GWAS[i]
  GWAS_snp_name <- loci$GWAS_snp[i]
  GWAS_snp_pos_name <- loci$GWAS_snp_pos[i]

  #Keep GWAS sumstats of SNPs in the locus
  sumstats_locus <- filter(sumstats,chr==chrom_locus) %>% 
    filter(bp_b37>=start & bp_b37<=end) %>% 
    dplyr::mutate(SNP_id_hg19=paste0(chr,':',bp_b37))

  #Preparing eQTL files - keeping SNPs from the eQTL data that math SNPs in the GWAS locus
  eqtl_ex <- prepare_eqtl(ex,chrom_locus,sumstats_locus)
  eqtl_inh <- prepare_eqtl(inh,chrom_locus,sumstats_locus)
  eqtl_oli <- prepare_eqtl(oli,chrom_locus,sumstats_locus)
  eqtl_opc <- prepare_eqtl(opc,chrom_locus,sumstats_locus)
  eqtl_micro <- prepare_eqtl(micro,chrom_locus,sumstats_locus)
  eqtl_astro <- prepare_eqtl(astro,chrom_locus,sumstats_locus)
  eqtl_endo <- prepare_eqtl(endo,chrom_locus,sumstats_locus)
  eqtl_peri <- prepare_eqtl(peri,chrom_locus,sumstats_locus)

  #Running colocalization analysis
  ex_coloc <- run_coloc(eqtl_ex,'Excitatory neurons',sumstats_locus)
  inh_coloc <- run_coloc(eqtl_inh,'Inhibitory neurons',sumstats_locus)
  oli_coloc <- run_coloc(eqtl_oli,'Oligodendrocytes',sumstats_locus)
  opc_coloc <- run_coloc(eqtl_opc,'OPCs / COPs',sumstats_locus)
  micro_coloc <- run_coloc(eqtl_micro,'Microglia',sumstats_locus)
  astro_coloc <- run_coloc(eqtl_astro,'Astrocytes',sumstats_locus)
  endo_coloc <- run_coloc(eqtl_endo,'Endothelial cells',sumstats_locus)
  peri_coloc <- run_coloc(eqtl_peri,'Pericytes',sumstats_locus)
  
  pval_eqtl <- rbind(ex_coloc,inh_coloc,oli_coloc,
                     opc_coloc,micro_coloc,astro_coloc,endo_coloc,peri_coloc)
  
  if(!is.null(pval_eqtl)){
    pval_eqtl <- pval_eqtl %>% 
      mutate(type='eQTL',
           locus=loci$locus_name[i],
           coloc_method='beta') %>% 
      mutate(closest_gene=closest_gene_locus,
             beta_top_GWAS=beta_top_GWAS_locus,
             GWAS_snp=GWAS_snp_name,
             GWAS_snp_pos=GWAS_snp_pos_name)
  }
  results <- pval_eqtl 
  if(!is.null(results)){
    results <- results %>% 
      arrange(-PP.H4.abf) %>% 
      dplyr::select(locus,closest_gene,GWAS_snp,GWAS_snp_pos,beta_top_GWAS,everything())
  }
  
  return(results)
},mc.cores = 36,mc.preschedule = FALSE)

coloc_results_all <- coloc_results_all %>% 
  bind_rows() %>% 
  arrange(-PP.H4.abf) %>% 
  separate(gene,into=c('symbol','ensembl'),sep='_')

Add LOEUF

loeuf <- read_tsv('data/gnomad_loeuf/supplementary_dataset_11_full_constraint_metrics.tsv') %>% 
  filter(canonical==TRUE) %>% 
  dplyr::select(gene,gene_id,transcript,oe_lof_upper,p) %>% 
  mutate(oe_lof_upper_bin = ntile(oe_lof_upper, 10)) %>% 
  dplyr::rename(ensembl=gene_id,symbol=gene) %>% 
  dplyr::select(ensembl,oe_lof_upper_bin)

coloc_results_all <- left_join(coloc_results_all,loeuf,by='ensembl')

Write

dir.create('output/coloc',showWarnings = FALSE)
write_tsv(coloc_results_all,paste0('output/coloc/coloc.',selected_trait,'.txt'))

sessionInfo()

R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /pstore/apps/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblasp-r0.3.1.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rtracklayer_1.48.0   GenomicRanges_1.40.0 GenomeInfoDb_1.24.0 
 [4] IRanges_2.22.2       S4Vectors_0.26.1     BiocGenerics_0.34.0 
 [7] coloc_5.1.0          data.table_1.12.8    forcats_0.5.0       
[10] stringr_1.4.0        dplyr_1.0.0          purrr_0.3.4         
[13] readr_1.3.1          tidyr_1.1.0          tibble_3.0.1        
[16] ggplot2_3.3.3        tidyverse_1.3.0     

loaded via a namespace (and not attached):
 [1] nlme_3.1-148                matrixStats_0.56.0         
 [3] bitops_1.0-6                fs_1.4.1                   
 [5] bit64_0.9-7                 lubridate_1.7.9            
 [7] httr_1.4.1                  rprojroot_1.3-2            
 [9] tools_4.0.1                 backports_1.1.7            
[11] R6_2.4.1                    irlba_2.3.3                
[13] DBI_1.1.0                   colorspace_1.4-1           
[15] withr_2.2.0                 tidyselect_1.1.0           
[17] gridExtra_2.3               bit_1.1-15.2               
[19] compiler_4.0.1              git2r_0.27.1               
[21] Biobase_2.48.0              cli_2.0.2                  
[23] rvest_0.3.5                 xml2_1.3.2                 
[25] DelayedArray_0.14.0         scales_1.1.1               
[27] mixsqp_0.3-43               Rsamtools_2.4.0            
[29] digest_0.6.25               rmarkdown_2.2              
[31] XVector_0.28.0              pkgconfig_2.0.3            
[33] htmltools_0.5.1.1           dbplyr_1.4.4               
[35] rlang_0.4.10                readxl_1.3.1               
[37] susieR_0.11.42              rstudioapi_0.11            
[39] generics_0.0.2              jsonlite_1.6.1             
[41] vroom_1.3.2                 BiocParallel_1.22.0        
[43] RCurl_1.98-1.2              magrittr_1.5               
[45] GenomeInfoDbData_1.2.3      Matrix_1.2-18              
[47] Rcpp_1.0.6                  munsell_0.5.0              
[49] fansi_0.4.1                 viridis_0.5.1              
[51] lifecycle_0.2.0             stringi_1.4.6              
[53] whisker_0.4                 yaml_2.2.1                 
[55] SummarizedExperiment_1.18.1 zlibbioc_1.34.0            
[57] plyr_1.8.6                  grid_4.0.1                 
[59] blob_1.2.1                  promises_1.1.1             
[61] crayon_1.3.4                lattice_0.20-41            
[63] Biostrings_2.56.0           haven_2.3.1                
[65] hms_0.5.3                   knitr_1.28                 
[67] pillar_1.4.4                reprex_0.3.0               
[69] XML_3.99-0.3                glue_1.4.1                 
[71] evaluate_0.14               modelr_0.1.8               
[73] vctrs_0.3.1                 httpuv_1.5.4               
[75] cellranger_1.1.0            gtable_0.3.0               
[77] reshape_0.8.8               assertthat_0.2.1           
[79] xfun_0.14                   broom_0.5.6                
[81] later_1.1.0.1               viridisLite_0.3.0          
[83] GenomicAlignments_1.24.0    workflowr_1.6.2            
[85] ellipsis_0.3.1

Colocalization analysis

Julien Bryois

February 02, 2022

Libraries

Process GWAS

AD Schwartzentruber

Get loci coordinates

Get closest gene

PD meta-analysis

Get loci coordinates

Get closest gene

MS GWAS

Get loci coordinates

Get closest gene

SCZ GWAS

Get loci coordinates

Get closest gene

Add closest gene to loci

GET MAF for each SNP

Get SNP position in hg19 - hg38

Run Coloc for each locus

Functions

Run

Add LOEUF

Write