Last updated: 2022-02-02

Checks: 7 0

Knit directory: snRNA_eqtl/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211124) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cb10986. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    data/magma_interaction_eqtl/
    Ignored:    data/magma_specific_genes/
    Ignored:    data_sensitive/
    Ignored:    output/figures/

Untracked files:
    Untracked:  analysis/._index.Rmd
    Untracked:  analysis/additional/
    Untracked:  analysis/alternatives/
    Untracked:  analysis/clean_analysis.Rmd
    Untracked:  analysis/log_run_slurm_out
    Untracked:  analysis/log_run_slurm_stderr
    Untracked:  analysis/portfolio/
    Untracked:  analysis/revision.Rmd
    Untracked:  analysis/run_slurm.Rscript
    Untracked:  analysis/run_slurm.sbatch
    Untracked:  data/dice/
    Untracked:  data/epigenome_enrichment/
    Untracked:  data/gencode/
    Untracked:  data/gnomad_loeuf/
    Untracked:  data/gtex/
    Untracked:  data/gwas/
    Untracked:  data/gwas_epigenome_overlap/
    Untracked:  data/metabrain/
    Untracked:  data/umap/
    Untracked:  output/coloc/
    Untracked:  output/eqtl/
    Untracked:  output_almost_final/
    Untracked:  output_tmp_QTLtools/
    Untracked:  output_tmp_fastQTL_sample_ambient_removed/

Unstaged changes:
    Modified:   .gitignore
    Modified:   analysis/_site.yml
    Modified:   analysis/plot_figures.Rmd
    Deleted:    output/README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/04_coloc.Rmd) and HTML (docs/04_coloc.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd cb10986 Julien Bryois 2022-02-02 coloc
html 91b1ccc Julien Bryois 2022-02-02 Build site.
html 437b480 Julien Bryois 2022-02-02 Build site.
html 7eee105 Julien Bryois 2022-02-02 Build site.
Rmd a2a25b6 Julien Bryois 2022-02-02 coloc
html b48d8b4 Julien Bryois 2022-02-02 Build site.
Rmd 1be9439 Julien Bryois 2022-02-02 coloc
Rmd d7d8d07 Julien Bryois 2022-02-02 coloc
html 3bdf530 Julien Bryois 2022-01-28 Build site.
Rmd caf8cdd Julien Bryois 2022-01-28 coloc
html 0ea1f33 Julien Bryois 2022-01-28 Build site.
Rmd fbcbb80 Julien Bryois 2022-01-28 coloc
html 70a2464 Julien Bryois 2022-01-28 Build site.
html 177cc78 Julien Bryois 2022-01-28 Build site.
Rmd 6baadda Julien Bryois 2022-01-28 coloc
html 764e618 Julien Bryois 2022-01-28 Build site.
Rmd 60a7f4f Julien Bryois 2022-01-28 coloc
html f4ad6a7 Julien Bryois 2022-01-25 Build site.
Rmd 29a59d5 Julien Bryois 2022-01-25 coloc
html 4358075 Julien Bryois 2022-01-25 Build site.
Rmd 3e7907f Julien Bryois 2022-01-25 coloc
html 9e2f9bf Julien Bryois 2022-01-25 Build site.
Rmd e9917b0 Julien Bryois 2022-01-25 coloc
html e0e499a Julien Bryois 2022-01-25 Build site.
Rmd 81a42b2 Julien Bryois 2022-01-25 coloc
html 8281c18 Julien Bryois 2022-01-07 Build site.
Rmd 22fe629 Julien Bryois 2022-01-07 Coloc AD
html 5878783 Julien Bryois 2022-01-06 Build site.
Rmd 3c17a76 Julien Bryois 2022-01-06 Coloc AD
html b4cdba1 Julien Bryois 2022-01-06 Build site.
Rmd 0209d6a Julien Bryois 2022-01-06 Coloc AD
html 820efe7 Julien Bryois 2022-01-06 Build site.
Rmd 4c5ae3d Julien Bryois 2022-01-06 Coloc AD
html 770527b Julien Bryois 2021-12-17 Build site.
html 0be678f Julien Bryois 2021-12-17 Build site.
Rmd 4ad463e Julien Bryois 2021-12-17 Coloc PD
Rmd 0bad9f7 Julien Bryois 2021-12-17 Coloc PD
html 94e7384 Julien Bryois 2021-12-17 Build site.
Rmd b51fccc Julien Bryois 2021-12-17 Coloc PD
html 889d05a Julien Bryois 2021-12-16 Build site.
Rmd 8ef85a1 Julien Bryois 2021-12-16 Coloc PD
html a4a1ca2 Julien Bryois 2021-12-16 Build site.
Rmd 4198c1d Julien Bryois 2021-12-16 Coloc PD
html 77dd2a4 Julien Bryois 2021-12-16 Build site.
Rmd e0f0f51 Julien Bryois 2021-12-16 Coloc MS
html 8ddfc74 Julien Bryois 2021-12-16 Build site.
Rmd 4eef4d4 Julien Bryois 2021-12-16 Coloc MS
html b5b88a0 Julien Bryois 2021-12-16 Build site.
Rmd 7df50cb Julien Bryois 2021-12-16 Coloc MS
html 9408e85 Julien Bryois 2021-12-16 Build site.
Rmd 752513d Julien Bryois 2021-12-16 Coloc MS

Libraries

library(tidyverse)
library(data.table)
library(coloc)
library(parallel)
library(GenomicRanges)
library(rtracklayer)
selected_trait <- 'pd'

Process GWAS

AD Schwartzentruber

if(selected_trait=='ad'){
sumstats <- vroom::vroom('data/gwas/ad/GCST90012877_buildGRCh37.tsv') %>% 
  mutate(chr=paste0('chr',chromosome)) %>% 
  dplyr::rename(bp_b37=base_pair_location) %>% 
  dplyr::select(variant_id,p_value,chr,bp_b37,effect_allele,other_allele,beta,se=standard_error) %>% 
  filter(!is.na(p_value))

  if(file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')){
  #Supplementary table file from Schwartzentruber et al.
  top_snps <- readxl::read_xlsx('data/gwas/ad/41588_2020_776_MOESM3_ESM.xlsx',sheet=2) %>% 
    dplyr::select(Chr,SNP,`Lead SNP pos`) %>% dplyr::rename(pos=`Lead SNP pos`) %>% 
    mutate(bp_37=paste0('chr',Chr,':',pos)) %>% dplyr::select(-Chr) %>% 
    mutate(SNP=gsub('_.+','',SNP))

  #Add novel loci only observed in replication 
  #+ discovery (not used for their downstream analysis)
  additional_snps <- readxl::read_xlsx('data/gwas/ad/41588_2020_776_MOESM3_ESM.xlsx',sheet=4,skip=43) %>%
    dplyr::select(1,2,3) %>% 
    setNames(c('chr','pos','SNP')) %>% 
    filter(!is.na(SNP)) %>% 
    mutate(bp_37=paste0('chr',chr,':',pos)) %>% 
    dplyr::select(SNP,pos,bp_37)

  top_snps <- rbind(top_snps,additional_snps)
}

Get 1KG LD for the top GWAS SNPs at each loci

if(!file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')){
proxy_snps <- lapply(top_snps$SNP,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", 
                     token = '72edb9cc22c9', file = FALSE)
saveRDS(proxy_snps,'data/gwas/ad/loci_LDlinkR.allSNPs.EUR.rds')
}

Define each locus as the most extreme coordinates of SNPs in LD (r2>=0.1) with the index GWAS SNP.

if(!file.exists('data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,
                    GWAS_snp_pos=top_snp_pos,
                    locus_name=locus_name,
                    chrom=chr,
                    start=min(pos),
                    end=max(pos))
      return(locus)
   }
    else{
      return(NULL)
    }
  }) %>% 
    bind_rows() %>% 
  write_tsv(.,'data/gwas/ad/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/ad/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)

  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/ad/ad_GWAS_index_snps.v2.bed',col_names = FALSE)
}
if(!file.exists('data/gwas/ad/closest.protein.coding.bed')){
  gtf_b37 <- rtracklayer::import('data/gencode/gencode.v39lift37.annotation.gtf.gz') %>% 
    as.data.frame() %>% 
    as_tibble()
}
if(!file.exists('data/gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed')){
  protein_coding <- filter(gtf_b37,type=='gene',gene_type=='protein_coding') %>% 
    mutate(gene_label=paste0(gene_name,'_',gsub('\\..+','',gene_id))) %>% 
    dplyr::select(seqnames,start,end,gene_label) %>% 
    filter(seqnames %in% paste0('chr',1:22)) %>% 
    mutate(seqnames=as.character(seqnames)) %>% 
    arrange(seqnames,start) %>% 
    write_tsv('data/gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed',col_names = FALSE)
}
source ~/.bashrc
cd data/gwas/ad
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a ad_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

PD meta-analysis

if(selected_trait=='pd'){
sumstats <- vroom::vroom('data/gwas/pd/ParkinsonMeta_Nalls2014_2019.tbl') %>% 
  dplyr::select(variant_id=MarkerName,p_value=`P-value`,chr,bp_b37=bp,effect_allele=Allele1,other_allele=Allele2,beta=Effect,se=StdErr) %>%
  mutate(effect_allele=toupper(effect_allele),
         other_allele=toupper(other_allele)) %>% 
  unique() #some lines are duplicated in the

  if(file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')){
  top_snps <- readxl::read_xlsx('data/gwas/pd/Table S2. Detailed summary statistics on all nominated risk variants, known and novel_.xlsx') %>%
    filter(!is.na(`Locus Number`),`Locus Number`!='NA')
}
if(!file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')){
  proxy_snps <- lapply(top_snps$SNP,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", token = '72edb9cc22c9', file = FALSE)
  saveRDS(proxy_snps,'data/gwas/pd/loci_LDlinkR.allSNPs.EUR.rds')
}
if(!file.exists('data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,GWAS_snp_pos=top_snp_pos,locus_name=locus_name,chrom=chr,start=min(pos),end=max(pos))
      return(locus)
    }
    else{
      return(NULL)
    }
  }) %>% 
    bind_rows()
  write_tsv(loci,'data/gwas/pd/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/pd/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)
}
if(!file.exists('data/gwas/pd/closest.protein.coding.bed')){
  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/pd/pd_GWAS_index_snps.v2.bed',col_names = FALSE)
}
source ~/.bashrc

cd data/gwas/pd
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a pd_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

MS GWAS

if(selected_trait=='ms'){
sumstats <- vroom::vroom('data/gwas/ms/discovery_metav3.0.meta') %>% 
  mutate(CHR=paste0('chr',CHR)) %>% 
  filter(!is.na(P)) %>% 
  mutate(beta=log(OR),
         se=abs(beta/qnorm(P/2))) %>% 
  dplyr::select(variant_id=SNP,p_value=P,chr=CHR,bp_b37=BP,effect_allele=A1,other_allele=A2,beta,se) %>% 
  filter(beta!=0) #Some SNPs have OR=1, so se estimate = 0, leading to issues in coloc, we exclude these here


  if(file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- readxl::read_xlsx('data/gwas/ms/aav7188_Patsopoulos_Tables S1-S10.xlsx',sheet=8,skip=3)
}
if(!file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')){
  proxy_snps <- lapply(loci$SNP,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", token = '72edb9cc22c9', file = FALSE)
  saveRDS(proxy_snps,'data/gwas/ms/loci_LDlinkR.allSNPs.EUR.rds')
}
if(!file.exists('data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,GWAS_snp_pos=top_snp_pos,locus_name=locus_name,chrom=chr,start=min(pos),end=max(pos))
      return(locus)
    }
    else{
      return(NULL)
    }
  }) %>% bind_rows()
  write_tsv(loci,'data/gwas/ms/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/ms/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)
}
if(!file.exists('data/gwas/ms/closest.protein.coding.bed')){
  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/ms/ms_GWAS_index_snps.v2.bed',col_names = FALSE)
}
source ~/.bashrc

cd data/gwas/ms
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a ms_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

SCZ GWAS

if(selected_trait=='scz'){
sumstats <- vroom::vroom('data/gwas/scz/PGC3_SCZ_wave3_public.v2.tsv.gz') %>%
  mutate(CHR=paste0('chr',CHR)) %>% 
  filter(!is.na(P)) %>% 
  mutate(beta=log(OR),
         se=SE) %>% 
  dplyr::select(variant_id=SNP,p_value=P,chr=CHR,bp_b37=BP,effect_allele=A1,other_allele=A2,beta,se)


  if(file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')) {
    loci <- read_tsv('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')
  }
}

Get loci coordinates

if(!file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- readxl::read_xlsx('data/gwas/scz/Supplementary Table 2 - Replication index.xlsx',sheet=2) %>% filter(`P-comb`<5e-8) %>%  
    mutate(locus=paste0('chr',CHR,':',left,'-',right)) %>% dplyr::rename(locus_name=locus) %>% mutate(start=BP,end=BP) %>%  
    mutate(GWAS_snp=SNP,GWAS_snp_pos=paste0('chr',CHR,':',BP))
}
if(!file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')){
  proxy_snps <- lapply(loci$GWAS_snp_pos,LDlinkR::LDproxy,pop = "EUR", r2d = "r2", token = '72edb9cc22c9', file = FALSE)
  saveRDS(proxy_snps,'data/gwas/scz/loci_LDlinkR.allSNPs.EUR.rds')
}
if(!file.exists('data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')){
  loci <- lapply(proxy_snps,function(x){
    if(nrow(x)>1){
      x <- filter(x,R2>=0.1) 
      chr <- x$Coord %>% gsub(':.+','',.) %>% unique()
      pos <- x$Coord %>% gsub('chr[0-9]{1,2}:','',.) %>% as.numeric()
      locus_name <- paste0(chr,':',min(pos),'_',max(pos))
      top_snp_df <- filter(x,Distance==0) 
      top_snp <- top_snp_df %>% pull(RS_Number)
      top_snp_pos <- top_snp_df %>% pull(Coord)
      locus <- tibble(GWAS_snp=top_snp,GWAS_snp_pos=top_snp_pos,locus_name=locus_name,chrom=chr,start=min(pos),end=max(pos))
      return(locus)
    }
    else{
      return(NULL)
    }
  }) %>% bind_rows()
  write_tsv(loci,'data/gwas/scz/loci_LDlinkR.r2.0.1.EUR.txt')
}

Get closest gene

if(!file.exists('data/gwas/scz/closest.protein.coding.bed')){
  sumstats_min <- sumstats %>% dplyr::select(chr,variant_id,bp_b37,beta)
}
if(!file.exists('data/gwas/scz/closest.protein.coding.bed')){
  gwas_snp <- tibble(GWAS_SNP_pos=unique(loci$GWAS_snp_pos)) %>% 
    separate(GWAS_SNP_pos,into=c('chr','bp_b37'),sep=':') %>% 
    mutate(end=bp_b37) %>% 
    mutate(bp_b37=as.numeric(bp_b37)) %>% 
    left_join(.,sumstats_min,by=c('chr','bp_b37')) %>% 
    arrange(chr,bp_b37) %>% 
    write_tsv(.,'data/gwas/scz/scz_GWAS_index_snps.v2.bed',col_names = FALSE)
}
source ~/.bashrc

cd data/gwas/scz
ln -s ../../gencode/gencode.v39lift37.annotation.protein_coding.1_22.bed .

ml bedtools/2.25.0-goolf-1.7.20
bedtools closest -d -wa -a scz_GWAS_index_snps.v2.bed -b gencode.v39lift37.annotation.protein_coding.1_22.bed > closest.protein.coding.bed

Add closest gene to loci

closest <- read_tsv(paste0('data/gwas/',selected_trait,'/closest.protein.coding.bed'),col_names = FALSE) %>% 
  setNames(c('chr_snp','start_snp','end_snp','GWAS_snp','beta','chr_gene','start_gene','end_gene','gene','distance')) %>% 
  mutate(GWAS_snp_pos=paste0(chr_snp,':',start_snp)) %>% 
  dplyr::select(GWAS_snp,GWAS_snp_pos,gene,beta,distance) %>% 
  separate(gene,into=c('symbol','ensembl'),sep='_') %>% 
  add_count(GWAS_snp) %>% 
  group_by(GWAS_snp) %>% 
  mutate(locus_name_gene=ifelse(n==1,symbol,paste0(symbol,collapse=' - '))) %>% 
  ungroup() %>% 
  dplyr::select(GWAS_snp_pos,locus_name_gene,beta_top_GWAS=beta) %>% 
  unique() %>% arrange(-abs(beta_top_GWAS))
loci <- left_join(loci,closest,by='GWAS_snp_pos')

GET MAF for each SNP

This section is run only once

source ~/.bashrc

ml PLINK/1.90-goolf-1.7.20
cd data_sensitive/genotypes/processed
plink --bfile ../combined_7 --freq

Get SNP position in hg19 - hg38

Use: Match eQTL SNPs with GWAS SNPs

snp_pos <- data.table::fread('data_sensitive/genotypes/processed/snp_pos_hg38_hg19.mappings.txt',data.table = FALSE) %>% as_tibble()

Code below is run only once

zcat data_sensitive/genotypes/processed/combined_final.vcf.gz | cut -f 1,2,3 | grep -v '^#' > data_sensitive/genotypes/processed/snp_pos_hg38.txt 
snp_pos <- data.table::fread('data_sensitive/genotypes/processed/snp_pos_hg38.txt',data.table = F) %>% mutate(end=V2) %>% 
  setNames(c('chr','start','SNP_id','end'))
snp_pos_gr <- makeGRangesFromDataFrame(snp_pos,keep.extra.columns=TRUE)
path = system.file(package="liftOver", "extdata", "hg38ToHg19.over.chain")
ch = import.chain(path)
ch
seqlevelsStyle(snp_pos_gr) = "UCSC"  # necessary
cur19 = liftOver(snp_pos_gr, ch)
cur19 = unlist(cur19)
genome(cur19) = "hg19"

Number of SNPs not lifted over: 33

length(snp_pos_gr)-length(cur19)
cur19 <- as.data.frame(cur19) %>% 
  as_tibble() %>% 
  mutate(SNP2=paste0(seqnames,':',start)) %>% 
  select(SNP_id,seqnames,start,end,SNP2)
snp_pos <- dplyr::select(snp_pos,chr,start,SNP_id) %>% 
  left_join(.,cur19,by='SNP_id') %>% 
  setNames(c('chr_hg38','start_hg38','SNP','chr_hg19','start_hg19','end_hg19','SNP_id_hg19')) %>% 
  mutate(SNP_id_hg38=paste0('chr',chr_hg38,':',start_hg38)) %>% 
  dplyr::select(SNP,chr_hg19,start_hg19,SNP_id_hg19,chr_hg38,start_hg38,SNP_id_hg38) %>% 
  as_tibble()
maf <- data.table::fread('data_sensitive/genotypes/processed/plink.frq',data.table=FALSE) %>% dplyr::select(SNP,A1,A2,MAF) %>% as_tibble()
snp_pos <- inner_join(snp_pos,maf,by='SNP')
write_tsv(snp_pos,'data_sensitive/genotypes/processed/snp_pos_hg38_hg19.mappings.txt')

Run Coloc for each locus

Set location of nominal fastQTL result files:

eqtl_path <- 'data_sensitive/eqtl/PC70_nominal/'
ex <- paste0(eqtl_path,'Excitatory.neurons.')
inh <- paste0(eqtl_path,'Inhibitory.neurons.')
oli <- paste0(eqtl_path,'Oligodendrocytes.')
opc <- paste0(eqtl_path,'OPCs...COPs.')
micro <- paste0(eqtl_path,'Microglia.')
astro <- paste0(eqtl_path,'Astrocytes.')
endo <- paste0(eqtl_path,'Endothelial.cells.')
peri <- paste0(eqtl_path,'Pericytes.')

Functions

# eqtl_ex <- prepare_eqtl(micro,chrom_locus,sumstats_locus)
prepare_eqtl <- function(eqtl_path,chrom_locus,sumstats_locus){
  file <- paste0(eqtl_path,gsub('chr','',chrom_locus),'.gz')
  eqtl <- data.table::fread(file,header = FALSE,data.table=FALSE,nThread = 1) %>% 
    dplyr::select(gene=V1,SNP=V2,p_eqtl=V4,beta_eqtl=V5) %>% # if using fastQTL output
    #dplyr::select(gene=V1,SNP=V2,p_eqtl=V3,beta_eqtl=V4,se_eqtl=V5) %>% # if using QTLtools output
    mutate(se_eqtl=abs(beta_eqtl/qnorm(p_eqtl/2))) %>% #compute standard error from pvalue and beta
    inner_join(.,snp_pos,by='SNP') %>% 
  filter(SNP_id_hg19%in%sumstats_locus$SNP_id_hg19) %>% 
  add_count(gene) %>% 
  filter(n>10) #Only keep genes with at least 10 SNPs
}
#  ex_coloc <- run_coloc(eqtl_ex,'Excitatory neurons',sumstats_locus)
run_coloc <- function(tissue_sumstats,tissue_name,sumstats_locus){
  
  if(nrow(tissue_sumstats)==0){
    return (NULL)
  }
  
  out <- lapply(unique(tissue_sumstats$gene),function(x){
    message(x)
    tissue_sumstats_gene <- filter(tissue_sumstats,gene==x)
    sumstats_locus_gene <- sumstats_locus %>% inner_join(.,tissue_sumstats_gene,by='SNP_id_hg19')
    
    if (nrow(sumstats_locus_gene)>0){
  
     coloc_res_pval <- coloc.abf(
       dataset1=list(beta=sumstats_locus_gene$beta,
                     varbeta=sumstats_locus_gene$se^2,
                     type="cc"),
       dataset2=list(beta=sumstats_locus_gene$beta_eqtl,
                     varbeta=sumstats_locus_gene$se_eqtl^2,
                     sdY=1,
                     type="quant"))$summary %>% 
       as.data.frame()
    colnames(coloc_res_pval) <- x
    
    #Get direction of effect for all SNPs at the locus
    sumstats_locus_gene <- sumstats_locus_gene %>% 
      mutate(direction=case_when(
      (effect_allele==A1 & other_allele==A2)  ~ sign(beta*beta_eqtl),
      (effect_allele==A2 & other_allele==A1)  ~ -sign(beta*beta_eqtl),
    TRUE ~ 0))
   
   #Get Proportion of positive direction
   direction_prop <- sumstats_locus_gene %>% 
     summarise(prop_pos_direction=sum(direction==1)/n()) %>% 
     setNames(x) %>% 
     as.data.frame()
   rownames(direction_prop) <- 'prop_pos_direction'
   
   direction_sign <- sumstats_locus_gene %>% 
     #Take SNP with strongest evidence of an effect on gene expression
     filter(p_eqtl==min(p_eqtl)) %>% 
     slice_min(n=1,p_value,with_ties=FALSE) %>% 
     summarise(direction=direction,
               beta_gwas=case_when(
                    (effect_allele==A1 & other_allele==A2)  ~ beta,
                    (effect_allele==A2 & other_allele==A1)  ~ -beta
                    ),
              beta_eqtl=beta_eqtl,
               beta_smr=case_when(
                    direction== 1  ~ abs(beta)/abs(beta_eqtl),
                    direction== -1  ~ -(abs(beta)/abs(beta_eqtl))
                 ),
               ) %>% 
     t() %>% 
     as.data.frame()
   colnames(direction_sign) <- x
   
    #Add direction of effect to coloc results
    coloc_res_pval <- rbind(coloc_res_pval,direction_sign,direction_prop)

    return(coloc_res_pval)
    }
    else{
      return (NULL)
    }
    
})
  
  out_pvalue <- out %>% 
    bind_cols() %>% 
    t() %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    as_tibble() %>% 
    arrange(-PP.H4.abf) %>% 
    mutate(tissue=tissue_name)
  
  return(out_pvalue)
}

Run

coloc_results_all <- mclapply(1:nrow(loci),function(i){

  #Get coordinates from the GWAS locus
  chrom_locus <- loci$chrom[i]
  start <- loci$start[i] %>% as.numeric()
  end <- loci$end[i] %>% as.numeric()
  
  closest_gene_locus <- loci$locus_name_gene[i]
  beta_top_GWAS_locus <- loci$beta_top_GWAS[i]
  GWAS_snp_name <- loci$GWAS_snp[i]
  GWAS_snp_pos_name <- loci$GWAS_snp_pos[i]

  #Keep GWAS sumstats of SNPs in the locus
  sumstats_locus <- filter(sumstats,chr==chrom_locus) %>% 
    filter(bp_b37>=start & bp_b37<=end) %>% 
    dplyr::mutate(SNP_id_hg19=paste0(chr,':',bp_b37))

  #Preparing eQTL files - keeping SNPs from the eQTL data that math SNPs in the GWAS locus
  eqtl_ex <- prepare_eqtl(ex,chrom_locus,sumstats_locus)
  eqtl_inh <- prepare_eqtl(inh,chrom_locus,sumstats_locus)
  eqtl_oli <- prepare_eqtl(oli,chrom_locus,sumstats_locus)
  eqtl_opc <- prepare_eqtl(opc,chrom_locus,sumstats_locus)
  eqtl_micro <- prepare_eqtl(micro,chrom_locus,sumstats_locus)
  eqtl_astro <- prepare_eqtl(astro,chrom_locus,sumstats_locus)
  eqtl_endo <- prepare_eqtl(endo,chrom_locus,sumstats_locus)
  eqtl_peri <- prepare_eqtl(peri,chrom_locus,sumstats_locus)

  #Running colocalization analysis
  ex_coloc <- run_coloc(eqtl_ex,'Excitatory neurons',sumstats_locus)
  inh_coloc <- run_coloc(eqtl_inh,'Inhibitory neurons',sumstats_locus)
  oli_coloc <- run_coloc(eqtl_oli,'Oligodendrocytes',sumstats_locus)
  opc_coloc <- run_coloc(eqtl_opc,'OPCs / COPs',sumstats_locus)
  micro_coloc <- run_coloc(eqtl_micro,'Microglia',sumstats_locus)
  astro_coloc <- run_coloc(eqtl_astro,'Astrocytes',sumstats_locus)
  endo_coloc <- run_coloc(eqtl_endo,'Endothelial cells',sumstats_locus)
  peri_coloc <- run_coloc(eqtl_peri,'Pericytes',sumstats_locus)
  
  pval_eqtl <- rbind(ex_coloc,inh_coloc,oli_coloc,
                     opc_coloc,micro_coloc,astro_coloc,endo_coloc,peri_coloc)
  
  if(!is.null(pval_eqtl)){
    pval_eqtl <- pval_eqtl %>% 
      mutate(type='eQTL',
           locus=loci$locus_name[i],
           coloc_method='beta') %>% 
      mutate(closest_gene=closest_gene_locus,
             beta_top_GWAS=beta_top_GWAS_locus,
             GWAS_snp=GWAS_snp_name,
             GWAS_snp_pos=GWAS_snp_pos_name)
  }
  results <- pval_eqtl 
  if(!is.null(results)){
    results <- results %>% 
      arrange(-PP.H4.abf) %>% 
      dplyr::select(locus,closest_gene,GWAS_snp,GWAS_snp_pos,beta_top_GWAS,everything())
  }
  
  return(results)
},mc.cores = 36,mc.preschedule = FALSE)
coloc_results_all <- coloc_results_all %>% 
  bind_rows() %>% 
  arrange(-PP.H4.abf) %>% 
  separate(gene,into=c('symbol','ensembl'),sep='_')

Add LOEUF

loeuf <- read_tsv('data/gnomad_loeuf/supplementary_dataset_11_full_constraint_metrics.tsv') %>% 
  filter(canonical==TRUE) %>% 
  dplyr::select(gene,gene_id,transcript,oe_lof_upper,p) %>% 
  mutate(oe_lof_upper_bin = ntile(oe_lof_upper, 10)) %>% 
  dplyr::rename(ensembl=gene_id,symbol=gene) %>% 
  dplyr::select(ensembl,oe_lof_upper_bin)
coloc_results_all <- left_join(coloc_results_all,loeuf,by='ensembl')

Write

dir.create('output/coloc',showWarnings = FALSE)
write_tsv(coloc_results_all,paste0('output/coloc/coloc.',selected_trait,'.txt'))

sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /pstore/apps/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblasp-r0.3.1.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rtracklayer_1.48.0   GenomicRanges_1.40.0 GenomeInfoDb_1.24.0 
 [4] IRanges_2.22.2       S4Vectors_0.26.1     BiocGenerics_0.34.0 
 [7] coloc_5.1.0          data.table_1.12.8    forcats_0.5.0       
[10] stringr_1.4.0        dplyr_1.0.0          purrr_0.3.4         
[13] readr_1.3.1          tidyr_1.1.0          tibble_3.0.1        
[16] ggplot2_3.3.3        tidyverse_1.3.0     

loaded via a namespace (and not attached):
 [1] nlme_3.1-148                matrixStats_0.56.0         
 [3] bitops_1.0-6                fs_1.4.1                   
 [5] bit64_0.9-7                 lubridate_1.7.9            
 [7] httr_1.4.1                  rprojroot_1.3-2            
 [9] tools_4.0.1                 backports_1.1.7            
[11] R6_2.4.1                    irlba_2.3.3                
[13] DBI_1.1.0                   colorspace_1.4-1           
[15] withr_2.2.0                 tidyselect_1.1.0           
[17] gridExtra_2.3               bit_1.1-15.2               
[19] compiler_4.0.1              git2r_0.27.1               
[21] Biobase_2.48.0              cli_2.0.2                  
[23] rvest_0.3.5                 xml2_1.3.2                 
[25] DelayedArray_0.14.0         scales_1.1.1               
[27] mixsqp_0.3-43               Rsamtools_2.4.0            
[29] digest_0.6.25               rmarkdown_2.2              
[31] XVector_0.28.0              pkgconfig_2.0.3            
[33] htmltools_0.5.1.1           dbplyr_1.4.4               
[35] rlang_0.4.10                readxl_1.3.1               
[37] susieR_0.11.42              rstudioapi_0.11            
[39] generics_0.0.2              jsonlite_1.6.1             
[41] vroom_1.3.2                 BiocParallel_1.22.0        
[43] RCurl_1.98-1.2              magrittr_1.5               
[45] GenomeInfoDbData_1.2.3      Matrix_1.2-18              
[47] Rcpp_1.0.6                  munsell_0.5.0              
[49] fansi_0.4.1                 viridis_0.5.1              
[51] lifecycle_0.2.0             stringi_1.4.6              
[53] whisker_0.4                 yaml_2.2.1                 
[55] SummarizedExperiment_1.18.1 zlibbioc_1.34.0            
[57] plyr_1.8.6                  grid_4.0.1                 
[59] blob_1.2.1                  promises_1.1.1             
[61] crayon_1.3.4                lattice_0.20-41            
[63] Biostrings_2.56.0           haven_2.3.1                
[65] hms_0.5.3                   knitr_1.28                 
[67] pillar_1.4.4                reprex_0.3.0               
[69] XML_3.99-0.3                glue_1.4.1                 
[71] evaluate_0.14               modelr_0.1.8               
[73] vctrs_0.3.1                 httpuv_1.5.4               
[75] cellranger_1.1.0            gtable_0.3.0               
[77] reshape_0.8.8               assertthat_0.2.1           
[79] xfun_0.14                   broom_0.5.6                
[81] later_1.1.0.1               viridisLite_0.3.0          
[83] GenomicAlignments_1.24.0    workflowr_1.6.2            
[85] ellipsis_0.3.1