Last updated: 2022-02-02
Checks: 7 0
Knit directory: snRNA_eqtl/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20211124)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version d7f2b88. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: data/magma_interaction_eqtl/
Ignored: data/magma_specific_genes/
Ignored: data_sensitive/
Ignored: output/figures/
Untracked files:
Untracked: analysis/._index.Rmd
Untracked: analysis/additional/
Untracked: analysis/alternatives/
Untracked: analysis/clean_analysis.Rmd
Untracked: analysis/log_run_slurm_out
Untracked: analysis/log_run_slurm_stderr
Untracked: analysis/portfolio/
Untracked: analysis/revision.Rmd
Untracked: analysis/run_slurm.Rscript
Untracked: analysis/run_slurm.sbatch
Untracked: data/dice/
Untracked: data/epigenome_enrichment/
Untracked: data/gencode/
Untracked: data/gnomad_loeuf/
Untracked: data/gtex/
Untracked: data/gwas/
Untracked: data/gwas_epigenome_overlap/
Untracked: data/metabrain/
Untracked: data/umap/
Untracked: output/eqtl/
Untracked: output_almost_final/
Untracked: output_tmp_QTLtools/
Untracked: output_tmp_fastQTL_sample_ambient_removed/
Unstaged changes:
Modified: .gitignore
Modified: analysis/04_coloc.Rmd
Modified: analysis/_site.yml
Modified: analysis/plot_figures.Rmd
Deleted: output/README.md
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/02_run_eQTL.Rmd
) and HTML (docs/02_run_eQTL.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | d7f2b88 | Julien Bryois | 2022-02-02 | eQTL pipeline |
html | fa2d582 | Julien Bryois | 2022-01-28 | Build site. |
Rmd | fe925c3 | Julien Bryois | 2022-01-28 | run eQTL with pseudo-bulk |
html | c0af206 | Julien Bryois | 2022-01-25 | Build site. |
Rmd | 2f8e71b | Julien Bryois | 2022-01-25 | run eQTL |
html | d2896af | Julien Bryois | 2021-12-01 | Build site. |
Rmd | b1d314e | Julien Bryois | 2021-12-01 | eqtl_code_added, figures added |
Use fastQTL for permutation pass and nominal pass. QTL tools for independent pass
library(tidyverse)
library(qvalue)
library(parallel)
Set path for eqtl data
Sys.setenv(eqtl_path='data_sensitive/eqtl')
cd $eqtl_path
source ~/.bashrc
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
for f in PC*
do
cd $f
for bed in *.bed.gz
do
cell=`echo $bed | sed 's/.bed.gz//'`
echo $cell
fastQTL --vcf combined_final.vcf.gz --bed $bed --permute 1000 --out $cell.quantile.txt.gz --normal --cov $cell.cov.txt.gz --commands 22 $cell.commands.22.tmp
sed 's/^ /sbatch --wrap="/; s/$/"/' $cell.commands.22.tmp > $cell.commands.22.txt
rm $cell.commands.22.tmp
done
cd ..
done
cd $eqtl_path
source ~/.bashrc
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
for d in PC*
do
cd $d
rm *quantile.txt.gz
for f in *.commands.22.txt
do
sh $f
done
cd ..
done
Check that all eQTL jobs ran successfully
cd $eqtl_path
for d in PC{10,20,30,40,50,60,70,80,90,100,110,120}
do
cd $d
echo $d
grep 'Running time' slurm-* | wc -l
cd ..
done
PC10
176
PC20
176
PC30
176
PC40
176
PC50
176
PC60
176
PC70
176
PC80
176
PC90
176
PC100
176
PC110
176
PC120
176
All good (176)
fastqtl_out_files <- list.files('data_sensitive/eqtl/',pattern='.gz.[0-9]',full.names = T,recursive = TRUE)
d <- tibble(files=fastqtl_out_files,
cell_type=basename(files) %>%
gsub('.quantile.txt.gz..+','',.) %>%
gsub('\\.',' ',.) %>%
gsub('OPCs COPs','OPCs / COPs',.)) %>%
mutate(PCs=dirname(files) %>% basename(.)) %>%
mutate(file_content=map(files,read.table,header=F)) %>%
unnest(file_content) %>%
filter(!is.na(V11)) %>%
select(-files) %>%
setNames(c("cell_type","PCs","pid", "nvar", "shape1", "shape2", "dummy", "sid", "dist", "npval", "slope","ppval", "bpval")) %>%
group_by(cell_type,PCs) %>%
mutate(adj_p=qvalue(bpval)$qvalues) %>%
ungroup() %>%
arrange(cell_type,bpval)
Write eQTL results with different number of expression principal components
dir.create('output/eqtl/',showWarnings = FALSE,recursive = TRUE)
write_tsv(d,'output/eqtl/eqtl.allPCs.txt')
Get number of PCs maximising the number of eQTL discoveries
selected_pcs <- d %>%
filter(adj_p<0.05) %>%
dplyr::count(PCs) %>% filter(n==max(n))
Write eQTL results with selected number of PCs
d %>%
filter(PCs==selected_pcs$PCs) %>%
dplyr::select(-PCs) %>%
write_tsv(.,paste0('output/eqtl/eqtl.',selected_pcs$PCs,'.txt'))
Removing tmp files
cd $eqtl_path
for d in PC*
do
cd $d
rm *quantile*
cd ..
done
We now run a nominal pass to get pvalues for all SNPs (necessary for colocalization analysis).
We will use 70 PCs as this maximised the number of eQTL discoveries using permutations.
source ~/.bashrc
cd $eqtl_path
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
mkdir -p 'PC70_nominal'
cd PC70_nominal
ln -s ../PC70/*.bed.gz .
ln -s ../PC70/*.bed.gz.tbi .
ln -s ../PC70/*.cov.txt.gz .
ln -s ../PC70/combined_final.vcf.gz .
ln -s ../PC70/combined_final.vcf.gz.tbi .
for bed in *.bed.gz
do
cell=`echo $bed | sed 's/.bed.gz//'`
fastQTL --vcf combined_final.vcf.gz --bed $bed --out $cell.quantile.txt.gz --normal --cov $cell.cov.txt.gz --commands 22 $cell.commands.22.tmp
sed 's/^ /sbatch --wrap="/; s/$/"/' $cell.commands.22.tmp > $cell.commands.22.txt
sh $cell.commands.22.txt
rm $cell.commands.22.tmp
done
cd $eqtl_path'/PC70_nominal'
grep 'Running time' slurm-* | wc -l
176
176 files successfully completed. All good!
Now let’s gzip the association files and give them a nice name.
cd $eqtl_path'/PC70_nominal'
rm *quantile.txt.gz
rm *.bed.gz
rm *.bed.gz.tbi
rm *cov.txt.gz
rm *.commands.22.txt
rm *.vcf.gz
rm *.vcf.gz.tbi
for f in *.quantile.txt.gz.*
do
gzip $f
x=`echo $f | sed 's/:.\+$/.gz/' | sed 's/.quantile.txt.gz//'`
mv $f.gz $x
done
We will use QTL tools to find independent cis-eQTLs.
bed_files_fastqtl <- list.files('data_sensitive/eqtl',pattern='.bed.gz$',full.names = T)
read_fastQTL_write_QTLtools <- function(i){
output_file_name <- gsub('bed.gz','qtltools.bed',bed_files_fastqtl[i]) %>%
gsub('data_sensitive/eqtl/','data_sensitive/eqtl/PC70_indep/',.)
fasqtl <- read_tsv(bed_files_fastqtl[i]) %>%
mutate(pid='.',strand='+') %>%
dplyr::select(`#Chr`:ID,pid,strand,everything())
dir.create('data_sensitive/eqtl/PC70_indep',recursive = TRUE,showWarnings = FALSE)
write_tsv(fasqtl,output_file_name)
}
mclapply(1:length(bed_files_fastqtl),read_fastQTL_write_QTLtools,mc.cores = 8)
mkdir -p $eqtl_path'/PC70_indep'
cd $eqtl_path'/PC70_indep'
ml tabix/0.2.6-goolf-1.7.20
for f in *.qtltools.bed
do
bgzip $f && tabix -p bed $f.gz
done
cd $eqtl_path'/PC70_indep'
source ~/.bashrc
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
ml HTSlib/1.9-GCCcore-7.3.0
ml zlib/1.2.11-GCCcore-7.3.0
ml lzma/4.32.7-GCCcore-7.3.0
ml cURL/7.60.0-GCCcore-7.3.0
for bed in *qtltools.bed.gz
do
cell=`echo $bed | sed 's/.qtltools.bed.gz//'`
for i in {0..22}
do
echo 'sbatch --mem=4000MB --wrap="QTLtools cis --vcf combined_final.vcf.gz --bed '$bed' --cov '$cell'.cov.txt.gz --permute 1000 --normal --chunk '$i' 22 --out '$cell'.conditional_permute_'$i'_22"' >> $cell.commands.cond.permutation.22.txt
done
done
cd $eqtl_path'/PC70_indep'
source ~/.bashrc
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
ml HTSlib/1.9-GCCcore-7.3.0
ml zlib/1.2.11-GCCcore-7.3.0
ml lzma/4.32.7-GCCcore-7.3.0
ml cURL/7.60.0-GCCcore-7.3.0
ln -s ../PC70/combined_final.vcf.gz .
ln -s ../PC70/combined_final.vcf.gz.tbi .
ln -s ../PC70/*.cov.txt.gz .
for f in *.commands.cond.permutation.22.txt
do
sh $f
done
cd $eqtl_path'/PC70_indep'
grep 'Running time' slurm-* | wc -l
All good!
Gather all output
cd $eqtl_path'/PC70_indep'
for bed in *.qtltools.bed.gz
do
cell=`echo $bed | sed 's/.qtltools.bed.gz//'`
cat $cell.conditional_permute_* >> $cell.conditional.permute.txt
done
rm *.conditional_permute_*
cd $eqtl_path'/PC70_indep'
ml R/4.0.1-foss-2018b
wget https://raw.githubusercontent.com/qtltools/qtltools/master/scripts/qtltools_runFDR_cis.R .
for f in *.conditional.permute.txt
do
cell=`echo $f | sed 's/.conditional.permute.txt//'`
Rscript qtltools_runFDR_cis.R $f 0.05 $cell.permutations_all
done
Read bed files and only retain significant genes.
bed_files <- list.files('data_sensitive/eqtl/PC70_indep',pattern='.qtltools.bed.gz$',full.names = T)
read_fastQTL_write_QTLtools_conditional <- function(i){
output_file_name <- gsub('qtltools.bed.gz','qtltools.conditional.bed',bed_files[i])
significant_hits <- read_delim(gsub('qtltools.bed.gz','permutations_all.significant.txt',bed_files[i]),delim = ' ',col_names=FALSE)
fasqtl <- read_tsv(bed_files[i]) %>%
filter(ID%in%significant_hits$X1)
threshold <- read_delim(gsub('qtltools.bed.gz','permutations_all.thresholds.txt',bed_files[i]),
delim = ' ',col_names=FALSE) %>%
filter(X1%in%significant_hits$X1)
write_tsv(fasqtl,output_file_name)
write_delim(threshold,gsub('qtltools.bed.gz','permutations_all.thresholds.txt',bed_files[i]),col_names=FALSE,delim = ' ')
}
mclapply(1:length(bed_files),read_fastQTL_write_QTLtools_conditional,mc.cores=8)
cd $eqtl_path'/PC70_indep'
ml tabix/0.2.6-goolf-1.7.20
for f in *.qtltools.conditional.bed
do
bgzip $f && tabix -p bed $f.gz
done
cd $eqtl_path'/PC70_indep'
source ~/.bashrc
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
ml HTSlib/1.9-GCCcore-7.3.0
ml zlib/1.2.11-GCCcore-7.3.0
ml lzma/4.32.7-GCCcore-7.3.0
ml cURL/7.60.0-GCCcore-7.3.0
rm slurm*
for bed in *.qtltools.conditional.bed.gz
do
cell=`echo $bed | sed 's/.qtltools.conditional.bed.gz//'`
echo 'sbatch --mem=8000MB --wrap="QTLtools cis --vcf combined_final.vcf.gz --bed '$bed' --cov '$cell'.cov.txt.gz --mapping '$cell'.permutations_all.thresholds.txt --normal --out '$cell'.conditional.txt"' > $cell.commands.cond.txt
done
for f in *.commands.cond.txt
do
sh $f
done
cd $eqtl_path'/PC70_indep'
grep 'Running time' slurm-* | wc -l
8
8 cell types successfully ran. All good!
Get conditional results
conditional_files <- list.files('data_sensitive/eqtl/PC70_indep',pattern='*conditional.txt',full.names = T)
header <- c('phe_id','phe_chr','phe_from','phe_to','phe_strd','n_var_in_cis','dist_phe_var','var_id', 'var_chr','var_from','var_to','rank','fwd_pval','fwd_r_squared','fwd_slope','fwd_best_hit','fwd_sig', 'bwd_pval','bwd_r_squared','bwd_slope','bwd_best_hit','bwd_sig')
d <- tibble(files=conditional_files,
cell_type=gsub('.conditional.txt','',basename(conditional_files)) %>%
gsub('\\.',' ',.) %>%
gsub('OPCs COPs','OPCs / COPs',.),
file_content=map(files,read_delim,delim=' ',col_names=FALSE)
) %>%
dplyr::select(-files) %>%
unnest(file_content) %>%
setNames(c('cell_type',header)) %>%
filter(bwd_best_hit==1)
write_tsv(d,'output/eqtl/eqtl.70PCs.indep.txt')
We will use CaVEMan for fine-mapping
d <- read_tsv('output/eqtl/eqtl.PC70.txt') %>%
mutate(cell_type=make.names(cell_type)) %>%
filter(adj_p<0.05)
bed_files_fastqtl <- list.files('data_sensitive/eqtl',pattern='.bed.gz$',full.names = T)
read_fastQTL_write_caveman <- function(i){
output_file_name <- gsub('bed.gz','caveman.bed',bed_files_fastqtl[i]) %>% gsub('eqtl','eqtl/PC70_caveman',.)
cell_type_file <- basename(bed_files_fastqtl[i]) %>% gsub('.bed.gz','',.)
genes_to_keep <- filter(d,cell_type==cell_type_file) %>% pull(pid)
fasqtl <- read_tsv(bed_files_fastqtl[i]) %>%
filter(ID%in%genes_to_keep)
dir.create('data_sensitive/eqtl/PC70_caveman',showWarnings = FALSE)
write_tsv(fasqtl,output_file_name)
}
mclapply(1:length(bed_files_fastqtl),read_fastQTL_write_caveman,mc.cores=8)
write_tsv(d[,'sid',drop=FALSE],'data_sensitive/eqtl/PC70_caveman/sig_eQTL_snps.txt',col_names = FALSE)
cd $eqtl_path'/PC70_caveman'
ln -s ../PC70/combined_final.vcf.gz .
ln -s ../PC70/combined_final.vcf.gz.tbi .
zgrep -Fw -f sig_eQTL_snps.txt combined_final.vcf.gz | grep -v '#' | cut -f 1,2,3,4,5 > sig_eQTL_snps_alleles.txt
genotypes_selected <- read_tsv('data_sensitive/eqtl/PC70_caveman/sig_eQTL_snps_alleles.txt',col_names = FALSE) %>%
setNames(c('chr','pos','sid','A1','A2'))
cell_types <- unique(d$cell_type)
get_eqtl_cov <- function(i){
eqtl_cell_type <- filter(d,cell_type==cell_types[i]) %>%
dplyr::select(pid,sid) %>%
left_join(.,genotypes_selected,by='sid') %>%
dplyr::select(pid,chr,pos,A1,A2)
write_tsv(eqtl_cell_type,paste0('data_sensitive/eqtl/PC70_caveman/',make.names(cell_types[i]),'.eqtl.list'),col_names = FALSE)
}
lapply(1:length(cell_types),get_eqtl_cov)
Now get covariates
cov_files <- list.files('data_sensitive/eqtl/PC70',pattern = 'cov.txt.gz',full.names = TRUE)
read_cov_write <- function(i){
cov <- read_tsv(cov_files[i])
out <- gsub('.txt.gz','.caveman.txt',cov_files[i]) %>% gsub('PC70','PC70_caveman',.)
study <- cov[4,][,-1] %>% t() %>% as.data.frame()
study_model_matrix <- model.matrix(~0+V1,data=study) %>% as.data.frame() %>% t() %>%
as.data.frame() %>% rownames_to_column('id') %>% mutate(id=gsub('V1','',id))
disease <- cov[5,][,-1] %>% t() %>% as.data.frame()
disease_model_matrix <- model.matrix(~0+V1,data=disease) %>% as.data.frame() %>% t() %>%
as.data.frame() %>% rownames_to_column('id') %>% mutate(id=gsub('V1','',id))
cov_out <- rbind(cov[c(1:3),],study_model_matrix,disease_model_matrix,cov[c(6:nrow(cov)),])
cov_out <- cov_out[,-1]
write_tsv(cov_out,out)
}
lapply(1:length(cov_files),read_cov_write)
cd $eqtl_path'/PC70_caveman'
source ~/.bashrc
ml tabix/0.2.6-goolf-1.7.20
for f in *.caveman.bed
do
cell=`echo $f | sed 's/.caveman.bed//'`
CaVEMaN --single-signal --eqtl "$cell".eqtl.list --bed "$f" --vcf combined_final.vcf.gz \
--out "$cell".caveman.corrected.expression.bed --cov "$cell".cov.caveman.txt
done
cd $eqtl_path'/PC70_caveman'
source ~/.bashrc
mkdir -p results
ml tabix/0.2.6-goolf-1.7.20
for f in *.caveman.corrected.expression.bed
do
cell=`echo $f | sed 's/.caveman.corrected.expression.bed//'`
n_eqtl=`wc -l $f |cut -d ' ' -f 1`
echo $n_eqtl
n_jobs=$(($n_eqtl/50))
n_jobs=$(($n_jobs+1))
for i in $( eval echo {1..$n_jobs} )
do
sbatch --wrap="CaVEMaN --bed '$f' --vcf combined_final.vcf.gz --genes 50 \
--job-number '$i' --normal --out results/'$cell'_'$i' --verbose"
done
done
cd $eqtl_path'/PC70_caveman'/results
awk 'FNR>1||NR==1' Astrocytes_* > Astrocytes.all
awk 'FNR>1||NR==1' Endothelial.cells_* > Endothelial.cells.all
awk 'FNR>1||NR==1' Excitatory.neurons_* > Excitatory.neurons.all
awk 'FNR>1||NR==1' Inhibitory.neurons_* > Inhibitory.neurons.all
awk 'FNR>1||NR==1' Microglia_* > Microglia.all
awk 'FNR>1||NR==1' Oligodendrocytes_* > Oligodendrocytes.all
awk 'FNR>1||NR==1' OPCs...COPs_* > OPCs...COPs.all
awk 'FNR>1||NR==1' Pericytes_* > Pericytes.all
cd $eqtl_path'/PC70_caveman'/results
source ~/.bashrc
ml tabix/0.2.6-goolf-1.7.20
for f in *.all
do
CaVEMaN --best $f --out $f.best --verbose
done
rm *_*
files <- list.files('data_sensitive/eqtl/PC70_caveman/results',pattern='.best',full.names = TRUE)
d <- tibble(files) %>% mutate(file_content=map(files,read_tsv)) %>%
unnest(file_content) %>%
mutate(cell_type=basename(files) %>% gsub('\\.all.best','',.)) %>%
dplyr::select(cell_type,everything()) %>%
dplyr::select(-files) %>%
mutate(cell_type=gsub('\\.',' ',cell_type)) %>%
mutate(cell_type=gsub('OPCs COPs','OPCs / COPs',cell_type)) %>%
arrange(-Probability)
write_tsv(d,'output/eqtl/eqtl.70PCs.caveman.txt')
Sys.setenv(eqtl_path='data_sensitive/eqtl_pb')
cd $eqtl_path
source ~/.bashrc
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
for f in PC*
do
cd $f
for bed in *.bed.gz
do
cell=`echo $bed | sed 's/.bed.gz//'`
echo $cell
fastQTL --vcf combined_final.vcf.gz --bed $bed --permute 1000 --out $cell.quantile.txt.gz --normal --cov $cell.cov.txt.gz --commands 22 $cell.commands.22.tmp
sed 's/^ /sbatch --wrap="/; s/$/"/' $cell.commands.22.tmp > $cell.commands.22.txt
rm $cell.commands.22.tmp
done
cd ..
done
cd $eqtl_path
source ~/.bashrc
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
for d in PC*
do
cd $d
rm *quantile.txt.gz
for f in *.commands.22.txt
do
sh $f
done
cd ..
done
Check that all eQTL jobs ran successfully
cd $eqtl_path
for d in PC{10,20,30,40,50,60,70,80,90,100,110,120}
do
cd $d
echo $d
grep 'Running time' slurm-* | wc -l
cd ..
done
PC10
22
PC20
22
PC30
22
PC40
22
PC50
22
PC60
22
PC70
22
PC80
22
PC90
22
PC100
22
PC110
22
PC120
22
All done (22)
fastqtl_out_files <- list.files('data_sensitive/eqtl_pb/',pattern='.gz.[0-9]',full.names = T,recursive = TRUE)
d <- tibble(files=fastqtl_out_files,
cell_type=basename(files) %>%
gsub('.quantile.txt.gz..+','',.)) %>%
mutate(PCs=dirname(files) %>% basename(.)) %>%
mutate(file_content=map(files,read.table,header=F)) %>%
unnest(file_content) %>%
filter(!is.na(V11)) %>%
dplyr::select(-files) %>%
setNames(c("cell_type","PCs","pid", "nvar", "shape1", "shape2", "dummy", "sid", "dist", "npval", "slope","ppval", "bpval")) %>%
group_by(cell_type,PCs) %>%
mutate(adj_p=qvalue(bpval)$qvalues) %>%
ungroup() %>%
arrange(cell_type,bpval)
Write eQTL results with different number of expression principal components
dir.create('output/eqtl/',showWarnings = FALSE)
write_tsv(d,'output/eqtl/eqtl.pb.allPCs.txt')
Get number of PCs maximising the number of eQTL discoveries
selected_pcs <- d %>%
filter(adj_p<0.05) %>%
dplyr::count(PCs) %>% filter(n==max(n))
Write eQTL results with selected number of PCs
d %>%
filter(PCs==selected_pcs$PCs) %>%
dplyr::select(-PCs) %>%
write_tsv(.,paste0('output/eqtl/eqtl.pb.',selected_pcs$PCs,'.txt'))
Removing tmp files
cd $eqtl_path
for d in PC*
do
cd $d
rm *quantile*
cd ..
done
We now run a nominal pass to get pvalues for all SNPs.
We will use 70 PCs as this maximised the number of eQTL discoveries using permutations.
source ~/.bashrc
cd $eqtl_path
ml foss/2018b
ml Rmath/3.6.1-GCCcore-7.3.0
ml GSL/2.5-GCCcore-7.3.0
ml Eigen/3.3.7
ml Boost/1.67.0-foss-2018b
mkdir -p 'PC70_nominal'
cd PC70_nominal
ln -s ../PC70/*.bed.gz .
ln -s ../PC70/*.bed.gz.tbi .
ln -s ../PC70/*.cov.txt.gz .
ln -s ../PC70/combined_final.vcf.gz .
ln -s ../PC70/combined_final.vcf.gz.tbi .
for bed in *.bed.gz
do
cell=`echo $bed | sed 's/.bed.gz//'`
fastQTL --vcf combined_final.vcf.gz --bed $bed --out $cell.quantile.txt.gz --normal --cov $cell.cov.txt.gz --commands 22 $cell.commands.22.tmp
sed 's/^ /sbatch --wrap="/; s/$/"/' $cell.commands.22.tmp > $cell.commands.22.txt
sh $cell.commands.22.txt
rm $cell.commands.22.tmp
done
cd $eqtl_path'/PC70_nominal'
grep 'Running time' slurm-* | wc -l
22
22 files successfully completed. All good!
Now let’s gzip the association files and give them a nice name.
cd $eqtl_path'/PC70_nominal'
rm *quantile.txt.gz
rm *.bed.gz
rm *.bed.gz.tbi
rm *cov.txt.gz
rm *.commands.22.txt
rm *.vcf.gz
rm *.vcf.gz.tbi
for f in *.quantile.txt.gz.*
do
gzip $f
x=`echo $f | sed 's/:.\+$/.gz/' | sed 's/.quantile.txt.gz//'`
mv $f.gz $x
done
sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /pstore/apps/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblasp-r0.3.1.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] qvalue_2.20.0 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0
[5] purrr_0.3.4 readr_1.3.1 tidyr_1.1.0 tibble_3.0.1
[9] ggplot2_3.3.3 tidyverse_1.3.0 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 xfun_0.14 reshape2_1.4.4 splines_4.0.1
[5] haven_2.3.1 lattice_0.20-41 colorspace_1.4-1 vctrs_0.3.1
[9] generics_0.0.2 htmltools_0.5.1.1 yaml_2.2.1 blob_1.2.1
[13] rlang_0.4.10 later_1.1.0.1 pillar_1.4.4 withr_2.2.0
[17] glue_1.4.1 DBI_1.1.0 dbplyr_1.4.4 modelr_0.1.8
[21] readxl_1.3.1 plyr_1.8.6 lifecycle_0.2.0 munsell_0.5.0
[25] gtable_0.3.0 cellranger_1.1.0 rvest_0.3.5 evaluate_0.14
[29] knitr_1.28 httpuv_1.5.4 fansi_0.4.1 broom_0.5.6
[33] Rcpp_1.0.6 promises_1.1.1 backports_1.1.7 scales_1.1.1
[37] jsonlite_1.6.1 fs_1.4.1 hms_0.5.3 digest_0.6.25
[41] stringi_1.4.6 grid_4.0.1 rprojroot_1.3-2 cli_2.0.2
[45] tools_4.0.1 magrittr_1.5 crayon_1.3.4 whisker_0.4
[49] pkgconfig_2.0.3 ellipsis_0.3.1 xml2_1.3.2 reprex_0.3.0
[53] lubridate_1.7.9 rstudioapi_0.11 assertthat_0.2.1 rmarkdown_2.2
[57] httr_1.4.1 R6_2.4.1 nlme_3.1-148 git2r_0.27.1
[61] compiler_4.0.1