Commit 351ceb48 by righetti2

Update README.md

parent 0dfa7f0a
...@@ -4,7 +4,7 @@ This project contains the codes and datasets used in the paper 'Genetic dietary ...@@ -4,7 +4,7 @@ This project contains the codes and datasets used in the paper 'Genetic dietary
Codes: Codes:
- read2CNV.slurm: slurm file that takes as input the reads of a single ancient genome, maps it to the reference GRCh38/hg38 and, after due filtering described in detail in the supplementary materials and methods of the paper, returns the copy number estimate of each gene. - read2CNV.slurm: slurm file that takes as input the reads of a single ancient genome, maps it to the reference GRCh38/hg38 and, after due filtering described in detail in the supplementary materials and methods of the paper, returns the copy number estimate of each gene.
- CNV_Analyses.slum: slurm file that takes the estimates of each single ancient genome as input and combines them into a single dataset in which clusters of recent paralogues are grouped into single lines (one per gene/cluster of genes) and in which each column corresponds to an ancient genome. - CNV_Analyses.slum: slurm file that takes the estimates of each single ancient genome as input and combines them into a single dataset in which clusters of recent paralogues are grouped into single lines (one per gene/cluster of genes) and in which each column corresponds to an ancient genome.
- HPA_Info_Extracter.R: this R-executable script extracts information from the two downloadable Human Protein Atlas expression datasets at https://www.proteinatlas.org/about/download, for rna (https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip) and protein (https://www.proteinatlas.org/download/normal_tissue.tsv.zip) respectively. For each gene, the 3 organs (or tissue groups) with the highest expression in the rna dataset and the protein dataset are identified. In the case of ties, a frequent occurrence in the discrete categories of the protein dataset, more than 3 organs are taken. These data can then be cross-referenced with a list of genes of interest, in our case with the list of genes with CNVs at population level. Before running the script, change to your own reference paths.\\ - HPA_Info_Extracter.R: this R-executable script extracts information from the two downloadable Human Protein Atlas expression datasets at https://www.proteinatlas.org/about/download, for rna (https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip) and protein (https://www.proteinatlas.org/download/normal_tissue.tsv.zip) respectively. For each gene, the 3 organs (or tissue groups) with the highest expression in the rna dataset and the protein dataset are identified. In the case of ties, a frequent occurrence in the discrete categories of the protein dataset, more than 3 organs are taken. These data can then be cross-referenced with a list of genes of interest, in our case with the list of genes with CNVs at population level. Before running the script, change to your own reference paths.
Datasets: Datasets:
- CNV_in_Ancient_Genomes.tsv: copy number estimates for selected diet related genes in all analysed ancient genomes. - CNV_in_Ancient_Genomes.tsv: copy number estimates for selected diet related genes in all analysed ancient genomes.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment