This project contains the codes and datasets used in the paper 'Genetic dietary adaptation in Neandertals, Denisovans and Sapiens revealed by gene copy number variation'.\\
This project contains the codes and datasets used in the paper 'Genetic dietary adaptation in Neandertals, Denisovans and Sapiens revealed by gene copy number variation'.
Codes:\\
Codes:
- read2CNV.slurm: slurm file that takes as input the reads of a single ancient genome, maps it to the reference GRCh38/hg38 and, after due filtering described in detail in the supplementary materials and methods of the paper, returns the copy number estimate of each gene.
- read2CNV.slurm: slurm file that takes as input the reads of a single ancient genome, maps it to the reference GRCh38/hg38 and, after due filtering described in detail in the supplementary materials and methods of the paper, returns the copy number estimate of each gene.
- CNV_Analyses.slum: slurm file that takes the estimates of each single ancient genome as input and combines them into a single dataset in which clusters of recent paralogues are grouped into single lines (one per gene/cluster of genes) and in which each column corresponds to an ancient genome.
- CNV_Analyses.slum: slurm file that takes the estimates of each single ancient genome as input and combines them into a single dataset in which clusters of recent paralogues are grouped into single lines (one per gene/cluster of genes) and in which each column corresponds to an ancient genome.
- HPA_Info_Extracter.R: this R-executable script extracts information from the two downloadable Human Protein Atlas expression datasets at https://www.proteinatlas.org/about/download, for rna (https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip) and protein (https://www.proteinatlas.org/download/normal_tissue.tsv.zip) respectively. For each gene, the 3 organs (or tissue groups) with the highest expression in the rna dataset and the protein dataset are identified. In the case of ties, a frequent occurrence in the discrete categories of the protein dataset, more than 3 organs are taken. These data can then be cross-referenced with a list of genes of interest, in our case with the list of genes with CNVs at population level. Before running the script, change to your own reference paths.\\
- HPA_Info_Extracter.R: this R-executable script extracts information from the two downloadable Human Protein Atlas expression datasets at https://www.proteinatlas.org/about/download, for rna (https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip) and protein (https://www.proteinatlas.org/download/normal_tissue.tsv.zip) respectively. For each gene, the 3 organs (or tissue groups) with the highest expression in the rna dataset and the protein dataset are identified. In the case of ties, a frequent occurrence in the discrete categories of the protein dataset, more than 3 organs are taken. These data can then be cross-referenced with a list of genes of interest, in our case with the list of genes with CNVs at population level. Before running the script, change to your own reference paths.\\
Datasets:\\
Datasets:
- CNV_in_Ancient_Genomes.tsv: copy number estimates for selected diet related genes in all analysed ancient genomes.
- CNV_in_Ancient_Genomes.tsv: copy number estimates for selected diet related genes in all analysed ancient genomes.
- CNV_in_Modern_Genomes.tsv: copy number estimates for selected diet related genes in all analysed modern genomes.
- CNV_in_Modern_Genomes.tsv: copy number estimates for selected diet related genes in all analysed modern genomes.
- HPA_in_Digestive_related_Tissues.tsv: digestion-related tissue with highest expression in selected diet related genes.
- HPA_in_Digestive_related_Tissues.tsv: digestion-related tissue with highest expression in selected diet related genes.