Skip to content
Snippets Groups Projects
Commit 2a870a14 authored by Hijazi, Hussein's avatar Hijazi, Hussein
Browse files

Add readme file

parents
No related branches found
No related tags found
No related merge requests found
Title: The scalability of phylogenomic methods for inferring phylogenetic networks: a performance study utilizing multi-locus datasets and phylogenetic networks with a single reticulation
Authors: Hussein A. Hejase & Natalie VandePol & Gregory A. Bonito & Patrick P. Edger & Kevin J. Liu
LICENSE: All data and scripts are distributed under the terms of the GNU General Public License as published by the Free Software Foundation. You can distribute or modify it under the terms of the GNU General Public License either version 3 of the License or any later version.
The following file contains information about the simulated and empirical data, and the scripts used to run the analysis:
Simulation Study
————————————————
1- Generate model trees using r8s.
2- Remove branch lengths of model trees using remove-bl.R
3- simulate.R, simulate-ret.R, random_network.R
These R scripts take an input of model trees simulated by r8s and generates the model network ms command
4- Run the following command to parse the true gene trees:
sh parse_gene_trees.sh <num species> <height> <migration_rate> <theta> <numRep>
5- Run the sequence evolution program using the following script:
sh run_seq_gen.sh <num species> <height> <migration_rate> <theta> <numRep>
run_seq_gen.sh is a bash file that simulates DNA sequence
evolution using seq-gen from a set of gene trees generated by ms.
To run it, use the following command: sh run_seq_gen.sh \<num species\> \<height\> \<migration_rate\> \<theta\> \<number of replicates\>
where theta is \<0.08\>. The output of seq-gen is stored in the following folder seqgen_\<theta>. In seqgen_\<theta\>, seq_\<height of model phylogeny\>_\<number of taxa\>_\<replicate #\>.txt contains the sequence alignment for each marker.
6- run_parse.sh
a bash file that parses sequence alignments generated by seq-gen, and use them as input to FastTree to infer a gene tree for each DNA sequence alignment. To run it, use the following command: sh run_parse.sh \<num species\> \<height of model phylogeny\> \<migration_rate\> \<theta\> \<number of replicates\>
7- Run the following script to get the gene trees without the outgroup:
Rscript get_inferred_gene_trees.R
8- Run the following script to get gene trees with the outgroup:
Rscript get_inferred_gene_trees_with_outgroup.R
FastNet
_______
Arguments:
path=\< current path \>
taxa=21
height=5
migration=5
theta=0.08
numRep=20
subproblem_size=5
ret=1 or 2 or 3
genetrees=1000
sample_size=1
1. Run ASTRAL to get guide tree:
sh run_ASTRAL.sh $path $taxa $height $migration $theta $numRep
2. Root ASTRAL tree:
Rscript root_ASTRAL_tree.R $path $taxa $height $migration $theta $numRep
3. Decompose disjoint subproblems:
Rscript generate_subproblems.R $path $taxa $height $migration $theta $numRep $subproblem_size
4. Create NEXUS files to run MLE:
sh create_nex.sh $path $taxa $ret $genetrees
5. Create datasets: for each dataset sample 1 taxon from each subproblem:
Rscript get_samples.R $path $taxa $height $migration $theta $numRep $sample_size
sh run_candidate.sh $path $taxa $ret $genetrees cand
6. Create datasets for all possible combinations of disjoint subproblems:
Rscript combine_subproblems.R $path $taxa $height $migration $theta $numRep
sh run_candidate.sh $path $taxa $ret $genetrees comb
7. Run the inference procedure on an HPCC cluster
8. Parse network and MLE scores for subproblems and candidates:
for i in `seq 0 $ret`;
do
sh parse_network_subproblems.sh $path/$taxa/genetrees $i
sh parse_network_candidates.sh $path/$taxa/genetrees $sample_size $i $numRep
sh parse_network_combine.sh $path/$taxa/genetrees $i $numRep
Rscript select_network_Lscore_candidate.R $path/$taxa/genetrees $numRep $sample_size $i
Rscript select_network_Lscore_subproblems.R $path/$taxa/genetrees $numRep $sample_size $i $taxa $height $migration $theta
Rscript select_network_Lscore_combine.R $path/$taxa/genetrees $numRep $i
done
9. Run merge.R to merge subproblems
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment