Systematics.hpp
Track genotypes, species, clades, or lineages of organisms in a world.
- Todo:
Technically, we don’t need to keep the ancestors in a set in order to track a lineage… If we delete all of their descendants they should automaticaly be deleted.
We should provide an option to back up systematics data to a file so that it doesn’t all need to be kept in memory, especially if we’re only doing post-analysis.
This inheritance system makes adding new systematics-related data tracking kind of a pain. Over time, this will probably become a maintainability problem. We can probably make the whole inheritance thing go away through judicious use of signals.
This does not currently handle situations where organisms change locations during their lifetimes gracefully.
-
template<typename ORG_INFO, typename DATA_STRUCT = datastruct::no_data>
class Taxon - #include <Systematics.hpp>
A Taxon represents a type of organism in a phylogeny.
Genotypes are the most commonly used Taxon; in general taxa can be anything from a shared genome sequence, a phenotypic trait, or a even a position in the world (if you want to track an evolutionary pathway)
- Param ORG_INFO:
The information type associated with an organism, used to categorize it.
Public Types
-
using data_t = DATA_STRUCT
Public Functions
-
inline size_t GetID() const
Get a unique ID for this taxon; IDs are assigned sequentially, so newer taxa have higher IDs.
-
inline void NullifyParent()
-
inline size_t GetNumOrgs() const
Get the number of living organisms currently associated with this Taxon.
-
inline size_t GetTotOrgs() const
Get the total number of organisms that have ever lived associated with this Taxon.
-
inline size_t GetNumOff() const
Get the number of taxa that were produced by organisms from this Taxon.
-
inline size_t GetDepth() const
Get the number of taxanomic steps since the ancestral organism was injected into the World.
-
inline double GetOriginationTime() const
-
inline void SetOriginationTime(double time)
-
inline double GetDestructionTime() const
-
inline void SetDestructionTime(double time)
-
inline void AddOrg()
Add a new organism to this Taxon.
-
inline void AddTotalOffspring()
Recursively increment total offspring count for this and all ancestors.
-
inline int GetTotalOffspring()
Get total number of offspring directly or indirectly descending from this taxon.
-
inline bool RemoveOrg()
Remove an organism from this Taxon (after it dies). Removals must return true if the taxon needs to continue; false if it should deactivate.
-
inline bool RemoveOffspring(Ptr<this_t> offspring_tax)
Remove and offspring taxa after its entire sub-tree has died out (pruning)
-
inline void RemoveTotalOffspring()
Reduce the total count of extant offspring and recursively do so for all ancestors (gets called on a taxon’s parent when that taxon goes extinct)
Protected Attributes
-
size_t id
ID for this Taxon (Unique within this Systematics)
-
size_t num_orgs
How many organisms currently exist of this group?
-
size_t tot_orgs
How many organisms have ever existed of this group?
-
size_t num_offspring
How many direct offspring groups exist from this one.
-
size_t total_offspring
How many total extant offspring taxa exist from this one (i.e. including indirect)
-
size_t depth
How deep in tree is this node? (Root is 0)
-
double origination_time
When did this taxon first appear in the population?
-
double destruction_time
When did this taxon leave the population?
-
DATA_STRUCT data
A struct for storing additional information about this taxon.
-
template<typename ORG>
class SystematicsBase - #include <Systematics.hpp>
A base class for Systematics, maintaining information common to all systematics managers and providing virtual functions.
Subclassed by Systematics< ORG, ORG_INFO, DATA_STRUCT >
Public Types
-
using data_node_t = DataNode<double, data::Current, data::Info, data::Range, data::Stats, data::Pull>
-
using data_ptr_t = Ptr<data_node_t>
Public Functions
-
inline SystematicsBase(bool _active = true, bool _anc = true, bool _all = false, bool _pos = true)
-
inline virtual ~SystematicsBase()
-
inline bool GetTrackSynchronous() const
Are we tracking a synchronous population?
-
inline bool GetStoreActive() const
Are we storing all taxa that are still alive in the population?
-
inline bool GetStoreAncestors() const
Are we storing all taxa that are the ancestors of living organisms in the population?
-
inline bool GetStoreOutside() const
Are we storing all taxa that have died out, as have all of their descendants.
-
inline bool GetArchive() const
Are we storing any taxa types that have died out?
-
inline bool GetStorePosition() const
Are we storing the positions of taxa?
-
inline size_t GetTotalOrgs() const
How many living organisms are currently being tracked?
-
inline size_t GetNumRoots() const
How many independent trees are being tracked?
-
inline size_t GetNextID() const
What ID will the next taxon have?
-
inline double GetAveDepth() const
What is the average phylogenetic depth of organisms in the population?
-
inline void SetTrackSynchronous(bool new_val)
Are we tracking organisms evolving in synchronous generations?
-
inline void SetStoreActive(bool new_val)
Are we storing all taxa that are still alive in the population?
-
inline void SetStoreAncestors(bool new_val)
Are we storing all taxa that are the ancestors of living organisms in the population?
-
inline void SetStoreOutside(bool new_val)
Are we storing all taxa that have died out, as have all of their descendants.
-
inline void SetArchive(bool new_val)
Are we storing any taxa types that have died out?
-
inline void SetStorePosition(bool new_val)
Are we storing the location of taxa?
-
inline data_ptr_t AddDataNode(const std::string &name)
-
inline data_ptr_t AddDataNode(std::function<vector<double>()> pull_set_fun, const std::string &name)
-
inline data_ptr_t AddDataNode(std::function<double()> pull_fun, const std::string &name)
-
inline data_ptr_t GetDataNode(const std::string &name)
-
virtual data_ptr_t AddEvolutionaryDistinctivenessDataNode(const std::string &name = "evolutionary_distinctiveness") = 0
-
virtual data_ptr_t AddPairwiseDistanceDataNode(const std::string &name = "pairwise_distance") = 0
-
virtual data_ptr_t AddPhylogeneticDiversityDataNode(const std::string &name = "phylogenetic_diversity") = 0
-
virtual data_ptr_t AddDeleteriousStepDataNode(const std::string &name = "deleterious_steps") = 0
-
virtual data_ptr_t AddVolatilityDataNode(const std::string &name = "volatility") = 0
-
virtual data_ptr_t AddUniqueTaxaDataNode(const std::string &name = "unique_taxa") = 0
-
virtual data_ptr_t AddMutationCountDataNode(const std::string &name = "mutation_count", const std::string &mutation = "substitution") = 0
-
virtual size_t GetNumActive() const = 0
-
virtual size_t GetNumAncestors() const = 0
-
virtual size_t GetNumOutside() const = 0
-
virtual size_t GetTreeSize() const = 0
-
virtual size_t GetNumTaxa() const = 0
-
virtual int GetMaxDepth() = 0
-
virtual int GetPhylogeneticDiversity() const = 0
-
virtual double GetMeanPairwiseDistance(bool branch_only) const = 0
-
virtual double GetSumPairwiseDistance(bool branch_only) const = 0
-
virtual double GetVariancePairwiseDistance(bool branch_only) const = 0
-
virtual int SackinIndex() const = 0
-
virtual double CollessLikeIndex() const = 0
-
virtual int GetMRCADepth() const = 0
-
virtual void AddOrg(ORG &&org, WorldPosition pos, int update) = 0
-
virtual void AddOrg(ORG &org, WorldPosition pos, int update) = 0
-
virtual bool RemoveOrg(WorldPosition pos, int time = -1) = 0
-
virtual void RemoveOrgAfterRepro(WorldPosition pos, int time = -1) = 0
-
virtual double CalcDiversity() const = 0
-
virtual void Update() = 0
-
virtual void SetNextParent(int pos) = 0
-
virtual void SetNextParent(WorldPosition &pos) = 0
Protected Attributes
-
bool store_active
Store all of the currently active taxa?
-
bool store_ancestors
Store all of the direct ancestors from living taxa?
-
bool store_outside
Store taxa that are extinct with no living descendants?
-
bool archive
Set to true if we are supposed to do any archiving of extinct taxa.
-
bool store_position
Keep a vector mapping positions to pointers.
-
bool track_synchronous
Does this systematics manager need to keep track of current and next positions?
-
size_t org_count
How many organisms are currently active?
-
size_t total_depth
Sum of taxa depths for calculating average.
-
size_t num_roots
How many distint injected ancestors are currently in population?
-
int max_depth
Depth of deepest taxon. -1 means needs to be recalculated.
-
size_t next_id
What ID value should the next new taxon have?
-
size_t curr_update
-
using data_node_t = DataNode<double, data::Current, data::Info, data::Range, data::Stats, data::Pull>
-
template<typename ORG, typename ORG_INFO, typename DATA_STRUCT = datastruct::no_data>
class Systematics : public SystematicsBase<ORG> - #include <Systematics.hpp>
A tool to track phylogenetic relationships among organisms. The systematics class tracks the relationships among all organisms based on the INFO_TYPE provided. If an offspring has the same value for INFO_TYPE as its parent, it is grouped into the same taxon. Otherwise a new Taxon is created and the old one is used as its parent in the phylogeny. If the provided INFO_TYPE is the organsism’s genome, a traditional phylogeny is formed, with genotypes. If the organism’s behavior/task set is used, then organisms are grouped by phenotypes. If the organsims’s position is used, the evolutionary path through space is tracked. Any other aspect of organisms can be tracked this way as well.
Public Types
-
using taxon_t = Taxon<ORG_INFO, DATA_STRUCT>
Public Functions
-
void RemoveOffspring(Ptr<taxon_t> offspring, Ptr<taxon_t> taxon)
Called when an offspring taxa has been deleted.
-
void MarkExtinct(Ptr<taxon_t> taxon, int time = -1)
Called when there are no more living members of a taxon. There may be descendants.
-
inline Systematics(fun_calc_info_t calc_taxon, bool store_active = true, bool store_ancestors = true, bool store_all = false, bool store_pos = true)
Contructor for Systematics; controls what information should be stored.
- Parameters:
store_active – Should living organisms’ taxa be tracked? (typically yes!)
store_ancestors – Should ancestral organisms’ taxa be maintained? (yes for lineages!)
store_all – Should all dead taxa be maintained? (typically no; it gets BIG!)
store_pos – Should the systematics tracker keep track of organism positions? (probably yes - only turn this off if you know what you’re doing)
-
Systematics(const Systematics&) = delete
-
Systematics(Systematics&&) = default
-
inline ~Systematics()
-
inline virtual void Update()
-
inline void SetCalcInfoFun(fun_calc_info_t f)
-
inline virtual size_t GetNumActive() const
How many taxa are still active in the population?
-
inline virtual size_t GetNumAncestors() const
How many taxa are ancestors of living organisms (but have died out themselves)?
-
inline virtual size_t GetNumOutside() const
How many taxa are stored that have died out, as have their descendents?
-
inline virtual size_t GetTreeSize() const
How many taxa are in the current phylogeny?
-
inline virtual size_t GetNumTaxa() const
How many taxa are stored in total?
-
inline virtual int GetMaxDepth()
-
inline virtual void SetNextParent(WorldPosition &pos)
-
inline virtual void SetNextParent(int pos)
-
inline SignalKey OnPrune(std::function<void(Ptr<taxon_t>)> &fun)
Privide a function for Systematics to call each time a taxon is about to be pruned. Trigger: Taxon is about to be killed Argument: Pointer to taxon
-
inline virtual data_ptr_t AddEvolutionaryDistinctivenessDataNode(const std::string &name = "evolutionary_distinctiveness")
-
inline virtual data_ptr_t AddPairwiseDistanceDataNode(const std::string &name = "pairwise_distance")
-
inline virtual data_ptr_t AddPhylogeneticDiversityDataNode(const std::string &name = "phylogenetic_diversity")
-
inline virtual data_ptr_t AddDeleteriousStepDataNode(const std::string &name = "deleterious_steps")
-
inline data_ptr_t AddDeleteriousStepDataNodeImpl(bool decoy, const std::string &name = "deleterious_steps")
-
template<typename T = int>
inline data_ptr_t AddDeleteriousStepDataNodeImpl(typename std::enable_if<DATA_STRUCT::has_fitness_t::value, T>::type decoy, const std::string &name = "deleterious_steps")
-
inline virtual data_ptr_t AddVolatilityDataNode(const std::string &name = "volatility")
-
inline data_ptr_t AddVolatilityDataNodeImpl(bool decoy, const std::string &name = "volatility")
-
template<typename T = int>
inline data_ptr_t AddVolatilityDataNodeImpl(typename std::enable_if<DATA_STRUCT::has_phen_t::value, T>::type decoy, const std::string &name = "volatility")
-
inline virtual data_ptr_t AddUniqueTaxaDataNode(const std::string &name = "unique_taxa")
-
inline data_ptr_t AddUniqueTaxaDataNodeImpl(bool decoy, const std::string &name = "unique_taxa")
-
template<typename T = int>
inline data_ptr_t AddUniqueTaxaDataNodeImpl(typename std::enable_if<DATA_STRUCT::has_phen_t::value, T>::type decoy, const std::string &name = "unique_taxa")
-
inline virtual data_ptr_t AddMutationCountDataNode(const std::string &name = "mutation_count", const std::string &mutation = "substitution")
-
inline data_ptr_t AddMutationCountDataNodeImpl(bool decoy, const std::string &name = "mutation_count", const std::string &mutation = "substitution")
-
template<typename T = int>
inline data_ptr_t AddMutationCountDataNodeImpl(typename std::enable_if<DATA_STRUCT::has_mutations_t::value, T>::type decoy, const std::string &name = "mutation_count", const std::string &mutation = "substitution")
-
inline void AddSnapshotFun(const std::function<std::string(const taxon_t&)> &fun, const std::string &key, const std::string &desc = "")
Add a new snapshot function. When a snapshot of the systematics is taken, in addition to the default set of functions, all user-added snapshot functions are run. Functions take a reference to a taxon as input and return the string to be dumped in the file at the given key.
-
inline bool IsTaxonAt(int id)
-
inline virtual int GetPhylogeneticDiversity() const
From (Faith 1992, reviewed in Winters et al., 2013), phylogenetic diversity is the sum of edges in the minimal spanning tree connected the taxa you’re calculating diversity of.
This calculates phylogenetic diversity for all extant taxa in the tree, assuming all edges from parent to child have a length of one. Possible extensions to this function that might be useful in the future include:
Pass it a set of taxon_t pointers and have it calculate PD for just those taxa
Enable calculation of branch lengths by amount of time that elapsed between origination of parent and origination of offspring
Enable a paleontology compatibility mode where only branching points are calculated
-
inline double GetTaxonDistinctiveness(Ptr<taxon_t> tax) const
This is a metric of how distinct
(From Vane-Wright et al., 1991; reviewed in Winter et al., 2013)
- Parameters:
tax – is from the rest of the population.
-
inline double GetEvolutionaryDistinctiveness(Ptr<taxon_t> tax, double time) const
This metric (from Isaac, 2007; reviewed in Winter et al., 2013) measures how distinct
To quantify length of evolutionary history, this method needs
Assumes the tree is all connected. Will return -1 if this assumption isn’t met.- Parameters:
tax – is from the rest of the population, weighted for the amount of unique evolutionary history that it represents.
time – the current time, in whatever units time is being measured in when taxa are added to the systematics manager. Note that passing a time in the past will produce inacurate results (since we don’t know what the state of the tree was at that time).
-
inline virtual double GetMeanPairwiseDistance(bool branch_only = false) const
Calculates mean pairwise distance between extant taxa (Webb and Losos, 2000). This measurement is also called Average Taxonomic Diversity (Warwick and Clark, 1998) (for demonstration of equivalence see Tucker et al, 2016). This measurement tells you about the amount of distinctness in the community as a whole.
This measurement assumes that the tree is fully connected. Will return -1 if this is not the case.
- Parameters:
branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred).
-
inline virtual double GetSumPairwiseDistance(bool branch_only = false) const
Calculates summed pairwise distance between extant taxa. Tucker et al 2017 points out that this is a measure of phylogenetic richness.
This measurement assumes that the tree is fully connected. Will return -1 if this is not the case.
- Parameters:
branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred).
-
inline virtual double GetVariancePairwiseDistance(bool branch_only = false) const
Calculates variance of pairwise distance between extant taxa. Tucker et al 2017 points out that this is a measure of phylogenetic regularity.
This measurement assumes that the tree is fully connected. Will return -1 if this is not the case.
- Parameters:
branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred).
-
inline virtual vector<double> GetPairwiseDistances(bool branch_only = false) const
Calculates a vector of all pairwise distances between extant taxa.
This method assumes that the tree is fully connected. Will return -1 if this is not the case.
- Parameters:
branch_only – only counts distance in terms of nodes that represent a branch between two extant taxa (potentially useful for comparison to biological data, where non-branching nodes generally cannot be inferred).
-
inline std::set<Ptr<taxon_t>> GetCanopyExtantRoots(int time_point = 0) const
Returns a vector containing all taxa from
- Parameters:
time_point – that were
-
inline int GetDistanceToRoot(Ptr<taxon_t> tax) const
Counts the total number of ancestors between
- Parameters:
tax – and MRCA, if there is one. If there is no common ancestor, distance to the root of this tree is calculated instead.
-
inline int GetBranchesToRoot(Ptr<taxon_t> tax) const
Counts the number of branching points leading to multiple extant taxa between
- Parameters:
tax – and the most-recent common ancestor (or the root of its subtree, if no MRCA exists). This is useful because a lot of stats for phylogenies are designed for phylogenies reconstructed from extant taxa. These phylogenies generally only contain branching points, rather than every ancestor along the way to the current taxon.
-
inline virtual int SackinIndex() const
Calculate Sackin Index of this tree (Sackin, 1972; reviewed in Shao, 1990). Measures tree balance
-
inline CollessStruct RecursiveCollessStep(Ptr<taxon_t> curr) const
-
inline virtual double CollessLikeIndex() const
Calculate Colless Index of this tree (Colless, 1982; reviewed in Shao, 1990). Measures tree balance. The standard Colless index only works for bifurcating trees, so this will be a Colless-like Index, as suggested in “Sound Colless-like balance indices for multifurcating trees” (Mir, 2018, PLoS One)
-
inline void RemoveBefore(int ud)
-
Ptr<taxon_t> GetMRCA() const
Request a pointer to the Most-Recent Common Ancestor for the population.
-
virtual int GetMRCADepth() const
Request the depth of the Most-Recent Common Ancestor; return -1 for none.
-
virtual void AddOrg(ORG &&org, WorldPosition pos, int update = -1)
Add information about a new organism, including its stored info and parent’s taxon; If you would like the systematics manager to track taxon age, you can also supply the update at which the taxon is being added. return a pointer for the associated taxon.
-
virtual void AddOrg(ORG &org, WorldPosition pos, int update = -1)
-
virtual bool RemoveOrg(WorldPosition pos, int time = -1)
Remove an instance of an organism; track when it’s gone.
-
virtual void RemoveOrgAfterRepro(WorldPosition pos, int time = -1)
-
Ptr<taxon_t> Parent(Ptr<taxon_t> taxon) const
Remove org from next population (for use with synchronous generations)
Climb up a lineage…
-
virtual void PrintStatus(std::ostream &os = std::cout) const
Print details about the Systematics manager.
-
void Snapshot(const std::string &file_path) const
Take a snapshot of current state of taxon phylogeny. WARNING: Current, this function assumes one parent taxon per-taxon.
-
virtual double CalcDiversity() const
Calculate the genetic diversity of the population.
-
size_t GetNumActive() const = 0
-
size_t GetNumAncestors() const = 0
-
size_t GetNumOutside() const = 0
-
size_t GetTreeSize() const = 0
-
size_t GetNumTaxa() const = 0
-
int GetPhylogeneticDiversity() const = 0
-
double GetMeanPairwiseDistance(bool branch_only) const = 0
-
double GetSumPairwiseDistance(bool branch_only) const = 0
-
double GetVariancePairwiseDistance(bool branch_only) const = 0
-
vector<double> GetPairwiseDistances(bool branch_only) const = 0
-
int GetMRCADepth() const = 0
-
void AddOrg(ORG &&org, WorldPosition pos, int update) = 0
-
void AddOrg(ORG &org, WorldPosition pos, int update) = 0
-
bool RemoveOrg(WorldPosition pos, int time = -1) = 0
-
void RemoveOrgAfterRepro(WorldPosition pos, int time = -1) = 0
-
void PrintStatus(std::ostream &os) const = 0
-
double CalcDiversity() const = 0
-
void Update() = 0
-
void SetNextParent(int pos) = 0
-
void SetNextParent(WorldPosition &pos) = 0
-
inline data_ptr_t GetDataNode(const std::string &name)
-
inline data_ptr_t AddDataNode(const std::string &name)
-
inline data_ptr_t AddDataNode(std::function<vector<double>()> pull_set_fun, const std::string &name)
-
inline data_ptr_t AddDataNode(std::function<double()> pull_fun, const std::string &name)
-
data_ptr_t AddEvolutionaryDistinctivenessDataNode(const std::string &name = "evolutionary_distinctiveness") = 0
-
data_ptr_t AddPairwiseDistanceDataNode(const std::string &name = "pairwise_distance") = 0
-
data_ptr_t AddPhylogeneticDiversityDataNode(const std::string &name = "phylogenetic_diversity") = 0
-
data_ptr_t AddDeleteriousStepDataNode(const std::string &name = "deleterious_steps") = 0
-
data_ptr_t AddVolatilityDataNode(const std::string &name = "volatility") = 0
-
data_ptr_t AddUniqueTaxaDataNode(const std::string &name = "unique_taxa") = 0
-
data_ptr_t AddMutationCountDataNode(const std::string &name = "mutation_count", const std::string &mutation = "substitution") = 0
-
int GetMaxDepth() = 0
Public Members
-
vector<SnapshotInfo> user_snapshot_funs
-
int removal_time = -1
-
int removal_pos = -1
Private Types
-
using parent_t = SystematicsBase<ORG>
Private Members
-
fun_calc_info_t calc_info_fun
-
bool store_active
Store all of the currently active taxa?
-
bool store_ancestors
Store all of the direct ancestors from living taxa?
-
bool store_outside
Store taxa that are extinct with no living descendants?
-
bool archive
Set to true if we are supposed to do any archiving of extinct taxa.
-
bool store_position
Keep a vector mapping positions to pointers.
-
bool track_synchronous
Does this systematics manager need to keep track of current and next positions?
-
size_t org_count
How many organisms are currently active?
-
size_t total_depth
Sum of taxa depths for calculating average.
-
size_t num_roots
How many distint injected ancestors are currently in population?
-
size_t next_id
What ID value should the next new taxon have?
-
size_t curr_update
-
int max_depth
Depth of deepest taxon. -1 means needs to be recalculated.
-
struct CollessStruct
- #include <Systematics.hpp>
-
struct SnapshotInfo
- #include <Systematics.hpp>
Public Functions
-
inline SnapshotInfo(const snapshot_fun_t &_fun, const std::string &_key, const std::string &_desc = "")
-
inline SnapshotInfo(const snapshot_fun_t &_fun, const std::string &_key, const std::string &_desc = "")
-
using taxon_t = Taxon<ORG_INFO, DATA_STRUCT>
-
namespace datastruct
The systematics manager allows an optional second template type that can store additional data about each taxon in the phylogeny. Here are some structs containing common pieces of additional data to track. Note: You are responsible for filling these in! Adding the template just gives you a place to store your data.
-
struct fitness
- #include <Systematics.hpp>
The default - an empty struct.
Public Types
-
template<typename PHEN_TYPE>
struct mut_landscape_info - #include <Systematics.hpp>
Public Types
-
struct no_data
- #include <Systematics.hpp>
-
struct fitness