"Topological Data Analysis i(TDA) s a relatively new field with many useful applications. Essentially, TDA borrows tools from topology in order to study data. That is, TDA seeks to determine whether a particular dataset has shape and what the shape of the dataset implies. It can be used independently or applied to other machine learning techniques. This particular projects aims at applying TDA to various machine learning methods in order to demonstrate its benefits. To do so, this project first presents the general theory of TDA as well as introducing the available TDA software. Then, several notebooks are illustrate how TDA can be applied to machine learning."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"# Statement of Need\n",
"\n",
"The purpose of this project is to introduce data scientists to Topological Data Analysis (TDA) through various examples. That is, this project aims to demonstrate applications of TDA to those who do not have a background in mathematics."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"# Installation instructions\n",
"\n",
"The environment.yml file constains all of the required dependices. To install the required moduels, run the following command in the terminal:\n",
"\n",
"`make init`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"# Unit Tests\n",
"\n",
"Unit testing is done on the following two functions: `lens_1d` and `uniform_sampling`. These tests check to verify that the function takes in correct inputs.\n",
"\n",
"To run a unit test, simply write `make test` in the terminal."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"# Methodology\n",
"\n",
"I was able to meet the majority of my initial goals. I created different notebooks that used TDA in different ways. Additionally, I created a background notebook that gave a basic introduction to TDA. While I was able to meet my general goals, I think I could have approach this project differently. For instance, Ripser and Mapper are the two most used libraries from the scikit-tda package. I think I should have introduced these libraries more thoroughly. Both of these libraries have many applications which my project did not highlight. For instance, Ripser can be used for nonlinear time series analysis, feature selection, classifying, etc. While it is impossible to demonstrate all of the possible applications for Ripser, I could have discussed them more and linked some articles - though the vast majority of the papers I’ve come across focus on the mathematical side of TDA rather than the application.\n",
"\n",
"Mapper was incredibly challenging to use. I spent the vast majority of the semester trying to understand and apply it. I was able to do basic classification using Mapper, although I am still unsure what the best method is in order to improve the model. I also do not know how to get some sort of accuracy score. I am only able to look at the output graph and determine whether the data separated well. I have yet to figure out how topologists determine the overall accuracy of their classification. Additionally, I was not able to fully complete the prediction notebook. I know how to use Mapper in the sense that I was able to separate the data visually. The next task is to learn how to extract the data. After doing so, I can easily build a new predictive model. In general, I need to experiment more with Mapper in order to better understand how to implement it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"# Concluding Remarks\n",
"\n",
"I have a much deeper understanding of TDA because of this project. Before this project, I thought Ripser was only used for persistence diagrams. That is, I thought you can only do an exploratory analysis with Ripser. I was unaware that Ripser is used in time series analysis or classification. As for Mapper, I had no idea how to work with it in general. I was not formally introduced to Mapper - everything I currently know comes from articles/papers I’ve read. My background in Mapper is still limited but I feel more confident using it and experimenting with the parameters. \n",
"\n",
"For future work, I would like to expand my work with Mapper. I would like to create more wrapper functions that simplify the steps that go into building a graph. Additionally, I would like to learn how to color nodes so that I can gain more insight. For instance, I would like to learn how to color nodes based on proportions for y1 to y2. My current understanding of Mapper is shallow so I would like to expand it and add detail application to my project.Additionally, I would like to create more notebooks that do not use toy datasets. Toy datasets are easy to work with but do not add enough complexity. To full demonstrate the applications of Mapper, I need to work with more challenging data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"# References\n",
"\n",
"Individual notebooks have their own reference section. However, all notebooks use the scikit-tda package. \n",
"\n",
"Saul, Nathaniel and Tralie, Chris. (2019). Scikit-TDA: Topological Data Analysis for Python. Zenodo. http://doi.org/10.5281/zenodo.2533369"
Topological Data Analysis i(TDA) s a relatively new field with many useful applications. Essentially, TDA borrows tools from topology in order to study data. That is, TDA seeks to determine whether a particular dataset has shape and what the shape of the dataset implies. It can be used independently or applied to other machine learning techniques. This particular projects aims at applying TDA to various machine learning methods in order to demonstrate its benefits. To do so, this project first presents the general theory of TDA as well as introducing the available TDA software. Then, several notebooks are illustrate how TDA can be applied to machine learning.
%% Cell type:markdown id: tags:
----
# Statement of Need
The purpose of this project is to introduce data scientists to Topological Data Analysis (TDA) through various examples. That is, this project aims to demonstrate applications of TDA to those who do not have a background in mathematics.
%% Cell type:markdown id: tags:
----
# Installation instructions
The environment.yml file constains all of the required dependices. To install the required moduels, run the following command in the terminal:
`make init`
%% Cell type:markdown id: tags:
----
# Unit Tests
Unit testing is done on the following two functions: `lens_1d` and `uniform_sampling`. These tests check to verify that the function takes in correct inputs.
To run a unit test, simply write `make test` in the terminal.
%% Cell type:markdown id: tags:
---
# Methodology
I was able to meet the majority of my initial goals. I created different notebooks that used TDA in different ways. Additionally, I created a background notebook that gave a basic introduction to TDA. While I was able to meet my general goals, I think I could have approach this project differently. For instance, Ripser and Mapper are the two most used libraries from the scikit-tda package. I think I should have introduced these libraries more thoroughly. Both of these libraries have many applications which my project did not highlight. For instance, Ripser can be used for nonlinear time series analysis, feature selection, classifying, etc. While it is impossible to demonstrate all of the possible applications for Ripser, I could have discussed them more and linked some articles - though the vast majority of the papers I’ve come across focus on the mathematical side of TDA rather than the application.
Mapper was incredibly challenging to use. I spent the vast majority of the semester trying to understand and apply it. I was able to do basic classification using Mapper, although I am still unsure what the best method is in order to improve the model. I also do not know how to get some sort of accuracy score. I am only able to look at the output graph and determine whether the data separated well. I have yet to figure out how topologists determine the overall accuracy of their classification. Additionally, I was not able to fully complete the prediction notebook. I know how to use Mapper in the sense that I was able to separate the data visually. The next task is to learn how to extract the data. After doing so, I can easily build a new predictive model. In general, I need to experiment more with Mapper in order to better understand how to implement it.
%% Cell type:markdown id: tags:
---
# Concluding Remarks
I have a much deeper understanding of TDA because of this project. Before this project, I thought Ripser was only used for persistence diagrams. That is, I thought you can only do an exploratory analysis with Ripser. I was unaware that Ripser is used in time series analysis or classification. As for Mapper, I had no idea how to work with it in general. I was not formally introduced to Mapper - everything I currently know comes from articles/papers I’ve read. My background in Mapper is still limited but I feel more confident using it and experimenting with the parameters.
For future work, I would like to expand my work with Mapper. I would like to create more wrapper functions that simplify the steps that go into building a graph. Additionally, I would like to learn how to color nodes so that I can gain more insight. For instance, I would like to learn how to color nodes based on proportions for y1 to y2. My current understanding of Mapper is shallow so I would like to expand it and add detail application to my project.Additionally, I would like to create more notebooks that do not use toy datasets. Toy datasets are easy to work with but do not add enough complexity. To full demonstrate the applications of Mapper, I need to work with more challenging data.
%% Cell type:markdown id: tags:
----
# References
Individual notebooks have their own reference section. However, all notebooks use the scikit-tda package.
Saul, Nathaniel and Tralie, Chris. (2019). Scikit-TDA: Topological Data Analysis for Python. Zenodo. http://doi.org/10.5281/zenodo.2533369