From 8e8bb9c70244dd08986d59908be6eaf58d910cb3 Mon Sep 17 00:00:00 2001
From: shawk masboob <masboob.shawk@gmail.com>
Date: Sat, 14 Mar 2020 00:23:49 -0400
Subject: [PATCH] adding ML report

---
 Reports/0313-REPORT-ML.ipynb | 68 ++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)
 create mode 100644 Reports/0313-REPORT-ML.ipynb

diff --git a/Reports/0313-REPORT-ML.ipynb b/Reports/0313-REPORT-ML.ipynb
new file mode 100644
index 0000000..223ba47
--- /dev/null
+++ b/Reports/0313-REPORT-ML.ipynb
@@ -0,0 +1,68 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# <center>Using Machine Learning (ML) in TDA</center>\n",
+    "\n",
+    "<center>by Shawk Masboob</center>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This projects aims to incorporate Topological Data Analysis (TDA) with several machine learning methods in order to demonstrate the potential benefits of TDA within data science. \n",
+    "\n",
+    "Traditional clustering methods can be “enhanced” by using TDA. Clustering is concerned with distance whereas TDA uses other relationships to cluster data together such as the amount of holes contained within the data. [2]. MAPPER begins by clustering the data points within an interval. The user can choose whatever clustering method they desire and metric. That is, they can choose hierarchical clustering with the euclidean distance metric. MAPPER then transforms the clusters into nodes within a graph. According to the developers of MAPPER, some points can exist within more than one node due to overlap. When there is member intersection, an edge is drawn between the nodes. [2]. The visualization provided by MAPPER will give interesting statistical results for each node that goes beyond traditional clustering. It should also be noted that TDA is often used for classification. A quick google search will reveal that TDA is often used to classify things such as animals, body parts, the presence of cancer, etc. \n",
+    "\n",
+    "TDA can be used for prediction or more specifically, feature selection. Suppose one is trying to build a regression model. An important intermediate step is to perform feature selection. While there are many machine learning (e.g. random forest) and statistical techniques (e.g. stepwise regression) that can be used for feature selection, one can consider using TDA. As done in this project (within the TDA_Prediction Python notebook), TDA is used to find the most prominent features for the multiple linear model. KeplerMapper, a Python TDA library, allows one to build graphs composed of nodes. These nodes contain data points and each node contains different statistics. For instance, a study used TDA to determine how much people are willing to pay for air quality improvements. [1]. The researchers generated eleven nodes using MAPPER. The size of each node is related to the number of observations within it. The researches used the color of each node to represent the relationship between “the mean value of all entries in that node with respect to the chosen variable.” [1]. \n",
+    "\n",
+    "TDA is in itself an unsupervised machine learning tool. One of the most popular features within TDA is persistence homology. The notebook titled “TDA_Voting” uses the Python Ripser persistent homology package to analyze the recent presidential election county level voting results. The aim of this notebook is to determine whether there is a natural pattern in voting habits and whether this pattern dissolved during the 2016 presidential election. To perform this analysis, the birth-death diagram (provided by Risper) is used to spot persistent features. In this example, a persistent feature is a loop of some counties that “behaved” similarly. To see more interesting features, the radius needs to be increased. However, the current radius is extremely small so nothing interesting appeared. It should also be noted that TDA is computationally expensive so a large radius might take hours or even days to compute. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "# References\n",
+    "\n",
+    "[1] Allen, Dylan. Topological Data Analysis: Giving Data Shape. Carroll, 13 May 2017, scholars.carroll.edu/cgi/viewcontent.cgi?article=1000&context=mathengcompsci_theses.\n",
+    "\n",
+    "[2] Keplermapper 1.2.0 Documentation a Scikit-tda Project\n",
+    "https://kepler-mapper.scikit-tda.org/theory.html"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
-- 
GitLab