"ggplot2 is a library specifically for creating graphics. To use it, you need to specify a dataset, provide an aesthetic map, then add layers (the type of graphs you would like), scales, faceting specifications, and coordinate systems. For example, a ggplot could look like:"
"where mpg is the dataset, aes() are the aesthetics,geom_point() is the layer, and facet_wrap() are the faceting specifications.\n",
"\n",
"Like all other tidyverse libraries, ggplot comes installed with tidyverse, but still must be loaded with the ```library(ggplot2)``` command. For more information about ggplot2 can be found on it's website [here](https://ggplot2.tidyverse.org)."
]
},
{
"cell_type": "markdown",
"id": "61f8ec16",
"metadata": {},
"source": [
"## dplyr\n",
"\n",
"dplyr is a library that establishes a common grammar for data manipulation across all of the tidyverse packages. dplyr also helps R communicate with other computational backends like SQL or cloud databases. It adds the following commands \n",
"* mutate() adds a new variable that is a function of existing variables\n",
"* select() picks variables based on their names\n",
"* filter() picks data points based on their values\n",
"* summarise() reduces multiple variables into a single summary\n",
"* arrange() changes the way rows are ordered\n",
"* group_by() allows for any of the above operations to be done by group\n",
"\n",
"\n",
"Like all other tidyverse libraries, ggplot comes installed with tidyverse, but still must be loaded with the ```library(dplyr)``` command. For more information about ggplot2 can be found on it's website [here](https://dplyr.tidyverse.org)."
]
},
{
"cell_type": "markdown",
"id": "7bf43715",
"metadata": {},
"source": [
"## tidyr\n",
"\n",
"tidyr helps people create tidy data, which is defined as data where every column is variable, every row is an observation, and every cell is a single value. Tidy data is helpful because all of the tidyverse is based on tidy data, meaning if you standardize your data as tidy you can use it with any tidyverse library.\n",
"\n",
"There are five main categories of tidyr functions:\n",
"\n",
"* Pivoting, which converts between long and wide forms of data (e.g converts categories into one row)\n",
"* Rectangling, which turns nested lists (often from .JSON files) into tibbles more easily used by the tidyverse\n",
"* Nesting, which converts grouped data to a form each group becomes a single row with a nested data frame, and unnesting which does the opposite\n",
"* Splitting and combining columns containing characters. ```seperate()``` and ```extract()``` pull single character columns into multiple columns, and ```unite()``` combines multiple columns into a single column\n",
"* Make implicit missing values explicit with ```complete()``` and explicit missing values implicit with ```drop_na()```. Replaces missing values with ```fill()```, or ```replace_na()```.\n",
"\n",
"Like all other tidyverse libraries, ggplot comes installed with tidyverse, but still must be loaded with the ```library(tidyr)``` command. For more information about ggplot2 can be found on it's website [here](https://tidyr.tidyverse.org)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0b7d2969",
"metadata": {},
"source": [
"## readr\n",
"Fast way to read rectangular data from delimited files. It's main purpose is to parse a wide variety of data types.\n",
"For more information on readr visit: https://readr.tidyverse.org/\n",
"\n",
"## purrr\n",
"purrr is a package that offers a family of functions that can be used to iterate of lists. For instance, the map() function is used to replace 'for' loops, which can make the code easier to read and make the code more efficient.\n",
"\n",
"For more information on purrr visit: https://purrr.tidyverse.org/\n",
"\n",
"## tibble\n",
"Tibbles are dataframes that are lazy and complain more. This will make the user to confront problems earlier leading to a cleaner dataset and more expressive code.\n",
"\n",
"For more information on Tibble visit: https://tibble.tidyverse.org/ \n",
"\n",
"## stringr\n",
"provides functions that makes working with strings easier. Stringr uses string manipulation functions to simplify your workflow with strings.\n",
"\n",
"For more information on stringr visit: https://stringr.tidyverse.org/\n",
"\n",
"## forcats\n",
"Package provides a suite of tools that are able to solve common problems with factor, which includes the changes the order of levels.\n",
"\n",
"For more information on forcats visit: https://forcats.tidyverse.org/"
R Studio is a platform used by data scientists, statisticians, and researchers for developing and executing code in the R programming language. It is an integrated development environment or IDE, that provides an easy-to-use interface for performing data analysis, visualization, and publication tasks. With many features such as code completion, and debugging tools, R Studio makes it convenient for users to write, run, and debug their code. Additionally, the platform enables the creation of interactive sessions, management of workspaces, and the sharing of reports and dashboards. R Studio is available as an open-source or commercial version and can be accessed on different operating systems including Windows, macOS, and Linux.
%% Cell type:markdown id:c93cec34 tags:
### Installing R
%% Cell type:markdown id:3b0d0ff7 tags:
To setup and install please visit this link https://posit.co/download/rstudio-desktop/. There are instructions on how to install and options for accessing different operating systems such as Windows, macOS, Linux, and many other resources.
%% Cell type:markdown id:c4ab6021 tags:
### Instructions for using R studios
%% Cell type:markdown id:eced54ea tags:
Give your project a name, select the type of project you want to create, and choose the directory where you want to save your project.
Next, set your working directory. This is the directory where R Studio will look for files and where it will save output. You can set your working directory by clicking on Session, then Set Working Directory, Choose Directory and selecting the directory you want.
To write R code, create a new R script by clicking on File, then New File, and R Script or R markdown. This will open a blank R script or blank R markdown file, where you can start writing your code.
When you're ready to run your code, select the lines of code you want to run and click on the Run button that is located near the top of the tab or use the keyboard shortcut Ctrl+Enter on Windows or Command+Enter on Mac. R Studio will execute the selected code and show the output in the Console window. To save your R script, click on File then Save or use the keyboard shortcut Ctrl+S on Windows or Command+S on Mac.
You can also install packages in R Studio by using the install.packages() function and load packages using the library() function. To install, use the command install.packages(“ ”) and to call the library, the command is just library(“ ”). To import data into R Studio, you can use the read.csv() function for CSV files, read_excel() function for Excel files, or other functions depending on the data file.
%% Cell type:markdown id:074d410e tags:
## Introduction For Tidyverse
%% Cell type:markdown id:8b5b1fc9 tags:
Tidyverse is a collection of open-source packages used in data analysis. Tidyverse was created to simplify the use of data cleaning, wrangling, modeling, and etc.. Some of the more popular packages within tidyverse include ggplot2 for data visualization, dplyr for data manipulation, and tidyr for cleaning data.
%% Cell type:markdown id:f1b51854 tags:
### Installing Tidyverse
%% Cell type:markdown id:4f469b34 tags:
To Install Tidyverse in R type the following commands in a R script or R markdown:
install.packages("tidyverse")
%% Cell type:code id:64edd4c5 tags:
``` python
# Install the tidyverse
# install.packages("tidyverse")
```
%% Cell type:markdown id:6d010f73 tags:
### ggplot2
ggplot2 is a library specifically for creating graphics. To use it, you need to specify a dataset, provide an aesthetic map, then add layers (the type of graphs you would like), scales, faceting specifications, and coordinate systems. For example, a ggplot could look like:
%% Cell type:code id:4c3d57bb tags:
``` python
# library(ggplot2)
# ggplot(mpg, aes(x=displ,y=hwy,color = class)) +
# geom_point() +
# facet_wrap(.~year)
```
%% Cell type:markdown id:9b701014 tags:
where mpg is the dataset, aes() are the aesthetics,geom_point() is the layer, and facet_wrap() are the faceting specifications.
Like all other tidyverse libraries, ggplot comes installed with tidyverse, but still must be loaded with the ```library(ggplot2)``` command. For more information about ggplot2 can be found on it's website [here](https://ggplot2.tidyverse.org).
%% Cell type:markdown id:61f8ec16 tags:
## dplyr
dplyr is a library that establishes a common grammar for data manipulation across all of the tidyverse packages. dplyr also helps R communicate with other computational backends like SQL or cloud databases. It adds the following commands
* mutate() adds a new variable that is a function of existing variables
* select() picks variables based on their names
* filter() picks data points based on their values
* summarise() reduces multiple variables into a single summary
* arrange() changes the way rows are ordered
* group_by() allows for any of the above operations to be done by group
Like all other tidyverse libraries, ggplot comes installed with tidyverse, but still must be loaded with the ```library(dplyr)``` command. For more information about ggplot2 can be found on it's website [here](https://dplyr.tidyverse.org).
%% Cell type:markdown id:7bf43715 tags:
## tidyr
tidyr helps people create tidy data, which is defined as data where every column is variable, every row is an observation, and every cell is a single value. Tidy data is helpful because all of the tidyverse is based on tidy data, meaning if you standardize your data as tidy you can use it with any tidyverse library.
There are five main categories of tidyr functions:
* Pivoting, which converts between long and wide forms of data (e.g converts categories into one row)
* Rectangling, which turns nested lists (often from .JSON files) into tibbles more easily used by the tidyverse
* Nesting, which converts grouped data to a form each group becomes a single row with a nested data frame, and unnesting which does the opposite
* Splitting and combining columns containing characters. ```seperate()``` and ```extract()``` pull single character columns into multiple columns, and ```unite()``` combines multiple columns into a single column
* Make implicit missing values explicit with ```complete()``` and explicit missing values implicit with ```drop_na()```. Replaces missing values with ```fill()```, or ```replace_na()```.
Like all other tidyverse libraries, ggplot comes installed with tidyverse, but still must be loaded with the ```library(tidyr)``` command. For more information about ggplot2 can be found on it's website [here](https://tidyr.tidyverse.org).
%% Cell type:markdown id:0b7d2969 tags:
## readr
Fast way to read rectangular data from delimited files. It's main purpose is to parse a wide variety of data types.
For comma-separated files: ```read_csv()```
For tab separated values: ```read_tsv()```
For whitespace-separated files: ```read_table()```
For fixed-width files: ```read_fwf```
For more information on readr visit: https://readr.tidyverse.org/
## purrr
purrr is a package that offers a family of functions that can be used to iterate of lists. For instance, the map() function is used to replace 'for' loops, which can make the code easier to read and make the code more efficient.
For more information on purrr visit: https://purrr.tidyverse.org/
## tibble
Tibbles are dataframes that are lazy and complain more. This will make the user to confront problems earlier leading to a cleaner dataset and more expressive code.
For more information on Tibble visit: https://tibble.tidyverse.org/
## stringr
provides functions that makes working with strings easier. Stringr uses string manipulation functions to simplify your workflow with strings.
For more information on stringr visit: https://stringr.tidyverse.org/
## forcats
Package provides a suite of tools that are able to solve common problems with factor, which includes the changes the order of levels.
For more information on forcats visit: https://forcats.tidyverse.org/