Recall that a vector is a 1-dimensional array with a single data type (either character or numeric). We can perform several different transforms on a vector: multiplying each value by a scalar, creating a new vector by multiplying one vector by another, etc. We also can transform the contents of a vector by performing a transform on each element. If

## Data Types Used in R

The workhorse data types of R are the vector and the data frame. Recall that (almost) everything in R is an object and a vector. Numbers and strings are 1 element vectors (that is length(n) == length(s) is true). Vectors can be numeric (c(1,2,3)) or character (c(“WoW”, “Good”, “Bad”)) or mixed (c(1, “two”, 3)). Mixed vectors are always considered to

## Getting Data into (and Out of) R

How do we input data into R? The first method, and sometimes the simplest, is: type the data in! This is a good method for small data sets. You can always read raw data from a data file using read.table(). There are several help functions for reading delimited data as well as fixed length fields; the scan() function permits reading

## Using the R Studio Graphical User Interface

R comes in multiple flavors. The heart of the software is a command-line interface (CLI) that is very similar to the BASH shell in Linux or the interactive versions of scripting language like Ruby or Python. The Window version of R supports multiple GUIs. The default GUI is invoked by simply invoking the R program either via the command line

## Review on basic data analytics methods using R

R is a big, complicated, messy, powerful, extensible framework for computing and graphing statistics. Written as a freeware version of the S language, it’s widespread availability and use have resulted in several vendors supplying R interfaces to their products. There are five things that you should remember about R. Doing so will help you in thinking about how to work

## Data Analytics Lifecycle Phase 6: Operationalize

In this phase, you will need to assess the benefits of the work that’s been done, and setup a pilot so you can deploy the work in a controlled way before broadening the work to a full enterprise or ecosystem of users. In phase 4, you scored the model in the sandbox, and phase 6 represents the first time that

## Data Analytics Lifecycle Phase 5: Communicate Results

Phase 5: Results & Key Findings Now that you’ve run the model, you need to go back and compare the outcomes to your criteria for success and failure. Consider how best to articulate the findings and outcome to the various team members and stakeholders. Make sure to consider and include caveats, assumptions, and any limitations of results. Remember that many

## Data Analytics Lifecycle Phase4: Model Building

In this phase, the model is fit on the training data and evaluated (scored) against the test data. Generally this work takes place in the sandbox, not in the live production environment. The phases of Model Planning and Model Building overlap quite a bit, and in practice one can iterate back and forth between the two phases for a while

## Data Analytics Lifeclycle Phase 3: Model Planning

Phase 3 represents the last step of preparations before executing the analytical models and, as such, requires you to be thorough in planning the analytical work and experiments in the next phase. This is the time to refer back to the hypotheses you developed in Phase 1, when you first began getting acquainted with the data and your understanding of

## Data Analytics Lifecycle Phase 2: Data Preparation

Of all of the phases, the step of Data Preparation is generally the most iterative and time intensive. In this step, you will need to define a space where you can explore the data without interfering with live production databases. For instance, you may need to work with a company’s financial data, but cannot interact with the production version of