Data analysis skills are in increasing demand in all areas of business, research, and academia. This course will teach you to use the R programming language with the RStudio development environment to process data, perform statistical analysis, and create stunning graphical visualizations.
The explosion of available data in recent years has resulted in the development of tools specifically designed for disciplined large-scale data analysis. The R programming language has received widespread attention as an alternative to spreadsheets that allows for controlled, repeatable data processing for such work. The RStudio development environment simplifies the use of R, and includes additional functionality to produce web based reports and presentations. This course will teach you to use R with RStudio to process data, perform statistical analysis, and create stunning graphical visualizations.
Casimir Saternos has been developing software for the past decade. He has written a number of articles that have appeared on the Oracle Technology Network and collaborated on several projects for PeepCode screencasts (now Pluralsight).
The R Environment This module is titled The R Environment, intended in a broad sense to include R in its technical environment as well as how it came to be over time. In it, we will provide a basic introduction to R and RStudio. We will talk about what R is, why you would want to use it, and the fundamentals of how it works. We will compare R to similar software and discuss what differentiates it from them. We will considers who uses R and look at how to set up R and work with R. Next, we will look at R as a calculator. This is the standard way that R is introduced and lets you get familiar with the patterns of interacting with R. We will then take a brief look at how to download and install packages. We will cover specific packages later in the course, but so much functionality is available through R packages that it is important to understand at the onset a few basic steps to include them in your projects. Data can be represented in R in a number of ways. We will discuss these and show you how to see how your data is being represented. We will also look at some of the features traditionally associated with learning a programming language. Don't fret if you are not a programmer. There's a huge amount possible with R that does not required in-depth programming knowledge.
Data Import Hi, this is Casimir Saternos with more on getting started with RStudio. The previous module had examples featuring data included in R packages, but most R users have their own data sets that they would like to use. This module will survey topics related to importing data into R and RStudio. Simple data imports are trivial using RStudio, but others require a bit of initial analysis and planning depending on what formats are in use and where the data is located. We will run through a few basic questions that you can ask yourself when considering a data import. We will then look at RStudio import processing, which makes it incredibly easy to load certain types of data. It, of course, relies on R language import functionality. And the R language also allows data to be created or imported in many other ways. The data frame, if you recall, is a two-dimensional data structure. Unlike a matrix, it supports different data types in each column. Data is often in this structure at some time in a data analyst's workflow. And a few basic operations should be mastered for use when analyzing and processing recently imported data. This is part of a wider subject of data transformation and techniques for reshaping data. Data manipulation involves converting data types, cleaning up data, and other related operations that immediately follow or are done in conjunction with the imported the imported data.
Plots You have already seen some of the possibilities for graphically representing data with R. You can develop interesting graphics that can be exported to image formats including PNG or JPEG and document formats like PDF or PostScript. Like so many things are, there is more than one way to do it. Each type of plot or chat can be created different ways. This module starts by describing the R Graphics Systems and their relationships to the most popular graphics packages in use. Standard plots are created using R's base graphics. These were later expanded upon by the lattice package, which is an R implementation of William Cleveland's trellis graphics. We will then discuss Hadley Wickham's ggplot2 package. Finally we will mention packages that suggest future directions of R graphics that involve web-based presentation and interactive visualizations.
Selected R Packages Welcome back to Getting Started with RStudio. In this module we will take a look at a few of the many packages available from CRAN, the Comprehensive R Archive Network. We will start by reviewing packages we've already introduced in earlier modules. For years activities involving data were practically equated with relational database technology and Structured Query Language, or SQL. The sqldf package allows experienced SQL users to leverage these skills as they work with data frames in R. We mentioned earlier that you don't need to know SQL to take this course, and to that end we will give you a quick overview of the language as well. Geospatial data, represented as points of latitude and longitude, is not easily interpreted in raw form. Plotting points on maps is far clearer in most cases and easily accomplished in R using packages including the maps, ggmaps, and mapproj packages. Special packages related to space suggest there might be special packages related to time. There are a number of such packages as a matter of fact. A familiar use of time series data is the financial data that is reported regularly by the new media. The QuantMod package was created to analyze stock performance and devise trading strategies. We will take a quick look at it as a package that deals with data that is time series specific. Finally we'll look at several interactive graphing packages. Not all packages available for R are on CRAN. Ggvis, mentioned previously, is hosted on GitHub. Another interactive web-oriented package hosted on GitHub is called rChart.
Data Export and Presentation From the beginning of this course we've seen how data can be imported, summarized, and presented with R through RStudio. In this final chapter we will take a bit more systematic look at how to get data out of R in the form that meets your needs. We've already seen how R functions can be called to create charts and plots using base graphics, lattice, or ggplot2. These functions, when called immediately, display an image on the screen. It is pretty common to iterate and modify the function calls a few times or transform the data a few times before a final result is obtained. At this point you can export an image to one of a number of standard image formats, and of course raw or summarized data can be executed into CSV or other formats. Your choice of format is often based on whether the data is simply being read by a person or consumed by other software. Text data and images are simple data exports that are generally based on a single dataset and incorporated into some more complex target project that includes other data, images, and text. Two options readily available in RStudio to support these types of complex projects are R documents and presentations. Each of these can include multiple images and text and even incorporate executable R code, which can be wonderful for expressing data and calculations in a reproducible and unambiguous manner. And having covered the application of R from importing through exporting and presenting data we will wind up this course with a few final remarks.