Getting started with R.
R is a flexible and powerful open-source Language and has extensive statistical and graphing capabilities. Its syntax is very simple and intuitive. The large and fast-growing community around the R language has certainly contributed to its value as a programming language and as a data analysis environment.
1) Install R and R studio (IDE)
2) Install packages:
install.packages("package name")
3) Rstudio OverView
RStudio is the most popular R code editor, and it interfaces with R for Windows, MacOS, and Linux platforms.
- script pane– to write and save the programming script
- Console pane – where all the code will get executed
- Environment/history pane – displays all the variables created,functions used with in the current session
- Helper pane – contains multiple tabs to install/display packages, view visualization plots, locate files within the workspace
4) The Workspace
The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions)
5) Entering Commands
R is a command line driven program. The user enters commands at the prompt (> by default) and each command is executed one at a time.
6) Data Types in R
A vector is a variable in the commonly admitted meaning. A factor is a categorical variable. An array is a table with k dimensions, a matrix being a particular case of array with k = 2. Note that the elements of an array or of a matrix are all of the same mode. A data frame is a table composed with one or several vectors and/or factors all of the same length but possibly of different modes.
7) variable assignment (<- or =)
variable <- 10
Extracting elements: this, [, can be used to extract content from vectors, lists, or data frames. and, [[ and $, extract content from a single object.
8) Getting Help
Once R is installed, there is a comprehensive built-in help system. At the program’s command prompt you can use any of the following:
help("data.frame")
?data.frame
?getwd
?"$"
9) Books
Important Packages
To load data
RMySQL, RPostgresSQL, RSQLite – to read in data from a database.
XLConnect, xlsx – to read and write Micorsoft Excel files from R.
foreign – to read a SAS/SPSS data set into R
R can handle plain text files – no package required. Just use the functions read.csv, read.table, and read.fwf.
To manipulate data
dplyr – dplyr is a go to package for fast data manipulation.
tidyr – Tools for changing the layout of your data sets.
stringr – Easy to learn tools for regular expressions and character strings.
lubridate – Tools that make working with dates and times easier.
To visualize data
ggplot2 – R’s famous package for making beautiful graphics.
ggvis – Interactive, web based graphics built with the grammar of graphics.
rgl – Interactive 3D visualizations with R
googleVis – Let’s you use Google Chart tools to visualize data in R.
To model data
car – car’s Anova function is popular for making type II and type III Anova tables.
mgcv – Generalized Additive Models
lme4/nlme – Linear and Non-linear mixed effects models
randomForest – Random forest methods from machine learning
multcomp – Tools for multiple comparison testing
vcd – Visualization tools and tests for categorical data
glmnet – Lasso and elastic-net regression methods with cross validation
survival – Tools for survival analysis
caret – Tools for training regression and classification models
To report results
shiny – Easily make interactive, web apps with R.
R Markdown – The perfect workflow for reproducible reporting.
For Spatial data
sp, maptools – Tools for loading and using spatial data including shapefiles.
maps – Easy to use map polygons for plots.
ggmap – Download street maps straight from Google maps and use them as a background in your ggplots.
For Time Series and Financial data
zoo – Provides the most popular format for saving time series objects in R.
xts – Very flexible tools for manipulating time series data sets.
quantmod – Tools for downloading financial data, plotting common charts, and doing technical analysis.
To write high performance R code
Rcpp – Write R functions that call C++ code for lightning fast speed.
data.table – An alternative way to organize data sets for very, very fast operations.
parallel – Use parallel processing in R to speed up your code or to crunch large data sets.
To work with the web
XML – Read and create XML documents with R
jsonlite – Read and create JSON data tables with R
httr – A set of useful tools for working with http connections
2 thoughts on “How to get started with R.”