R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics, and data analysis.[8]

The core R language is augmented by a large number of extension packages, containing reusable code, documentation, and sample data.

R software is open-source and free software. It is licensed by the GNU Project and available under the GNU General Public License.[3] It is written primarily in C, Fortran, and R itself. Precompiled executables are provided for various operating systems.

As an interpreted language, R has a native command line interface. Moreover, multiple third-party graphical user interfaces are available, such as RStudio—an integrated development environment—and Jupyter—a notebook interface.

History

Ross Ihaka, co-originator of R

R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland.[9] The language was inspired by the S programming language, with most S programs able to run unaltered in R.[6] The language was also inspired by Scheme's lexical scoping, allowing for local variables.[1]

The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert.[10] In August 1993, Ihaka and Gentleman posted a binary of R on StatLib — a data archive website. At the same time, they announced the posting on the s-news mailing list.[11] On December 5, 1997, R became a GNU project when version 0.60 was released.[12] On February 29, 2000, the first official 1.0 version was released.[13]

Packages

refer to caption
Violin plot created from the R visualization package ggplot2

R packages are collections of functions, documentation, and data that expand R.[14] For example, packages add report features such as RMarkdown, knitr and Sweave. Easy package installation and use have contributed to the language's adoption in data science.[15]

The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Fritz Leisch to host R's source code, executable files, documentation, and user-created packages.[16] Its name and scope mimic the Comprehensive TeX Archive Network and the Comprehensive Perl Archive Network.[16] CRAN originally had three mirrors and 12 contributed packages.[17] As of December 2022, it has 103 mirrors[18] and 18,976 contributed packages.[19] Packages are also available on repositories R-Forge, Omegahat, and GitHub.

The Task Views on the CRAN website lists packages in fields such as finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.

The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and high-throughput sequencing methods.

Packages add the capability to implement various statistical techniques such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering.

The tidyverse package is organized to have a common interface around accessing and processing data contained in the data frame data structure, a two-dimensional table of rows and columns. Each function in the package is designed to couple together all the other functions in the package.[20]

Installing a package occurs only once. To install tidyverse:[20]

> install.packages( "tidyverse" )

To instantiate the functions, data, and documentation of a package, execute the library() function. To instantiate tidyverse:[a]

> library( tidyverse )

Interfaces

R comes installed with a command line console. Available for installation are various integrated development environments (IDE). IDEs for R include R.app (OSX/macOS only), Rattle GUI, R Commander, RKWard, RStudio, and Tinn-R.

General purpose IDEs that support R include Eclipse via the StatET plugin and Visual Studio via R Tools for Visual Studio.

Editors that support R include Emacs, Vim via the Nvim-R plugin, Kate, LyX via Sweave, WinEdt (website), and Jupyter (website).

Scripting languages that support R include Python (website), Perl (website), Ruby (source code), F# (website), and Julia (source code).

General purpose programming languages that support R include Java via the Rserve socket server, and .NET C# (website).

Statistical frameworks which use R in the background include Jamovi and JASP.

Community

The R Core Team was founded in 1997 to maintain the R source code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.

The R Journal is an open access, academic journal which features short to medium-length articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.

The R community hosts many conferences and in-person meetups. These groups include:

  • UseR!: an annual international R user conference (website)
  • Directions in Statistical Computing (DSC) (website)
  • R-Ladies: an organization to promote gender diversity in the R community (website)
  • SatRdays: R-focused conferences held on Saturdays (website)
  • R Conference (website)
  • posit::conf (formerly known as rstudio::conf) (website)

Implementations

The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include:

Microsoft R Open (MRO) was a R implementation. As of 30 June 2021, Microsoft started to phase out MRO in favor of the CRAN distribution.[23]

Commercial support

Although R is an open-source project, some companies provide commercial support:

  • Revolution Analytics provides commercial support for Revolution R.
  • Oracle provides commercial support for the Big Data Appliance, which integrates R into its other products.
  • IBM provides commercial support for in-Hadoop execution of R.

Examples

Basic syntax

The following examples illustrate the basic syntax of the language and use of the command-line interface. (An expanded list of standard language features can be found in the R manual, "An Introduction to R".[24])

In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases.[25]

> x <- 1:6 # Create a numeric vector in the current environment
> y <- x^2 # Create vector based on the values in x.
> print(y) # Print the vector’s contents.
[1]  1  4  9 16 25 36

> z <- x + y # Create a new vector that is the sum of x and y
> z # Return the contents of z to the current environment.
[1]  2  6 12 20 30 42

> z_matrix <- matrix(z, nrow=3) # Create a new matrix that turns the vector z into a 3x2 matrix object
> z_matrix 
     [,1] [,2]
[1,]    2   20
[2,]    6   30
[3,]   12   42

> 2*t(z_matrix)-2 # Transpose the matrix, multiply every element by 2, subtract 2 from each element in the matrix, and return the results to the terminal.
     [,1] [,2] [,3]
[1,]    2   10   22
[2,]   38   58   82

> new_df <- data.frame(t(z_matrix), row.names=c('A','B')) # Create a new data.frame object that contains the data from a transposed z_matrix, with row names 'A' and 'B'
> names(new_df) <- c('X','Y','Z') # Set the column names of new_df as X, Y, and Z.
> print(new_df)  # Print the current results.
   X  Y  Z
A  2  6 12
B 20 30 42

> new_df$Z # Output the Z column
[1] 12 42

> new_df$Z==new_df['Z'] && new_df[3]==new_df$Z # The data.frame column Z can be accessed using $Z, ['Z'], or [3] syntax and the values are the same. 
[1] TRUE

> attributes(new_df) # Print attributes information about the new_df object
$names
[1] "X" "Y" "Z"

$row.names
[1] "A" "B"

$class
[1] "data.frame"

> attributes(new_df)$row.names <- c('one','two') # Access and then change the row.names attribute; can also be done using rownames()
> new_df
     X  Y  Z
one  2  6 12
two 20 30 42

Structure of a function

One of R's strengths is the ease of creating new functions.[26] Objects in the function body remain local to the function, and any data type may be returned.

Create a function:

# The input parameters are x and y.
# The function returns a linear combination of x and y.
f <- function(x, y) {
  z <- 3 * x + 4 * y

  # this return() statement is optional
  return(z)
}

Usage output:

> f(1, 2)
[1] 11

> f(c(1,2,3), c(5,3,4))
[1] 23 18 25

> f(1:3, 4)
[1] 19 22 25

Native pipe operator

In R version 4.1.0, a native pipe operator, |>, was introduced.[27] This operator allows users to chain functions together one after another, instead of a nested function call.

> nrow(subset(mtcars, cyl == 4)) # Nested without the pipe character
[1] 11

> mtcars |> subset(cyl == 4) |> nrow() # Using the pipe character
[1] 11

Another alternative to nested functions, in contrast to using the pipe character, is using intermediate objects. However, some argue that using the pipe operator will produce code that is easier to read.[20]

> mtcars_subset_rows <- subset(mtcars, cyl == 4)
> num_mtcars_subset <- nrow(mtcars_subset_rows)
> print(num_mtcars_subset)
[1] 11

Modeling and plotting

Diagnostic plots from plotting “model” (q.v. “plot.lm()” function). Notice the mathematical notation allowed in labels (lower left plot).

The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.

# Create x and y values
x <- 1:6
y <- x^2

# Linear regression model y = A + B * x
model <- lm(y ~ x)

# Display an in-depth summary of the model
summary(model)

# Create a 2 by 2 layout for figures
par(mfrow = c(2, 2))

# Output diagnostic plots of the model
plot(model)

Output:

Residuals:
      1       2       3       4       5       6       7       8      9      10
 3.3333 -0.6667 -2.6667 -2.6667 -0.6667  3.3333

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -9.3333     2.8441  -3.282 0.030453 * 
x             7.0000     0.7303   9.585 0.000662 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared:  0.9583, Adjusted R-squared:  0.9478
F-statistic: 91.88 on 1 and 4 DF,  p-value: 0.000662

Mandelbrot set

"Mandelbrot.gif" graphic created in R

This Mandelbrot set example highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c, where c represents different complex constants.

Install the package that provides the write.gif() function beforehand:

install.packages("caTools")

R Source code:

library(caTools)

jet.colors <-
    colorRampPalette(
        c("green", "pink", "#007FFF", "cyan", "#7FFF7F",
          "white", "#FF7F00", "red", "#7F0000"))

dx <- 1500 # define width
dy <- 1400 # define height

C  <-
    complex(
            real = rep(seq(-2.2, 1.0, length.out = dx), each = dy),
            imag = rep(seq(-1.2, 1.2, length.out = dy), times = dx)
            )

# reshape as matrix of complex numbers
C <- matrix(C, dy, dx)

# initialize output 3D array
X <- array(0, c(dy, dx, 20))

Z <- 0

# loop with 20 iterations
for (k in 1:20) {

  # the central difference equation
  Z <- Z^2 + C

  # capture the results
  X[, , k] <- exp(-abs(Z))
}

write.gif(
    X,
    "Mandelbrot.gif",
    col = jet.colors,
    delay = 100)

See also

Further reading

  • Wickham, Hadley; Çetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for data science: import, tidy, transform, visualize, and model data (2nd ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. ISBN 978-1-4920-9740-2.
  • Gagolewski, Marek (2024). Deep R Programming. doi:10.5281/ZENODO.7490464. ISBN 978-0-6455719-2-9.

External links

Portal

Notes

  1. ^ This displays to standard error a listing of all the packages that tidyverse depends upon. It may also display two errors showing conflict. The errors may be ignored.

References

  1. ^ a b c Morandat, Frances; Hill, Brandon; Osvald, Leo; Vitek, Jan (11 June 2012). "Evaluating the design of the R language: objects and functions for data analysis". European Conference on Object-Oriented Programming. 2012: 104–131. doi:10.1007/978-3-642-31057-7_6. Retrieved 17 May 2016 – via SpringerLink.
  2. ^ Peter Dalgaard (29 February 2024). "R 4.3.3 is released". Retrieved 1 March 2024.
  3. ^ a b "R - Free Software Directory". directory.fsf.org. Retrieved 26 January 2024.
  4. ^ "R scripts". mercury.webster.edu. Retrieved 17 July 2021.
  5. ^ "R Data Format Family (.rdata, .rda)". Loc.gov. 9 June 2017. Retrieved 17 July 2021.
  6. ^ a b Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 3.3 What are the differences between R and S?. Archived from the original on 28 December 2022. Retrieved 27 December 2022.
  7. ^ "Introduction". The Julia Manual. Archived from the original on 20 June 2018. Retrieved 5 August 2018.
  8. ^ Giorgi, Federico M.; Ceraolo, Carmine; Mercatelli, Daniele (27 April 2022). "The R Language: An Engine for Bioinformatics and Data Science". Life. 12 (5): 648. Bibcode:2022Life...12..648G. doi:10.3390/life12050648. PMC 9148156. PMID 35629316.
  9. ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 12. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022. We set a goal of developing enough of a language to teach introductory statistics courses at Auckland.
  10. ^ Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 2.13 What is the R Foundation?. Archived from the original on 28 December 2022. Retrieved 28 December 2022.
  11. ^ Ihaka, Ross. "R: Past and Future History" (PDF). p. 4. Archived (PDF) from the original on 28 December 2022. Retrieved 28 December 2022.
  12. ^ Ihaka, Ross (5 December 1997). "New R Version for Unix". stat.ethz.ch. Archived from the original on 12 February 2023. Retrieved 12 February 2023.
  13. ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 18. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022.
  14. ^ Wickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. p. xvii. ISBN 978-1-492-09740-2.
  15. ^ Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi:10.32614/RJ-2020-028. ISSN 2073-4859. The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.
  16. ^ a b Hornik, Kurt (2012). "The Comprehensive R Archive Network". WIREs Computational Statistics. 4 (4): 394–398. doi:10.1002/wics.1212. ISSN 1939-5108. S2CID 62231320.
  17. ^ Kurt Hornik (23 April 1997). "Announce: CRAN". r-help. Wikidata Q101068595..
  18. ^ "The Status of CRAN Mirrors". cran.r-project.org. Retrieved 30 December 2022.
  19. ^ "CRAN - Contributed Packages". cran.r-project.org. Retrieved 29 December 2022.
  20. ^ a b c Wickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. ISBN 978-1-492-09740-2.
  21. ^ Talbot, Justin; DeVito, Zachary; Hanrahan, Pat (1 January 2012). "Riposte: A trace-driven compiler and parallel VM for vector code in R". Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM. pp. 43–52. doi:10.1145/2370816.2370825. ISBN 9781450311823. S2CID 1989369.
  22. ^ Jackson, Joab (16 May 2013). TIBCO offers free R to the enterprise. PC World. Retrieved 20 July 2015.
  23. ^ "Looking to the future for R in Azure SQL and SQL Server". 30 June 2021. Retrieved 7 November 2021.
  24. ^ "An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics" (PDF). Retrieved 3 January 2021.
  25. ^ R Development Core Team. "Assignments with the = Operator". Retrieved 11 September 2018.
  26. ^ Kabacoff, Robert (2012). "Quick-R: User-Defined Functions". statmethods.net. Retrieved 28 September 2018.
  27. ^ "R: R News". cran.r-project.org. Retrieved 14 March 2024.