Explore essential concepts of R programming for data science, from data types and basic syntax to core functions and data visualization principles. Challenge your understanding of foundational R skills needed for effective data analysis and statistical computing.
Which of the following is the correct way to assign the value 10 to a variable called ‘count’ in R?
Explanation: The correct assignment operator in R is the left arrow 'u003C-', so 'count u003C- 10' assigns the value properly. 'count =: 10' and 'count := 10' use invalid operators and are not recognized in R. '10 -u003E count' is valid but less commonly used and reverses the typical left-to-right assignment.
What is the class of the following R object: x u003C- c(TRUE, FALSE, TRUE)?
Explanation: The c(TRUE, FALSE, TRUE) vector creates a logical vector, which contains boolean values. 'numeric' would be for numbers, while 'character' is used for text, and 'factor' is for categorical variables. Only 'logical' matches the type in this context.
If you want to create a sequence of numbers from 1 to 5 in R, which expression is correct?
Explanation: The expression '1:5' produces a sequence of integers from 1 to 5. '1..5' is not valid R syntax, '[1:5]' resembles subsetting but not sequence generation, and 'seq{1,5}' uses incorrect brackets and syntax for the function.
Suppose you have survey responses stored as: c('yes', 'no', 'yes'). Which R data type should you use to represent these as categorical values?
Explanation: Factors are used to represent categorical data such as survey responses. 'Numeric' applies to numbers, 'logical' is for TRUE/FALSE values, while 'matrix' is a multi-dimensional structure unsuitable for categorical variables.
What is the primary characteristic that distinguishes a data frame from a matrix in R?
Explanation: Data frames can store different data types in different columns, while matrices require all elements to be of the same type. A data frame is not limited to numeric values. Matrices cannot contain lists directly, and both data structures need elements of equal length.
To see the first six rows of a data frame named 'df' in R, which function would you use?
Explanation: The 'head' function displays the first part of a data object, typically the first six rows. 'start(df)' and 'first(df)' are not standard R functions. 'view(df)' may open an interactive view in some environments but does not output rows directly.
How would you select only the second column from a data frame 'data' using R?
Explanation: The correct way to select the second column is with 'data[,2]'. 'data[2,]' selects the second row. 'data$2' is invalid syntax because column names, not positions, follow the dollar sign. 'data[2]' returns the second column as a data frame but in a different structure.
Which R function is used to import a CSV file into a data frame?
Explanation: The 'read.csv()' function reads comma-separated CSV files into data frames. 'csv.read()', 'import.csv()', and 'readfile()' are not built-in R functions for this purpose. Only 'read.csv()' matches the standard data import syntax.
Which function would you use in R to create a basic scatter plot of variables x and y?
Explanation: 'plot(x, y)' creates a scatter plot of two variables in base R. 'graph', 'map', and 'chart' are not base R graphics functions or their syntax is incorrect. Only 'plot' directly produces the intended visualization.
Which symbol does R use to denote missing values in a vector or data frame?
Explanation: 'NA' is the standard symbol for missing values in R data structures. 'NULL' represents the absence of an object, while 'NaN' denotes 'Not a Number' from invalid calculations. 'BLANK' is not a standard missing value indicator in R.