Posts

Assignment #10: Building Your Own R Package

Image
  The plot_summary() function is used to automatically generate visualizations for a dataset. It begins by loading the ggplot2 package, which is used for creating graphs. The function then separates the dataset into numeric and categorical variables using sapply() with is.numeric and is.factor . This allows the function to treat each type of data appropriately. For numeric variables, a loop is used to create histograms that display the distribution of values. For categorical variables, another loop creates bar charts to show the frequency of each category. Each plot is printed using print(p) , which ensures that the graphs appear immediately in the RStudio Plots pane without needing additional commands. The interactive_chart() function creates an interactive scatter plot between two selected variables. It loads both ggplot2 and plotly , where ggplot2 is used to build the initial plot and plotly adds interactivity. The function takes a dataset and two column names as inputs,...

Module 9: Visualization in R - Base Graphic, Lattice, and ggplot2

Image
  The Base R graphics provide a simple and direct way to create visualization. The plot() function creates a scatter plot showing the relationship between weight and fuel efficiency. The hist() function displays the distribution of horsepower. These functions are straightforward but require manual adjustments for styling. Lattice graphics are useful for creating grouped or conditioned plots. The xyplot() function creates multiple scatter plots based on the number of cylinders, allowing for easy comparison. The bwplot() function shows how horsepower varies across cylinder groups. Lattice uses formulas, making it efficient for grouped data visualization. The ggplot2 graphics uses a layered approach based on the grammer of graphics. The ggplot() function builds plots layer by layer. geom_point() adds data points, while geom_smooth() adds a regression line. Faceting allows us to split the histogram by cylinder groups. This system is highly customizable and produces professional-q...

Module 8 Input/ Output, String Manipulation and pylr package

Image
  First, the required package plyr is installed and loaded. This package allows grouped operations on datasets. The dataset is then imported using read.table() with file.choose() , which opens a prompt to select the file from the computer. Next, the ddply() function groups the dataset by the Sex variable and calculates the mean of the Grade column for each group. A new variable called Grade.Average is created containing the calculated means. Finally, the results are written to a text file using write.table() so they can be saved outside of R. The line y <- ddply(x, "Sex", transform, Grade.Average = mean(Grade)) calculates the average grade grouped by the Sex category. The ddply() function separates the dataset by the values in the Sex column and calculates the mean of the Grade column for each group. The result is stored in a new variable called y , and a new column called Grade.Average is added to display the calculated mean values. The command write.table(y,...

Module 7 R Object: S3 vs. S4 Assignment

Image
  I used the built-in dataset mtcars to answer all the questions. I used the head() function to display the first six rows, so we see the structure of the data. This will help us understand what type of object we are working with before applying the functions. The summary() function is a generic function that will detect that mtcars is a data frame. The output that it gives us will provide a descriptive statistic (min, max, mean, quartiles) for each variable. I had an error about the "figure margins too large" and that is because the plot window was too small. I had to use the par() function to reset the plot margins to default size. The plot() function creates a graph of miles per gallon values. The graph shows the results after using the plot() function. I used the class(mtcars) to make sure the object is "data.frame" and the isS4() function to check whether the object is S4. Since the output returned FALSE, it confirms that mtcars uses the S3 system. Next, I use ...

Module 6 Assignment

Image
  First, two small matrices are created and used to practice matrix addition and subtraction . These operations combine values based on their position in the matrix. Addition adds corresponding elements, while subtraction finds the difference between elements in the same row and column positions. This demonstrates how matrix arithmetic works element by element. Since we are creating a matrices using the matrix() function, R will fill the matrices by column. The matrices will look like this:  By combining all the corresponding numbers to each column by adding, you will get the value of what it equals to in each spot. Matrix addition adds corresponding positions: 2 + 5 = 7 1 + 4 = 5 0 + 2 = 2 3 + (-1) = 2 Here we are doing the same thing to the matrix. Only this time we are subtracting the values in each box. Matrix subtraction subtracts corresponding elements: 2 - 5 = -3 1 - 4 = -3 0 - 2 = -2 3 - (-1) = 4 Next, the assignment uses the diag() function to build a diagonal matrix...

Module 5

Image
  This a 10 x 10 in R. Matrix A contains the numbers 1 - 100, and the Matrix B contains the numbers 1 - 1,000. The matrix() function arranges the numbers into rows and columns so they can be used to matrix calculations. By specifying nrow = 10, it tells R to place the numbers into 10 rows. Since there are 100 values total, R fill 10 columns automatically. That how you get the 10 x 10 square matrix. With the 1:1000, it creates a sequence from 1 to 1000. The matrix () functions arrange the numbers into 10 rows, since there is 1000 numbers total, R creates 100 columns (1000 / 10 = 100). So, B becomes a 10 100 matrix, which is rectangular rather than square. This matters since there is only square matrices and can have determinants and inverses. The rm() function removes all existing objects from memory and ensure that no old variables interfere with your results. 1 :100 created numbers from 1 to 100. These numbers were arranged into 10 rows, producing a 10 x 10 matrix for A. 1: 1000 c...

Module 4 Assignment

Image
I implemented each code and assigned them a vector by name. Each data given base of the module.   The boxplot shows that the patients who got a high final emergency decision usually had higher blood pressure values compared to those who received low care. The group who received high care showed a wider range of blood pressure levels, including extreme values.  This shows a clear difference between patients who received low care and those who received high care. While patients with low-care group had a lower and more tightly grouped blood pressure measurements. It indicates that the blood pressure remains within a lower or more stable range, which would make the final decision more likely to be a low emergency care. The histogram shows that most patients have blood pressure below 120, but there are a lot of high outliers above 150, including one extreme case above 200. These higher blood pressure values appear to influence the doctors' final decision to be flagged for high emer...