Posts

R Final project package: Introducing muMotif

Image
Welcome! Greetings and welcome to the write up of muMotif package. Github link to muMotif:  https://github.com/Ant-nguyen/RmuMotif This was a final project for our R programming course, the goal of this package is to provide some tools that can help one look up motifs in DNA sequences. Motifs can be thought as patterns or hidden messages that standout in DNA. Motifs can be useful for narrowing down on regions of interest in large DNA sequences.  Much of the logic and principles of motif search were inspired by:  https://www.bioinformaticsalgorithms.org/ Throughout this blog I hope to explain the concepts and though processes behind the package, if you find some of these concepts interesting or if my explanation is lacking I highly recommend the link above. Let us begin! Quick Tour: MuMotif is a package of functions that help allow one to find motifs in strings of DNA. The package is also set up to allow for quick visualization of some of the results. There are two broad s...

Module 12: R Markdown

Image
This week we will be working with R Markdown and exploring some of the functions and format of R Markdown. Our assignment this week is simply to make an R Markdown file and play around with some of the features in order for us to use R Markdown in the future to present R input and output neatly.  GitHub link:  https://github.com/Ant-nguyen/Intro_r_2021/blob/main/Module%2012.Rmd Let us begin ! First we have to make a .rmd file. This is very simple in RStudio, where all we have to do is create an R markdown file, much like when creating a new script or project in RStudio. From here we are prompt what we want to name this file, and the desired output type. R Markdown can produce many different document types, this includes HTML,pdf ,word docs and many more. Here I decided to simply do HTML . One thing to note about creating R Markdown with RStudio is that they automatically include a little write up about what is R Markdown and some examples of its feature.  I removed this f...

Module 11: Debugging and defensive programming.

This week we learned about debugging and ways we can tackle bugs in our code. Our assignment this week is to debug the following function: tukey_multiple <- function(x) {    outliers <- array(TRUE,dim=dim(x))    for (j in 1:ncol(x))     {     outliers[,j] <- outliers[,j] && tukey.outlier(x[,j])     } outlier.vec <- vector(length=nrow(x))     for (i in 1:nrow(x))     { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) } There are deliberate bugs included in this code and our goal is to use our debugging tools to find what possible issues may lie in the function. GitHub link: https://github.com/Ant-nguyen/Intro_r_2021/blob/main/Module%2011.R Let us begin! First thing first we should probably look at our function and have an idea of what it does? This is where it gets tricky. There is no comments in the code (# comments) or inherent restrictions on input that give u...

Module 10: Building your own R package

Image
This week we begin to dive into the procedures needed to create an official R package. Our assignment this week was to create a package in R and create a DESCRIPTION File.  GitHub link:  https://github.com/Ant-nguyen/Intro_r_2021/blob/main/DESCRIPTION Let us begin!:  First thing first, we will be heavily reliant on two main packages: 1. devtools 2. roxygen2 devtools will provide us with most of the functions needed in order to create a package and roxygen2 is a great package that streamlines many of the documentations needed in packages for R.  We simply have to install the packages: >install.packages("devtools") >library("devtools") >install.packages("roxygen2") >library*"roxygen2") Unfortunately for me there was an error in my "backports", that prevented me from installing and using devtools correctly. Upon searching online I found a way to reinstall the backports package and install devtools again. This fixed my issue: ...

Module 9 : Visualization

Image
This week we will be exploring some of the different visualization options available in R. We will mainly be comparing the built in R plot, lattice package and ggplot2 package. We will discuss the difference syntax and abilities of each method.  Our assignment was to find a dataset from the following link :  https://vincentarelbundock.github.io/Rdatasets/datasets.html   and using the three different visualization methods listed above create some visualization with the data.  GitHub link:  https://github.com/Ant-nguyen/Intro_r_2021/blob/main/Module%209.R Let us begin! The dataset I decided to use was the  Wong   dataset.  https://vincentarelbundock.github.io/Rdatasets/doc/carData/Wong.html This dataset looks at IQ of patients after being in a coma.  I downloaded the csv file into the directory that is the same as where R is installed and ran the following line of code to input the data into R: > wong<- read.table("Wong.CSV",header =...

Module 8 : I/O, string manipulation and plyr package

Image
This week we will be exploring the Input and Out capabilities of R, as well as the usefulness of the tools of the plyr package. For this assignment we are given three different tasks: 1. We must import a text file with that contain a data set of Name, Age, Sex and Grade of 20 students and then using plyr package organize the data based on sex, and include a new column with the grade average of their sex.  2. We need to take our data and output our data into a CSV(Comma Separated Value) format. 3. We will take the original data set and extract the data for only students that have the character "i" in their name. And then output a the results to a file in the CSV format. Github link:  https://github.com/Ant-nguyen/Intro_r_2021/blob/main/Module%208.R Let us begin! First we need to input the text file into R. To do this I used the read.table() function. This is a specific function for files that are already in a table like format. I used the arguments header = TRUE and sep =...

Module 7: Object Oriented Programming

Image
This week we will be exploring object oriented programming (OOP).  GitHub Link:  https://github.com/Ant-nguyen/Intro_r_2021/blob/main/Module7.R Our assignment this week is to obtain any type of data and then try to determine what generic functions can be assigned to the data set. Then see how we can utilize S3 and S4 class structures and OOP paradigms in the data. This week I decided to challenge myself a bit and use a slightly unconventional type of data, a sequence of genomic data. The goal being to try to utilize OOP methods when tackling a sequence of DNA.  For those who are unfamiliar with DNA transcription I will be covering some of the basics through out but the following video is a simple to understand explanation of some of the concepts I will be dealing with: Link to DNA data:  https://www.ncbi.nlm.nih.gov/nuccore/JX262162 Raw genomic sequence:  https://www.ncbi.nlm.nih.gov/nuccore/JX262162.1?report=fasta Let us begin First let me begin by explaining w...