Module 11: Debugging and defensive programming.

This week we learned about debugging and ways we can tackle bugs in our code. Our assignment this week is to debug the following function:
tukey_multiple <- function(x) {
   outliers <- array(TRUE,dim=dim(x))
   for (j in 1:ncol(x))
    {
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j])
    }
outlier.vec <- vector(length=nrow(x))
    for (i in 1:nrow(x))
    { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) }

There are deliberate bugs included in this code and our goal is to use our debugging tools to find what possible issues may lie in the function.



Let us begin!

First thing first we should probably look at our function and have an idea of what it does? This is where it gets tricky. There is no comments in the code (# comments) or inherent restrictions on input that give us an idea what our function does... based on the name of the function and some of the variables names in the function it looks like it is intended to look at outliers and return a vector of Boolean values. Using this lets see if we can find some bugs in the code.

As we learned in the presentation this week, the first step to debugging is realizing we have a bug! The first thing to come to mind is to just run the code and see what we get. 
Inputting the lines of code in the the R console we get:

Error: unexpected symbol in:
"    for (i in 1:nrow(x))
    { outlier.vec[i] <- all(outliers[i,]) } return"


An error! It seems that we have a syntax error in the code. R is confused that that return statement is in the same line as the for loop. We can easily fix this by making sure the return statement has its own line.
>tukey_multiple <- function(x) {
  outliers <- array(TRUE,dim=dim(x))
  for (j in 1:ncol(x))
  {
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j])
  }
  outlier.vec <- vector(length=nrow(x))
  for (i in 1:nrow(x))
  { outlier.vec[i] <- all(outliers[i,]) } 
  return(outlier.vec) }  # Own line outside of the for loop

Now the function code is recognize by R! 
But just because R recognize our code doesn't mean it actually works...
Here we come across the main barrier in debugging a code that we aren't familiarized with, we can only infer what the argument input of the functions are without really knowing what it is intended to be?
I assume that the input must be some sort of data array since in the first line of the function makes a placeholder array that matches our input:
>tukey_multiple <- function(x) {
  outliers <- array(TRUE,dim=dim(x))
  for (j in 1:ncol(x))
 ...}

Also based on the name of the variable (outliers) and the name of the function (tukey) I assume it has something to do with statistical outliers. 
Okay so from this it is most likely we are dealing with multiple statistical data and we want outliers?
I decided to look up some reading on tukey multiple test and based on what I read the function doesn't really seem to be doing any of the statistical calculation that is to be expected. I'm stumped! 

I decided the best thing to do now is to not worry what the function is trying to do since that is not the goal for this assignment and focus more on what bugs may exist when we try to run the function. So I run debug() on tukey_multiple(). This marks the function for debug so when we run the function the debug browser will pop up allow us to see what each step of the function is doing when we run it. 

I run tukey_multple() with a test input I created:
test <- array(data = c(1,2,3,4,5,10,15,25,30,15,20,25,40,20,100),dim = c(5,3))
     [,1] [,2] [,3]
[1,]    1   10   20
[2,]    2   15   25
[3,]    3   25   40
[4,]    4   30   20
[5,]    5   15  100

As the debug browser open I go through each line until I reach one crucial error!
Error in tukey.outlier(x[, j]) : could not find function "tukey.outlier"
This makes sense since we don't have a tukey.outlier() method/function...
To fix solve this issue I created a simple function that returns a TRUE or FALSE if there is an outlier in a collection of numbers using tukey outlier method ( https://en.wikipedia.org/wiki/Outlier)
tukey.outlier <- function(x) { 
  quant <- quantile(x)
  iqr <- quant[4]-quant[2]
  return(any(x <(quant[2]-(1.5*iqr))|any(x > (quant[4]+(1.5*iqr)))))}

Now I run the function again and we see what we if the debug gets an errors, none!
This may not necessarily mean the code is bug free, but based on my limited knowledge of the intention of this function at least we manage to overcome all the errors we know. 

Conclusion

In this assignment I learned quite a bit about debugging and also presentation. I see truly how important notes/comments and communication is when it comes to presentation of code. Unfortunately I was unable to really debug the code based on desired output since I wasn't sure what was to be expected as an output ( let alone input). Although hands on use of the debug browser was very informative, the step by step system really let you see exactly what is going on at each moment in your code allow one to pinpoint where errors may lie. I came across a lot of hurdles and confusion but I think overall I gained a better understanding being a better coder!

- Anthony 

Comments

Popular posts from this blog

R Final project package: Introducing muMotif

Module 8 : I/O, string manipulation and plyr package

Module 9 : Visualization