Programming structure assignment (Doctors diagnosis and BP)

This week's assignment we are given a mock up of patients' frequency visiting the hospital(Freq), their blood pressure(BP), and three different doctors rating on the patients condition. The first doctor is a general doctor and is simply stating "bad" or "good", and the other two doctors are external doctors rating the patients condition based on decision regarding immediate care( low or high). We are told to give these rating numerical representation based on either 1 or 0. (bad = 0, good =1) and low = 0 ,high = 1)
Variables are : Freq, BP, "First", "second", "final".
1.    "0.6","103","bad","low","low”
2.     "0.3","87","bad","low","high”
3.     "0.4","32","bad","high","low”
4.      "0.4","42","bad","high","high"
5.     "0.2","59","good","low","low”
6.      "0.6","109","good","low","high”
7.     "0.3","78","good","high","low”
8.      "0.4","205","good","high","high”
9.      "0.9","135",”NA","high","high"
10.    "0.2","176",”bad","high","high”

Our main goals this week is to make a side-by-side boxplot and a histogram of the data presented to us.
We are also expected to discuss whatever results we were able to attain from the data.

Let us begin: 

First we need to convert each variable to a vector:

>Freq <- c(0.6,0.3,0.4,0.4,0.2,0.6,0.3,0.4,0.9,0.2)
>BP <- c(103,87,32,42,59,109,78,205,135,176)
>First <- c(1,1,1,1,0,0,0,0,NA,1)
>Second <- c(0,0,1,1,0,0,1,1,1,1)
>Final <- c(0,1,0,1,0,1,0,1,1,1)


From here I decided to use what we learned last session and organized all of our data in a data frame:


>docdf <- data.frame(Freq,BP,First,Second,Final,stringsAsFactors = FALSE)

Now that we have our data organize we can begin to decide how we want to analyze the data. 
I decided to focus on how the patients' Frequency of hospital visits and Blood Pressure relate to the how all three doctors rated the patients condition. 
What I will do is take the three doctor's ratings (1 or 0) and determine the majority rating based on which rating is more favored.  Therefore any values greater than 1(>1/3) would represent instances when the doctors overall felt the patient was at some concern and values less than 1 would mean the doctors were not critically concerned. 
Here is the function I came up with to represent these data as a side-by-side boxplot:

# This function function will create a Boxplot based on MDs' rating #of either Freq or Bp based on colm number
> plotBox <- function(df,colm){
+ if (colm !=1 && colm!=2){return("Please pick either colm 1 or 2")} # A Check to make sure users enter either 1 or 2 for colm
+ docs <-vector() 
+ zeros = vector() 
+ ones = vector()
+ for(i in 1:nrow(df)){
+   docs<- c(docs,(sum(df[i,3:5],na.rm = TRUE)))
+   if(docs[i] >1){ones <- c(ones,df[i,colm])} else{zeros <- c(zeros,df[i,colm])}
  }
+ if(colm ==1){return(
+   boxplot(ones,zeros,
+   main= "Boxplot of frequency values based on overall MDs' rating",
+   names= c("Concerned","Unconcerned"),
+   ylab ="Frequency of hospital visits in a 12 month period"))
+   }
+ else if (colm==2){return( 
+   boxplot(ones,zeros,
+   main= "Boxplot of BP values based on overall MDs' rating",
+   names= c("Concerned","Unconcerned"),
+   ylab ="BP Values"))
+ }

First in our function we state the arguments that will be used. To keep things modular we have the df argument for the data frame and colm represents either Freq or BP, 1 or 2 respectively. 

I added an instant return message if the user entered a value not 1 or 2.

(Note: originally I had two different function that did  Freq or BP plotbox, but since both code were largely a repeat except for a few variables I decided to combine the two to reduce the size of the overall code and instead used arguments an extra argument to switch between Freq and BP)

Next I have three empty vector variables. 
docs will house the consensus  rating of the three doctors that will be obtained from a for loop (more later about the loop later)
zeros and ones will  look at the consensus rating of the doctors and will add the  corresponding BP/Freq to zeros if more doctors rated 0 and ones if more doctors rated 1. This will all be further explained in the for loop section next.

In our for loop, we are going through each row, since Data frame must have equal number of rows we it will loop through each row and determine the overall consensus of the doctors and assign the BP/Freq to the respective category . The first line of the for loop will count up all the 1s between the doctors. Note that we needed to add na.rm to the sum function to ignore the NA that was in the data for the First doctor. 

+ docs<- c(docs,(sum(df[i,3:5],na.rm = TRUE)))

The next line looks at the sum and determines if that value belongs in zeros or ones

if(docs[i] >1){ones <- c(ones,df[i,colm])} else{zeros <- c(zeros,df[i,colm])}

Finally the last part of the function checks if we are focusing on the patient frequency to the hospital or their BP and return a labeled Boxplot accordingly. 

Here are the plots:

From our line:
>plotBox(docdf,1)

and our line:
>plotBox(docdf,2)



For the histogram I simply just ran a histogram function on our Freq vector to get an idea of how often patients are coming to the hospital visually. 
hist(Freq)


Graphs and figures are fine, but I decided it would be nice if we could get some hard numbers on the BP and Freq based on the doctors overall consensus. So I came up with a function that would return the mean of either the Blood Pressure or Freq of hospital of either patients that were good=1 or bad=0.

# This function is similar to plotBox, but is only interested in returning a mean value, df is the dataframe,colm is the characteristic of the patient we are interested in and oz is either 1 or 0 based on the MDs overall rating the user is interested in.
>meanofMD <- function(df,colm,oz) {
+  if (colm !=1 && colm!=2){return("Please enter either 1 or 2 for colm")}
+  if (oz != 1 && oz !=0 ){return("Please enter either 1 or 0 for oz")}
+  docs <-vector() 
+  zeros = vector() 
+  ones = vector()
+  for(i in 1:nrow(df)){
+    docs<- c(docs,(sum(df[i,3:5],na.rm = TRUE)))
+    if(docs[i] >1){ones <- c(ones,df[i,colm])} else{zeros <- c(zeros,df[i,colm])}}
+  if (oz == 0){return(mean(zeros))}
+  else{return(mean(ones))}
+}  

The main loop and logic of this function is very much the same of the plotbox function we made earlier.
The arguments for this function is mostly the same as plotbox but I added one more argument oz that is either 1 or 0, this return the mean of either concerned patients = 1 or unconcerned patients = 0.

The the main difference from plotbox is mostly in the last two lines, instead of returning a boxplot we will return a mean:
+ if (oz == 0){return(mean(zeros))}
+ else{return(mean(ones))}

I ran the four different possible means with the following lines of code:
>meanofMD(docdf,2,1) #mean of BP of patients MD found concerning
>meanofMD(docdf,2,0) #mean of BP of patients MD were not concerned
>meanofMD(docdf,1,1) #mean of Freq of patients MD found concerning
>meanofMD(docdf,1,0) #mean of Freq of patients MD were not concerned

Results:
> meanofMD(docdf,2,1)
[1] 112.8333
> meanofMD(docdf,2,0)
[1] 87.25
> meanofMD(docdf,1,1)
[1] 0.4333333
> meanofMD(docdf,1,0)
[1] 0.425

Here we can see that there is a noticeable difference in the mean of the patient's BP between patients the doctors were concerned about and not too concerned about. 
While we see that the frequency of hospital visits don't seem to differ that much and may not play that much of a role in whether a doctor feels the patient is need of critical care. 

This is just one simple take on the data given. Hope you enjoy the post.
Please feel free to comment about any concerns you may have.

-Anthony 

Comments

Popular posts from this blog

R Final project package: Introducing muMotif

Module 8 : I/O, string manipulation and plyr package

Module 9 : Visualization