{"id":4344,"date":"2019-01-07T21:12:23","date_gmt":"2019-01-07T19:12:23","guid":{"rendered":"https:\/\/engel-wolf.com\/?p=4344"},"modified":"2019-10-26T23:11:51","modified_gmt":"2019-10-26T21:11:51","slug":"tutorial-an-app-in-r-shiny-visualizing-biopsy-data%e2%80%8a-%e2%80%8ain-a-pharmaceutical-company","status":"publish","type":"post","link":"https:\/\/engel-wolf.com\/?p=4344","title":{"rendered":"Tutorial: An app in R shiny visualizing biopsy data\u200a\u2014\u200ain a pharmaceutical company"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\" id=\"e0fa\">Learn how to build a shiny app for the visualization of clustering results. The app helps to better identify patient data samples, e.g. during a clinical&nbsp;study.<\/h3>\n\n\n\n<p>This tutorial is a joint work effort. The Tutorial was presented by&nbsp;<a href=\"https:\/\/github.com\/olaf-menzer\" rel=\"noreferrer noopener\" target=\"_blank\">Olaf Menzer<\/a>&nbsp;in a workshop at the&nbsp;<a href=\"https:\/\/odsc.com\/training\/portfolio\/visual-elements-of-data-science\" rel=\"noreferrer noopener\" target=\"_blank\">ODSC West Conference in San Francisco in 2018<\/a>.&nbsp;<a href=\"https:\/\/github.com\/zappingseb\" rel=\"noreferrer noopener\" target=\"_blank\">Sebastian Wolf<\/a>&nbsp;was co-implementing this application as an expert in bio-pharmaceutical web-applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"98d8\">What is it&nbsp;about?<\/h3>\n\n\n\n<p>The story behind this app comes from a real application inside departments evaluating clinical studies in diagnostics.<\/p>\n\n\n\n<p>Due to fast recruiting for clinical studies the patient cohorts seemed to be inhomogenic. Therefore a researcher and a clinical study statistician wanted to find out, by which parameter they can find patients, that do not seem to fit their desired class. Maybe there was a mistake in the labeling of a patient\u2019s disease status? Maybe one measurement or two measurements can be used to easily find such patients?<\/p>\n\n\n\n<p>The example data used here is real data from a 1990\u2019s study, known as the biopsy data set,&nbsp;<a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/breast+cancer+wisconsin+%28original%29\" rel=\"noreferrer noopener\" target=\"_blank\">also hosted on UCI ML data repository<\/a>. The app that should be build inside this tutorial is preliminary and was especially build for the tutorial. Pieces of it were applied in real world biostatistical applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"0673\">Starting point<\/h3>\n\n\n\n<p>Please start forking the tutorial at:&nbsp;<a href=\"https:\/\/github.com\/zappingseb\/biopharma-app\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/github.com\/zappingseb\/biopharma-app<\/a><\/p>\n\n\n\n<p class=\"wrap\">and afterwards run the installation of dependencies in your R session:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\ninstall.packages(c(&quot;ape&quot;,&quot;dplyr&quot;, &quot;magrittr&quot;, &quot;igraph&quot;, &quot;viridisLite&quot;, &quot;MASS&quot;, &quot;shiny&quot;))\n<\/pre><\/div>\n\n\n<p>if you have all the packages installed you will have one file to work on. This file is&nbsp;<code>app.R<\/code>. This&nbsp;<code>app.R<\/code>&nbsp;allows you to build a&nbsp;<a href=\"https:\/\/shiny.rstudio.com\/\" rel=\"noreferrer noopener\" target=\"_blank\">shiny<\/a>&nbsp;application.<\/p>\n\n\n\n<p>This file we will use to insert the right visualizations and the right table for the researcher to fullfil the task named above. To check what the app will finally look like you can already perform&nbsp;<code>runApp()<\/code>&nbsp;inside the console and your browser will open the app.<\/p>\n\n\n\n<p>The app already contains:<\/p>\n\n\n\n<p>A&nbsp;<code>sideBarPanel<\/code>&nbsp;that has all the inputs we need<\/p>\n\n\n\n<ul><li>Slider for the # of patients<\/li><li>Slider for the # of desired clusters\/groups<\/li><li>Empty input to choose measurements\u200a\u2014\u200ashall be done by you<\/li><li>Dropdown field for the clustering method<\/li><\/ul>\n\n\n\n<p>A&nbsp;<code>server<\/code>&nbsp;function that will provide<\/p>\n\n\n\n<ol><li>The empty input to choose measurments<\/li><li>A Heatmap to see the outcome of clustering<\/li><li>A Phylogenetic tree plot to see the outcome of clustering<\/li><li>A table to see the outcome of the clustering<\/li><\/ol>\n\n\n\n<p>It already contains a function to provide you with the input data sets&nbsp;<code>biopsy<\/code>&nbsp;and&nbsp;<code>biopsy_numeric()<\/code>, as&nbsp;<code>biopsy_numeric<\/code>&nbsp;is a&nbsp;<a href=\"https:\/\/shiny.rstudio.com\/tutorial\/written-tutorial\/lesson6\/\" rel=\"noreferrer noopener\" target=\"_blank\">reactive<\/a>.<\/p>\n\n\n\n<p>In this tutorial we will go through steps 1\u20134 to enable building the app<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"41f4\">The input data&nbsp;set<\/h3>\n\n\n\n<p>The patients inside the data set were obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.&nbsp;<a href=\"http:\/\/rexa.info\/paper\/781d2581b297dad058cf6f1be2a009144b5306fb\" rel=\"noreferrer noopener\" target=\"_blank\">He assessed biopsies of breast tumours for 699 patients<\/a>&nbsp;up to 15 July 1992; each of nine attributes has been scored on a scale of 1 to 10, and the outcome is also known. There are 699 rows and 11 columns.<\/p>\n\n\n\n<p>The data set can be called by the variable&nbsp;<code>biopsy<\/code>&nbsp;inside the app. The columns 2-10 were stored inside the reactive&nbsp;<code>biopsy_numeric()<\/code>&nbsp;which is filtered by the&nbsp;<code>input$patients<\/code>&nbsp;input to not use all 699 patients, but between 1 and 100.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"8332\">1) Construction of a&nbsp;<code>SelectInput<\/code><\/h3>\n\n\n\n<p>The&nbsp;<code>SelectInput<\/code>&nbsp;shall allow the user to not use all 9 measured variables, but just the ones he desires. This shall help finding the measurement, that is necessary to classify patients. What is a shiny selectInput? We can therefore look at the description of the selectInput by<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n?shiny::selectInput\n<\/pre><\/div>\n\n\n<p>and see<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*jH0mGpgdpTmNSSXA.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<p>We now need to build the&nbsp;<code>choices<\/code>&nbsp;as the column names of the biopsy data set from 2-10. the&nbsp;<code>selected<\/code>&nbsp;input will be the same. We shall allow multiple inputs, so&nbsp;<code>multiple<\/code>&nbsp;will be set to&nbsp;<code>TRUE<\/code>. Additionally we shall name the&nbsp;<code>inputId<\/code>&nbsp;&#8220;vars&#8221;. So we can replace the part&nbsp;<code>output$variables<\/code>&nbsp;inside the&nbsp;<code>app.R<\/code>&nbsp;file with this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\noutput$variables &lt;- renderUI({\n  selectInput(inputId=&quot;vars&quot;,\n              label = &quot;Variables to use&quot;,\n              choices = names(biopsy)&#91;2:10],\n              multiple = TRUE,\n              selected = names(biopsy)&#91;2:10]\n   )\n})\n<\/pre><\/div>\n\n\n<p>And you\u2019re done.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2ee3\">2) A Heatmap to see the outcome of clustering<\/h3>\n\n\n\n<p>The basic&nbsp;<code><a href=\"https:\/\/stat.ethz.ch\/R-manual\/R-patched\/library\/stats\/html\/heatmap.html\" rel=\"noreferrer noopener\" target=\"_blank\">heatmap<\/a><\/code>&nbsp;function allows you to draw a heat map. In this case we would like to change a few things. We would like to change the clustering method inside the&nbsp;<code>hclust<\/code>&nbsp;function to a method defined by the user. We can grab the user defined method by using&nbsp;<code>input$method<\/code>&nbsp;as we already defined this input field as a drop down menu. We have to overwrite the default hclust method with our method by:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nmy_hclust &lt;- function(...){\n  hclust(method=my_method,...)\n}    \nmy_method &lt;&lt;- input$method\n<\/pre><\/div>\n\n\n<p>Be aware that you define a global variable&nbsp;<code>my_method<\/code>&nbsp;here, which suffices within the scope of this tutorial. However, please keep in mind that global variables can be problematic in many other contexts and do your own research what best fits your application.<\/p>\n\n\n\n<p>Now for the heatmap call we basically need to change a few inputs. Please see the result:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nheatmap(\n  x = t(as.matrix(biopsy_numeric())), Rowv=NA,\n  hclustfun=my_hclust,\n  labCol =biopsy$&quot;disease status&quot;,\n  col=viridis(15)\n)\n<\/pre><\/div>\n\n\n<p>We need to transform the biopsy_numeric matrix, as we would like to have the patients in columns. As there is just a one dimensional clustering, we can switch of row labels by setting&nbsp;<code>Rowv<\/code>&nbsp;to&nbsp;<code>NA<\/code>. The&nbsp;<code>hclustfun<\/code>&nbsp;is overwritten by our function&nbsp;<code>my_hclust<\/code>.<\/p>\n\n\n\n<p>For coloring of the plot we use the&nbsp;<code>viridis<\/code>&nbsp;palette as it is a color blind friendly palette. And the labels of our columns shall now not only the patient IDs but the disease status. You can see the names of all columns we defined in the file&nbsp;<code>R\/utils.R<\/code>. There you see that the last column of&nbsp;<code>biopsy<\/code>&nbsp;is called &#8220;disease status&#8221;. This will be used to label each patient. Now we got:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\noutput$plot1 &lt;- renderPlot({\n  my_hclust &lt;- function(...){\n      hclust(method=my_method,...)\n    }\n    my_method &lt;&lt;- input$method\n    heatmap(x = t(as.matrix(biopsy_numeric())), Rowv=NA, hclustfun=my_hclust,\n            labCol =biopsy$&quot;disease status&quot;,\n            col=viridis(15)\n    )\n})\n<\/pre><\/div>\n\n\n<p>Part 2 is done<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"24c3\">3) Plot a phylogenetic tree<\/h3>\n\n\n\n<p>To allow plotting a phylogenetic tree we provided you with a function called&nbsp;<code>phyltree<\/code>. You can read the whole code of the function inside&nbsp;<code>R\/utils.R<\/code>. This function takes as inputs<\/p>\n\n\n\n<ul><li>a numeric matrix &gt;&nbsp;<code>biopsy_numeric()<\/code>&nbsp;CHECK<\/li><li>The clustering method &gt;&nbsp;<code>input$method<\/code>&nbsp;CHECK<\/li><li>The number of clusters &gt;&nbsp;<code>input$nc<\/code>&nbsp;CHECK<\/li><li>A color function &gt;&nbsp;<code>viridis<\/code>&nbsp;CHECK<\/li><\/ul>\n\n\n\n<p>You can read why to use&nbsp;<code>()<\/code>&nbsp;behind&nbsp;<code>biopsy_numeric<\/code>&nbsp;<a href=\"https:\/\/shiny.rstudio.com\/tutorial\/written-tutorial\/lesson6\/\" rel=\"noreferrer noopener\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<p>The hard part are now the labels. The&nbsp;<code>biopsy_numeric<\/code>&nbsp;data set is filtered by the # of patients. Therefore we have to filter the labels, too. Therefore we use<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nlabels = biopsy %&gt;% dplyr::select(&quot;disease status&quot;) %&gt;%\n                filter(row_number() &lt;= input$patients) %&gt;%\n                mutate_all(as.character) %&gt;%\n                pull(&quot;disease status&quot;)\n<\/pre><\/div>\n\n\n<p>This is a workflow using functional programming with the R-package&nbsp;<code>dplyr<\/code>. The function&nbsp;<code>select<\/code>&nbsp;allows us to just select the &#8220;disease status&#8221;. The&nbsp;<code>filter<\/code>&nbsp;function filters the number of rows. The&nbsp;<code>mutate_all<\/code>&nbsp;function applies the&nbsp;<code>as.character<\/code>&nbsp;function to all columns and finally we export the labels as a vector by using&nbsp;<code>pull<\/code>.<\/p>\n\n\n\n<ul><li>labels for the tree nodes &gt;&nbsp;<code>biopsy %&gt;%&nbsp;...<\/code>&nbsp;CHECK<\/li><\/ul>\n\n\n\n<p>The final result looks like this<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\noutput$plot2 &lt;- renderPlot({\n    phyltree( x = biopsy_numeric(),\n              method = input$method,\n              nc = input$nc,\n              color_func = &quot;viridis&quot;,\n              labels = biopsy %&gt;%\n     dplyr::select(&quot;disease status&quot;) %&gt;%\n                 filter(row_number() &lt;= input$patients) %&gt;%\n                 mutate_all(as.character)%&gt;%\n                 pull(&quot;disease status&quot;)\n    )\n})\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"e285\">4) Create a table from clustering results<\/h3>\n\n\n\n<p>Now we would also like to see for each patient in which cluster she was assigned. Therefore we perform the clustering and tree cutting on our own:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nclust &lt;- hclust(dist(biopsy_numeric()), method = input$method)\ncluster_assigment = cutree(clust, k = input$nc) #cluster assignement\n<\/pre><\/div>\n\n\n<p>The cluster_assignment is now a vector with numbers for the clusters for each patient such as&nbsp;<code>c(1,2,1,1,1,2,2,1,...)<\/code>. This information can be helpful if we combine it with the patientID and the disease status that was named in the patients forms.<\/p>\n\n\n\n<p>The task will be performed using the&nbsp;<code>cbind<\/code>&nbsp;function of R:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nout_table &lt;- cbind(\n      cluster_assigment,\n      biopsy %&gt;%\n      filter(row_number() &lt;= length(cluster_assigment)) %&gt;%\n      select(c(1,11))    \n)# cbind\n<\/pre><\/div>\n\n\n<p>Now this table shall be sorted by the&nbsp;<code>cluster_assigment<\/code>&nbsp;to get a faster view on which patients landed in the wrong cluster.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nout_table %&gt;% arrange(cluster_assigment)\n<\/pre><\/div>\n\n\n<p>The final code:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\noutput$cluster_table &lt;- renderTable({\n  # --------- perform clustering ---------------- \n  # Clustering via Hierarchical Clustering\n  clust &lt;- hclust(dist(biopsy_numeric()), method = input$method)    \n  cluster_assigment &lt;- cutree(clust, k = input$nc) #cluster assignement        \n\n  # Create a table with the clusters, Patient IDs and Disease status    \n  out_table &lt;- cbind(\n      cluster_assigment,\n      biopsy %&gt;%\n        filter(row_number() &lt;= length(cluster_assigment)) %&gt;%\n        select(c(1,11))\n  )# cbind    # Order by cluster_assigment    out_table %&gt;% \n  arrange(cluster_assigment)  \n})\n<\/pre><\/div>\n\n\n<p>Done<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a485\">What to do&nbsp;now?<\/h3>\n\n\n\n<p>Now you can run the&nbsp;<code>runApp()<\/code>&nbsp;function.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*hFbSaqo3RErXjxUr.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<p>If you choose 100 patients, 2 clusters, \u201cward.D2\u201d clustering and all variables you will see pretty fast, that the patients:<\/p>\n\n\n\n<ul><li>1002945<\/li><li>1016277<\/li><li>1018099<\/li><li>1096800<\/li><\/ul>\n\n\n\n<p>could be identified as the patients that were clustered wrong. Now you can go search for problems in clustering or look at the sheets of those patients. By changing the labeling e.g. using PatientIDs inside the&nbsp;<code>phyltree<\/code>&nbsp;function call, you can even check which other patients show close measurements to these patients.&nbsp;<strong>Explore and play!<\/strong><\/p>\n\n\n\n<p>This article was also posted in the <a href=\"https:\/\/medium.com\/datadriveninvestor\/tutorial-an-app-in-r-shiny-visualizing-biopsy-data-in-a-pharmaceutical-company-f15f06395f3e?source=rss-dbc9f652035a------2\">Data Driven Investor<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to build a shiny app for the visualization of clustering results. The app helps to better identify patient data samples, e.g. during a clinical&nbsp;study. This tutorial is a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4348,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[386,387,384,385,381,388],"_links":{"self":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4344"}],"collection":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4344"}],"version-history":[{"count":5,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4344\/revisions"}],"predecessor-version":[{"id":4355,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4344\/revisions\/4355"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/media\/4348"}],"wp:attachment":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4344"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4344"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4344"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}