{"id":4385,"date":"2018-12-09T16:56:36","date_gmt":"2018-12-09T14:56:36","guid":{"rendered":"https:\/\/engel-wolf.com\/?p=4385"},"modified":"2019-10-27T17:06:31","modified_gmt":"2019-10-27T15:06:31","slug":"interesting-packages-taken-from-r-pharma","status":"publish","type":"post","link":"https:\/\/engel-wolf.com\/?p=4385","title":{"rendered":"Interesting packages taken from R\/Pharma"},"content":{"rendered":"\n<p>A few month ago I joined the R\/Pharma conference in Cambridge, MA.<br><\/p>\n\n\n\n<p>As a take away I thought of my project and how I can improve, with solutions others provided. Mainly solutions in&nbsp;<em>R<\/em>&nbsp;are&nbsp;<em>R-packages.&nbsp;<\/em>So I\u2019m a R-Shiny programmer in a regulated environment, so the list of the solutions I took are mainly helping you, if you are providing a) Shiny Apps b) Statistical packages c) verified solutions. Let\u2019s go and see which R-packages I did not know and now find really useful:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"d1e7\">Packrat<\/h3>\n\n\n\n<p>What most of my co-workers are producing, are reports. Actually tons of statistical reports. As we work in a regulated environment, all reports are double-checked. Meaning you program it, and someone else programs it, too. You do not want to waste time, because there was an update of a mathematical package which leads to differences in a number. There is a really nice solution for that.<\/p>\n\n\n\n<p>Packrat allows you to store all packages you are using for a certain session\/project. The main guide for packrat can be found at the&nbsp;<a href=\"https:\/\/rstudio.github.io\/packrat\/commands.html\" rel=\"noreferrer noopener\" target=\"_blank\">RStudio blog describing<\/a>&nbsp;it.<\/p>\n\n\n\n<p>Packrat will not only store all packages, but also all project files. It\u2019s integrated in RStudio\u2019s user interface. It allows you to share projects along different co-workers really fast.<\/p>\n\n\n\n<p>The main lack I see is the need for a server, where you store all these packages. This should be solved with RStudio\u2019s new&nbsp;<a href=\"https:\/\/www.rstudio.com\/products\/package-manager\/\" rel=\"noreferrer noopener\" target=\"_blank\">Package Manager<\/a>. Another disadvantage is the incompatibility with some packages. I noticed that I could not use the BH package under R-3.4.2 with packrat and had to find a work-around for that.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"19c8\">Diffdf<\/h3>\n\n\n\n<p>I have to tell you that I wasted nearly 30% of some of my days comparing data.frames. It\u2019s an important task in testing statistical outcome or any calculations done with a statistical application you compiled. In Pharmaceutical and diagnostics applications one of the most relevant aspects is the validity of data. To ensure that the data we use in clinical studies, quality assurance or on a daily basis just getting data from a co-worker. For me this task has not only been hard, but even harder to document.<\/p>\n\n\n\n<p>The&nbsp;<a href=\"https:\/\/cran.r-project.org\/web\/packages\/diffdf\/index.html\" rel=\"noreferrer noopener\" target=\"_blank\">diffdf<\/a>&nbsp;package by&nbsp;<a href=\"https:\/\/github.com\/kieranjmartin\" rel=\"noreferrer noopener\" target=\"_blank\">Kieran Martin<\/a>&nbsp;really solved that task. It not only provides you with a neat interface, but also with well arranged outcomes.<\/p>\n\n\n\n<p>The basic diffdf example looks like this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nlibrary(diffdf)\niris2 &lt;- iris\nfor (i in 1:3) iris2&#91;i,i] &lt;- i^2\niris2$new_var &lt;- &quot;hello&quot;\nclass(iris2$Species) &lt;- &quot;some class&quot;\ndiffdf(iris, iris2)\n<\/pre><\/div>\n\n\n<p>You can see that basically one column is newly introduced, three values are changed in 3 different numeric columns and the type of a column is changed. All these three changes are displayed in a separate output. Additionally also things that did not change are mentioned, with can be really helpful, in case you do not check for full equality of the data frames you are comparing.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nDifferences found between the objects!\n\nA summary is given below.\n\nThere are columns in BASE and COMPARE with different classes !!\nAll rows are shown in table below\n\n  ==================================\n   VARIABLE  CLASS.BASE  CLASS.COMP \n  ----------------------------------\n   Species     factor    some class \n  ----------------------------------\n\nThere are columns in COMPARE that are not in BASE !!\nAll rows are shown in table below\n\n  =========\n   COLUMNS \n  ---------\n   new_var \n  ---------\n\nNot all Values Compared Equal\nAll rows are shown in table below\n\n  =================================\n     Variable    No of Differences \n  ---------------------------------\n   Sepal.Length          1         \n   Sepal.Width           1         \n   Petal.Length          1         \n  ---------------------------------\n\n\nAll rows are shown in table below\n\n  ============================================\n     VARIABLE    ..ROWNUMBER..  BASE  COMPARE \n  --------------------------------------------\n   Sepal.Length        1        5.1      1    \n  --------------------------------------------\n\n\nAll rows are shown in table below\n\n  ===========================================\n    VARIABLE    ..ROWNUMBER..  BASE  COMPARE \n  -------------------------------------------\n   Sepal.Width        2         3       4    \n  -------------------------------------------\n\n\nAll rows are shown in table below\n\n  ============================================\n     VARIABLE    ..ROWNUMBER..  BASE  COMPARE \n  --------------------------------------------\n   Petal.Length        3        1.3      9    \n  --------------------------------------------\n<\/pre><\/div>\n\n\n<p>The output is easily readable and covers all the information you need to do the expected: comparing two data frames. What I really like is the quick feedback on how many differences were observed. In case you have a lot of differences, expect you added +1 to every value of a column, you can immediately see this in the summary.<\/p>\n\n\n\n<p>Additionally the detailed information, given not only the value difference, but also the position of the value in the table, is a huge advantage. Sometimes analyzing large cohorts of patients, can reveal a difference in measurement 99,880 and you do not want to scroll through a table of \u201cmatches\u201d until you find this one difference. Therefore this detail view is a huge advantage against other packages.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3319\">Archivist<\/h3>\n\n\n\n<p>An R package designed to improve the management of results of data analysis. Key functionalities of this package include:<\/p>\n\n\n\n<p>(i) management of local and remote repositories which contain R objects and their meta-data (objects\u2019 properties and relations between them);<\/p>\n\n\n\n<p>(ii) archiving R objects to repositories;<\/p>\n\n\n\n<p>(iii) sharing and retrieving objects (and it\u2019s pedigree) by their unique hooks;<\/p>\n\n\n\n<p>(iv) searching for objects with specific properties or relations to other objects;<\/p>\n\n\n\n<p>(v) verification of object\u2019s identity and context of it\u2019s creation.<\/p>\n\n\n\n<p>This can be really important in reproducible data analytics. In pharmacological projects you often have to reproduce cases after really long time. The archivist package allows to store models, data sets and whole R objects, which can also be functions or expressions, in files. Now you can store the file in a long-term data storage and even after 10 years, using packrat + archivist you\u2019ll be able to reproduce your study.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"a030\">Example for task (ii)\u200a\u2014\u200arestore&nbsp;models<\/h4>\n\n\n\n<p>This example gives a list of models stored inside the package<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nlibrary(archivist)\nmodels &lt;- asearch(&quot;pbiecek\/graphGallery&quot;, patterns = &quot;class:lm&quot;)\nmodelsBIC &lt;- sapply(models, BIC)\nsort(modelsBIC)\n<\/pre><\/div>\n\n\n<h4 class=\"wp-block-heading\" id=\"e7f0\">Example for task (i)\u200a\u2014\u200astore objects&nbsp;locally<\/h4>\n\n\n\n<p>A data.frame comes with in my repository at&nbsp;<a href=\"https:\/\/github.com\/zappingseb\/RPharma2018packages#archivist\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/github.com\/zappingseb\/RPharma2018packages<\/a>&nbsp;in the&nbsp;<code>arepo<\/code>&nbsp;folder. Your task is to create a new data.frame, store it in the&nbsp;<code>arepo_new<\/code>&nbsp;folder and add it to the restored data.frame. If everything works out the sum of the data.frames shows up to be 2 for position (1,1).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nlibrary(archivist)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nrepo &lt;- &quot;arepo_new&quot;\ncreateLocalRepo(repoDir = repo, default = TRUE)\n\ndf &lt;- data.frame(x=c(1,2),y=c(2,3))\nsaveToRepo(df)\n\nsetLocalRepo(&quot;arepo&quot;)\ndf2 &lt;- loadFromLocalRepo(&quot;4a7369a8c51cb1e7efda0b46dad8195e&quot;,value = TRUE)\n\ndf_test &lt;- df + df2\n\nprint(df_test&#91;1,1]==2)\n<\/pre><\/div>\n\n\n<p>You can see in this task, that my old data.frame is not only stored as a data.frame, but also with a distinct and reproducible md5 hash. This makes it incredibly easy to find stuff in a few years again and showcase that it\u2019s the exact piece you needed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1ebc\">logR<\/h3>\n\n\n\n<p>The&nbsp;<a href=\"https:\/\/github.com\/jangorecki\/logR\" rel=\"noreferrer noopener\" target=\"_blank\">logR&nbsp;<\/a>package can be used to basically log steps of your analysis. In case you have a lot of steps in your analysis and need to know, how long these take, what was the status (error, warning) and what was the exact call, you can use logR to store everything that was done. logR therefore connects to a PostGres database and logs all steps of your analysis there. I can highly recommend to use logR in case you\u2019re not sure if your analysis will go through running it a second time. logR will check each one of your steps, therefore any failure is stored. If your next step runs just because of any environment variable, that was set, you can definitely see this. Here is the basic example for logR from the&nbsp;<a href=\"https:\/\/github.com\/jangorecki\" rel=\"noreferrer noopener\" target=\"_blank\">author<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">library(logR)<br><br># setup connection, default to env vars: `POSTGRES_DB`, etc.<br># if you have docker then: docker run --rm -p 127.0.0.1:5432:5432 -e POSTGRES_PASSWORD=postgres --name pg-logr postgres:9.5<br>logR_connect()<br># [1] TRUE<br><br># create logr table<br>logR_schema()<br><br># make some logging and calls<br><br>logR(1+2) # OK<br>#[1] 3<br>logR(log(-1)) # warning<br>#[1] NaN<br>f = function() stop(\"an error\")<br>logR(r &lt;- f()) # stop<br>#NULL<br>g = function(n) data.frame(a=sample(letters, n, TRUE))<br>logR(df &lt;- g(4)) # out rows<br>#  a<br>#1 u<br>#2 c<br>#3 w<br>#4 p<br><br># try CTRL+C \/ 'stop' button to interrupt<br>logR(Sys.sleep(15))<br><br># wrapper to: dbReadTable(conn = getOption(\"logR.conn\"), name = \"logr\")<br>logR_dump()<br>#   logr_id              logr_start          expr    status alert                logr_end      timing in_rows out_rows  mail message cond_call  cond_message<br>#1:       1 2016-02-08 16:35:00.148         1 + 2   success FALSE 2016-02-08 16:35:00.157 0.000049163      NA       NA FALSE      NA        NA            NA<br>#2:       2 2016-02-08 16:35:00.164       log(-1)   warning  TRUE 2016-02-08 16:35:00.171 0.000170801      NA       NA FALSE      NA   log(-1) NaNs produced<br>#3:       3 2016-02-08 16:35:00.180      r &lt;- f()     error  TRUE 2016-02-08 16:35:00.187 0.000136896      NA       NA FALSE      NA       f()      an error<br>#4:       4 2016-02-08 16:35:00.197    df &lt;- g(4)   success FALSE 2016-02-08 16:35:00.213 0.000696145      NA        4 FALSE      NA        NA            NA<br>#5:       5 2016-02-08 16:35:00.223 Sys.sleep(15) interrupt  TRUE 2016-02-08 16:35:05.434 5.202319000      NA       NA FALSE      NA        NA            NA<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a490\"><a href=\"https:\/\/github.com\/zappingseb\/RPharma2018packages#rinno\" rel=\"noreferrer noopener\" target=\"_blank\">RInno<\/a>\u200a\u2014\u200aShiny Apps as Windows Applications<\/h3>\n\n\n\n<p>We often build up shiny apps that need local PC settings to perform well. For example did we build a shiny app that accesses a MySQL database by the Active Directory login of the user. To get the Active Directory Credentials without a login window, we just ran the shiny app locally. As not all users in the department knew how to run R +&nbsp;<code>runApp()<\/code>&nbsp;, RInno sounds like a great solution for me.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*dwoDG0P8ESvNXyblczyWIA.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>RInno packs your shiny app into an&nbsp;<code>.exe<\/code>&nbsp;file that your users can run directly on their PC. This would also allow them to use fancy ggplot functionalities on locally stored Excel files. This can be really important in case data is under security protection and cannot be uploaded to a server. The&nbsp;<a href=\"https:\/\/github.com\/ficonsulting\/RInno\" rel=\"noreferrer noopener\" target=\"_blank\">tutorial&nbsp;<\/a>given by the developer can help you a lot to understand the issue and how to solve it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"b14a\">Yardstick<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/600\/0*Gx2bExzFYDD7W5aO\" alt=\"\"\/><figcaption>Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@sernarial?utm_source=medium&amp;utm_medium=referral\" rel=\"noreferrer noopener\" target=\"_blank\">patricia serna<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p>A package containing all measures you might need to evaluate the predictive power of a statistical model. I came across this package in one of the first sessions. Whenever we think of a proper way how to measure the difference between our model and the data, we discussed a lot of different ways. Of course it\u2019s simple to write&nbsp;<code>sqrt(sum((x-y)**2))<\/code>&nbsp;but it looks way better in yardstick&nbsp;<code>two_class_example %&gt;% rmse(x, y)<\/code>&nbsp;. In yardstick you know where your data is coming from and you can easily exchange the function&nbsp;<code>rmse<\/code>&nbsp;while in the example I\u2019ve shown you need to re-code the whole functionality. Yardstick will save a lot of discussions in our team in the future. Happy it came out.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few month ago I joined the R\/Pharma conference in Cambridge, MA. As a take away I thought of my project and how I can improve, with solutions others provided. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4386,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[405,408,406,387,384,407,404,381],"_links":{"self":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4385"}],"collection":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4385"}],"version-history":[{"count":1,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4385\/revisions"}],"predecessor-version":[{"id":4387,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4385\/revisions\/4387"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/media\/4386"}],"wp:attachment":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}