{"id":4322,"date":"2018-09-25T22:29:49","date_gmt":"2018-09-25T20:29:49","guid":{"rendered":"https:\/\/engel-wolf.com\/?p=4322"},"modified":"2019-10-26T23:11:51","modified_gmt":"2019-10-26T21:11:51","slug":"how-to-get-people-r-ready-in-an-hour%e2%80%8a-%e2%80%8athors-hammer","status":"publish","type":"post","link":"https:\/\/engel-wolf.com\/?p=4322","title":{"rendered":"How to get people R ready in an hour\u200a\u2014\u200aThor&#8217;s Hammer"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\" id=\"ab30\">Use Thor\u2019s hammer to get data scientists ready to work faster than you ever thought.<\/h3>\n\n\n\n<p>Welcome to your new office! Let\u2019s take a look into your computer with your supervisor next to you saying:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>Here is our working folder. Click through it and you\u2019ll find out what we are doing. There is a list of software tools you\u2019ll need to work with. Please install them. In case of any questions, ask Jamie.<\/p><\/blockquote>\n\n\n\n<p>Does this sound familiar to you? So what is so special about this if you are talking to people who use&nbsp;<em>R<\/em>. Actually nothing. But the text might start like this:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>Here is our working folder. Folder A contains some really useful scripts, we call them hammers. Folder B contains some packages we started, we call them scissors and you find some packages we started on our github, these are the saws. We also have a package server, better install stuff from there. You can choose the R-Version you want to work with, Jamie has the most recent one, ask him how he does it. It would be great if you could use RStudio and get some common packages in it, maybe Jamie has a list of his favorite packages. Your most important project is inside our github. So please look through the folder structure. In case of any question, come to me, ask Jamie or check the wiki.<\/p><\/blockquote>\n\n\n\n<p>Yeah cool. These guys have a wiki. At least I can look something up. OK, they have three different places for R-packages and nobody knows which environment to work with, but I can handle this. Let\u2019s go to Jamie and start.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>When I saw systems like that for the first time, it really drove me crazy. Today\u2019s Biostatisticians, Data Scientists or Software developers cost you more than 150 USD\/hour. So you basically waste minimum two days if your startup structure is bad. I mean the structure you heard about does not seem bad, but it is. Taking into account you\u2019ll need 30 minutes to get Jamie&#8217;s R-environment, 3 hours to install it, 5\u20136 hours to look through package folders, 5 hours or more to read the wiki and 2 hours to get access to github and then you still miss some system dependencies, this costs your minimum 2,325 USD. For this amount you can get a better computer or have free coffee for the whole year. So what can you do against it?<\/p>\n\n\n\n<ol><li>A pre-set up IDE installation\u200a\u2014\u200asaves 2 hours<\/li><li>A standard R environment\u200a\u2014\u200asaves 3 hours<\/li><li>A list of tools + a nice tutorial on installation\u200a\u2014\u200asaves 5\u20136 hours<\/li><li>A fixed and standardized folder structure OR standardized project names\u200a\u2014\u200asaves your life<\/li><li>A collection of vignettes\u200a\u2014\u200asaves at least 5 hours<\/li><\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7f0a\">IDE installation<\/h3>\n\n\n\n<p>The&nbsp;<strong>integrated development environment<\/strong>&nbsp;(<strong>IDE<\/strong>) is the place to work for the guys in your office. It contains the connection to the source-code control system, the code editor, the console to run code, basically everything. You do not want somebody to waste time on getting this set up. So please have an install script that<\/p>\n\n\n\n<ul><li>installs the IDE<\/li><li>installs the extensions needed for your source code control systems and all links to that<\/li><li>installs all system components to work with this IDE (you hopefully know from your older projects)<\/li><li>installs a list of bookmarks to your important folders<\/li><li>sets up everything in a pre-defined folder or at least the extension repository. You can use a folder like:<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\company_tools\\IDEs\\ourIDE<\/pre>\n\n\n\n<p>This IDE installation script will set up everybody in your department with the same IDE being installed at the same place. So you won\u2019t hear questions like \u201c<em>How do I access github?\u201d<\/em>&nbsp;and the new people won\u2019t have to call you because it says \u201c<em>It\u2019s not possible to access version control without the following system components: XXX, XXX, XXX\u201d.<\/em><\/p>\n\n\n\n<p>In case the IDE crashes on the first day of the co-worker, you know where to look for it. This allows you to check for missing plugins and missing links in the system&nbsp;<code>PATH<\/code>&nbsp;. You think this just saves you 5 minutes, but each of the requests I mentioned is taking 5 minutes. Three simple questions and 15 minutes and 75 USD (0.25 hours * 150 USD\/hour * 2 people) are gone.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a25b\">A standard R environment<\/h3>\n\n\n\n<p>People working with R know how hard it is to share code with a co-worker. They have different package versions installed, they have the R environment in a different folder on their HD, they may even have a different R-version. All these troubles will occur, if you do not use&nbsp;<a href=\"http:\/\/rstudio.github.io\/packrat\/\" rel=\"noreferrer noopener\" target=\"_blank\"><em>packrat&nbsp;<\/em><\/a>or&nbsp;<a href=\"https:\/\/www.rstudio.com\/products\/connect\/\" rel=\"noreferrer noopener\" target=\"_blank\"><em>RstudioConnect&nbsp;<\/em><\/a>or&nbsp;<a href=\"https:\/\/rcloud.social\/index.html\" rel=\"noreferrer noopener\" target=\"_blank\"><em>RCloud<\/em>&nbsp;<\/a>for each of your projects. And these troubles will definitely occur, even if you do so. But shall this happen at the first day a co-worker starts? Of course not.<\/p>\n\n\n\n<p>So please give them a pre-<strong>defined R-version<\/strong>&nbsp;with a bunch of packages being pre-installed. I recommend to have at least 50, maybe a 100 packages you use on a daily basis, installed. Decide for one R-version that every co-worker needs to have installed with&nbsp;<strong>~100 packages<\/strong>. Have an install script that installs it in a pre-defined folder. Something like<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\company_tools\\R\\R-Versions\\R-3.4.0-company<\/pre>\n\n\n\n<p>Please store the whole install script and the packages to be installed pre-compiled on a company repository. The install process shall not take forever and not everything needs to be recompiled. All people in your office shall have at least the same OS which allows you to store everything&nbsp;<strong>pre-compiled<\/strong>. This makes the whole R installation a copy&amp;paste process and by this really fast.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4f20\"><strong>A list of tools +&nbsp;tutorial<\/strong><\/h3>\n\n\n\n<p>In one or the other R developer team people might want you to have a bunch of tools installed. For example in my group we use:<\/p>\n\n\n\n<ul><li>Miktex<\/li><li>ImageMagick<\/li><li>Ghostscript<\/li><li>LibreOffice<\/li><li>Pandoc<\/li><li>Java&gt;1.8<\/li><li>git<\/li><li>qpdf<\/li><\/ul>\n\n\n\n<p>So if it is clear that sooner or later your co-worker will need these tools, please provide the guy with an installation script that&nbsp;<strong>downloads&nbsp;<\/strong>the most recent version, or a version you defined, for each tool and&nbsp;<strong>installs<\/strong>&nbsp;it.<\/p>\n\n\n\n<p>Sometimes people will need&nbsp;<strong>admin rights<\/strong>&nbsp;to install all of these. So instead of requesting them for each single installation, they request them once and install all the stuff you told them to have.<\/p>\n\n\n\n<p>If an install script is not suitable for you, write an entry in your wiki or a package vignette. This will contain all the steps needed to get the tools ready to work.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"0d91\">A fixed and standardized folder structure&nbsp;<em>OR standardized project&nbsp;names<\/em><\/h3>\n\n\n\n<p>In R there are a lot of different working styles when it comes to folders and projects. My two major observations or working styles were:<\/p>\n\n\n\n<ol><li>people working on github and having RStudio projects<\/li><li>people working on shared drives\/TFS\/SVN and having sub-folder structures<\/li><\/ol>\n\n\n\n<p>1 For the people who work with project names, it is important to find a project, even if it was started years ago. Additionally you do not want to cause conflicts with your co-workers, because two projects have the same name. Moreover a project shall not contain all of the code developed in your department. Maybe it needs just 50 lines of code. So please define the following:<\/p>\n\n\n\n<ul><li>What is the naming convention for your projects e.g. username_task_month_year<\/li><li>What is the general size of your project e.g. one R-package, one script file, one script file + one data folder<\/li><li>Who is allowed to work on one project<\/li><li>Where do you list all projects e.g. a wiki, a specific website, a ticket management system<\/li><\/ul>\n\n\n\n<p>and the&nbsp;<strong>most important part<\/strong>:&nbsp;<strong>WRITE IT DOWN&nbsp;<\/strong>and make it available to everybody<strong>.&nbsp;<\/strong>If you have decided on these 4 parts, write it down, make it a working policy and kick peoples asses if they do not follow the rules. Else you\u2019ll end up in chaos.<\/p>\n\n\n\n<p>It will really help the guy at the first day who knows he has to find one of Jamie\u2019s projects that deals with Clustering, as it might be called:<\/p>\n\n\n\n<p><em>jamie123_clustering_patient_data_january_2017<\/em><\/p>\n\n\n\n<p>2 For people who like folders and folder structures, I guess it is a bit easier. You may want to have the ability to find things after years, too. If you want to standardize the development of R-packages, research projects, test collections and report projects, those shall each look the same. A guy who developed one R-package in your company should be able to look into a second one and understand it in minutes. So please define the following:<\/p>\n\n\n\n<ul><li>What shall be the name of a certain project folder e.g. typeOfProject_name_month<\/li><li>Which sub-folders are needed for an R-package e.g. a change log folder, a releases folder, a test folder, a README.md file<\/li><li>Do you need separate folders for running projects vs packages? Shall they be stored at different places?<\/li><li>Is there any kind of folders that need to be existing for the storage of extensions, plugins, libraries?<\/li><\/ul>\n\n\n\n<p>and the&nbsp;<strong>most important part<\/strong>:&nbsp;<strong>WRITE IT DOWN&nbsp;<\/strong>and make it available to everybody<strong>.&nbsp;<\/strong>If you have decided on these 4 parts, write it down, make it a working policy and kick peoples asses if they do not follow the rules. Else you\u2019ll end up in chaos.<\/p>\n\n\n\n<p>I\u2019m sorry that I was repeating myself, but I really needed to make my point.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a6b4\">Your collection of vignettes<\/h3>\n\n\n\n<p>This is a bit&nbsp;<em>R<\/em>&nbsp;specific. But instead of writing&nbsp;<a href=\"https:\/\/www.atlassian.com\/software\/confluence\" rel=\"noreferrer noopener\" target=\"_blank\">Confluence&nbsp;<\/a>or&nbsp;<a href=\"https:\/\/www.mediawiki.org\/wiki\/MediaWiki\" rel=\"noreferrer noopener\" target=\"_blank\">Wiki&nbsp;<\/a>entries, I really like the idea of using vignettes +&nbsp;<a href=\"https:\/\/github.com\/r-lib\/pkgdown\" rel=\"noreferrer noopener\" target=\"_blank\"><em>pkgdown<\/em><\/a><em>.&nbsp;<\/em>For any development projects wikis are a nice tool to look things up. But inside wikis your code does not run immediately. In case you are writing vignettes to show your co-worker how certain things have to be done, you can check yourself. The code you write to document your installation scripts, your standard R-environment or even your folder structure has to work. Each single line of code can be executed inside the vignette.<\/p>\n\n\n\n<p>Additionally vignettes are well known for&nbsp;<em>R<\/em>&nbsp;developers and they know they can access them via&nbsp;<code>vignette()<\/code>&nbsp;. Moreover the&nbsp;<a href=\"https:\/\/github.com\/r-lib\/pkgdown\" rel=\"noreferrer noopener\" target=\"_blank\"><em>pkgdown<\/em><\/a>package allows you to put the whole information on a website. This makes it a wiki again.<\/p>\n\n\n\n<p>I also recommend to write such vignettes about your standard way on \u201cHow to build a package\u201d, \u201cHow to document a function call in R Code\u201d, \u201cHow to generate a&nbsp;<a href=\"https:\/\/yihui.name\/knitr\/\" rel=\"noreferrer noopener\" target=\"_blank\"><em>knitR&nbsp;<\/em><\/a>report with the right design\u201d&nbsp;\u2026. If you do all this in a nice and comprehensive way your co-worker won\u2019t talk to Jamie, he\u2019ll read your wiki. Instead of two people working, you\u2019ll just need one.<\/p>\n\n\n\n<p>Of course the new guy should still have a coffee with Jamie to know what\u2019s up&nbsp;\ud83d\ude09<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"47a9\">Something left for&nbsp;pro\u2019s<\/h3>\n\n\n\n<p>If you really want to make it easy for your co-workers, write an R-package that contains all this. It can contain the install scripts, the vignettes, functions for basic processes, a function that builds default folder structures, maybe even functions that start tickets for newbies at the global IT service desk. Put it all in a package and call it:&nbsp;<strong>thorshammer<\/strong><\/p>\n\n\n\n<p>This package will be the most important tool for the first hour of your co-worker!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"30cd\">Final words<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"337\" src=\"https:\/\/engel-wolf.com\/wp-content\/uploads\/0_XNY3rsOSBJR_u5nX.jpg\" alt=\"\" class=\"wp-image-4325\" srcset=\"https:\/\/engel-wolf.com\/wp-content\/uploads\/0_XNY3rsOSBJR_u5nX.jpg 600w, https:\/\/engel-wolf.com\/wp-content\/uploads\/0_XNY3rsOSBJR_u5nX-300x169.jpg 300w, https:\/\/engel-wolf.com\/wp-content\/uploads\/0_XNY3rsOSBJR_u5nX-500x281.jpg 500w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><figcaption> \u201chandheld tool lot\u201d by&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/unsplash.com\/@carlevarino?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\">Cesar Carlevarino Aragon<\/a>&nbsp;on&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p>I guess if you follow my instructions the whole setting up a co-worker process runs within an hour or maybe less. You could start the day like this.<\/p>\n\n\n\n<p><em>Hey, welcome. This is your PC, please get admin rights, after please download the thorshammer package. It will be the one tool you need today. First run the&nbsp;<\/em><code><em>welcome<\/em><\/code><em>&nbsp;bash script. During the script runs you can read inside the wiki how it sets up your folder structure and the IDE and how you can work with our version control. We have some basic tutorials how we do things in thorshammer, so please use it and see how we nail things down. After you nailed down a few planks, please have a coffee with Jamie to tell you what\u2019s up next.<\/em><\/p>\n\n\n\n<p><strong>Enjoy your coffee.<\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<blockquote class=\"wp-block-quote\"><p>This article was previously published in the<a href=\"https:\/\/medium.com\/datadriveninvestor\/how-to-get-people-r-ready-in-an-hour-thors-hammer-d8c853abaf0b\"> Data Driven Investor<\/a><\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Use Thor\u2019s hammer to get data scientists ready to work faster than you ever thought. Welcome to your new office! Let\u2019s take a look into your computer with your supervisor [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4324,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[381],"_links":{"self":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4322"}],"collection":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4322"}],"version-history":[{"count":3,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4322\/revisions"}],"predecessor-version":[{"id":4341,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4322\/revisions\/4341"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/media\/4324"}],"wp:attachment":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4322"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4322"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4322"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}