{"id":4411,"date":"2019-10-27T17:45:30","date_gmt":"2019-10-27T15:45:30","guid":{"rendered":"https:\/\/engel-wolf.com\/?p=4411"},"modified":"2019-10-27T17:45:32","modified_gmt":"2019-10-27T15:45:32","slug":"climbing-mt-whitney-with-web-browser-automation-and-r","status":"publish","type":"post","link":"https:\/\/engel-wolf.com\/?p=4411","title":{"rendered":"Climbing Mt. Whitney with web browser automation and R"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\" id=\"28a5\">Mount Whitney is the tallest mountain in the contiguous United States and you need a permit to climb it. These permits are limited. But from time to time, somebody will return his permit. It will show up on the permit website recreation.cov. I wanted to get one of those and will tell you&nbsp;how.<\/h3>\n\n\n\n<p>A friend of mine had a time window of two weeks to get a permit for Mt Whitney. I did not really know about this mountain until he came up with the trip. Mt. Whitney is located in California and 14,505 ft (4,421 m) above sea level. As a lot of people want to go there every year. The USDA Forest Service decided to limit the number of permits to hike the mountain. To get a permit, you simply go to this&nbsp;<a href=\"https:\/\/www.recreation.gov\/permits\/233260\" rel=\"noreferrer noopener\" target=\"_blank\">website&nbsp;<\/a>and check if your date with the # of hikers is available.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/600\/1*CUzmB6bO9i9SB8lqtc9D8Q.png\" alt=\"\"\/><figcaption>Mt. Whitney permit website (2019 Oct&nbsp;3rd)<\/figcaption><\/figure>\n\n\n\n<p>You will notice pretty fast, that if you are not an early bird, all permits for your desired date are gone. Now you have three choices. One, resign and don\u2019t hike the mountain. Two, check the website by yourself every day, to see if new or returned permits are available. Three, get a bot or browser automation to work that checks the permits for you. My friend decided to ask me for the third option. And as I have some experience with RSelenium (just recently presented at&nbsp;<a href=\"https:\/\/earlconf.com\/#speakers\" rel=\"noreferrer noopener\" target=\"_blank\">EARLconf<\/a>), I wanted to try this approach.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a076\">The knowledge you need to follow this&nbsp;tutorial<\/h3>\n\n\n\n<ul><li><a href=\"https:\/\/docs.docker.com\/engine\/reference\/commandline\/build\/\" rel=\"noreferrer noopener\" target=\"_blank\">How to start a docker container<\/a><\/li><li><a href=\"https:\/\/docs.ropensci.org\/RSelenium\/articles\/docker.html\" rel=\"noreferrer noopener\" target=\"_blank\">Some basics in RSelenium<\/a><\/li><li><a href=\"https:\/\/developers.google.com\/web\/tools\/chrome-devtools\" rel=\"noreferrer noopener\" target=\"_blank\">How to use Chrome or Firefox Developer Tools<\/a><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a50f\">Getting started with RSelenium and&nbsp;docker<\/h3>\n\n\n\n<p>First I wanted to run my&nbsp;<em>bot&nbsp;<\/em>on the cloud. Moreover, I wanted to get a reproducible environment. So I decided to follow the&nbsp;<a href=\"https:\/\/github.com\/ropensci\/RSelenium\/blob\/master\/vignettes\/docker.Rmd\" rel=\"noreferrer noopener\" target=\"_blank\">vignette approach&nbsp;<\/a>of RSelenium. This means using a docker container for Selenium. So my first task was to spin up two docker containers. The first one should run Selenium, the second one should run R and python to access it.<\/p>\n\n\n\n<p>Spinning up the Selenium container is simple:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">docker run -d -p 4445:4444 --name seleniumcontainer --net mynet selenium\/standalone-chrome<\/pre>\n\n\n\n<p>I used a shared network between the docker containers called&nbsp;<code>mynet<\/code>&nbsp;. This allows the two docker containers to find each other in the network, even by names.<\/p>\n\n\n\n<p>The second docker container must consist of three files.<\/p>\n\n\n\n<ol><li><code>run_tests.R<\/code>&nbsp;to execute my RSelenium calls<\/li><li><code>sendmail.py<\/code>&nbsp;to send emails from python<\/li><li><code>DockerFile<\/code>&nbsp;to build a docker container<\/li><\/ol>\n\n\n\n<p>The Dockerfile needs to look like this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\n# alpine-python-ecmwfapi\nFROM rocker\/tidyverse:3.6.0\nMAINTAINER zappingseb &quot;sebastian@engel-wolf.com&quot;\n\nRUN R -e &quot;install.packages(c(&#039;RSelenium&#039;), repos=&#039;https:\/\/cran.rstudio.com\/&#039;) &quot;\n\nRUN apt-get update -qq \\\n  &amp;&amp; apt-get install -y \\\n  python-pip \\\n  vim\n\nRUN pip install pytest-shutil\nRUN pip install --upgrade numpy secure-smtplib email\n\nCOPY run_tests.R \/tmp\/run_tests.R\nCOPY sendmail.py \/tmp\/sendmail.py\n\nRUN apt-get update &amp;&amp; apt-get -y install cron\nRUN echo &quot;0 *\/12 * * * root Rscript \/tmp\/run_tests.R&quot; &gt;&gt; \/etc\/crontab\nRUN service cron start\n<\/pre><\/div>\n\n\n<p>I used&nbsp;<code>tidyverse<\/code>&nbsp;docker container and installed the RSelenium package. Additionally, I installed python&nbsp;<code>secure-smtplib<\/code>&nbsp;and&nbsp;<code>email<\/code>&nbsp;. I also already added a cronjob to my docker container. This cronjob will run the web crawler every twelve hours by:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">RUN apt-get update &amp;&amp; apt-get -y install cron<br>RUN echo \"0 *\/12 * * * root Rscript \/tmp\/run_tests.R\" &gt;&gt;   <br>  \/etc\/crontab<br>RUN service cron start<\/pre>\n\n\n\n<p>Now I would like to spin up the docker container. But my&nbsp;<code>sendmail.py<\/code>&nbsp;and&nbsp;<code>run_tests.R<\/code>&nbsp;files were missing. Let\u2019s create them<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"ee51\">Using RSelenium to crawl&nbsp;permits<\/h3>\n\n\n\n<p>To use RSelenium you first need to connect to the Selenium server. It runs in the other docker container. To connect to it run:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">remDr &lt;- remoteDriver(remoteServerAddr = \"seleniumcontainer\", browserName = \"chrome\")<\/pre>\n\n\n\n<p>The name&nbsp;<code>seleniumcontainer<\/code>&nbsp;will be automatically identified as long as the container runs inside&nbsp;<code>mynet.<\/code>&nbsp;Two steps will lead to the Mt. Whitney permit website. Opening a browser and navigating to the website:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">remDr$open()<br>remDr$navigate(\"https:\/\/www.recreation.gov\/permits\/233260\/\")<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"c44a\">Work with the permit&nbsp;form<\/h4>\n\n\n\n<p>The much harder part is to find the elements to click on. So first I noticed, that I need to click on the option \u201cAll Routes\u201d, which was the third one from the dropdown menu:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*2a0OGb9qickeQ_aXuyI7zg.png\" alt=\"\"\/><figcaption>Mt Whitney dropdown menu HTML&nbsp;code<\/figcaption><\/figure>\n\n\n\n<p>This option can be accessed by its&nbsp;<code>id<\/code>&nbsp;. This id is&nbsp;<code>division-selection<\/code>&nbsp;. By clicking on the element with the&nbsp;<code>id<\/code>&nbsp;, the dropdown will open. After the dropdown is open, you need to click on the 3rd&nbsp;<code>option<\/code>&nbsp;element available on the website. With these 4 lines of code you can realize it using RSelenium:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">el_1 &lt;- remDr$findElements(\"id\", \"division-selection\")<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">el_1[[1]]$clickElement()<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">el_2 &lt;- remDr$findElements(\"css selector\", \"option\")<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">el_2[[3]]$clickElement()<\/pre>\n\n\n\n<p>As you can see&nbsp;<code>findElements<\/code>&nbsp;returns a list of&nbsp;<em>webElements&nbsp;<\/em>with the desired attributes.&nbsp;<code>clickElement<\/code>&nbsp;is a method of such a&nbsp;<em>webElement&nbsp;<\/em>and will basically click the element.<\/p>\n\n\n\n<p>This was the easiest part of browser automation steps. The much harder part is entering the number of hikers. The safest way to change them is not only to type into the text field but also to use javascript to change its value. The field&nbsp;<code>number-input-<\/code>&nbsp;will be used for this.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*JoLQD_-t414hjRLC3IYj5g.png\" alt=\"\"\/><figcaption>Mt Whitney numeric&nbsp;input<\/figcaption><\/figure>\n\n\n\n<p>To change the value I used the following code:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">el_3 &lt;- remDr$findElements(\"id\", \"number-input-\")<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"># executing a javascript piece to update the field value<br>remDr$executeScript(\"arguments[0].setAttribute('value','1');\"), list(el_3[[1]]))<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"># clearing the element and entering 1 participant<br>el_3[[1]]$clearElement()<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">el_3[[1]]$sendKeysToElement(list(\"1\"))<\/pre>\n\n\n\n<p>You can clearly see that I wanted one single permit for the mountain. The javascript piece ran on the webElement itself, which was stored in&nbsp;<code>el_3[[1]]<\/code>&nbsp;. For RSelenium I prefer finding elements with the&nbsp;<code>remDr$findElements<\/code>&nbsp;method. Afterward, I take the first piece if I am sure that there is just a single element. The methods&nbsp;<code>clearElement<\/code>&nbsp;and&nbsp;<code>sendKeysToElement<\/code>&nbsp;remove old values and enter the value needed. The API of&nbsp;<code>sendKeysToElement<\/code>&nbsp;is a bit weird, as it requires a list of keys, instead of a string. But once used, it is easy to keep your code.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"bb0f\">Interact with the permit&nbsp;calendar<\/h4>\n\n\n\n<p>After these steps, the calendar with permits gets activated. I wanted to get a permit in October 2019. So I needed to click on \u201cNEXT\u201d until October shows up.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*FQoSyX94WlxiGBZXuzrAiw.png\" alt=\"\"\/><figcaption>Mt Whitney next&nbsp;button<\/figcaption><\/figure>\n\n\n\n<p>I build a loop to perform this task using the\u00a0<code>while<\/code>\u00a0command<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\n# Get the initial month shown\nmonth_elem &lt;- remDr$findElements(&quot;css selector&quot;, &quot;.CalendarMonth_caption strong&quot;)\nmonth &lt;- month_elem&#91;&#91;1]]$getElementText()\n# Loop to until the October calendar is shown\nwhile(!grepl(&quot;October&quot;, month)) {\n \n  el_4 &lt;- remDr$findElements(&quot;css selector&quot;, &quot;.sarsa-day-picker-\n    range-controller-month-navigation-button.right&quot;)\n  el_4&#91;&#91;1]]$clickElement()\n  Sys.sleep(1)\n  month_elem &lt;- remDr$findElements(&quot;css selector&quot;, \n    &quot;.CalendarMonth_caption&quot;)\n  month &lt;- month_elem&#91;&#91;2]]$getElementText()\n}\n<\/pre><\/div>\n\n\n<p>The element containing the month was had the tag&nbsp;<code>class=\"CalendarMonth_caption\"&gt;&lt;strong&gt;...&lt;\/<\/code>&nbsp;. I accessed this with a CSS selector. Upon clicking the next button, which had a specific CSS class, a new calendar name shows up. It took me a while to find out that the old calendar month is not gone. Now the second element has to be checked for the name. So I overwrite the&nbsp;<code>month<\/code>&nbsp;variable with the newly shown up heading of the calendar.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"83cb\">Derive the first available date from the&nbsp;calendar<\/h4>\n\n\n\n<p>Finding an available day as a human is simple. Just look at the calendar and search for blue boxes with an A inside:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*WnPWGRtgU1uKGwJz1quywQ.png\" alt=\"\"\/><figcaption>Calendar with available day at Mt&nbsp;Whitney<\/figcaption><\/figure>\n\n\n\n<p>For a computer, it is not that easy. In my case, I just had one question to answer. What is the first date in October 2019 to climb Mt. Whitney?<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nday_elem &lt;- remDr$findElements(&quot;css selector&quot;, &quot;.rec-available-day&quot;)\nif (length(day_elem) &lt; 1) {\n  earliest_day &lt;- &quot;NODAY&quot;\n} else {\n  earliest_day &lt;- strsplit(\n    day_elem&#91;&#91;1]]$getElementText()&#91;&#91;1]],\n    split = &quot;\\n&quot;)&#91;&#91;1]]&#91;1]\n}\n<\/pre><\/div>\n\n\n<p> Thus I searched for any entry with the class\u00a0<code>rec-available-day<\/code>\u00a0. In case any entry was there, I got the text of the first one and took all characters before a line-break. This extracted the number of the date. Now, wrap this up and send an email with the date: <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: r; title: ; notranslate\" title=\"\">\nfileConn&lt;-file(&quot;\/tmp\/output.txt&quot;)\nwriteLines(paste0(&quot;The earliest day for Mnt Whitney in &quot;, month&#91;&#91;1]], &quot; is: &quot;, earliest_day, &quot;th of October 2019.\\n\\n-------------------\\n&quot;))\nclose(fileConn)\n# Write an email from output.txt with python\nsystem(&quot;python \/tmp\/sendmail.py&quot;)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"567b\">Activating the docker container<\/h3>\n\n\n\n<p>Once the script was finished I wrapped up all files in my&nbsp;<a href=\"https:\/\/github.com\/zappingseb\/mtwhitney\" rel=\"noreferrer noopener\" target=\"_blank\">GitHub repository (zappingseb\/mtwhitney)<\/a>. They will help me by sending an email every 12 hours. From there I went back to my docker server and git cloned the repository. I could then spin up my docker container by running:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ndocker build --net mynet --name mtwhitney -f Dockerfile .\n<\/pre><\/div>\n\n\n<p>and test the script one time by connecting to the docker container:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ndocker run -it mtwhitney \/bin\/bash\n<\/pre><\/div>\n\n\n<p>and running the script with:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsudo Rscript \/tmp\/run_tests.R\n<\/pre><\/div>\n\n\n<p>I got an email. After receiving the email I was sure the script will run and disconnected using&nbsp;<code>Ctrl + p<\/code>&nbsp;and&nbsp;<code>Ctrl + q<\/code>&nbsp;.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"952f\">Learnings<\/h3>\n\n\n\n<p>Scripting this piece really got me an email with a free slot and a permit to climb Mt Whitney:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*EQw4FDV3nfqjYupvJaSArA.png\" alt=\"\"\/><figcaption>email from Browser Automation<\/figcaption><\/figure>\n\n\n\n<p>Browser automation can be helpful for use cases like this one. It helped me to get a permit. I did not overengineer it by scraping the website every few seconds or checking for specific dates. It was more of a fun project. You can think of a lot of ways to make it better. For example, it could send an email, if a date is available. But I wanted to get the log files every 12 hours, to see if something went wrong.<\/p>\n\n\n\n<p>During the scraping, the website got updated once. So I received an error message from my scraper. I changed the script to the one presented in this blog post. The script may not work if you want to scrape Mt.Whitney pages tomorrow. Recreation.gov might have changed the website already.<\/p>\n\n\n\n<p>I use browser tests to make my R-shiny apps safer. I work in a regulated environment and there these tests safe me a lot of time. This time I would have spent click-testing without such great tools as&nbsp;<a href=\"https:\/\/github.com\/ropensci\/RSelenium\" rel=\"noreferrer noopener\" target=\"_blank\">RSelenium&nbsp;<\/a>or&nbsp;<a href=\"https:\/\/blog.rstudio.com\/2018\/10\/18\/shinytest-automated-testing-for-shiny-apps\/\" rel=\"noreferrer noopener\" target=\"_blank\">shinytest<\/a>. Try it out and enjoy your browser doing the job for you.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Mount Whitney is the tallest mountain in the contiguous United States and you need a permit to climb it. These permits are limited. But from time to time, somebody will [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4412,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[384,427,428,426,423,381],"_links":{"self":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4411"}],"collection":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4411"}],"version-history":[{"count":1,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4411\/revisions"}],"predecessor-version":[{"id":4413,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/posts\/4411\/revisions\/4413"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=\/wp\/v2\/media\/4412"}],"wp:attachment":[{"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4411"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4411"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/engel-wolf.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4411"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}