Shiny and the Ops long journey
last update: 2020/07/05
Introduction
I am a (Dev)Ops working on a HTC and bioinformatic platform. R (maybe with python) is the favorite programming language for our users.
Let's be clear about that: shiny is not designed for production. That being said, R was not designed for production too.
I remember some time ago, when I used R with environment modules
on our cluster, and had some issues with Rscript (...): the PATH to the R binary seemed to be hardcoded within the Rscript binary during the install... We still advise our users to use R CMD BATCH
instead of Rscript
, even if our R version are now managed with both environment modules
and singularity
...
However, R is doing great jobs for data processing, statistics, or for bioinformatics, and more recently with packages like rmarkdown
and knitr
. Due to performance issues, main time processing packages are coded with C++
, and then bring back to R, thanks to the Rcpp
package. Rmpi
or multithreaded R programs, thanks to the snow
or parallel
packages, are also a long way to go as a basic user, or even at a system administrator on a cluster.
Before shiny
, some people used cgi-bin
scripts or rserve
to call R scripts remotely from a web server.
So, shiny
emerged and the Web applications came into the hand of bioinformaticians and statisticians (who usually use R as their standard programing language). That was not a great move for webmasters, Ops, or even webdesigners or even frontend developpers (as the web interface is always the same).
When shiny has been launched by the rstudio team, it was the same problem as R: it is not designed for production. Like R, it is, by default, monothreaded. Nevertheless, some low level packages, like httpuv
(used by shiny) or promises
allowing asynchronous calls, with the help of future
or httr
(which provides an API to R), alleviate drastically this issue. It helped us to produce more production compliant web applications.
That is being said, it is not sufficiant to deploy a shiny application in a real production environment (even if a basic shiny application would nicely run if you have roughly only less than ~5 clients (!)).
Shiny needs stickiness and a web server that could manage both web services (which is quite new in the web world) + http standard requests.
These last years, I saw many try to remove the R/shiny locks.
The shiny basic web application
Rstudio team did a little howto on how to configure nginx or Apache for a R Shiny application.
This works for a demo application, but do not expect it to work as it is with a high load traffic.
We are using it locally, on our MBB platform, with apache on a shiny server, in order to provide a basic demo shiny server.
Apache basic configuration for shiny application under http://shiny.domain.tld/myapp URL
RewriteEngine On
############ MyApp ################
RewriteCond %{SERVER_NAME} =shiny.domain.tld
RewriteRule ^/myapp/ https://%{SERVER_NAME}%{REQUEST_URI} [END,QSA,R=permanent]
RewriteCond %{HTTP:Upgrade} =websocket
RewriteRule /myapp/(.*) ws://IP.IP.IP.IP:3838/$1 [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket
RewriteRule /myapp/(.*) http://IP.IP.IP.IP:3838/$1 [P,L]
ProxyPass /myapp/ http://IP.IP.IP.IP:3838/
ProxyPassReverse /myapp/ http://IP.IP.IP.IP:3838/
########## End MyApp ##############
RedirectMatch permanent ^/myapp$ /myapp/
ProxyRequests Off
You should also look carefully to timeout and keepalive configurations (value in seconds). Of course, that is also true for nginx.
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5
You can imagine many applications by just changing the shiny default port and proxying the same way with apache. Then, you can have nice URLs, instead of having one port by shiny application and all the firewall issues.
A configuration more complicated to serve more clients
Ok, we have a working shiny server. That is not great, but hey, it works !
Now, you have many choices :
- shinyproxy,
- shiny-server,
- the hard way.
We have tested all these solutions, staying with the free editions. If you don't want to put your hands inside complicated OpenSource stacks, you should stop right now and look into the enterprise plan editions of shiny-server or shinyproxy.
shiny-server works great for a single server application (physical or VM, that does not matter). If you need something that could scale great, with many time consuming R applications, then you should prefer shinyproxy. Note that https is not possible in shiny-server free edition, although it looks like a regular minimal nginx configuration (see bellow).
However, you can use those 2 solutions in free plans with quite nice results. Yet, with service level agreements (SLA) increasing, production constraints, like nowadays real web applications, that won't be enough.
A free shiny-server example
A basic shiny-server configuration
run_as shiny;
server {
listen 8001;
## https not working : shiny pro only...
#ssl /etc/shiny-server/server.key /etc/shiny-server/server.cert;
## neither nginx default conf.
#ssl_certificate /etc/shiny-server/server.cert;
#ssl_certificate_key /etc/shiny-server/server.key;
# Define a location at the base URL
location / {
# Host the directory of Shiny Apps stored in this directory
site_dir /opt/shiny;
# Log all Shiny output to files in this directory
log_dir /var/log/shiny-server;
# When a user visits the base URL rather than a particular application,
# an index of the applications available in this directory will be shown.
directory_index off;
}
}
A free shinyproxy
Shinyproxy is great to serve shiny application with LDAP authentication. We are using it here with the help of a docker swarm
cluster made up of 6 machines.
I won't enter too much into the details right now, but we are using a basic docker swarm
, with portainer
, a docker registry
and a NAS server with NFS + a local dedicated network + a web dedicated PHP page in order to access this service.
Shinyproxy is well documented. However, that is java, quite opaque, and it is not really fast, especially with containers. Moreover, even if it looks to be opensource, developpers are mainly concerned by solving the problem they want (eg: they seem to prefer diving into k8s
issues, rather than swarm
ones).
Multiple apache or nginx proxy with loadbalancing
Apache
Here comes the complicated stuff. Let's say you are on apache and want to serve at least 5 parallel R sessions. You will need to have 5 apache threads dedicated to shiny.
Configuration for apache here is quite longer
Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
<Proxy "balancer://myhttpcluster">
BalancerMember "http://IP.IP.IP.IP:9090" route=1
BalancerMember "http://IP.IP.IP.IP:9091" route=2
BalancerMember "http://IP.IP.IP.IP:9092" route=3
BalancerMember "http://IP.IP.IP.IP:9093" route=4
BalancerMember "http://IP.IP.IP.IP:9094" route=5
</Proxy>
<Proxy "balancer://mywscluster">
BalancerMember ws://IP.IP.IP.IP:9090 route=1
BalancerMember ws://IP.IP.IP.IP:9091 route=2
BalancerMember ws://IP.IP.IP.IP:9092 route=3
BalancerMember ws://IP.IP.IP.IP:9093 route=4
BalancerMember ws://IP.IP.IP.IP:9094 route=5
</Proxy>
ProxyPass "/myapp/" "balancer://myhttpcluster" stickysession=ROUTEID
ProxyPassReverse "/myapp/" "balancer://mywscluster" stickysession=ROUTEID
############### myapp with load balancing ################
RewriteCond %{SERVER_NAME} =www.domain.tld
RewriteRule ^/myapp/ https://%{SERVER_NAME}%{REQUEST_URI} [END,QSA,R=permanent]
RewriteCond %{HTTP:Upgrade} =websocket
RewriteRule /myapp/(.*) balancer://mywscluster/$1 [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket
RewriteRule /myapp/(.*) balancer://myhttpcluster/$1 [P,L]
As you can see, the multiple proxying threads for both web services and HTTP requests considerably increase this configuration. Now imagine doing this for 20 threads, and 20 shiny applications !!
Apache allows you to use lb
(aka loadbalancer) with a list of servers in a txt file or even a script. My colleague tried all those solutions. We had three dedicated machines configured for this purpose.
Apache and `lb` scripts
Apache configuration:
#RewriteMap lb "rnd:/var/www/serverlist.txt"
#RewriteMap lb "prg:/var/www/get_a_working_server.sh"
RewriteMap lb "prg:/var/www/get_a_working_shiny_server.sh"
serverlist.txt
servers www.domain.tld:8080|IP.IP.IP.IP:4123|IP.IP.IP.IP:4124|IP2.IP2.IP2.IP2:PORT
We extensively used netcat in the next scripts to check remote ports:
get_a_working_server.sh
:
#!/bin/bash
stdbuf -i0 -o0
while read request
do
list=($(nc -znv IP.IP.IP.IP 8080-8090 2>&1 | grep open | cut -f3 -d' ' | awk '{print "www.domain.tld:"$1}') )
list+=($(nc -znv IP1.IP1.IP1.IP1 4123-4140 2>&1 | grep open | cut -f3 -d' ' | awk '{print "www2.domain.tld:"$1}') )
list+=($(nc -znv IP2.IP2.IP2.IP2 4123-4140 2>&1 | grep open | cut -f3 -d' ' | awk '{print "www1.domain.tld:"$1}') )
RANGE=${#list[*]}
number=$RANDOM
let "number %= $RANGE"
server=${list[$number]}
echo $server
done < /dev/stdin
Finally, that is what we used during some times : get_a_working_shiny_server.sh
:
#!/bin/bash
while read request
do
list=()
#list=($(nc -zv 127.0.0.1 4123-4140 2>&1 | grep succeeded | cut -f3,4 -d' ' --output-delimiter ":") )
#list+=($(nc -zv www1.domain.tld 4123-4140 2>&1 | grep succeeded | cut -f3,4 -d' ' --output-delimiter ":") )
#list+=($(nc -zv www2.domain.tld 4123-4140 2>&1 | grep succeeded | cut -f3,4 -d' ' --output-delimiter ":") )
for host in www1.domain.tld www2.domain.tld
#for host in www2.domain.tld
#for host in www1.domain.tld
do
for app in `seq 1 10`
do
list+=($host:8080/myapp$app)
done
done
RANGE=${#list[*]}
number=$RANDOM
let "number %= $RANGE"
server=${list[$number]}
echo $server
done < /dev/stdin
This solution is nice, but you can have https or firewall issues due to multiple servers and ports.
However, that being said, you may be able to solve this with another layer of frontend/proxy, like HAproxy
(eg. see this interesting message here on stackoverflow).
Nginx
Here, we used one big host server. We will use ip_hash
to loadbalance sessions, as it will keep sticky sessions, based on the ip address. For a true nginx
loadbalancer, we would need Nginx Plus
.
nginx configuration with a single shiny app served many times
nginx.conf
:
user shiny;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 768;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 300;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
gzip on;
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream myapp {
ip_hash;
server IP.IP.IP.IP:8001;
server IP.IP.IP.IP:8002;
server IP.IP.IP.IP:8003;
server IP.IP.IP.IP:8004;
server IP.IP.IP.IP:8005;
server IP.IP.IP.IP:8006;
server IP.IP.IP.IP:8007;
server IP.IP.IP.IP:8008;
server IP.IP.IP.IP:8009;
server IP.IP.IP.IP:8010;
server IP.IP.IP.IP:8011;
server IP.IP.IP.IP:8012;
server IP.IP.IP.IP:8013;
server IP.IP.IP.IP:8014;
server IP.IP.IP.IP:8015;
server IP.IP.IP.IP:8016;
server IP.IP.IP.IP:8017;
server IP.IP.IP.IP:8018;
server IP.IP.IP.IP:8019;
server IP.IP.IP.IP:8020;
server IP.IP.IP.IP:8021;
server IP.IP.IP.IP:8022;
server IP.IP.IP.IP:8023;
server IP.IP.IP.IP:8024;
server IP.IP.IP.IP:8025;
server IP.IP.IP.IP:8026;
server IP.IP.IP.IP:8027;
server IP.IP.IP.IP:8028;
server IP.IP.IP.IP:8029;
server IP.IP.IP.IP:8030;
keepalive 300;
}
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Site configuration:
server {
listen 80 default_server;
listen [::]:80 default_server;
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
ssl_certificate /etc/letsencrypt/live/www.domain.tld /fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/www.domain.tld /privkey.pem; # managed by Certbot
root /opt/shinyy;
index index.html index.htm index.nginx-debian.html;
server_name www.domain.tld myapp.domain.tld;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_buffering off;
client_max_body_size 10m;
client_body_buffer_size 128k;
proxy_connect_timeout 300;
proxy_send_timeout 300;
proxy_read_timeout 300;
proxy_buffer_size 4k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
proxy_pass http://myapp;
proxy_redirect / $scheme://$host/;
}
Then, you will need to create all the 30 shiny servers (port 8000 to 8030) in /opt/shiny. You can have the same content served 30 times using symbolic links. Then, using a basic launch script like this:
#!/usr/bin/env bash
for i in {1..30}
do
if [ $i -lt 10 ] ;then
/sbin/runuser -s /bin/bash -l shiny -c ". /usr/local/shiny/.Renviron && LANG=en_US.UTF-8 /usr/bin/Rscript -e \"shiny::runApp('/opt/shiny/myapp_$i', port=800$i, host='IP.IP.IP.IP')\" 1>&2 >> /usr/local/shiny/myapp_${i}.log &"
else
/sbin/runuser -s /bin/bash -l shiny -c ". /usr/local/shiny/.Renviron && LANG=en_US.UTF-8 /usr/bin/Rscript -e \"shiny::runApp('/opt/shiny/myapp_$i', port=80$i, host='IP.IP.IP.IP')\" 1>&2 >> /usr/local/shiny/myapp_${i}.log &"
fi
done
That solution is quite nice, but it will be hit by a major issue: the necessary timeout to remove the lock on each of nginx proxy threads.
Also keep in mind that your web user, here, shiny needs also to have full permissions on your DocumentRoot
(root
in nginx), all the /opt/shiny
subdirectories, especially if you serve static contents in it (*.ccs
, *.js
, data, pictures...).
Singularity instance for simple shiny application
I also experienced shiny instance through singularity instance + nginx
proxying. It works fine for a single instance but I am not sure if it could scale fine.
Conversion from a dockerfile is quite easy. Main advantage is to be able to launch it from a standard user (non root / sudo).
Please note that I still don't know if it could scale extensively as we used it only for old applications, but I thought it was a nice solution for a single shiny application.
Traefik came into the game
It is been a while I am looking at Traefik. Traefik is a modern web load balancer especially designed for containers that could handle both web services and http requests. The other usual loadbalancers or proxying solutions, like the previous one we saw, apache
, nginx
, or even HAproxy
are not performing as smoothly as traefik in our case; that is to say with web services + http(s) requests + stickiness sessions + containers.
After reading someone that made a similar solution, I was definitively sure that was possible. I also commented this article as QuiPasseParLĂ
, explaining my point of view about this. I wanted to remove the shinyproxy, as it is not intended to deliver fast small web applications (without intensive workloads).
Traefik could be really complex to configure. So here is my solution, referenced in this thread.
The downside of this configuration is the lack of authentication feature (except with TraefikEE
), and the fact that it should run on one host. As I said before, for real complex applications with heavy load, we are using shinyproxy
.
Nevertheless, here are some articles on how to bypass those issues with traefik :
- Authentication:
- Traefik and let's encrypt + domain wildcard certicates:
Or, maybe, you can use another ID solution provider, within a dedicated container...