Running background tasks with Flask and RQ

I wrote several webapps but it took me a while to understand how to run a long task and get the result back (without blocking the server). Of course, you should use a task queue like Celery or RQ. It's easy to find examples how to send a task to a queue and... forget about it. But how do you get the result?

I found a great blog post from Miguel Grinberg: Using Celery With Flask. It explains how to use ajax to poll the server for status updates. And I finally got it! As Miguel's post already detailed Celery, I wanted to investigate RQ (Redis Queue), a simple library to queue job.

As a side note, Miguel's blog is really great. I learned Flask following the The Flask Mega-Tutorial. If you are starting with Flask, I highly recommend it, as well as the Flask book.

We'll make a simple app with a form to run some actions.

First version: send a post to the server and wait for the response

Let's start with some boilerplate code. This is gonna be a very simple example, but I'll organize it like I use to for a real application using Blueprints, an application factory and some extensions (Flask-Bootstrap, Flask-Script and Flask-WTF):

├── Dockerfile
├── LICENSE
├── README.rst
├── app
│   ├── __init__.py
│   ├── extensions.py
│   ├── factory.py
│   ├── main
│   │   ├── __init__.py
│   │   ├── forms.py
│   │   └── views.py
│   ├── settings.py
│   ├── static
│   │   └── css
│   │       └── main.css
│   ├── tasks.py
│   └── templates
│       ├── base.html
│       └── index.html
├── docker-compose.yml
├── environment.yml
├── manage.py
└── uwsgi.py

I define all the used extensions in app/extensions.py, my application factory in app/factory.py and my default settings in app/settings.py. Nothing strange in there. You can refer to the GitHub repository.

Here is our main app/main/views.py:

from flask import Blueprint, render_template, url_for, flash, redirect
from .. import tasks
from .forms import TaskForm

bp = Blueprint('main', __name__)


@bp.route('/', methods=['GET', 'POST'])
def index():
    form = TaskForm()
    if form.validate_on_submit():
        task = form.task.data
        try:
            result = tasks.run(task)
        except Exception as e:
            flash('Task failed: {}'.format(e), 'danger')
        else:
            flash(result, 'success')
        return redirect(url_for('main.index'))
    return render_template('index.html', form=form)

As said previously, we create a form. On submit, we run the task and send the response back.

The form is defined in app/main/forms.py:

from flask import current_app
from flask_wtf import Form
from wtforms import SelectField


class TaskForm(Form):
    task = SelectField('Task')

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.task.choices = [(task, task) for task in current_app.config['TASKS']]

In app/tasks.py, we have our run function to start a dummy task:

import random
import time
from flask import current_app


def run(task):
    if 'error' in task:
        time.sleep(0.5)
        1 / 0
    if task.startswith('Short'):
        seconds = 1
    else:
        seconds = random.randint(1, current_app.config['MAX_TIME_TO_WAIT'])
    time.sleep(seconds)
    return '{} performed in {} second(s)'.format(task, seconds)

In app/templates/base.html, we define a fixed to top navbar and a container to show flash messages and our main code. Note that we take advantage of Flask-Bootstrap.

{%- extends "bootstrap/base.html" %}
{% import "bootstrap/utils.html" as utils %}

{% block head %}
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  {{super()}}
{% endblock %}

{% block styles %}
  {{super()}}
  <link href="{{ url_for('static', filename='css/main.css') }}" rel="stylesheet">
{% endblock %}

{% block title %}My App{% endblock %}

{% block navbar %}
  <!-- Fixed navbar -->
  <div class="navbar navbar-default navbar-fixed-top" role="navigation">
    <div class="container">
      <div class="navbar-header">
        <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
          <span class="sr-only">Toggle navigation</span>
          <span class="icon-bar"></span>
          <span class="icon-bar"></span>
          <span class="icon-bar"></span>
        </button>
        <!--img class="navbar-brand" src="../../static/logo.png"-->
        <a class="navbar-brand" href="{{ url_for('main.index') }}">My App</a>
      </div>
    </div>
  </div>
{% endblock %}

{% block content %}
  <div class="container" id="mainContent">
    {{utils.flashed_messages(container=False, dismissible=True)}}
    {% block main %}{% endblock %}
  </div>
{% endblock %}

The html code for our view is in app/templates/index.html:

{%- extends "base.html" %}
{% import "bootstrap/wtf.html" as wtf %}

{% block main %}
      <div class="panel panel-default">
        <!-- Default panel contents -->
        <div class="panel-heading">Select task to run</div>
        <div class="panel-body">
          <div class="col-md-3">
            <form class="form" id="taskForm" method="POST">
              {{ form.hidden_tag() }}
              {{ wtf.form_field(form.task) }}
              <div class="form-group">
                <button type="submit" class="btn btn-default" id="submit">Run</button>
              </div>
            </form>
          </div>
        </div>
      </div>
{% endblock %}

Let's run this first example. We could just create a virtual environment using virtualenv or conda. As we'll soon need Redis, let's directly go for Docker:

$ git clone https://github.com/beenje/flask-rq-example.git
$ cd flask-rq-example
$ git checkout faa61009dbe3bafe49aae473f0fa19ab05a3ab90
$ docker-compose build
$ docker-compose up

Go to http://localhost:5000. You should see the following window:

/images/flask-rq-example.png

Choose a task and press run. See how The UI is stuck while waiting for the server? Not very nice... Let' improve that a little by using some JavaScript.

Second version: use Ajax to submit the form

Let's write some javascript. Here is the app/static/js/main.js:

$(document).ready(function() {

  // flash an alert
  // remove previous alerts by default
  // set clean to false to keep old alerts
  function flash_alert(message, category, clean) {
    if (typeof(clean) === "undefined") clean = true;
    if(clean) {
      remove_alerts();
    }
    var htmlString = '<div class="alert alert-' + category + ' alert-dismissible" role="alert">'
    htmlString += '<button type="button" class="close" data-dismiss="alert" aria-label="Close">'
    htmlString += '<span aria-hidden="true">&times;</span></button>' + message + '</div>'
    $(htmlString).prependTo("#mainContent").hide().slideDown();
  }

  function remove_alerts() {
    $(".alert").slideUp("normal", function() {
      $(this).remove();
    });
  }

  // submit form
  $("#submit").on('click', function() {
    flash_alert("Running " + $("#task").val() + "...", "info");
    $.ajax({
      url: $SCRIPT_ROOT + "/_run_task",
      data: $("#taskForm").serialize(),
      method: "POST",
      dataType: "json",
      success: function(data) {
        flash_alert(data.result, "success");
      },
      error: function(jqXHR, textStatus, errorThrown) {
        flash_alert(JSON.parse(jqXHR.responseText).message, "danger");
      }
    });
  });

});

To include this file in our html, we add the following block to app/templates/base.html:

{% block scripts %}
  {{super()}}
  <script type=text/javascript>
    $SCRIPT_ROOT = {{ request.script_root|tojson|safe }};
  </script>
  {% block app_scripts %}{% endblock %}
{% endblock %}

And here is a diff for our app/templates/index.html:

               {{ form.hidden_tag() }}
               {{ wtf.form_field(form.task) }}
               <div class="form-group">
-                <button type="submit" class="btn btn-default" id="submit">Run</button>
+                <button type="button" class="btn btn-default" id="submit">Run</button>
               </div>
             </form>
           </div>
         </div>
       </div>
 {% endblock %}
+
+{% block app_scripts %}
+  <script src="{{ url_for('static', filename='js/main.js') }}"></script>
+{% endblock %}

We change the button type from submit to button so that it doesn't send a POST when clicked. We send an Ajax query to $SCRIPT_ROOT/_run_task instead.

This is our new app/main/views.py:

from flask import Blueprint, render_template, request, jsonify
from .. import tasks
from .forms import TaskForm

bp = Blueprint('main', __name__)


@bp.route('/_run_task', methods=['POST'])
def run_task():
    task = request.form.get('task')
    try:
        result = tasks.run(task)
    except Exception as e:
        return jsonify({'message': 'Task failed: {}'.format(e)}), 500
    return jsonify({'result': result})


@bp.route('/')
def index():
    form = TaskForm()
    return render_template('index.html', form=form)

Let's run this new example:

$ git checkout c1ccfe8b3a39079ab80f813b5733b324c8b65c6f
$ docker rm flaskrqexample_web
$ docker-compose up

This time we immediately get some feedback when clicking on Run. There is no reload. That's better, but the server is still busy during the processing. If you try to open a new page, you won't get any answer until the task is done...

To avoid blocking the server, we'll use a task queue.

Third version: setup RQ

As its name indicates, RQ (Redis Queue) is backed by Redis. It is designed to have a low barrier entry. What do we need to integrate RQ in our Flask web app?

Let's first add some variables in app/settings.py:

# The Redis database to use
REDIS_URL = 'redis://redis:6379/0'
# The queues to listen on
QUEUES = ['default']

To execute a background job, we need a worker. RQ comes with the rq worker command to start a worker. To integrate it better with our Flask app, we are going to write a simple Flask-Script command. We add the following to our manage.py:

from rq import Connection, Worker

@manager.command
def runworker():
    redis_url = app.config['REDIS_URL']
    redis_connection = redis.from_url(redis_url)
    with Connection(redis_connection):
        worker = Worker(app.config['QUEUES'])
        worker.work()

The Manager runs the command inside a Flask test context, meaning we can access the app config from within the worker. This is nice because both our web application and workers (and thus the jobs run on the worker) have access to the same configuration variables. No separate config file. No discrepancy. Everything is in app/settings.py and can be overwritten by LOCAL_SETTINGS.

To put a job in a Queue, you just create a RQ Queue and enqueue it. One way to do that is to pass the connection when creating the Queue. This is a bit tedious. RQ has the notion of connection context. We take advantage of that and register a function to push the connection and pop it before and after a request (app/main/views.py):

import redis
from flask import Blueprint, render_template, request, jsonify, current_app, g
from rq import push_connection, pop_connection, Queue


def get_redis_connection():
    redis_connection = getattr(g, '_redis_connection', None)
    if redis_connection is None:
        redis_url = current_app.config['REDIS_URL']
        redis_connection = g._redis_connection = redis.from_url(redis_url)
    return redis_connection


@bp.before_request
def push_rq_connection():
    push_connection(get_redis_connection())


@bp.teardown_request
def pop_rq_connection(exception=None):
    pop_connection()

This makes it easy to create a Queue in a request or application context.

The get_redis_connection function gets the Redis connection and stores it in the flask.g object. This is the same as what is explained for SQLite here.

With that in place, it's easy to enqueue a job. Here are the changes to the run_task function:

 @bp.route('/_run_task', methods=['POST'])
 def run_task():
     task = request.form.get('task')
-    try:
-        result = tasks.run(task)
-    except Exception as e:
-        return jsonify({'message': 'Task failed: {}'.format(e)}), 500
-    return jsonify({'result': result})
+    q = Queue()
+    job = q.enqueue(tasks.run, task)
+    return jsonify({'job_id': job.get_id()})

We enqeue our task and just return the job id for now.

Docker and docker-compose are now gonna come in handy to start eveything (Redis, our web app and a worker). We just have to add the following to our docker-compose.yml file:

 - "5000:5000"
 volumes:
 - .:/app
+    depends_on:
+    - redis
+  worker:
+    image: flaskrqexample
+    container_name: flaskrqexample_worker
+    environment:
+      LOCAL_SETTINGS: /app/settings.cfg
+    command: python manage.py runworker
+    volumes:
+    - .:/app
+    depends_on:
+    - redis
+  redis:
+    image: redis:3.2

Don't forget to add redis and rq to your environment.yml file!

   - dominate==2.2.1
   - flask-bootstrap==3.3.6.0
   - flask-script==2.0.5
+  - redis==2.10.5
+  - rq==0.6.0
   - visitor==0.1.3

Rebuild the docker image and start the app:

$ git checkout 437e710df3df0dd4b153f20027f5f00270b2e1a3
$ docker rm flaskrqexample_web
$ docker-compose build
$ docker-compose up

OK, nice, we started a job in the background! This is fine to run a task and forget about it (like sending an e-mail). But how do we get the result back?

Fourth version: poll job status and get the result

This is the part I have been missing for some time. But, as often, it's not difficult when you have seen it. When launching the job, we return an url to check the status of the job. The trick is to periodically call back the same function until the job is finished or failed.

On the server side, the job_status endpoint uses the job_id to retrieve the job and to get its status and result.

@bp.route('/status/<job_id>')
def job_status(job_id):
    q = Queue()
    job = q.fetch_job(job_id)
    if job is None:
        response = {'status': 'unknown'}
    else:
        response = {
            'status': job.get_status(),
            'result': job.result,
        }
        if job.is_failed:
            response['message'] = job.exc_info.strip().split('\n')[-1]
    return jsonify(response)


@bp.route('/_run_task', methods=['POST'])
def run_task():
    task = request.form.get('task')
    q = Queue()
    job = q.enqueue(tasks.run, task)
    return jsonify({}), 202, {'Location': url_for('main.job_status', job_id=job.get_id())}

The run_task function returns an empty response with the 202 status code. We use the Location response-header field to pass the job_status URL to the client.

On the client side, we retrieve the URL from the header and call the new check_job_status function.

@@ -28,8 +53,11 @@ $(document).ready(function() {
       data: $("#taskForm").serialize(),
       method: "POST",
       dataType: "json",
-      success: function(data) {
-        flash_alert("Job " + data.job_id + " started...", "info", false);
+      success: function(data, status, request) {
+        $("#submit").attr("disabled", "disabled");
+        flash_alert("Running " + task + "...", "info");
+        var status_url = request.getResponseHeader('Location');
+        check_job_status(status_url);
       },
       error: function(jqXHR, textStatus, errorThrown) {
         flash_alert("Failed to start " + task, "danger");

We use setTimeout to call back the same function until the job is done (finished or failed).

function check_job_status(status_url) {
  $.getJSON(status_url, function(data) {
    console.log(data);
    switch (data.status) {
      case "unknown":
          flash_alert("Unknown job id", "danger");
          $("#submit").removeAttr("disabled");
          break;
      case "finished":
          flash_alert(data.result, "success");
          $("#submit").removeAttr("disabled");
          break;
      case "failed":
          flash_alert("Job failed: " + data.message, "danger");
          $("#submit").removeAttr("disabled");
          break;
      default:
        // queued/started/deferred
        setTimeout(function() {
          check_job_status(status_url);
        }, 500);
    }
  });
}

Let's checkout this commit and run our app again:

$ git checkout da8360aefb222afc17417a518ac25029566071d6
$ docker rm flaskrqexample_web
$ docker rm flaskrqexample_worker
$ docker-compose up

Try submitting some tasks. This time you can open another window and the server will answer even when a task is running :-) You can open a console in your browser to see the polling and the response from the job_status function. Note that we only have one worker, so if you start a second task, it will be enqueued and run only when the first one is done.

Conclusion

Using RQ with Flask isn't that difficult. So no need to block the server to get the result of a long task. There are a few more things to say, but this post starts to be a bit long, so I'll keep that for another time.

Thanks again to Miguel Grinberg and all his posts about Flask!

Installing OpenVPN on a Raspberry Pi with Ansible

I have to confess that I initially decided to install a VPN, not to secure my connection when using a free Wireless Acces Point in an airport or hotel, but to watch Netflix :-)

I had a VPS in France where I installed sniproxy to access Netflix. Not that I find the french catalogue so great, but as a French guy living in Sweden, it was a good way for my kids to watch some french programs. But Netflix started to block VPS providers...

I have a brother in France who has a Fiber Optic Internet access. That was a good opportunity to setup a private VPN and I bought him a Raspberry Pi.

There are many resources on the web about OpenVPN. A paper worth mentioning is: SOHO Remote Access VPN. Easy as Pie, Raspberry Pi... It's from end of 2013 and describes Esay-RSA 2.0 (that used to be installed with OpenVPN), but it's still an interesting read.

Anyway, most resources describe all the commands to run. I don't really like installing softwares by running a bunch of commands. Propably due to my professional experience, I like things to be reproducible. That's why I love to automate things. I wrote a lot of shell scripts over the years. About two years ago, I discovered Ansible and it quickly became my favorite tool to deploy software.

So let's write a small Ansible playbook to install OpenVPN on a Raspberry Pi.

First the firewall configuration. I like to use ufw which is quite easy to setup:

- name: install dependencies
  apt: name=ufw state=present update_cache=yes cache_valid_time=3600

- name: update ufw default forward policy
  lineinfile: dest=/etc/default/ufw regexp=^DEFAULT_FORWARD_POLICY line=DEFAULT_FORWARD_POLICY="ACCEPT"
  notify: reload ufw

- name: enable ufw ip forward
  lineinfile: dest=/etc/ufw/sysctl.conf regexp=^net/ipv4/ip_forward line=net/ipv4/ip_forward=1
  notify: reload ufw

- name: add NAT rules to ufw
  blockinfile:
    dest: /etc/ufw/before.rules
    insertbefore: BOF
    block: |
      # Nat table
      *nat
      :POSTROUTING ACCEPT [0:0]

      # Nat rules
      -F
      -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j SNAT --to-source {{ansible_eth0.ipv4.address}}

      # don't delete the 'COMMIT' line or these nat rules won't be processed
      COMMIT
  notify: reload ufw

- name: allow ssh
  ufw: rule=limit port=ssh proto=tcp

- name: allow openvpn
  ufw: rule=allow port={{openvpn_port}} proto={{openvpn_protocol}}

- name: enable ufw
  ufw: logging=on state=enabled

This enables IP forwarding, adds the required NAT rules and allows ssh and openvpn.

The rest of the playbook installs OpenVPN and generates all the keys automatically, except the Diffie-Hellman one that should be generated locally. This is just because it takes for ever on the Pi :-)

- name: install openvpn
  apt: name=openvpn state=present

- name: create /etc/openvpn
  file: path=/etc/openvpn state=directory mode=0755 owner=root group=root

- name: create /etc/openvpn/keys
  file: path=/etc/openvpn/keys state=directory mode=0700 owner=root group=root

- name: create clientside and serverside directories
  file: path="{{item}}" state=directory mode=0755
  with_items:
      - "{{clientside}}/keys"
      - "{{serverside}}"
  become: true
  become_user: "{{user}}"

- name: create openvpn base client.conf
  template: src=client.conf.j2 dest={{clientside}}/client.conf owner=root group=root mode=0644

- name: download EasyRSA
  get_url: url={{easyrsa_url}} dest=/home/{{user}}/openvpn
  become: true
  become_user: "{{user}}"

- name: create scripts
  template: src={{item}}.j2 dest=/home/{{user}}/openvpn/{{item}} owner=root group=root mode=0755
  with_items:
    - create_serverside
    - create_clientside
  tags: client

- name: run serverside script
  command: ./create_serverside
  args:
    chdir: /home/{{user}}/openvpn
    creates: "{{easyrsa_server}}/ta.key"
  become: true
  become_user: "{{user}}"

- name: run clientside script
  command: ./create_clientside {{item}}
  args:
    chdir: /home/{{user}}/openvpn
    creates: "{{clientside}}/files/{{item}}.ovpn"
  become: true
  become_user: "{{user}}"
  with_items: "{{openvpn_clients}}"
  tags: client

- name: install all server keys
  command: install -o root -g root -m 600 {{item.name}} /etc/openvpn/keys/
  args:
    chdir: "{{item.path}}"
    creates: /etc/openvpn/keys/{{item.name}}
  with_items:
    - { name: 'ca.crt', path: "{{easyrsa_server}}/pki" }
    - { name: '{{ansible_hostname}}.crt', path: "{{easyrsa_server}}/pki/issued" }
    - { name: '{{ansible_hostname}}.key', path: "{{easyrsa_server}}/pki/private" }
    - { name: 'ta.key', path: "{{easyrsa_server}}" }

- name: copy Diffie-Hellman key
  copy: src="{{openvpn_dh}}" dest=/etc/openvpn/keys/dh.pem owner=root group=root mode=0600

- name: create openvpn server.conf
  template: src=server.conf.j2 dest=/etc/openvpn/server.conf owner=root group=root mode=0644
  notify: restart openvpn

- name: start openvpn
  service: name=openvpn state=started

The create_clientside script generates all the required client keys and creates an ovpn file that includes them. It makes it very easy to install on any device: just one file to drop.

One thing I stumbled upon is the ns-cert-type server option that I initially used in the server configuration. This prevented the client to connect. As explained here, this option is a deprecated "Netscape" cert attribute. It's not enabled by default with Easy-RSA 3.

Fortunately, the mentioned howto and the Easy-RSA github page are good references for Easy-RSA 3.

One important thing to note is that I create all the keys with no password. That's obviously not the most secure and recommended way. Anyone accessing the CA could sign new requests. But it can be stored offline on an USB stick. I actually think that for my use case it's not even worth keeping the CA. Sure it means I can't easily add a new client or revoke a certificate. But with the playbook, it's super easy to throw all the keys and regenerate everything. That forces to replace all clients configuration but with 2 or 3 clients, this is not a problem.

For sure don't leave all the generated keys on the Pi! After copying the clients ovpn files, remove the /home/pi/openvpn directory (save it somewhere safe if you want to add new clients or revoke a certificate without regenerating everything).

The full playbook can be found on github. The README includes some quick instructions.

I now have a private VPN in France and one at home that I can use to securely access my NAS from anywhere!

uWSGI, send_file and Python 3.5

I have a Flask app that returns an in-memory bytes buffer (io.Bytesio) using Flask send_file function.

The app is deployed using uWSGI behind Nginx. This was working fine with Python 3.4.

When I updated Python to 3.5, I got the following exception when trying to download a file:

io.UnsupportedOperation: fileno

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask_login.py", line 758, in decorated_view
    return func(*args, **kwargs)
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask_security/decorators.py", line 194, in decorated_view
    return fn(*args, **kwargs)
  File "/webapps/bowser/bowser/app/bext/views.py", line 116, in download
    as_attachment=True)
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/flask/helpers.py", line 523, in send_file
    data = wrap_file(request.environ, file)
  File "/webapps/bowser/miniconda3/envs/bowser/lib/python3.5/site-packages/werkzeug/wsgi.py", line 726, in wrap_file
    return environ.get('wsgi.file_wrapper', FileWrapper)(file, buffer_size)
SystemError: <built-in function uwsgi_sendfile> returned a result with an error set

I quickly found the following post with the same exception, but no answer... A little more googling brought me to this github issue: In python3, uwsgi fails to respond a stream from BytesIO object

As described, you should run uwsgi with the --wsgi-disable-file-wrapper flag to avoid this problem. As with all command line options, you can add the following entry in your uwsgi.ini file:

wsgi-disable-file-wrapper = true

Note that uWSGI 2.0.12 is required.

When searching in uWSGI documentation, I only found one match in uWSGI 2.0.12 release notes.

A problem/option that should be better documented. Probably a pull request to open :-)

UPDATE (2016-07-13): pull request merged

GitLab CI and conda

I setup GitLab to host several projects at work and I have been quite pleased with it. I read that setting GitLab CI for test and deployment was easy so I decided to try it to automatically run the test suite and the sphinx documentation.

I found the official documentation to be quite good to setup a runner so I won't go into details here. I chose the Docker executor.

Here is my first .gitlab-ci.yml test:

image: python:3.4

before_script:
  - pip install -r requirements.txt

tests:
  stage: test
  script:
    - python -m unittest discover -v

Success, it works! Nice. But... 8 minutes 33 seconds build time for a test suite that runs in less than 1 second... that's a bit long.

Let's try using some caching to avoid having to download all the pip requirements every time. After googling, I found this post explaining that the cache path must be inside the build directory:

image: python:3.4

before_script:
  - export PIP_CACHE_DIR="pip-cache"
  - pip install -r requirements.txt

cache:
  paths:
    - pip-cache

tests:
  stage: test
  script:
    - python -m unittest discover -v

With the pip cache, the build time went down to about 6 minutes. A bit better, but far from acceptable.

Of course I knew the problem was not the download, but the installation of the pip requirements. I use pandas which explains why it takes a while to compile.

So how do you install pandas easily? With conda of course! There are even some nice docker images created by Continuum Analytics ready to be used.

So let's try again:

image: continuumio/miniconda3:latest

before_script:
  - conda env create -f environment.yml
  - source activate koopa

tests:
  stage: test
  script:
    - python -m unittest discover -v

Build time: 2 minutes 55 seconds. Nice but we need some cache to avoid downloading all the packages everytime. The first problem is that the cache path has to be in the build directory. Conda packages are saved in /opt/conda/pkgs by default. A solution is to replace that directory with a link to a local directory. It works but the problem is that Gitlab makes a compressed archive to save and restore the cache which takes quite some time in this case...

How to get a fast cache? Let's use a docker volume! I modified my /etc/gitlab-runner/config.toml to add two volumes:

[runners.docker]
  tls_verify = false
  image = "continuumio/miniconda3:latest"
  privileged = false
  disable_cache = false
  volumes = ["/cache", "/opt/cache/conda/pkgs:/opt/conda/pkgs:rw", "/opt/cache/pip:/opt/cache/pip:rw"]

One volume for conda packages and one for pip. My new .gitlab-ci.yml:

image: continuumio/miniconda3:latest

before_script:
  - export PIP_CACHE_DIR="/opt/cache/pip"
  - conda env create -f environment.yml
  - source activate koopa

tests:
  stage: test
  script:
    - python -m unittest discover -v

The build time is about 10 seconds!

Just a few days after my tests, GitLab announced GitLab Container Registry. I already thought about building my own docker image and this new feature would make it even easier than before. But I would have to remember to update my image if I change my requirements. Which I don't have to think about with the current solution.

Switching from git-bigfile to git-lfs

In 2012, I was looking for a way to store big files in git. git-annex was already around, but I found it a bit too complex for my use case. I discovered git-media from Scott Chacon and it looked like what I was looking for. It was in Ruby which made it not super easy to install on some machines at work. I thought it was a good exercise to port it to Python. That's how git-bigfile was born. It was simple and was doing the job.

Last year, I was thinking about giving it some love: port it to Python 3, add some unittests... That's about when I switched from Gogs to Gitlab and read that Gitlab was about to support git-lfs.

Being developed by GitHub and with Gitlab support, git-lfs was an obvious option to replace git-bigfile.

Here is how to switch a project using git-bigfile to git-lfs:

  1. Make a list of all files tracked by git-bigfile:

    $ git bigfile status | awk '/pushed/ {print $NF}' > /tmp/list
    
  2. Edit .gitattributes to replace the filter. Replace filter=bigfile -crlf with filter=lfs diff=lfs merge=lfs -text:

    $ cat .gitattributes
    *.tar.bz2 filter=lfs diff=lfs merge=lfs -text
    *.iso filter=lfs diff=lfs merge=lfs -text
    *.img filter=lfs diff=lfs merge=lfs -text
    
  3. Remove all big files from the staging area and add them back with git-lfs:

    $ git rm --cached $(cat /tmp/list)
    $ git add .
    $ git commit -m "Switch to git-lfs"
    
  4. Check that the files were added using git-lfs. You should see something like that:

    $ git show HEAD
    diff --git a/CentOS_6.4/images/install.img
    b/CentOS_6.4/images/install.img
    index 227ea55..a9cc6a8 100644
    --- a/CentOS_6.4/images/install.img
    +++ b/CentOS_6.4/images/install.img
    @@ -1 +1,3 @@
    -5d243948497ceb9f07b033da62498e52269f4b83
    +version https://git-lfs.github.com/spec/v1
    +oid
    sha256:6fcaac620b82e38e2092a6353ca766a3b01fba7f3fd6a0397c57e979aa293db0
    +size 133255168
    
  5. Remove git-bigfile cache directory:

    $ rm -rf .git/bigfile
    

Note: to push files larger than 2.1GB to your gitlab server, wait for this fix. Hopefully it will be in 8.4.3.

crontab and date

The other day, I wanted to add a script to the crontab and to redirect the output to a file including the current date. Easy. I have used the date command many times in bash script like that:

current_date=$(date +"%Y%m%dT%H%M")

So I added the following to my crontab:

0 1 * * * /usr/local/bin/foo > /tmp/foo.$(date +%Y%m%dT%H%M).log 2>&1

And... it didn't work...

I quickly identified that the script was working properly when run from the crontab (it's easy to get a script working from the prompt, not running from the crontab due to incorrect PATH). The problem was the redirection but I couldn't see why.

I googled a bit but didn't find anything...

I finally looked at the man pages:

$  man 5 crontab

     ...
     The  ``sixth''  field  (the  rest of the line) specifies the command to be run.  The entire command portion of the line, up to a
     newline or % character...

Here it was of course! % is a special character. It needs to be escaped:

0 1 * * * /usr/local/bin/foo > /tmp/foo.$(date +\%Y\%m\%dT\%H\%M).log 2>&1

Lesson to remember: check the man pages before to google!

Compile and install Kodi on iPad without jailbreak

With iOS 9 and Xcode 7 it's finally possible to compile and deploy apps on your iPhone/iPad with a free Apple developer account (no paid membership required).

I compiled XBMC/Kodi many times on my mac but had never signed an app with Xcode before and it took me some time to get it right. So here are my notes:

First thanks to memphiz for the iOS9 support!

I compiled from his ios9_workaround branch, but it has been merged to master since:

$ git clone https://github.com/xbmc/xbmc.git Kodi
$ cd Kodi
$ git remote add memphiz https://github.com/Memphiz/xbmc.git
$ git fetch memphiz
$ git checkout -b ios9_workaround memphiz/ios9_workaround

Follow the instructions from the README.ios file:

$ git submodule update --init addons/skin.re-touched
$ cd tools/depends
$ ./bootstrap
$ ./configure --host=arm-apple-darwin
$ make -j4
$ make -j4 -C target/binary-addons
$ cd ../..
$ make -j4 -C tools/depends/target/xbmc
$ make clean
$ make -j4 xcode_depends

Start Xcode and open the Kodi project. Open the Preferences, and add your Apple ID if not already done:

/images/add_account.png

Select the Kodi-iOS target:

/images/kodi_ios_target.png

Change the bundle identifier to something unique and click on Fix Issue to create a provisioning profile.

/images/bundle_identifier.png

Connect your device to your mac and select it:

/images/device.png

Click on Run to compile and install Kodi on your device!