The OASIS website has not been updated since a while. So I decide to take a shot at making more up to date. This blog post is about the pipeline I have but in place to automatically update the website. It is the first end to end 'continuous deployment' project I have achieved.
Among the user visible changes:
- an invitation to circle the OASIS G+ page, which is now the official channel for small updates in OASIS.
- an invitation to fork the project on Github since it is now the official repository for OASIS.
- some link to documentation for the bleeding edge version of OASIS manual and API.
The OASIS website repository is also on Github. Feel free to fork it and send me pull request if you see any mistake.
The website is still using a lot of markdown processed by pandoc. But there are some new technical features:
- no more index.php, we use a templating system to point to latest version
- we use a Jenkins job to generate the website daily or after any update in OASIS itself.
Since I start using quite quite a lot Python, I have decided to use it for this project. It has a lot of nice libraries and it helps me to quickly do something about the website (and provides plenty of idea to create equivalent tools in OCaml).
The daily generation: Jenkins
I have a Jenkins instance running, so I decided to use it to compile once a day the new website with updated documentation and links. This Jenkins instance also monitor changes of the OASIS source code. So I can do something even more precise: regenerate the website after every OASIS changes.
I use the Jenkins instance to also generate a documentation tarball for OASIS manual and API. This helps a lot to be able to display the latest manual and API documentation. This way I can quickly browse the documentation and spot errors early.
Another good point about Jenkins, is that it allows to store SSH credential. So I created a build user, with its own SSH key, in the OCaml Forge and I use it to publish the website at the end of the build.
Right now Jenkins do the following:
- trigger a build of the OASIS website:
- every day (cron)
- when a push in OASIS website repository is detected
- when a successful build of OASIS is achieved.
- get documentation artifact from the latest successful build of OASIS
- build the website
- publish it
Data gathering
To build the website I need some data:
- documentation tarballs containing the API (HTML from OCamldoc) and manual (Markdow)
- list of OASIS version published
- links to each tarball (documentation and source tarball)
The OCaml Forge has a nice SOAP API. But one need to be logged in to access it. This is unfortunate, because I just want to access public data. The only way I found to gather my data was to scrape the OCaml Forge.
Python has a very nice scraping library for that: beautifulsoup.
I use beautifulsoup to parse the HTML downloaded from the Files tab of the OASIS project and extract all the relevant information. I use curl to download the documentation tarball (for released versions) and for the latest development version.
Code
Template
Python has also a very nice library to process template: mako.
Using the data I have gathered, I feed them to mako and I process all the .tmpl files in the repository to create matching files.
Among the thing that I have transformed into template:
- the index.php has been transformed into a index.mkd.tmpl, it was a hackish PHP script scraping the RSS of updates before, it is now a clean template.
- robots.txt.tmpl, see the following section for explanation
- documentation.mkd.tmpl in order to list all version of documentation.
Fix documentation and indexing
One of the problem of providing access to all versions of the documentation, is that people can end up reading an old version of the documentation. In order to prevent that, I use two different techniques:
- prevent search engine to index old version.
- warn the user that he is reading an old version.
To prevent search engine to index the file, I have created a robots.txt that list all URL of old documentation. This should be enough to prevent search engine to index the wrong page.
To warn the user that he is reading the wrong version, I have added a box "you are not viewing the latest version". This part was tricky but beautifulsoup v4 provide a nice API to edit HTML in place. I just have to find the right CSS selector to define the position where I want to insert my warning box.
Code
Publish
The ultimate goal of the project is the 'continuous deployment'. Rather than picking what version to deploy and do the process by hand, I let Jenkins deploy every version of it.
Deploying the website used to be a simple rsync. But for this project I decided to use a fancier method. I spend a few hours deciding what was the best framework to do the automatic deployment. There are two main frameworks around: capistrano (Ruby) and fabric (Python).
Fabric is written in Python, so I pick this one because it was a good fit for the project. Fabric biggest feature is to be a SSH wrapper.
The fabric script is quite simple and to understand it, you just have to know that local run a local command and run run a command on the target host.
The fabfile.py script do the following:
- create a local tarball using the OASIS website html/ directory
- upload the tarball to ssh.ocamlcore.org
- uncompress it and replace the htdocs/ directory of the oasis project
- remove the oldest tarballs, we keep a few versions to be able to perform a rollback.
Given this new process, the website is updated in 3 minutes automatically after a successful build of OASIS.