How to Create Your Data Science Blog with Pelican and Jupyter Notebooks

Writing articles and tutorials are a great way to learn new things in depth while building a portfolio. In this tutorial, you will find the first steps that you will need to start your data science blog with Pelican and Jupyter Notebooks.

Data Science lives and breathes from communicating your findings, presenting reports and discussing results and new insights. A blog is a great way to engage in this discussion. David Robinson wrote a great article explaining much better how a blog could help you.

Installation and Quickstart

Pelican runs best with Python 2.7, but you can also use Python 3.3+. To install pelican you need to run:

pip install pelican

For more details on the installation, have a look at the installation guide. After the installation, you are ready to create a Pelican project. First, you need to create the folder for your project and then you can run inside the folder the command:

pelican-quickstart

This should prompt you to add general information for your website.

Installing Themes

There are many existing themes out there that you can use right away with Pelican. You can find on Pelican Themes a whole list of great themes to choose from. To install a theme simply create a folder called themes inside the project folder and inside there you can clone a theme from Github. Here is how you would do that for the attila theme:

mkdir themes
git clone https://github.com/arulrajnet/attila themes/attila

Another way would be to use a git submodule if you don’t want to modify the theme:

mkdir themes
git submodule add https://github.com/arulrajnet/attila themes/attila

Now, you just need to add the theme to the configuration file pelicanconf.py. There you need to specify the folder of the theme by adding the line:

THEME = 'themes/attila'

If you want to create a theme yourself, have a look at this guide.

Writing an Article

Before we get to Jupyter, let’s have a look how writing an article would look like. For this, we can use Markdown files with additional metadata. Here is how such an article would look like:

Title: This is the Best Tool so far
Date: 2019-06-21 12:00
Category: tools
Tags: tool, software, best
Slug: best-tool
Authors: Jon Doe
Summary: This is the most important tool out there

# Header

You have to learn about this new tool
...

Here is a quick rundown of the most common fields:

  • Title: title of the post
  • Slug: path at which the post will be accessed on the server
  • Date: publication date
  • Category: category for the post
  • Tags: space-separated list of tags to use for the post
  • Author: Author of the post
  • Summary: summary of your post

This file needs to be saved in the content folder. For more information on how to write content, have a look at the documentation. To generate the HTML from your Markdown files you can run:

pelican content -o output

This will generate the whole website in the output folder. To view the website, you can launch Pelican’s web server with:

pelican --listen

Now you can open the website on localhost:8000. If you want to use a different port, you can change 8000 with any other port you want.

Installing Pelican Plugins

Pelican can be extended with various features for almost any kind of need. You can find most plugins in the Pelican Plugins repository. To add the plugins to your project, you can add it as a git submodule in the following way:

git submodule add https://github.com/getpelican/pelican-plugins pelican-plugins

Many of the plugins are in the form of submodules in this repository, which means that you will find many empty folders. To load those as well, you can load them recursively with:

git submodule update --init --recursive

Update all submodules to latest commit on origin:

git submodule update --remote --merge --recursive

Now you can include plugins by adding them to your pelicanconf.py. Here is how you would add the sitemap plugin:

# Path to the folder containing the plugins
PLUGIN_PATHS = ['pelican-plugins']
# Enabled plugins
PLUGINS = ['sitemap']

Additionally, for the sitemap plugin, you would need to add some configuration:

SITEMAP = {
    'format': 'xml',
    'priorities': {
        'articles': 1,
        'indexes': 0.5,
        'pages': 0.5,
    },
    'changefreqs': {
        'articles': 'always',
        'indexes': 'hourly',
        'pages': 'monthly'
    }
}

This would generate a sitemap on the for the URL BASEURL/sitemap.xml, where your BASEURL would be localhost:8000 when testing. For more information about Pelican plugins, have a look at the documentation.

Using Jupyter Notebooks to Write Articles

You’ve seen now how to use Pelican, but you are still wondering how you would use Jupyter Notebooks efficiently with Pelican. There is another great plugin, called pelican-ipynb which has got you covered. You will even find this plugin in your pelican-plugins/ folder. To use it, simply add pelican-ipynb.markup to the list of plugins:

PLUGINS = ['sitemap', 'pelican-ipynb.markup']

Now, you need to add the .ipynb to the markup extensions and you need to ignore the .ipynb_checkpoints/ folder:

MARKUP = ('md', 'ipynb')

IGNORE_FILES = [".ipynb_checkpoints"]  

Now you can use Jupyter Notebooks instead of Markdown files. There are a couple of ways to add metadata to your notebooks. You can add the metadata in an additional metadata file with the same file name and the same format as in the markdown file but with .nbdata extension. Another option is to add the metadata in a cell in the notebook. Before you can do that you need to enable this mode by adding IPYNB_USE_METACELL = True to your configuration. Then you can add the metadata in the first notebook cell in markdown mode like this:

- title: This is the Best Tool so far
- date: 2019-06-21 12:00
- category: tools
- tags: tool, software, best
- slug: best-tool
- authors: Jon Doe
- summary: This is the most important tool out there

Finally, the last way to add metadata is by directly modifying the metadata tag in the raw notebook file. Jupyter Notebooks are stored as JSON documents and inside them, you will find a "metadata" tag, where you can add the metadata directly. It is also possible to edit the metadata with Jupyter Notebook. If you are using JupyterLab, you can edit the metadata of the notebooks with the jupyterlab-nbmetadata plugin. Make sure to have a look at the documentation for more details on how to provide the notebook with metadata.

Hosting Your Blog on GitHub with GitHub Pages

You can host your website directly on GitHub with GitHub Pages. Before we get started, you have to create a repository in GitHub with username.github.io, where you should replace username with your username on GitHub. After you have set up your website in the previous steps, you can add this repository as a submodule:

git submodule add https://github.com/username/username.github.io.git output

Additionally, you will need to add ignore = all to the .gitmodules file in the [submodule "output"] section:

[submodule "output"]
    path = output
    url = https://github.com/username/username.github.io.git
    ignore = all

Finally, you need to specify in the publishconf.py configuration file that you don’t want the .git submodule dir to be deleted and to use the correct absolute URL. This can be specified by modifying the file with:

DELETE_OUTPUT_DIRECTORY = False

SITEURL = 'http://username.github.io'
RELATIVE_URLS = False

Now you can generate the website and after committing and pushing it, you should find your website ready at username.github.io to be filled with content. Note, that it is also possible to use GitLab Pages too. The steps are mostly pretty much the same.

Conclusion

You have seen how to create your data science blog using Pelican and Jupyter Notebooks. This is a powerful way to share your projects and articles with others and is additionally a helpful way to learn new topics in more depth. There is another plugin ipynb2pelican, which I haven’t tested but which can also integrate Jupyter notebooks with Pelican.

If anything is unclear or if you have further suggestions, feel free to add them in the comments below.

Further Reading

Here are some further resources that might be helpful for you

Image from Wikimedia Commons