Here you can find a selection of articles that are either important for me or were well received by the community. For more articles have a look at my blog
03 Mar 2021
Querying big data on Hadoop can be challenging to get running, but alternatively, many solutions are using S3 object stores which you can access and query with Presto or Trino. In this guide you will see how to install, configure, and run Presto or Trino on Debian or Ubuntu with the S3 object store of your choice and the Hive standalone metastore.
17 Oct 2020
Presto is an open source distibruted query engine built for Big Data enabling high performance SQL access to a large variety of data sources including HDFS, PostgreSQL, MySQL, Cassandra, MongoDB, Elasticsearch and Kafka among others.
17 Aug 2020
Google Analytics is a powerful analytics tool found in an astonishing number of websites. In this tutorial, we will take a look at how to access the Google Analytics API (v4) with Python and Pandas. Additionally, we will take a look at the various ways to analyze your tracking data and create custom reports.
12 Feb 2019
Jupyter Notebook is a powerful tool, but how can you use it in all its glory on a server? In this tutorial you will see how to set up Jupyter notebook on a server like Digital Ocean, AWS or most other hosting provider available. Additionally, you will see how to use Jupyter notebooks over SSH tunneling or SSL with with Let’s Encrypt.
23 Jan 2019
Say you have an external hard drive with layers upon layers of cryptically named folders and intricate mazes of directories (like here, or here). How can you make sense of this mess? Python offers various tools in the Python standard library to deal with your file system and the folderstats module can be of additional help to gain insights into your file system.
01 Aug 2018
In this article, we will be going through building queries for Wikidata with Python and SPARQL by taking a look where mayors in Europe are born. This tutorial is building up the knowledge to collect the data responsible for this interactive visualization from the header image which was done with deck.gl.
05 Jun 2018
There are many ways to compare countries and cities and many measurements to choose from. We can see how they perform economically, or how their demographics differ, but what if we take a look at data available in OpenStreetMap? In this article, we explore just that with the help of a procedure called t-SNE.
15 May 2018
OpenStreetMap (OSM) is a massive collaborative map of the world, built and maintained mostly by volunteers. On the other hand, there exist various indicators to measure economic growth, prosperity, and produce of a country. What if we use OpenStreetMap to predict those economic indicators?
04 Mar 2018
Have you ever wondered where most Biergarten in Germany are or how many banks are hidden in Switzerland? OpenStreetMap is a great open source map of the world which can give us some insight into these and similar questions. There is a lot of data hidden in this data set, full of useful labels and geographic information, but how do we get our hands on the data?
06 Dec 2017
There are times being stuck with a load of images that need to be cropped, resized or converted, but doing this by hand in an image editor is tedious work. One tool I commonly use in these desperate situations is ImageMagick, which is a powerful tool when automating raster and vector image processing. Here I’ll introduce a few common commands I had to look up multiple times.
30 Jun 2017
This article explores an efficient way on how to create tubes, ribbons and moving camera orientations based on parametric curves with the help of moving coordinate frames.
02 Mar 2017
This article is showing a geometric and intuitive explanation of the covariance matrix and the way it describes the shape of a data set. We will describe the geometric relationship of the covariance matrix with the use of linear transformations and eigen decomposition.