03 Mar 2021
Querying big data on Hadoop can be challenging to get running, but alternatively, many solutions are using S3 object stores which you can access and query with Presto or Trino. In this guide you will see how to install, configure, and run Presto or Trino on Debian or Ubuntu with the S3 object store of your choice and the Hive standalone metastore.
10 Nov 2020
In this article you will see how to easily manage Jupyter Notebook and JupyterLab by using the Systemd tooling. This is useful when you want to have an instance running local or on your server that you can manage and monitor.
08 Nov 2020
Systemd is an init system in Linux used for system intialization and service management. It is fairly useful to manage and monitor services. In this cheatsheet you will find a collection of common commands used with the command line tools systemctl
and journalctl
.
17 Oct 2020
Presto is an open source distibruted query engine built for Big Data enabling high performance SQL access to a large variety of data sources including HDFS, PostgreSQL, MySQL, Cassandra, MongoDB, Elasticsearch and Kafka among others.
27 Sep 2020
In this short article we will have a look on how to use PyTorch with the Iris data set. We will create and train a neural network with Linear layers and we will employ a Softmax activation function and the Adam optimizer.
17 Aug 2020
Google Analytics is a powerful analytics tool found in an astonishing number of websites. In this tutorial, we will take a look at how to access the Google Analytics API (v4) with Python and Pandas. Additionally, we will take a look at the various ways to analyze your tracking data and create custom reports.
20 Dec 2019
Apache Airflow is a powerfull workflow management system which you can use to automate and manage complex Extract Transform Load (ETL) pipelines. In this tutorial you will see how to integrate Airflow with the systemd system and service manager which is available on most Linux systems to help you with monitoring and restarting Airflow on failure.
26 Jul 2019
Writing articles and tutorials are a great way to learn new things in depth while building a portfolio. In this tutorial, you will find the first steps that you will need to start your data science blog with Pelican and Jupyter Notebooks.
22 Apr 2019
Python is a wonderful language for scripting and automating workflows and it is packed with useful tools out of the box with the Python Standard Library. A common thing to do, especially for a sysadmin, is to execute shell commands. But what usually will end up in a bash or batch file, can be also done in Python. You’ll learn here how to do just that with the os and subprocess modules.
12 Feb 2019
Jupyter Notebook is a powerful tool, but how can you use it in all its glory on a server? In this tutorial you will see how to set up Jupyter notebook on a server like Digital Ocean, AWS or most other hosting provider available. Additionally, you will see how to use Jupyter notebooks over SSH tunneling or SSL with with Let’s Encrypt.