22 Nov 2022
JSON is a popular data format for storing data in a structured way. Python has a built-in module called json that can be used to work with JSON data. In this article, we will see how to use the json module to serialize and deserialize data in Python.
21 Nov 2022
Git stashing is a way to temporarily save changes that you do not want to commit yet. This is useful if you need to switch branches, but do not want to commit your changes first.
10 Nov 2022
Prometheus is a powerful open-source monitoring system that can be used to collect and track a variety of metrics for your applications. In this guide, we will cover how to get Prometheus up and running with systemd on a Ubuntu or Debian server.
10 Apr 2022
When working with large amounts of data, a common approach is to store the data in S3 buckets. Instead of dumping the data as CSV files or plain text files, a good option is to use Apache Parquet. In this short guide you’ll see how to read and write Parquet files on S3 using Python, Pandas and PyArrow.
11 Jan 2022
When writing programs, there is often a large set of configuration and credentials that should not be hard-coded in the program. This also makes the customization of the program much easier and more generally applicable. There are various ways to handle configuration and credentials and you will see here a few of the popular and common ways to do that with Python.
20 Dec 2021
tqdm is a fast, user-friendly and extensible progress bar for Python and shell programs. Here you’ll find a collection of useful commands for quick reference.
06 Nov 2021
Often times you want to delete the output of a jupyter notebook before commiting it to a repository, but in most cases you want to still have the notebook output for yourself. In this short guide you will seeh how to delete the notebook output automatically when committing notebooks to a repository while keeping the outputs local.
03 Apr 2021
This is a quick example how to chunk a large data set with Pandas that otherwise won’t fit into memory. In this short example you will see how to apply this to CSV files with pandas.read_csv.
03 Mar 2021
Querying big data on Hadoop can be challenging to get running, but alternatively, many solutions are using S3 object stores which you can access and query with Presto or Trino. In this guide you will see how to install, configure, and run Presto or Trino on Debian or Ubuntu with the S3 object store of your choice and the Hive standalone metastore.
10 Nov 2020
In this article you will see how to easily manage Jupyter Notebook and JupyterLab by using the Systemd tooling. This is useful when you want to have an instance running local or on your server that you can manage and monitor.