Object Serialization with JSON and compressed JSON in Python

JSON is a popular data format for storing data in a structured way. Python has a built-in module called json that can be used to work with JSON data. In this article, we will see how to use the json module to serialize and deserialize data in Python.

How to Save Temporary Changes in Git Using Git Stash

Git stashing is a way to temporarily save changes that you do not want to commit yet. This is useful if you need to switch branches, but do not want to commit your changes first.

Running Prometheus with Systemd

Prometheus is a powerful open-source monitoring system that can be used to collect and track a variety of metrics for your applications. In this guide, we will cover how to get Prometheus up and running with systemd on a Ubuntu or Debian server.

Reading and Writing Parquet Files on S3 with Pandas and PyArrow

When working with large amounts of data, a common approach is to store the data in S3 buckets. Instead of dumping the data as CSV files or plain text files, a good option is to use Apache Parquet. In this short guide you’ll see how to read and write Parquet files on S3 using Python, Pandas and PyArrow.

Working with Credentials and Configurations in Python

When writing programs, there is often a large set of configuration and credentials that should not be hard-coded in the program. This also makes the customization of the program much easier and more generally applicable. There are various ways to handle configuration and credentials and you will see here a few of the popular and common ways to do that with Python.

tqdm Cheat Sheet

tqdm is a fast, user-friendly and extensible progress bar for Python and shell programs. Here you’ll find a collection of useful commands for quick reference.

Remove Jupyter Notebook Output from Terminal and when using Git

Often times you want to delete the output of a jupyter notebook before commiting it to a repository, but in most cases you want to still have the notebook output for yourself. In this short guide you will seeh how to delete the notebook output automatically when committing notebooks to a repository while keeping the outputs local.

Reading and Writing Pandas DataFrames in Chunks

This is a quick example how to chunk a large data set with Pandas that otherwise won’t fit into memory. In this short example you will see how to apply this to CSV files with pandas.read_csv.

Querying S3 Object Stores with Presto or Trino

Querying big data on Hadoop can be challenging to get running, but alternatively, many solutions are using S3 object stores which you can access and query with Presto or Trino. In this guide you will see how to install, configure, and run Presto or Trino on Debian or Ubuntu with the S3 object store of your choice and the Hive standalone metastore.

Manage Jupyter Notebook and JupyterLab with Systemd

In this article you will see how to easily manage Jupyter Notebook and JupyterLab by using the Systemd tooling. This is useful when you want to have an instance running local or on your server that you can manage and monitor.