Querying big data on Hadoop can be challenging to get running, but alternatively, many solutions are using S3 object stores which you can access and query with Presto or Trino. In this guide you will see how to install, configure, and run Presto or Trino on Debian or Ubuntu with the S3 object store of your choice and the Hive standalone metastore.
In this article you will see how to easily manage Jupyter Notebook and JupyterLab by using the Systemd tooling. This is useful when you want to have an instance running local or on your server that you can manage and monitor.
Systemd is an init system in Linux used for system intialization and service management. It is fairly useful to manage and monitor services. In this cheatsheet you will find a collection of common commands used with the command line tools
Presto is an open source distibruted query engine built for Big Data enabling high performance SQL access to a large variety of data sources including HDFS, PostgreSQL, MySQL, Cassandra, MongoDB, Elasticsearch and Kafka among others.
In this short article we will have a look on how to use PyTorch with the Iris data set. We will create and train a neural network with Linear layers and we will employ a Softmax activation function and the Adam optimizer.
Google Analytics is a powerful analytics tool found in an astonishing number of websites. In this tutorial, we will take a look at how to access the Google Analytics API (v4) with Python and Pandas. Additionally, we will take a look at the various ways to analyze your tracking data and create custom reports.
Apache Airflow is a powerfull workflow management system which you can use to automate and manage complex Extract Transform Load (ETL) pipelines. In this tutorial you will see how to integrate Airflow with the systemd system and service manager which is available on most Linux systems to help you with monitoring and restarting Airflow on failure.
Writing articles and tutorials are a great way to learn new things in depth while building a portfolio. In this tutorial, you will find the first steps that you will need to start your data science blog with Pelican and Jupyter Notebooks.
Python is a wonderful language for scripting and automating workflows and it is packed with useful tools out of the box with the Python Standard Library. A common thing to do, especially for a sysadmin, is to execute shell commands. But what usually will end up in a bash or batch file, can be also done in Python. You’ll learn here how to do just that with the os and subprocess modules.