Apache SupersetApache Superset
Spread the love

Introduction

Are looking for a tool to visualize, explore and analyze data that can be modified and adapted to organizational architecture while it does not cost a fortune?
Apache Superset is the leading open-source BI platform for creating dashboards to analyze data.

Superset Architecture

Apache Superset involves the following components:
1. Web server (can run multiple instances)
2. Metadata database
3. Cache layer
4. Message queue for async queries
5. Results backend

The web server is a flask python app, using sqlalchemy ORM to connect to any database. We can configure the data warehouse(s) to connect to. We can also choose the results backend we want to store the results of long-running queries into.

 

 

 

Superset Scalable Architecture
Figure : Apache Superset architecture for building open source scalable analytics architecture

Hardware Requirements

We recommend assigning at least 8GB of RAM to the virtual machine as well as provisioning a hard drive of at least 40GB so that there will be enough space for both the OS and all of the required dependencies.

Docker Desktop Based Setup

Docker Desktop with recently added support for Windows Subsystem for Linux WSL 2, is a good option for setting up Superset in Windows.

Control and Permissions over Charts and Dashboards

Superset ensures the datasets are adequately protected. This is done by defining the extent of control or permission granted to other users of the file. In that way, users can effectively manage the data to avoid compromise.

Database Support

Another amazing feature of superset GitHub is that it supports several SQL databases which permit access to Oracle, MySQL, MS SQL Server, Sybase, Postgres, among others.

Superset Setup Method 1

If you are familiar with Docker, then getting starting with Superset is super easy.

It is a two-step process.

Step 1 Clone Superset Repo

The first step is to clone the latest stable version of the superset, as follows:-

$ git clone https://github.com/apache/superset.git

Step 2 Starting Superset in Non-Dev Mode with PostgreSQL

$ docker-compose -f docker-compose-non-dev.yml up

Configuring Docker Compose

The following is for users who want to configure how Superset starts up in Docker Compose.

You can configure the Docker Compose settings for dev and non-dev mode with docker/.env and docker/.env-non-dev respectively. These environment files set the environment for most containers in the Docker Compose setup, and some variables affect multiple containers and others only single ones.

One important variable is SUPERSET_LOAD_EXAMPLES which determines whether the superset_init container will load example data and visualizations into the database and Superset. These examples are quite helpful for most people, but probably unnecessary for experienced users. The loading process can sometimes take a few minutes and a good amount of CPU, so you may want to disable it on a resource-constrained device.

Login to Superset

Your local Superset instance also includes a Postgres server to store your data and is already pre-loaded with some example datasets that ship with Superset. You can access Superset now via your web browser by visiting http://localhost:8088. Note that many browsers now default to https – if yours is one of them, please make sure it uses HTTP.

Log in with the default username and password:

username: admin

password: admin

Method 2: Superset GitHub Configuration Through the Installation of Python Packages 

 

Step 1: Installing the Operating System Dependencies

Superset keeps information regarding all database connections in the metadata database. As such, the cryptography python library is helpful in encrypting the connection passwords. However, this python library has operating system dependencies., implying that users will have to install other OS dependencies to support the running of Superset. 

For the installation of Ubuntu and Debian, users will have to use the command below to ensure the necessary OS dependencies: 

$ sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev

The command to use for Ubuntu 20.04 is :

$ sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev

Step 2: Installing the Python virtualenv

Users are encouraged to install the superset Github inside the python virtualenv. Python 3 comes with the virtualenv. In a case where the virtualenv is not installed, you can do the installation by using the OS package or the pip command below: 

$ pip install virtualenv

Users can also create or activate the virtualenv with the command below: 

# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.

$ python3 -m venv venv
$ . venv/bin/activate

Having completed the installation of virtualenv, every program you run will be done inside it. You can, however, choose to stop using the virtualenv by typing “deactivate“.

Step 3: Installing and Launching the Superset GitHub

After completing the installation of the operating system dependencies and python virtualenv, the next thing is to install the Superset and initiate it. To install the Superset, follow the guide below: 

# Install superset

$ pip install apache-superset
 
# Initialize the database

$ superset db upgrade
 
# Create an admin user (you will be prompted to set a username, first and last name before setting a password)

$ export FLASK_APP=superset
$ superset fab create-admin
 
# Load some data to play with

$ superset load_examples
 
# Create default roles and permissions

$ superset init
 
# To start a development web server on port 8088, use -p to bind to another port

$ superset run -p 8088 –with-threads –reload –debugger

Note

Development web server (superset run or flask run) is not intended for production use.

By Hassan Amin

Dr. Syed Hassan Amin has done Ph.D. in Computer Science from Imperial College London, United Kingdom and MS in Computer System Engineering from GIKI, Pakistan. During PhD, he has worked on Image Processing, Computer Vision, and Machine Learning. He has done research and development in many areas including Urdu and local language Optical Character Recognition, Retail Analysis, Affiliate Marketing, Fraud Prediction, 3D reconstruction of face images from 2D images, and Retinal Image analysis in addition to other areas.