Introduction
Are looking for a tool to visualize, explore and analyze data that can be modified and adapted to organizational architecture while it does not cost a fortune?
Apache Superset is the leading open-source BI platform for creating dashboards to analyze data.
Superset Architecture
Apache Superset involves the following components:
1. Web server (can run multiple instances)
2. Metadata database
3. Cache layer
4. Message queue for async queries
5. Results backend
The web server is a flask python app, using sqlalchemy ORM to connect to any database. We can configure the data warehouse(s) to connect to. We can also choose the results backend we want to store the results of long-running queries into.
Hardware Requirements
We recommend assigning at least 8GB of RAM to the virtual machine as well as provisioning a hard drive of at least 40GB so that there will be enough space for both the OS and all of the required dependencies.
Docker Desktop Based Setup
Docker Desktop with recently added support for Windows Subsystem for Linux WSL 2, is a good option for setting up Superset in Windows.
Control and Permissions over Charts and Dashboards
Superset ensures the datasets are adequately protected. This is done by defining the extent of control or permission granted to other users of the file. In that way, users can effectively manage the data to avoid compromise.
Database Support
Another amazing feature of superset GitHub is that it supports several SQL databases which permit access to Oracle, MySQL, MS SQL Server, Sybase, Postgres, among others.
Superset Setup Method 1
If you are familiar with Docker, then getting starting with Superset is super easy.
It is a two-step process.
Step 1 Clone Superset Repo
The first step is to clone the latest stable version of the superset, as follows:-
$ git clone https://github.com/apache/superset.git
Step 2 Starting Superset in Non-Dev Mode with PostgreSQL
$ docker-compose -f docker-compose-non-dev.yml up
Configuring Docker Compose
The following is for users who want to configure how Superset starts up in Docker Compose.
You can configure the Docker Compose settings for dev and non-dev mode with docker/.env and docker/.env-non-dev respectively. These environment files set the environment for most containers in the Docker Compose setup, and some variables affect multiple containers and others only single ones.
One important variable is SUPERSET_LOAD_EXAMPLES which determines whether the superset_init container will load example data and visualizations into the database and Superset. These examples are quite helpful for most people, but probably unnecessary for experienced users. The loading process can sometimes take a few minutes and a good amount of CPU, so you may want to disable it on a resource-constrained device.
Login to Superset
Your local Superset instance also includes a Postgres server to store your data and is already pre-loaded with some example datasets that ship with Superset. You can access Superset now via your web browser by visiting http://localhost:8088. Note that many browsers now default to https – if yours is one of them, please make sure it uses HTTP.
Log in with the default username and password:
username: admin
password: admin
Method 2: Superset GitHub Configuration Through the Installation of Python Packages
Step 1: Installing the Operating System Dependencies
Superset keeps information regarding all database connections in the metadata database. As such, the cryptography python library is helpful in encrypting the connection passwords. However, this python library has operating system dependencies., implying that users will have to install other OS dependencies to support the running of Superset.
For the installation of Ubuntu and Debian, users will have to use the command below to ensure the necessary OS dependencies:
$ sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev
The command to use for Ubuntu 20.04 is :
$ sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev
Step 2: Installing the Python virtualenv
Users are encouraged to install the superset Github inside the python virtualenv. Python 3 comes with the virtualenv. In a case where the virtualenv is not installed, you can do the installation by using the OS package or the pip command below:
$ pip install virtualenv
Users can also create or activate the virtualenv with the command below:
# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.
$ python3 -m venv venv
$ . venv/bin/activate
Having completed the installation of virtualenv, every program you run will be done inside it. You can, however, choose to stop using the virtualenv by typing “deactivate“.
Step 3: Installing and Launching the Superset GitHub
After completing the installation of the operating system dependencies and python virtualenv, the next thing is to install the Superset and initiate it. To install the Superset, follow the guide below:
# Install superset
$ pip install apache-superset
# Initialize the database
$ superset db upgrade
# Create an admin user (you will be prompted to set a username, first and last name before setting a password)
$ export FLASK_APP=superset
$ superset fab create-admin
# Load some data to play with
$ superset load_examples
# Create default roles and permissions
$ superset init
# To start a development web server on port 8088, use -p to bind to another port
$ superset run -p 8088 –with-threads –reload –debugger
Note
Development web server (superset run or flask run) is not intended for production use.
References
https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
Setting up Superset GitHub Integration: 3 Easy Methods (hevodata.com)
Adding New Drivers in Docker | Superset (apache.org)
Configuring Superset | Superset (apache.org)
https://apache-superset.readthedocs.io/en/latest/installation.html