Greenplum Programming Getting Started Hands On TutorialGreenplum Programming Getting Started Hands On Tutorial
Spread the love

Introduction

This tutorial assumes that you have downloaded and setup Greenplum VM in virtualbox on your local machine. After setting up Greenplum data warehouse, it guides you through common commands and steps that you will need to perform on Greenplum database.

Here’s link for more details https://www.2ndquadrant.com/en/blog/greenplum-vmware-virtualbox/.

https://greenplum.org/download/
Login as GP Admin

$ gpadmin

Default password for Greenplum 5 VM is pivotal.

Start the Greenplum Database
$./start_all.sh

To Check the Current Status of the Greenplum Database Instance

Run the gpstate command:

$ gpstate

The command displays the status of the master and segment processes. If the

Greenplum system is not running, the command displays an error message.

To Shut Down the Greenplum Database Instance

1. Run the gpstop command:

$ gpstop

Displays parameters for the master and segment processes that are to be stopped.

2. Enter y when prompted to stop the Greenplum instance.

To Start the Greenplum Database Instance

1. Run the gpstart command:

$ gpstart

The command displays parameters for the master and segment processes that are to be started.

2. Enter y when prompted to continue starting up the instance.

When newly installed, a Greenplum Database instance has three databases:

• The template1 database is the default template used to create new databases. If you have objects that should exist in every database managed by your Greenplum Database instance, you can create them in the template1 database.

• The postgreSQL and template0 databases are used internally and should not be modified or dropped. If you have modified the template1 database you can use the template0 database to create a new database without your modifications.

Connect to a Database with psql

The psql command is an interactive, command-line client used to access a Greenplum database.

Since there is no gpadmin database by default, you must at least

specify the database name on the psql command line.

To connect to a database with default connection parameters:

$ psql template1

To specify connection parameters on the command line:

$ psql -h localhost -p 5432 -U gpadmin template1

To set connection parameters in the environment:

$ export PGPORT=5432

$ export PGHOST=localhost

$ export PGDATABASE=template1

$ psql

 

PSQL Meta Commands

In addition to SQL statements, you can enter psql meta-commands, which begin with a backslash (\).

Here are some common psql meta-commands:

• Enter \g instead of a semicolon to terminate a SQL statement

• Enter \e to edit the buffer in an external editor (vi by default)

• Enter \p to display the contents of the query buffer

• Enter \r to reset the query buffer, abandoning what you have entered

• Enter \l to list databases

• Enter \d to list tables, views, and sequences

• Enter \q to exit psql

• Enter \h to display help for SQL statements

• Enter \? to display help for psql

When using the psql command line, you may list all schema with command \dn

Putty Based Login for Greenplum VM

Setup ssh connection with localhost using putty for accessing greenplum localhost. This would greatly ease development in local environment.

 

 

List Tables in a Given Schema

SELECT table_name FROM information_schema.tables

WHERE table_schema = ‘public’

ORDER BY table_name;

Describing Table Structure
select column_name, data_type, character_maximum_length

from INFORMATION_SCHEMA.COLUMNS where table_name =’d_airlines’;

Creating Analytics

Get average delay, standard deviation, and number of flights for USAir and Delta airlines.

SELECT carrier, AVG(arrdelayminutes),

STDDEV(arrdelayminutes), COUNT(*)

FROM faa.otp_c

WHERE carrier = ‘US’ OR carrier = ‘DL’

GROUP BY carrier;

 

To Show Tables Using SQL

SELECT * FROM pg_catalog.pg_tables;

Getting Tables for your desired schema

SELECT * FROM pg_catalog.pg_tables where schemaname=’faa’;

Getting Fixed Number of Records from Table
select * from faa.otp_r limit 10;

Comparing Table Sizes for Row vs Column Orientation

=# SELECT pg_size_pretty(pg_total_relation_size(‘faa.otp_r’));

pg_size_pretty

—————-

383 MB

(1 row)

=# SELECT pg_size_pretty(pg_relation_size(‘faa.otp_c’));

pg_size_pretty

—————-

0 bytes

(1 row)

=# SELECT

pg_size_pretty(pg_total_relation_size(‘faa.otp_c’));

pg_size_pretty

—————-

256 kB

(1 row)

 

Compression Example

CREATE TABLE foobar (fid INT, ft TEXT) WITH (compresstype = quicklz, orientation=column, appendonly=true) DISTRIBUTED by (fid);

Greenplum Installation

https://greenplum.org/install-greenplum-oss-on-ubuntu/

By Hassan Amin

Dr. Syed Hassan Amin has done Ph.D. in Computer Science from Imperial College London, United Kingdom and MS in Computer System Engineering from GIKI, Pakistan. During PhD, he has worked on Image Processing, Computer Vision, and Machine Learning. He has done research and development in many areas including Urdu and local language Optical Character Recognition, Retail Analysis, Affiliate Marketing, Fraud Prediction, 3D reconstruction of face images from 2D images, and Retinal Image analysis in addition to other areas.