Understanding and Developing Data Strategy and Monetization
A data strategy is a plan that outlines how an organization will collect, store, manage, and use its data. It is important because it can help organizations to improve decision-making,…
Covering Data Science, Business Intelligence, Technology Industry News Updates and Trends
A data strategy is a plan that outlines how an organization will collect, store, manage, and use its data. It is important because it can help organizations to improve decision-making,…
Introduction It is very important to have a local development and testing environment for developing data pipeline. Having a local development environment means you can try things and innovate with…
Anaconda distribution of Python is the best option for problem solvers who want to use Python. Anaconda is free (although the download is large which can take time) and can…
Quick reference for essential Git commands.
Introduction To be successful as data analysts you need to learn fundamental data analysis techniques, and data-oriented programming languages, and have a strong background in math. Here are the most…
What is Feature Engineering ? Feature engineering is an important step in the data science pipeline. One of the most important questions facing data scientists is how to choose which…
In this code example, we will apply a convolutional neural networks(CNN) to textual data. More specifically, we will use the structure of CNNs to classify text. Unlike images, which are…
Not many data scientists realize that it is possible and much more beneficial to build data science models directly on top of a data warehouse. Apache MADlib is an open-source…
Solving data science problems requires systematic thinking and approach, here's some of the key concepts and steps that you need to apply when trying to solve data science problems.
Building great data science models is key to success in business. Understanding the difference between data science and business intelligence is key to using these to succeed in your business…
Spark is a fast, easy-to-use and flexible data processing framework. It has an advanced execution engine supporting cyclic data flow and in-memory computing. Spark can run on Hadoop, standalone or…
Greenplum Architecture The main reason behind adaptation of massively parallel processing(MPP), data warehouse(DWH) solution is MPP architectural principles. These principles aim at removing main drawbacks of traditional DWH, and make…
Database programming in Python using SQLAlchemy library that provides uniform interface for connecting with different databases.
Data Science Gym is a great book that will get you started with mastering data science through a systematic set of practical exercises, and relevant background concepts.
Introduction There’s a lot of demand for companies and professionals who are experts in big data, because we are generating all kinds of data at greater pace than ever. Not…