Introduction to Basic Visualizations in Python¶

The basic question of when I should use which chart is explained here.

The type of graph you use is important. It's important for telling the story of data, that your assocications are not mis-interpreted.

Plot Types¶

Bar Plots¶

Usage: When Comparing the same varibales in the same category or datasets.

Do not use: More than 3 categories of variables or when trying to visualze continuous data.

import matplotlib.pyplot as plt 

numbers = [500, 800, 900, 1000, 1400, 1600]
widths  = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
colors  = ['b','b','b','b','r','b']

fig, ax = plt.subplots()

plt.bar(range(6), numbers, width=widths, color=colors, align='center')

plt.xticks(range(6), ('2016','2017', '2018', '2019', '2020', '2021'))

ax.set_ylabel('Billions')

plt.title('GDP Prediction')

plt.show

<function matplotlib.pyplot.show(*args, **kw)>

Line Plots¶

Line plots are the most common

Usage: When you are tracking and comparing several variables across time, analyzing trends and variation and predicting future values.

Do not use: To get an general overview of your data or analyzing individual components or sections.

plt.plot(range(6), numbers)
plt.show()

Drawing mulitple lines and plots

numbers2 = [200, 600, 900, 1900, 1200, 1800]
plt.plot(range(6), numbers)
plt.plot(range(6), numbers2)
plt.show()

Setting the Axis, Ticks, Grids

#use an alyusis for the axis fuction
ax = plt.axes()

# changing the x axes and y axes limit (making them longer)
ax.set_xlim([0,11])
ax.set_ylim([-1,11])

# changing the x axes and yaxes ticks
ax.set_xticks([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
ax.set_yticks([200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000])
plt.plot(range(6), numbers)
plt.plot(range(6), numbers2)
plt.show()

Add Grids

#use an alyusis for the axis fuction
ax = plt.axes()

# changing the x axes and y axes limit (making them longer)
ax.set_xlim([0,11])
ax.set_ylim([-1,11])

# changing the x axes and yaxes ticks
ax.set_xticks([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
ax.set_yticks([200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000])

# add Grids
ax.grid()
#plot
plt.plot(range(6), numbers)
plt.plot(range(6), numbers2)
plt.show()

Change line appearence

# '-' Solid Line
# '--' Dashed Line
# '-.' Dash Dot Line
# ':' Dotted Line

#plot
plt.plot(range(6), numbers, '--')
plt.plot(range(6), numbers2, ':')
plt.show()

Use Colors

plt.plot(range(6), numbers,  'r',)
plt.plot(range(6), numbers2, 'b',)
plt.show()

Adding Markers

Options Can be found here: https://matplotlib.org/api/markers_api.html

plt.plot(range(6), numbers,  'o--')
plt.plot(range(6), numbers2, 'v:' )
plt.show()

Change Color on Markers

plt.plot(range(6), numbers,  'ro--')
plt.plot(range(6), numbers2, 'bv:' )
plt.show()

Add Labels

# labels 
plt.xlabel('X axis label')
plt.ylabel('Y axis label')

plt.plot(range(6), numbers,  'ro--')
plt.plot(range(6), numbers2, 'bv:' )
plt.show()

Annotating the Chart

# Annotating 
plt.annotate(xy=[0,500], s='Make a point')
plt.annotate(xy=[5,1500], s='Make another point')
# plot
plt.plot(range(6), numbers,  'ro--')
plt.plot(range(6), numbers2, 'bv:' )
plt.show()

Create a Legend

plt.plot(numbers, label="test1")
plt.plot(numbers2, label="test2")
# Place a legend to the right of this smaller subplot.
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)

plt.show()

Scatter Plots¶

Usage: When analyzing indivudual points, looking for outliers, fluctuations, general overview of variables

Do not use: when looking for precision, one dimensional data, non numerica/categorical data

import numpy as np
import matplotlib.pyplot as plt

# creating arrays
# rand generates from distribution [0, 1)
x1 = 5 * np.random.rand(40)
x2 = 5 * np.random.rand(40) + 25
x3 = 25 * np.random.rand(20)

# combining all these arrays and creating a list
x = np.concatenate((x1, x2, x3))

y1 = 5 * np.random.rand(40)
y2 = 5 * np.random.rand(40) + 25
y3 = 25 * np.random.rand(20)
y = np.concatenate((y1, y2, y3))

# s is the size of each data point
# marker is the shape of each data point
# c is the color
plt.scatter(x, y, s=[100], marker='^', c='r')
plt.show()

Scatterplots are especially important for data science because they can show data patterns that are not obvious when viewed in other ways. You can see data groupings with relative ease and help the viewer understand when data belongs to a particular group.

import numpy as np
import matplotlib.pyplot as plt

x1 = 5 * np.random.rand(50)
x2 = 5 * np.random.rand(50) + 25
x3 = 30 * np.random.rand(25)
x = np.concatenate((x1, x2, x3))

y1 = 5 * np.random.rand(50)
y2 = 5 * np.random.rand(50) + 25
y3 = 30 * np.random.rand(25)
y = np.concatenate((y1, y2, y3))

# using different colors for the data
color_array = ['b'] * 50 + ['g'] * 50 + ['r'] * 25

plt.scatter(x, y, s=[50], marker='D', c=color_array)
plt.show()

Showing correlations¶

In some cases, you need to know the general direction that your data is taking when looking at a scatterplot. In this case, you add a trendline to the output.
Least square regression is being used.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.pylab as plb

x1 = 15 * np.random.rand(50)
x2 = 15 * np.random.rand(50) + 15
x3 = 30 * np.random.rand(30)
x = np.concatenate((x1, x2, x3))

y1 = 15 * np.random.rand(50)
y2 = 15 * np.random.rand(50) + 15
y3 = 30 * np.random.rand(30)
y = np.concatenate((y1, y2, y3))

color_array = ['b'] * 50 + ['g'] * 50 + ['r'] * 30

plt.scatter(x, y, s=[90], marker='*', c=color_array)

# The vector output of polyfit() is used as input to poly1d(), which calculates the actual y-axis data points.
# The third argument (1) is the degree of polinominal fit. Which is a line when it is 1.
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
# plot with red color and solid line
plb.plot(x, p(x), 'r-')

plt.show()

HenryBernreuter.com v1.0

Basic Data Visualiztion in Python

Introduction to Basic Visualizations in Python¶

Plot Types¶

Bar Plots¶

Line Plots¶

Scatter Plots¶

Showing correlations¶

Split Variables from one column into several columns in R

Basic Data Visualiztion in Python

Introduction to Basic Visualizations in Python¶

Plot Types¶

Bar Plots¶

Line Plots¶

Scatter Plots¶

Showing correlations¶

Split Variables from one column into several columns in R

Don't go yet!

Numpy in One Post

Split Variables from one column into several columns in R

Like this page? Consider buying me a cup of coffee or sharing it?