NumPy Basics: Arrays and Vectorized Computation¶

NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.

Here what it provides:

1- ndarray. a fast and space-efficient multidimensional array.
2- Standard mathematical functions for fast operations on entire arrays of data without having to write loops.
3- Tools for reading / writing array data to disk and working with memory-mapped files.
4- Linear algebra, ranadom number generation and Fourier transform capabilities.
5- Tools for intergating code written in C/C++ and Fortran.

The NumPy ndarray: A Multidimensional Array Object¶

One of the key feautures of NumPy is its N-dimensional array object, or ndarray which is fast, flexible container for large data sets in Python.

Creating an Array¶

# need to import the numpy library
import numpy as np

# one dimensional array
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

# two dimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

# dimension of the array
arr2.ndim

2

# shape of the array
#type(arr2.shape)
arr2.shape

(2, 4)

# data type of the array
arr1.dtype

dtype('float64')

# size of the array
arr2.size

8

# number of rows
len(arr2)
#arr2

2

# number of columns
# refer to this after reading about slicing
len(arr2[0,:])

4

# create one dimensional array and all zero
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

# create one dimensional array and all ones
np.ones(5)

array([1., 1., 1., 1., 1.])

# create two dimensional array and all zero
np.zeros((3,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

# similar to range but create one dimensional array
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

# create an array similar to arr2 shape and all ones
arr3 = np.ones_like(arr2)
arr3

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

# create an array similar to arr2 shape and all zeros
arr4 = np.zeros_like(arr2)

# create empty array (allocating new memory so values might be garbage)
arr5 = np.empty((3, 4))
arr5

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

# creates an empty array similar shape of arr2
arr6 = np.empty_like(arr2)
arr6

array([[0, 0, 0, 0],
       [0, 0, 0, 0]])

# create n x n identity matrix
arr7 = np.identity(5)
arr7

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

# create n x n identity matrix
arr8 = np.eye(3)
arr8

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Data Types for ndarrays¶

arr1 = np.array([1,2,3])
arr1.dtype

dtype('int32')

arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype

dtype('int32')

array types:
¶

int8, uint8
int16, uint16
int32, uint32
int64, uint64
float16
float32
float64
float128
complex64, complex128
complex256
bool
object
string
unicode

arr = np.array([1, 2, 3])
arr.dtype

dtype('int32')

float_arr = arr.astype(np.float64)
float_arr

array([1., 2., 3.])

arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

# you can drop the dtype and get same result
numeric_strings = np.array(['1.2', '3.4', '5.6'], dtype=np.string_)
numeric_strings.astype(np.float64)

array([1.2, 3.4, 5.6])

Operation between Arrays and Scalars¶

arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

arr * arr

array([[ 1,  4,  9],
       [16, 25, 36]])

arr + arr

array([[ 2,  4,  6],
       [ 8, 10, 12]])

arr - arr

array([[0, 0, 0],
       [0, 0, 0]])

1.0 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

arr ** 2

array([[ 1,  4,  9],
       [16, 25, 36]], dtype=int32)

Basic Indexing and Slicing¶

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[5]

6

arr[5:8]

array([6, 7, 8])

arr[5:8] = 12
arr

array([ 1,  2,  3,  4,  5, 12, 12, 12,  9])

# IMPORTANT: slices are views of orignal array, so change to view affects original one
arr_slice = arr[5:8]
arr_slice[1] = 1000
arr

array([   1,    2,    3,    4,    5,   12, 1000,   12,    9])

arr_slice[:] = 64
arr

array([ 1,  2,  3,  4,  5, 64, 64, 64,  9])

# this is how you create new array not the view of the original array
arr_new = np.array(arr[5:8])
arr[6] = 200
# no side effect on arr_new
arr_new

array([64, 64, 64])

# or you can use
arr_new = arr[5:8].copy()
arr_new

array([ 64, 200,  64])

# some examples for higher dimensional arrays
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

array([7, 8, 9])

arr2d[2][2]

9

# or you can
arr2d[2, 2]

9

# examples for 3D arrays
arr3d = np.array([[[1, 2, 3], [3, 4, 5]], [[6, 7, 8], [9, 10 , 11]]])
arr3d

array([[[ 1,  2,  3],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

# imagine every index that you use, you get into one bracket 
# this below generates a 2 x 3 array
arr3d[0]

array([[1, 2, 3],
       [3, 4, 5]])

arr3d[0][1]

array([3, 4, 5])

arr3d[0][1][2]

5

# or you can type
arr3d[0, 1, 2]

5

# some more operations
# again, you need copy so you dont generate a view
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

Indexing with Slices¶

arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

arr2d[1, :2]

array([4, 5])

arr2d[2, :1]

array([7])

arr2d[:, :1]

array([[1],
       [4],
       [7]])

arr2d[:2, 1:] = 1000
arr2d

array([[   1, 1000, 1000],
       [   4, 1000, 1000],
       [   7,    8,    9]])

Boolean Indexing¶

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
# random number of normal distribution [-1, 1]
data = np.random.randn(7, 4)
data

array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])

names.shape

(7,)

data
#names

array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])

names == 'Bob'

array([ True, False, False,  True, False, False, False])

# matches the row with above True-False and picks only the True ones
data[names == 'Bob', 2:]

array([[ 1.18939086, -0.56486144],
       [-1.13202316,  1.78486746]])

data[names == 'Bob', 3]

array([-0.56486144,  1.78486746])

# To select everything but Bob
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

# or you can use ~
data[~(names == 'Bob')]

array([[-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])

Note: Selecting data from an array by boolean indexing always create a copy of the data¶

# you can use & and | for boolean expressions
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False])

data[mask]

array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059]])

Note: keywords and/or do not work with boolean arrays¶

data

array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])

# setting all negative values in array daat to zero
data[data < 0] = 0
data

array([[1.37879732, 0.        , 1.18939086, 0.        ],
       [0.        , 0.83257863, 0.        , 0.        ],
       [0.26552364, 0.38933028, 0.19881786, 0.        ],
       [0.        , 2.4892188 , 0.        , 1.78486746],
       [0.        , 0.62826899, 0.        , 0.        ],
       [1.23605999, 0.        , 0.72049124, 0.11816467],
       [0.        , 0.        , 0.        , 0.        ]])

Fancy Indexing¶

arr = np.zeros((8, 4))
for i in range(len(arr)):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

# fancy indexing
# picks complete row of each element of the list
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

# array length - 1 is the last row
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

# reshape being introduced here
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

# another fancy indexing
# intersection of rows and columns in order
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

Note: Fancy indexing, unlike slicing always copies the data into a new array¶

Transporting Arrays and Swapping Axes¶

arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# transpose of an array which is a view of the array
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

arr = np.array([[1, 2], [3, 4]])
arr

array([[1, 2],
       [3, 4]])

# matrix multiplication
arr.dot(arr)

array([[ 7, 10],
       [15, 22]])

# or you can type
np.dot(arr, arr)

array([[ 7, 10],
       [15, 22]])

arr = np.random.randn(6, 3)
np.dot(arr.T, arr)

array([[ 6.01986433, -1.83225009,  1.13710425],
       [-1.83225009,  3.26559867, -1.68900134],
       [ 1.13710425, -1.68900134,  3.46863104]])

arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

# transpose permutes the axes. It axes start from 0, 1 ... depending to dimension of the array
# following means transpose the rows and columns
arr.transpose(1,0)

array([[1, 4],
       [2, 5],
       [3, 6]])

arr = np.arange(16).reshape((2, 2, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

# following means keep the last index intact but change the first index with second one
# to understand what is happening use Aijk and play with keeping k as before but changing i and j
arr.transpose(1, 0, 2)

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

# swap axes works like transpose but gets a pair of axes to swap
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

arr.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

Universal Functions: Fast Element-wise Array Functions¶

A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

arr = np.arange(10)
# unary universal function of sqrt
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

# unary universal function of exponent
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

x = np.random.randn(8)
y = np.random.randn(8)
x

array([-0.03736112, -0.60744449, -0.33559193, -0.51274497, -0.61488508,
       -0.16981259, -0.28471117, -0.46250942])

y

array([-0.19247363,  2.00505277,  0.23527698, -0.90768257, -0.16235317,
       -0.08872467, -0.40754121, -0.24138504])

# binary universal function of maximum (compares element by element in order)
np.maximum(x, y)

array([-0.03736112,  2.00505277,  0.23527698, -0.51274497, -0.16235317,
       -0.08872467, -0.28471117, -0.24138504])

arr = np.random.randn(8)
# modf returns two array as a tuple, one is fractional and one integral part of numbers
np.modf(arr)

(array([ 0.85664649,  0.59811592, -0.93778945, -0.23931799, -0.64456316,
        -0.38911459,  0.46648843,  0.62282601]),
 array([ 0.,  0., -0., -1., -0., -0.,  0.,  0.]))

Some unary ufuncs (Please refer to PyNum documentation for the explanation of each)¶

abs, fabs
sqrt
square
exp
log, log10, log2, log1p
sign
ceil
floor
rint
modf
isnan
isfinite, isinf
cos, cosh, sin, sinh
tan, tanh
arccos,arccosh, arcsin
arcsinh, arctan, arctanh
logical_not

Some binary ufuncs (Please refer to NumPy documentation for the explanation of each)¶

add
subtract
multiply
divide, floor_divide
power
maximum, fmax
minimum, fmin
mod
copysign
greater, greater_equal
less, less_equal, equal
not_equal
logical_and
logical_or
logical_xor

Data Processing Using Arrays¶

Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents.

# lets say you want to calculate the function sqrt(x^2 + y^2) across a reqular grid of values.
# np.meshgrid function takes two 1D array and produces two 2D, look at following example and see how
points = np.arange(0, 10, 2)
points

array([0, 2, 4, 6, 8])

xs, ys = np.meshgrid(points, points)
xs

array([[0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8]])

ys

array([[0, 0, 0, 0, 0],
       [2, 2, 2, 2, 2],
       [4, 4, 4, 4, 4],
       [6, 6, 6, 6, 6],
       [8, 8, 8, 8, 8]])

z= np.sqrt(xs ** 2 + ys ** 2)
z

array([[ 0.        ,  2.        ,  4.        ,  6.        ,  8.        ],
       [ 2.        ,  2.82842712,  4.47213595,  6.32455532,  8.24621125],
       [ 4.        ,  4.47213595,  5.65685425,  7.21110255,  8.94427191],
       [ 6.        ,  6.32455532,  7.21110255,  8.48528137, 10.        ],
       [ 8.        ,  8.24621125,  8.94427191, 10.        , 11.3137085 ]])

Expressing Conditional Logic as Array Operations¶

The numpy.where function is a vectorized version of the ternary expression x if condition else y

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

cond = np.array([True, False, True, True, False])
#zip() is built in Python function and makes an iterator that aggregates elements from each of the iterables.
result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
result

[1.1, 2.2, 1.3, 1.4, 2.5]

This has multiple problems.
First, it will not be very fast for large arrays. (Pure Python)
Second, it will not works with multidimensional arrays.
With np.where you can write:

result = np.where(cond, xarr, yarr)
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

# The second or third arguments of where function; one or both of them can be scalars.
arr = np.random.randn(4,4)
arr

array([[ 1.05846636, -2.18934139,  0.69033616, -0.42188738],
       [ 1.6349166 , -0.79310744,  0.37484735, -1.69703955],
       [ 1.06596908, -0.43937802,  0.53081635, -1.62868329],
       [ 0.38555556,  0.10910263, -0.94933816, -0.98044428]])

# we want to replace all positive values with 2 and all negative values with -2
np.where(arr > 0, 2, -2)

array([[ 2, -2,  2, -2],
       [ 2, -2,  2, -2],
       [ 2, -2,  2, -2],
       [ 2,  2, -2, -2]])

# or setting only positive values to 2
np.where(arr > 0, 2, arr)

array([[ 2.        , -2.18934139,  2.        , -0.42188738],
       [ 2.        , -0.79310744,  2.        , -1.69703955],
       [ 2.        , -0.43937802,  2.        , -1.62868329],
       [ 2.        ,  2.        , -0.94933816, -0.98044428]])

''' 
Consider following example where we have two boolean arrays, cond1 and cond2 and wish to assign
a different value for each of he 4 possible pairs of boolean values.
Pure Pythin:
'''
cond1 = np.array([True, True, False, False])
cond2 = np.array([True, False, True, False])

result = []
for i in range(len(cond1)):
    if cond1[i] and cond2[i]:
        result.append(0)
    elif cond1[i]:
        result.append(1)
    elif cond2[i]:
        result.append(2)
    else:
        result.append(3)
result

[0, 1, 2, 3]

# smart way of using np.where
np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2, 3)))

array([0, 1, 2, 3])

# values of zero treated as False and non-zero True in Python
# so we can re-write previous code as:
result = 1 * (cond1 & ~cond2) + 2 * (~cond1 * cond2) + 3 * (~cond1 * ~cond2)
result

array([0, 1, 2, 3])

Mathematical and Statistical Methods¶

arr = np.random.randn(5, 4)
arr

array([[-0.18252157, -0.63149364,  0.66817973,  0.65735378],
       [ 0.76601508,  0.2255208 ,  2.22401099,  0.04788006],
       [-0.42680459,  0.53485304,  0.37781218, -0.01701542],
       [ 0.61577001, -1.44680833, -1.02141823, -0.76900976],
       [ 0.87232208,  1.19893413, -1.31132677, -0.49392435]])

arr.mean()

0.09441646134989254

arr.sum()

1.8883292269978509

arr.std()

0.8831429955637358

arr = np.array([[1, 2, 3],[4, 5, 6], [7, 8, 9]])
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

# mean on axis - 0 is column and 1 is row for two dimension array
arr.mean(0)

array([4., 5., 6.])

arr.mean(axis = 1)

array([2., 5., 8.])

arr.sum(axis = 0)

array([12, 15, 18])

# Cumulative sum - starting from zero as sum
arr.cumsum(axis = 0)

array([[ 1,  2,  3],
       [ 5,  7,  9],
       [12, 15, 18]], dtype=int32)

Basic array statistical methods¶

# cumulative product - starting from one as product
arr.cumprod(axis = 1)

array([[  1,   2,   6],
       [  4,  20, 120],
       [  7,  56, 504]], dtype=int32)

sum
mean
std, var
min, max
argmin, argmax (Indices of minimum and maximum elements, respectively. By default, the index is for the flattened array)
cumsum
cumprod

arr.min(axis = 0)
#arr

array([1, 2, 3])

# max index for flattened array
arr.argmax()

8

Methods for Boolean Arrays¶

boolean values are coerced to 1 (True) and 0 (False).

arr = np.random.randn(10)
arr

array([ 0.16555383,  0.59795282,  0.00255714, -0.18289366, -0.59485528,
        1.55983561, -0.83679902,  0.50554529,  0.48152866,  1.74469548])

arr > 0

array([ True,  True,  True, False, False,  True, False,  True,  True,
        True])

(arr > 0).sum()

7

# any() method retrun True if any element is True
bool = np.array([False, False, True, False])
bool.any()

True

# all() method return True if all elements are True
bool.all()

False

Sorting¶

arr = np.random.randn(10)
arr

array([-1.25449998,  0.65584279, -0.44780096,  0.2871527 , -0.18042682,
        0.78583569,  0.49849835, -0.12863463, -0.23158023, -0.54440885])

arr.sort()
arr

array([-1.25449998, -0.54440885, -0.44780096, -0.23158023, -0.18042682,
       -0.12863463,  0.2871527 ,  0.49849835,  0.65584279,  0.78583569])

arr = np.random.randn(3, 4)
arr

array([[ 0.48463237, -0.08756859, -0.33046087,  0.85640524],
       [ 1.01822106, -0.39207202, -0.24327351, -0.49075468],
       [-2.21382404,  1.75957395, -0.47186965,  2.27107916]])

arr.sort(axis = 0)
arr

array([[-2.21382404, -0.39207202, -0.47186965, -0.49075468],
       [ 0.48463237, -0.08756859, -0.33046087,  0.85640524],
       [ 1.01822106,  1.75957395, -0.24327351,  2.27107916]])

arr.sort(axis = 1)
arr

array([[-2.21382404, -0.49075468, -0.47186965, -0.39207202],
       [-0.33046087, -0.08756859,  0.48463237,  0.85640524],
       [-0.24327351,  1.01822106,  1.75957395,  2.27107916]])

# finding 5% quantile
large_array = np.random.randn(1000)
large_array.sort()
large_array[int(0.05 * len(large_array))]

-1.7346778251928088

Unique and Other Set Logic¶

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

array([1, 2, 3, 4])

# putting in set to remove the duplicates
sorted(set(names))
#names.sort()
#names

['Bob', 'Joe', 'Will']

# compute a boolean array indicating whether each element of x is contained in y
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])

array([ True, False, False,  True,  True, False,  True])

# compute the sorted union of element
np.union1d(values, [200, 100])

array([  0,   2,   3,   5,   6, 100, 200])

# compute the sorted , common elements
np.intersect1d(values, [3, 2])

array([2, 3])

# set differencce, elements in first set but not the second one
np.setdiff1d(values, [0, 6, 10])

array([2, 3, 5])

# set symmetric difference, elements that are in either of the arrays but not both
np.setxor1d(values, [0, 6, 10])

array([ 2,  3,  5, 10])

Linear Algebra¶

# example of matrix multiplication
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])
np.dot(x, y)

array([[ 28,  64],
       [ 67, 181]])

# the same
x.dot(y)

array([[ 28,  64],
       [ 67, 181]])

# numpy.linalg has standard set of matrix decomposition and things like inverse and determinant
from numpy.linalg import inv, qr
X = np.random.randn(5,5)
# T for transpose
mat = X.T.dot(X)
# inverse of teh matrix
inv(mat)

array([[ 1.19602384, -0.43339575, -0.7840773 , -1.04325319, -1.35922149],
       [-0.43339575,  0.42768034,  0.43260469,  0.47742356,  0.45555962],
       [-0.7840773 ,  0.43260469,  0.94646622,  0.86541588,  1.11106882],
       [-1.04325319,  0.47742356,  0.86541588,  1.23113086,  1.27966059],
       [-1.35922149,  0.45555962,  1.11106882,  1.27966059,  2.33246229]])

# It should give you Identity matrix
mat.dot(inv(mat))

array([[ 1.00000000e+00,  1.64524375e-16, -7.08356149e-17,
         4.25098208e-16, -3.40210593e-16],
       [-2.61326446e-16,  1.00000000e+00,  4.36098464e-16,
         2.24063507e-16,  2.99000431e-16],
       [ 2.66672903e-16, -9.98307792e-17,  1.00000000e+00,
        -7.68950680e-16, -5.31347236e-16],
       [ 5.36413813e-16,  1.93103132e-16,  1.28670399e-17,
         1.00000000e+00,  4.09378604e-17],
       [ 1.00034662e-16,  5.19666045e-17, -8.55730682e-17,
         5.24496102e-16,  1.00000000e+00]])

Commonly-used numpy.linalg functions¶

diag (return the diagonal of the matrix)
dot (multiplication)
trace (main diagonal sum)
det (determinent)
eig (AV = EV)
inv (inverse)
qr (QR decomposition)
svd (singular value decomposition)
solve (solve Ax = b for x, where A is a square matrix)

mat.trace()

19.43134780284728

Random Number Generation¶

# np.random supplements the built-in Python random with functions for efficiency
# for example a 4 by 4 array of samples from standard normal distribution
samples = np.random.normal(size=(4,4))
samples

array([[ 0.18101741, -0.0609608 , -1.19150952,  0.2243375 ],
       [ 1.38680011,  2.52575841,  0.22857496, -1.70951622],
       [-0.41593192, -0.55278893,  0.22820945, -0.3270344 ],
       [-1.76895857,  1.68049515, -0.64992314, -0.25926266]])

List vs Array in Python¶

Arrays and lists are both used in Python to store data, but they don't serve exactly the same purposes. They both can be used to store any data type (real numbers, strings, etc), and they both can be indexed and iterated through, but the similarities between the two don't go much further. The main difference between a list and an array is the functions that you can perform to them.

Another difference between an array and a list is that array elements are of the same data type, vs. list elements can have different data types.

Some of numpy.random functions¶

seed     (seed the random number generator)
permutation     (return a random permutation)
shuffle     (randomly permute a sequence in place)
rand     (draw samples from a uniform distribution)
randint     (draw random integers from a given low-to-high range)
randn     (draw samples from a normal distribution with mean 0 and standard deviation 1)
binominal     (draw samples from binominal distribution)
normal     (draw samples from normal (Gaussian) distribution)
beta     (draw samples from beta distribution)
chisquare     (draw samples from a chi-square distribution)
gamma     (draw samples from gamma distribution)
uniform     (draw samples from a uniform [0, 1) distribution)

HenryBernreuter.com v1.0

Numpy in One Post

NumPy Basics: Arrays and Vectorized Computation¶

The NumPy ndarray: A Multidimensional Array Object¶

Creating an Array¶

Data Types for ndarrays¶

array types:
¶

Operation between Arrays and Scalars¶

Basic Indexing and Slicing¶

Indexing with Slices¶

Boolean Indexing¶

Note: Selecting data from an array by boolean indexing always create a copy of the data¶

Note: keywords and/or do not work with boolean arrays¶

Fancy Indexing¶

Note: Fancy indexing, unlike slicing always copies the data into a new array¶

Transporting Arrays and Swapping Axes¶

Universal Functions: Fast Element-wise Array Functions¶

Some unary ufuncs (Please refer to PyNum documentation for the explanation of each)¶

Some binary ufuncs (Please refer to NumPy documentation for the explanation of each)¶

Data Processing Using Arrays¶

Expressing Conditional Logic as Array Operations¶

Mathematical and Statistical Methods¶

Basic array statistical methods¶

Methods for Boolean Arrays¶

Sorting¶

Unique and Other Set Logic¶

Linear Algebra¶

Commonly-used numpy.linalg functions¶

Random Number Generation¶

List vs Array in Python¶

Some of numpy.random functions¶

Cool Data Visualizations in Python (code Included)

Numpy in One Post

NumPy Basics: Arrays and Vectorized Computation¶

The NumPy ndarray: A Multidimensional Array Object¶

Creating an Array¶

Data Types for ndarrays¶

array types:¶

Operation between Arrays and Scalars¶

Basic Indexing and Slicing¶

Indexing with Slices¶

Boolean Indexing¶

Note: Selecting data from an array by boolean indexing always create a copy of the data¶

Note: keywords and/or do not work with boolean arrays¶

Fancy Indexing¶

Note: Fancy indexing, unlike slicing always copies the data into a new array¶

Transporting Arrays and Swapping Axes¶

Universal Functions: Fast Element-wise Array Functions¶

Some unary ufuncs (Please refer to PyNum documentation for the explanation of each)¶

Some binary ufuncs (Please refer to NumPy documentation for the explanation of each)¶

Data Processing Using Arrays¶

Expressing Conditional Logic as Array Operations¶

Mathematical and Statistical Methods¶

Basic array statistical methods¶

Methods for Boolean Arrays¶

Sorting¶

Unique and Other Set Logic¶

Linear Algebra¶

Commonly-used numpy.linalg functions¶

Random Number Generation¶

List vs Array in Python¶

Some of numpy.random functions¶

Cool Data Visualizations in Python (code Included)

Don't go yet!

R in One Post

Cool Data Visualizations in Python (code Included)

Like this page? Consider buying me a cup of coffee or sharing it?

array types:
¶