Numpy in One Post

Numpy

Featured image numpy_tutorial

NumPy Basics: Arrays and Vectorized Computation

NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.

Here what it provides:

1- ndarray. a fast and space-efficient multidimensional array.
2- Standard mathematical functions for fast operations on entire arrays of data without having to write loops.
3- Tools for reading / writing array data to disk and working with memory-mapped files.
4- Linear algebra, ranadom number generation and Fourier transform capabilities.
5- Tools for intergating code written in C/C++ and Fortran.

The NumPy ndarray: A Multidimensional Array Object

One of the key feautures of NumPy is its N-dimensional array object, or ndarray which is fast, flexible container for large data sets in Python.

Creating an Array

In [2]:
# need to import the numpy library
import numpy as np
In [3]:
# one dimensional array
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1
Out[3]:
array([6. , 7.5, 8. , 0. , 1. ])
In [4]:
# two dimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2
Out[4]:
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
In [5]:
# dimension of the array
arr2.ndim
Out[5]:
2
In [6]:
# shape of the array
#type(arr2.shape)
arr2.shape
Out[6]:
(2, 4)
In [7]:
# data type of the array
arr1.dtype
Out[7]:
dtype('float64')
In [8]:
# size of the array
arr2.size
Out[8]:
8
In [9]:
# number of rows
len(arr2)
#arr2
Out[9]:
2
In [10]:
# number of columns
# refer to this after reading about slicing
len(arr2[0,:])
Out[10]:
4
In [11]:
# create one dimensional array and all zero
np.zeros(10)
Out[11]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [12]:
# create one dimensional array and all ones
np.ones(5)
Out[12]:
array([1., 1., 1., 1., 1.])
In [13]:
# create two dimensional array and all zero
np.zeros((3,5))
Out[13]:
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
In [14]:
# similar to range but create one dimensional array
np.arange(10)
Out[14]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [15]:
arr2
Out[15]:
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
In [16]:
# create an array similar to arr2 shape and all ones
arr3 = np.ones_like(arr2)
arr3
Out[16]:
array([[1, 1, 1, 1],
       [1, 1, 1, 1]])
In [17]:
# create an array similar to arr2 shape and all zeros
arr4 = np.zeros_like(arr2)
In [18]:
# create empty array (allocating new memory so values might be garbage)
arr5 = np.empty((3, 4))
arr5
Out[18]:
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
In [19]:
# creates an empty array similar shape of arr2
arr6 = np.empty_like(arr2)
arr6
Out[19]:
array([[0, 0, 0, 0],
       [0, 0, 0, 0]])
In [20]:
# create n x n identity matrix
arr7 = np.identity(5)
arr7
Out[20]:
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])
In [21]:
# create n x n identity matrix
arr8 = np.eye(3)
arr8
Out[21]:
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Data Types for ndarrays

In [22]:
arr1 = np.array([1,2,3])
arr1.dtype
Out[22]:
dtype('int32')
In [23]:
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype
Out[23]:
dtype('int32')

array types:

int8, uint8
int16, uint16
int32, uint32
int64, uint64
float16
float32
float64
float128
complex64, complex128
complex256
bool
object
string
unicode

In [24]:
arr = np.array([1, 2, 3])
arr.dtype
Out[24]:
dtype('int32')
In [25]:
float_arr = arr.astype(np.float64)
float_arr
Out[25]:
array([1., 2., 3.])
In [26]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr.astype(np.int32)
Out[26]:
array([ 3, -1, -2,  0, 12, 10])
In [27]:
# you can drop the dtype and get same result
numeric_strings = np.array(['1.2', '3.4', '5.6'], dtype=np.string_)
numeric_strings.astype(np.float64)
Out[27]:
array([1.2, 3.4, 5.6])

Operation between Arrays and Scalars

In [28]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr
Out[28]:
array([[1, 2, 3],
       [4, 5, 6]])
In [29]:
arr * arr
Out[29]:
array([[ 1,  4,  9],
       [16, 25, 36]])
In [30]:
arr + arr
Out[30]:
array([[ 2,  4,  6],
       [ 8, 10, 12]])
In [31]:
arr - arr
Out[31]:
array([[0, 0, 0],
       [0, 0, 0]])
In [32]:
1.0 / arr
Out[32]:
array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])
In [33]:
arr ** 2
Out[33]:
array([[ 1,  4,  9],
       [16, 25, 36]], dtype=int32)

Basic Indexing and Slicing

In [34]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr
Out[34]:
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [35]:
arr[5]
Out[35]:
6
In [36]:
arr[5:8]
Out[36]:
array([6, 7, 8])
In [37]:
arr[5:8] = 12
arr
Out[37]:
array([ 1,  2,  3,  4,  5, 12, 12, 12,  9])
In [38]:
# IMPORTANT: slices are views of orignal array, so change to view affects original one
arr_slice = arr[5:8]
arr_slice[1] = 1000
arr
Out[38]:
array([   1,    2,    3,    4,    5,   12, 1000,   12,    9])
In [39]:
arr_slice[:] = 64
arr
Out[39]:
array([ 1,  2,  3,  4,  5, 64, 64, 64,  9])
In [40]:
# this is how you create new array not the view of the original array
arr_new = np.array(arr[5:8])
arr[6] = 200
# no side effect on arr_new
arr_new
Out[40]:
array([64, 64, 64])
In [41]:
# or you can use
arr_new = arr[5:8].copy()
arr_new
Out[41]:
array([ 64, 200,  64])
In [42]:
# some examples for higher dimensional arrays
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]
Out[42]:
array([7, 8, 9])
In [43]:
arr2d[2][2]
Out[43]:
9
In [44]:
# or you can
arr2d[2, 2]
Out[44]:
9
In [45]:
# examples for 3D arrays
arr3d = np.array([[[1, 2, 3], [3, 4, 5]], [[6, 7, 8], [9, 10 , 11]]])
arr3d              
Out[45]:
array([[[ 1,  2,  3],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])
In [46]:
# imagine every index that you use, you get into one bracket 
# this below generates a 2 x 3 array
arr3d[0]
Out[46]:
array([[1, 2, 3],
       [3, 4, 5]])
In [47]:
arr3d[0][1]
Out[47]:
array([3, 4, 5])
In [48]:
arr3d[0][1][2]
Out[48]:
5
In [49]:
# or you can type
arr3d[0, 1, 2]
Out[49]:
5
In [50]:
# some more operations
# again, you need copy so you dont generate a view
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d
Out[50]:
array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])
In [51]:
arr3d[0] = old_values
arr3d
Out[51]:
array([[[ 1,  2,  3],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

Indexing with Slices

In [52]:
arr2d
Out[52]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [53]:
arr2d[:2]
Out[53]:
array([[1, 2, 3],
       [4, 5, 6]])
In [54]:
arr2d[:2, 1:]
Out[54]:
array([[2, 3],
       [5, 6]])
In [55]:
arr2d[1, :2]
Out[55]:
array([4, 5])
In [56]:
arr2d[2, :1]
Out[56]:
array([7])
In [57]:
arr2d[:, :1]
Out[57]:
array([[1],
       [4],
       [7]])
In [58]:
arr2d[:2, 1:] = 1000
arr2d
Out[58]:
array([[   1, 1000, 1000],
       [   4, 1000, 1000],
       [   7,    8,    9]])

Boolean Indexing

In [59]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
# random number of normal distribution [-1, 1]
data = np.random.randn(7, 4)
data
Out[59]:
array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])
In [60]:
names.shape
Out[60]:
(7,)
In [61]:
data
#names
Out[61]:
array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])
In [62]:
names == 'Bob'
Out[62]:
array([ True, False, False,  True, False, False, False])
In [63]:
# matches the row with above True-False and picks only the True ones
data[names == 'Bob', 2:]
Out[63]:
array([[ 1.18939086, -0.56486144],
       [-1.13202316,  1.78486746]])
In [64]:
data[names == 'Bob', 3]
Out[64]:
array([-0.56486144,  1.78486746])
In [65]:
# To select everything but Bob
names != 'Bob'
Out[65]:
array([False,  True,  True, False,  True,  True,  True])
In [66]:
# or you can use ~
data[~(names == 'Bob')]
Out[66]:
array([[-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])

Note: Selecting data from an array by boolean indexing always create a copy of the data

In [67]:
# you can use & and | for boolean expressions
mask = (names == 'Bob') | (names == 'Will')
mask
Out[67]:
array([ True, False,  True,  True,  True, False, False])
In [68]:
data[mask]
Out[68]:
array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059]])

Note: keywords and/or do not work with boolean arrays

In [69]:
data
Out[69]:
array([[ 1.37879732, -0.65073545,  1.18939086, -0.56486144],
       [-1.11065484,  0.83257863, -0.37990433, -0.75005754],
       [ 0.26552364,  0.38933028,  0.19881786, -0.28672083],
       [-1.25284176,  2.4892188 , -1.13202316,  1.78486746],
       [-1.67654273,  0.62826899, -0.75880761, -0.52511059],
       [ 1.23605999, -1.04208615,  0.72049124,  0.11816467],
       [-0.94977664, -0.26821673, -0.70636882, -1.22471888]])
In [70]:
# setting all negative values in array daat to zero
data[data < 0] = 0
data
Out[70]:
array([[1.37879732, 0.        , 1.18939086, 0.        ],
       [0.        , 0.83257863, 0.        , 0.        ],
       [0.26552364, 0.38933028, 0.19881786, 0.        ],
       [0.        , 2.4892188 , 0.        , 1.78486746],
       [0.        , 0.62826899, 0.        , 0.        ],
       [1.23605999, 0.        , 0.72049124, 0.11816467],
       [0.        , 0.        , 0.        , 0.        ]])

Fancy Indexing

In [71]:
arr = np.zeros((8, 4))
for i in range(len(arr)):
    arr[i] = i
arr
Out[71]:
array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])
In [72]:
# fancy indexing
# picks complete row of each element of the list
arr[[4, 3, 0, 6]]
Out[72]:
array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])
In [73]:
# array length - 1 is the last row
arr[[-3, -5, -7]]
Out[73]:
array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])
In [74]:
# reshape being introduced here
arr = np.arange(32).reshape((8, 4))
arr
Out[74]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])
In [75]:
arr
Out[75]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])
In [76]:
# another fancy indexing
# intersection of rows and columns in order
arr[[1, 5, 7, 2], [0, 3, 1, 2]]
Out[76]:
array([ 4, 23, 29, 10])

Note: Fancy indexing, unlike slicing always copies the data into a new array

Transporting Arrays and Swapping Axes

In [77]:
arr = np.arange(15).reshape((3, 5))
arr
Out[77]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
In [78]:
# transpose of an array which is a view of the array
arr.T
Out[78]:
array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])
In [79]:
arr = np.array([[1, 2], [3, 4]])
arr
Out[79]:
array([[1, 2],
       [3, 4]])
In [80]:
# matrix multiplication
arr.dot(arr)
Out[80]:
array([[ 7, 10],
       [15, 22]])
In [81]:
# or you can type
np.dot(arr, arr)
Out[81]:
array([[ 7, 10],
       [15, 22]])
In [82]:
arr = np.random.randn(6, 3)
np.dot(arr.T, arr)
Out[82]:
array([[ 6.01986433, -1.83225009,  1.13710425],
       [-1.83225009,  3.26559867, -1.68900134],
       [ 1.13710425, -1.68900134,  3.46863104]])
In [83]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr
Out[83]:
array([[1, 2, 3],
       [4, 5, 6]])
In [84]:
# transpose permutes the axes. It axes start from 0, 1 ... depending to dimension of the array
# following means transpose the rows and columns
arr.transpose(1,0)
Out[84]:
array([[1, 4],
       [2, 5],
       [3, 6]])
In [85]:
arr = np.arange(16).reshape((2, 2, 4))
arr
Out[85]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])
In [86]:
# following means keep the last index intact but change the first index with second one
# to understand what is happening use Aijk and play with keeping k as before but changing i and j
arr.transpose(1, 0, 2)
Out[86]:
array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])
In [87]:
# swap axes works like transpose but gets a pair of axes to swap
arr
Out[87]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])
In [88]:
arr.swapaxes(1,2)
Out[88]:
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

Universal Functions: Fast Element-wise Array Functions

A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [89]:
arr = np.arange(10)
# unary universal function of sqrt
np.sqrt(arr)
Out[89]:
array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])
In [90]:
# unary universal function of exponent
np.exp(arr)
Out[90]:
array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])
In [91]:
x = np.random.randn(8)
y = np.random.randn(8)
x
Out[91]:
array([-0.03736112, -0.60744449, -0.33559193, -0.51274497, -0.61488508,
       -0.16981259, -0.28471117, -0.46250942])
In [92]:
y
Out[92]:
array([-0.19247363,  2.00505277,  0.23527698, -0.90768257, -0.16235317,
       -0.08872467, -0.40754121, -0.24138504])
In [93]:
# binary universal function of maximum (compares element by element in order)
np.maximum(x, y)
Out[93]:
array([-0.03736112,  2.00505277,  0.23527698, -0.51274497, -0.16235317,
       -0.08872467, -0.28471117, -0.24138504])
In [94]:
arr = np.random.randn(8)
# modf returns two array as a tuple, one is fractional and one integral part of numbers
np.modf(arr)
Out[94]:
(array([ 0.85664649,  0.59811592, -0.93778945, -0.23931799, -0.64456316,
        -0.38911459,  0.46648843,  0.62282601]),
 array([ 0.,  0., -0., -1., -0., -0.,  0.,  0.]))

Some unary ufuncs (Please refer to PyNum documentation for the explanation of each)

abs, fabs
sqrt
square
exp
log, log10, log2, log1p
sign
ceil
floor
rint
modf
isnan
isfinite, isinf
cos, cosh, sin, sinh
tan, tanh
arccos,arccosh, arcsin
arcsinh, arctan, arctanh
logical_not

Some binary ufuncs (Please refer to NumPy documentation for the explanation of each)

add
subtract
multiply
divide, floor_divide
power
maximum, fmax
minimum, fmin
mod
copysign
greater, greater_equal
less, less_equal, equal
not_equal
logical_and
logical_or
logical_xor

Data Processing Using Arrays

Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents.

In [95]:
# lets say you want to calculate the function sqrt(x^2 + y^2) across a reqular grid of values.
# np.meshgrid function takes two 1D array and produces two 2D, look at following example and see how
points = np.arange(0, 10, 2)
points
Out[95]:
array([0, 2, 4, 6, 8])
In [96]:
xs, ys = np.meshgrid(points, points)
xs
Out[96]:
array([[0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8]])
In [97]:
ys
Out[97]:
array([[0, 0, 0, 0, 0],
       [2, 2, 2, 2, 2],
       [4, 4, 4, 4, 4],
       [6, 6, 6, 6, 6],
       [8, 8, 8, 8, 8]])
In [98]:
z= np.sqrt(xs ** 2 + ys ** 2)
z
Out[98]:
array([[ 0.        ,  2.        ,  4.        ,  6.        ,  8.        ],
       [ 2.        ,  2.82842712,  4.47213595,  6.32455532,  8.24621125],
       [ 4.        ,  4.47213595,  5.65685425,  7.21110255,  8.94427191],
       [ 6.        ,  6.32455532,  7.21110255,  8.48528137, 10.        ],
       [ 8.        ,  8.24621125,  8.94427191, 10.        , 11.3137085 ]])

Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition else y

In [99]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

cond = np.array([True, False, True, True, False])
#zip() is built in Python function and makes an iterator that aggregates elements from each of the iterables.
result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
result
Out[99]:
[1.1, 2.2, 1.3, 1.4, 2.5]

This has multiple problems.
First, it will not be very fast for large arrays. (Pure Python)
Second, it will not works with multidimensional arrays.
With np.where you can write:

In [100]:
result = np.where(cond, xarr, yarr)
result
Out[100]:
array([1.1, 2.2, 1.3, 1.4, 2.5])
In [101]:
# The second or third arguments of where function; one or both of them can be scalars.
arr = np.random.randn(4,4)
arr
Out[101]:
array([[ 1.05846636, -2.18934139,  0.69033616, -0.42188738],
       [ 1.6349166 , -0.79310744,  0.37484735, -1.69703955],
       [ 1.06596908, -0.43937802,  0.53081635, -1.62868329],
       [ 0.38555556,  0.10910263, -0.94933816, -0.98044428]])
In [102]:
# we want to replace all positive values with 2 and all negative values with -2
np.where(arr > 0, 2, -2)
Out[102]:
array([[ 2, -2,  2, -2],
       [ 2, -2,  2, -2],
       [ 2, -2,  2, -2],
       [ 2,  2, -2, -2]])
In [103]:
# or setting only positive values to 2
np.where(arr > 0, 2, arr)
Out[103]:
array([[ 2.        , -2.18934139,  2.        , -0.42188738],
       [ 2.        , -0.79310744,  2.        , -1.69703955],
       [ 2.        , -0.43937802,  2.        , -1.62868329],
       [ 2.        ,  2.        , -0.94933816, -0.98044428]])
In [104]:
''' 
Consider following example where we have two boolean arrays, cond1 and cond2 and wish to assign
a different value for each of he 4 possible pairs of boolean values.
Pure Pythin:
'''
cond1 = np.array([True, True, False, False])
cond2 = np.array([True, False, True, False])

result = []
for i in range(len(cond1)):
    if cond1[i] and cond2[i]:
        result.append(0)
    elif cond1[i]:
        result.append(1)
    elif cond2[i]:
        result.append(2)
    else:
        result.append(3)
result       
Out[104]:
[0, 1, 2, 3]
In [105]:
# smart way of using np.where
np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2, 3)))
Out[105]:
array([0, 1, 2, 3])
In [106]:
# values of zero treated as False and non-zero True in Python
# so we can re-write previous code as:
result = 1 * (cond1 & ~cond2) + 2 * (~cond1 * cond2) + 3 * (~cond1 * ~cond2)
result
Out[106]:
array([0, 1, 2, 3])

Mathematical and Statistical Methods

In [107]:
arr = np.random.randn(5, 4)
arr
Out[107]:
array([[-0.18252157, -0.63149364,  0.66817973,  0.65735378],
       [ 0.76601508,  0.2255208 ,  2.22401099,  0.04788006],
       [-0.42680459,  0.53485304,  0.37781218, -0.01701542],
       [ 0.61577001, -1.44680833, -1.02141823, -0.76900976],
       [ 0.87232208,  1.19893413, -1.31132677, -0.49392435]])
In [108]:
arr.mean()
Out[108]:
0.09441646134989254
In [109]:
arr.sum()
Out[109]:
1.8883292269978509
In [110]:
arr.std()
Out[110]:
0.8831429955637358
In [111]:
arr = np.array([[1, 2, 3],[4, 5, 6], [7, 8, 9]])
arr
Out[111]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [112]:
# mean on axis - 0 is column and 1 is row for two dimension array
arr.mean(0)
Out[112]:
array([4., 5., 6.])
In [113]:
arr.mean(axis = 1)
Out[113]:
array([2., 5., 8.])
In [114]:
arr.sum(axis = 0)
Out[114]:
array([12, 15, 18])
In [115]:
# Cumulative sum - starting from zero as sum
arr.cumsum(axis = 0)
Out[115]:
array([[ 1,  2,  3],
       [ 5,  7,  9],
       [12, 15, 18]], dtype=int32)

Basic array statistical methods

In [116]:
# cumulative product - starting from one as product
arr.cumprod(axis = 1)
Out[116]:
array([[  1,   2,   6],
       [  4,  20, 120],
       [  7,  56, 504]], dtype=int32)

sum
mean
std, var
min, max
argmin, argmax (Indices of minimum and maximum elements, respectively. By default, the index is for the flattened array)
cumsum
cumprod

In [117]:
arr.min(axis = 0)
#arr
Out[117]:
array([1, 2, 3])
In [118]:
# max index for flattened array
arr.argmax()
Out[118]:
8

Methods for Boolean Arrays

boolean values are coerced to 1 (True) and 0 (False).

In [119]:
arr = np.random.randn(10)
arr
Out[119]:
array([ 0.16555383,  0.59795282,  0.00255714, -0.18289366, -0.59485528,
        1.55983561, -0.83679902,  0.50554529,  0.48152866,  1.74469548])
In [120]:
arr > 0
Out[120]:
array([ True,  True,  True, False, False,  True, False,  True,  True,
        True])
In [121]:
(arr > 0).sum()
Out[121]:
7
In [122]:
# any() method retrun True if any element is True
bool = np.array([False, False, True, False])
bool.any()
Out[122]:
True
In [123]:
# all() method return True if all elements are True
bool.all()
Out[123]:
False

Sorting

In [124]:
arr = np.random.randn(10)
arr
Out[124]:
array([-1.25449998,  0.65584279, -0.44780096,  0.2871527 , -0.18042682,
        0.78583569,  0.49849835, -0.12863463, -0.23158023, -0.54440885])
In [125]:
arr.sort()
arr
Out[125]:
array([-1.25449998, -0.54440885, -0.44780096, -0.23158023, -0.18042682,
       -0.12863463,  0.2871527 ,  0.49849835,  0.65584279,  0.78583569])
In [126]:
arr = np.random.randn(3, 4)
arr
Out[126]:
array([[ 0.48463237, -0.08756859, -0.33046087,  0.85640524],
       [ 1.01822106, -0.39207202, -0.24327351, -0.49075468],
       [-2.21382404,  1.75957395, -0.47186965,  2.27107916]])
In [127]:
arr.sort(axis = 0)
arr
Out[127]:
array([[-2.21382404, -0.39207202, -0.47186965, -0.49075468],
       [ 0.48463237, -0.08756859, -0.33046087,  0.85640524],
       [ 1.01822106,  1.75957395, -0.24327351,  2.27107916]])
In [128]:
arr.sort(axis = 1)
arr
Out[128]:
array([[-2.21382404, -0.49075468, -0.47186965, -0.39207202],
       [-0.33046087, -0.08756859,  0.48463237,  0.85640524],
       [-0.24327351,  1.01822106,  1.75957395,  2.27107916]])
In [129]:
# finding 5% quantile
large_array = np.random.randn(1000)
large_array.sort()
large_array[int(0.05 * len(large_array))]
Out[129]:
-1.7346778251928088

Unique and Other Set Logic

In [130]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)
Out[130]:
array(['Bob', 'Joe', 'Will'], dtype='<U4')
In [131]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)
Out[131]:
array([1, 2, 3, 4])
In [132]:
# putting in set to remove the duplicates
sorted(set(names))
#names.sort()
#names
Out[132]:
['Bob', 'Joe', 'Will']
In [133]:
# compute a boolean array indicating whether each element of x is contained in y
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6]) 
Out[133]:
array([ True, False, False,  True,  True, False,  True])
In [134]:
# compute the sorted union of element
np.union1d(values, [200, 100])
Out[134]:
array([  0,   2,   3,   5,   6, 100, 200])
In [135]:
# compute the sorted , common elements
np.intersect1d(values, [3, 2])
Out[135]:
array([2, 3])
In [136]:
# set differencce, elements in first set but not the second one
np.setdiff1d(values, [0, 6, 10])
Out[136]:
array([2, 3, 5])
In [137]:
# set symmetric difference, elements that are in either of the arrays but not both
np.setxor1d(values, [0, 6, 10])
Out[137]:
array([ 2,  3,  5, 10])

Linear Algebra

In [142]:
# example of matrix multiplication
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])
np.dot(x, y)
Out[142]:
array([[ 28,  64],
       [ 67, 181]])
In [143]:
# the same
x.dot(y)
Out[143]:
array([[ 28,  64],
       [ 67, 181]])
In [144]:
# numpy.linalg has standard set of matrix decomposition and things like inverse and determinant
from numpy.linalg import inv, qr
X = np.random.randn(5,5)
# T for transpose
mat = X.T.dot(X)
# inverse of teh matrix
inv(mat)
Out[144]:
array([[ 1.19602384, -0.43339575, -0.7840773 , -1.04325319, -1.35922149],
       [-0.43339575,  0.42768034,  0.43260469,  0.47742356,  0.45555962],
       [-0.7840773 ,  0.43260469,  0.94646622,  0.86541588,  1.11106882],
       [-1.04325319,  0.47742356,  0.86541588,  1.23113086,  1.27966059],
       [-1.35922149,  0.45555962,  1.11106882,  1.27966059,  2.33246229]])
In [145]:
# It should give you Identity matrix
mat.dot(inv(mat))
Out[145]:
array([[ 1.00000000e+00,  1.64524375e-16, -7.08356149e-17,
         4.25098208e-16, -3.40210593e-16],
       [-2.61326446e-16,  1.00000000e+00,  4.36098464e-16,
         2.24063507e-16,  2.99000431e-16],
       [ 2.66672903e-16, -9.98307792e-17,  1.00000000e+00,
        -7.68950680e-16, -5.31347236e-16],
       [ 5.36413813e-16,  1.93103132e-16,  1.28670399e-17,
         1.00000000e+00,  4.09378604e-17],
       [ 1.00034662e-16,  5.19666045e-17, -8.55730682e-17,
         5.24496102e-16,  1.00000000e+00]])

Commonly-used numpy.linalg functions

diag (return the diagonal of the matrix)
dot (multiplication)
trace (main diagonal sum)
det (determinent)
eig (AV = EV)
inv (inverse)
qr (QR decomposition)
svd (singular value decomposition)
solve (solve Ax = b for x, where A is a square matrix)

In [146]:
mat.trace()
Out[146]:
19.43134780284728

Random Number Generation

In [147]:
# np.random supplements the built-in Python random with functions for efficiency
# for example a 4 by 4 array of samples from standard normal distribution
samples = np.random.normal(size=(4,4))
samples
Out[147]:
array([[ 0.18101741, -0.0609608 , -1.19150952,  0.2243375 ],
       [ 1.38680011,  2.52575841,  0.22857496, -1.70951622],
       [-0.41593192, -0.55278893,  0.22820945, -0.3270344 ],
       [-1.76895857,  1.68049515, -0.64992314, -0.25926266]])

List vs Array in Python

Arrays and lists are both used in Python to store data, but they don't serve exactly the same purposes. They both can be used to store any data type (real numbers, strings, etc), and they both can be indexed and iterated through, but the similarities between the two don't go much further. The main difference between a list and an array is the functions that you can perform to them.

Another difference between an array and a list is that array elements are of the same data type, vs. list elements can have different data types.

Some of numpy.random functions

seed     (seed the random number generator)
permutation     (return a random permutation)
shuffle     (randomly permute a sequence in place)
rand     (draw samples from a uniform distribution)
randint     (draw random integers from a given low-to-high range)
randn     (draw samples from a normal distribution with mean 0 and standard deviation 1)
binominal     (draw samples from binominal distribution)
normal     (draw samples from normal (Gaussian) distribution)
beta     (draw samples from beta distribution)
chisquare     (draw samples from a chi-square distribution)
gamma     (draw samples from gamma distribution)
uniform     (draw samples from a uniform [0, 1) distribution)