numpy and pandas CheatSheet

Python
Author

Quasar

Published

March 6, 2025

np.arange(start,stop,step)

np.arange(start, stop, step) returns evenly spaced values in a given interval.

import numpy as np

np.arange(0.0, 1.1, 0.1)
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

np.zeros(shape)

np.zeros(shape=(3,3))
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

np.zeros_like

x = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9,10,11,12]
])

np.zeros_like(x)
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

np.ones(shape)

import numpy as np

# The matrix of all ones of size 3 x 3
np.ones(shape=(3,3))
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

np.eye(N_rows,M_cols)

import numpy as np

# Identity matrix of size 3 x 3
np.eye(3,3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

np.any(array_like, axis, keepdims)

Tests whether any array element along a given axis evaluates to True.

import numpy as np

np.any([[True, False], [True, True]])
np.True_
np.any([[True, False], [True, True]], axis=0)
array([ True,  True])
np.any([[True, False], [True, False]], axis=0)
array([ True, False])
np.any([[True, False], [True, False]], axis=1)
array([ True,  True])
np.any([[True, False], [True, False]], axis=1, keepdims=True)
array([[ True],
       [ True]])

np.all(array_like, axis, keepdims)

import numpy as np

np.all([[True, False], [True, True]])
np.False_
np.all([[True, False], [True, True]], axis=0)
array([ True, False])
np.all([[True, False], [True, False]], axis=1)
array([False, False])

np.tile(array, reps)

Constructs an array by repeating the array reps number of times.

import numpy as np

a = np.array([0, 1, 2])
np.tile(a, 2)
array([0, 1, 2, 0, 1, 2])
import numpy as np

a = np.array([0, 1, 2])
np.tile(a, (2, 2))
array([[0, 1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1, 2]])

np.repeat(array, repeats, axis)

Repeats each element of an array after themselves.

np.repeat(3,4)
array([3, 3, 3, 3])
x = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])

np.repeat(x, repeats=2,axis=0)
array([[1, 2],
       [1, 2],
       [3, 4],
       [3, 4],
       [5, 6],
       [5, 6]])
np.repeat(x, repeats = 2, axis=1)
array([[1, 1, 2, 2],
       [3, 3, 4, 4],
       [5, 5, 6, 6]])

Broadcasting

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is broadcast across the larger array, so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C, instead of Python.

For example, let \(\mathbf{x}=[x_0, x_1, \ldots, x_{n-1}]\) be a column vector and let \(k\) be a scalar.

The scalar multiplication \(\mathbf{y} = k \mathbf{x}\) multiplies each element \(x_0, x_1, x_2, \ldots, x_{n-1}\) by \(k\).

We can think of the scalar \(k\) as being stretched during the arithmetic operation into a vector with the same length as \(\mathbf{x}\). The stretching analogy is only conceptual. NumPy is smart enough to use the original scalar value without actually making copies.

np.where(condition, x, y)

For each element \(x\) in the array, if the array-element satisfies the condition, then x values are returned, else y values are returned.

import numpy as np

x = np.arange(10)
x > 5   # this returns a filter mask - an array of booleans
array([False, False, False, False, False, False,  True,  True,  True,
        True])
x[x > 5]
array([6, 7, 8, 9])
np.where(x > 5, x**2, x)
array([ 0,  1,  2,  3,  4,  5, 36, 49, 64, 81])

pandas.DataFrame(data,columns)

A pandas.DataFrame represents a two dimensional, size-mutable, potentially heterogenous collection of data.

data can be any iterable, dict or another dataframe.

import pandas as pd
from datetime import date
data = {
    'Date' : [ date(2025,1,31), date(2025,2,1)],
    'Close price' : [ 101.25, 103.00 ]
}

df = pd.DataFrame(data)
df
Date Close price
0 2025-01-31 101.25
1 2025-02-01 103.00

Indexing a DataFrame

# Access a single value for a row/column label pair
df.at[1, 'Close price']
np.float64(103.0)
df.at[1, 'Close price'] = 102.50
# Accessing a group of rows and columns by label(s) or boolean array
df.loc[0]
Date           2025-01-31
Close price        101.25
Name: 0, dtype: object
df = pd.DataFrame({
    'A' : [1, 2, 3, 4, 5, 6],
    'B' : [7, 8, 9, 10, 11, 12],
    'C' : [13, 14, 15, 16, 17, 18]
})
df
A B C
0 1 7 13
1 2 8 14
2 3 9 15
3 4 10 16
4 5 11 17
5 6 12 18
# Accessing a group of rows and columns by label(s) or boolean array
df.loc[0]
A     1
B     7
C    13
Name: 0, dtype: int64
# Integer location based indexing
df.iloc[1:3,1]
1    8
2    9
Name: B, dtype: int64

Filtering data

# This produces a filter mask
df['B'] >= 10
0    False
1    False
2    False
3     True
4     True
5     True
Name: B, dtype: bool
df[df['B'] >= 10]
A B C
3 4 10 16
4 5 11 17
5 6 12 18

Data transformation

df['B'] = df.apply(lambda row: row['B'] ** 2 , axis=1)
df
A B C
0 1 49 13
1 2 64 14
2 3 81 15
3 4 100 16
4 5 121 17
5 6 144 18