API Documentation¶

This is the documentation of hmmpy. It consists of 3 general classes for estimation and one class used for sampling. They are structured as follows:

hmmpy.base.BaseHiddenMarkov: This class holds all the base methods shared across estimation methods. This includes methods related to an HMM that is already fitted (such as sampling), initialization methods etc.
hmmpy.mle.MLEHMM: Methods related to maximum likeleihood estimation using Expectation Maximizing (Baum-Welch) algorithm.
hmmpy.jump.JumpHMM: Methods related to jump estimation.

hmmpy.base

class hmmpy.base.BaseHiddenMarkov(n_states: int = 2, init: str = 'random', max_iter: int = 100, tol: float = 1e-06, epochs: int = 1, random_state: int = 42)¶

Parent class for Hidden Markov methods with gaussian distributions.

Contain methods related to: 1. Initializing HMM parameters using uniform distributions 2. Methods requiring an HMM to be fitted, e.g. prediction, decoding, sampling etc.

To fit an HMM refer to the respective child classes MLEHMM and JumpHMM.

Parameters

n_states (int, default=2) – Number of hidden states
max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization
tol (float, default=1e-6) – Criterion for early stopping
epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.
random_state (int, default = 42) – Set seed. Used to create reproducible results.
init (str) –
- Set to ‘kmeans++’ to use that init method - only supported for JumpHMM
- Set to ‘random’ for random initialization.
- Set to “deterministic” for deterministic init.

is_fitted¶

Whether the model has been successfully fitted or not.

Type: bool

mu¶

Fitted means for each state

Type: ndarray of shape (n_states,)

std¶

Fitted std for each state

Type: ndarray of shape (n_states,)

tpm¶

Transition probability matrix between states

Type: ndarray of shape (n_states, n_states)

start_proba¶

Initial state distribution

Type: ndarray of shape (n_states,)

stationary_dist¶

Stationary distribution - requires model to be fitted.

Type: ndarray of shape (n_states,)

_get_stationary_dist(tpm)¶

Outputs the stationary distribution of the fitted model.

The stationary distributions corresponds to the largest eigenvector of the transition probability matrix. Since all values in the TPM are bounded between 0 and 1, we know that the largest eigenvalue is 1, and that the eigenvectors will all be defined by real numbers.

Computed by taking the eigenvector corresponding to the largest eigenvalue and scaling it to sum to 1.

Parameters: tpm (ndarray of shape (n_states, n_states)) – Transition probability matrix
Returns: stationary_dist
Return type: ndarray of shape (n_states,)

_init_params(X=None, diag_uniform_dist=(0.95, 0.99), output_hmm_params=True)¶

Initializes HMM parameters randomly using uniform distributions.

Parameters: diag_uniform_dist (1D-array) – The lower and upper bounds of uniform distribution to sample init from.

_log_backward_proba()¶

Compute the log of backward probabilities $\log\beta_t$.

Backward probabilities are the conditional probability of observating the future observations $x_{t+1},...x_{T}$ given the current state, i.e $P(x_{t+1},...x_{T} | s_t = i)$

Returns: log_betas – Array containing the log of forward probabilities $\log\beta_t$ at each time step.
Return type: ndarray of shape (n_samples, n_states)

_log_forward_proba()¶

Compute log forward probabilities $\log\alpha_t$.

Forward probabilities are the joint probability $P(s_t=i , x_1,...x_t)$.

Returns

llk (float) – log-likehood of given HMM parameters
log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities $\log\alpha_t$ at each time step.

bac_score(X, y_true, verbose=False)¶

Computes balanced accuracy score when true state sequence is known

Parameters

X (ndarray of shape (n_samples,)) – Times series of data
y_true (ndarray of shape (n_samples,)) – True states.
verbose (bool) – Verbose output.

Returns

bac – Balanced accuracy score

Return type

float

decode(X)¶

Output the most likely sequence of states given an observation sequence using Viterbi algorithm.

Parameters: X (ndarray of shape (n_samples,)) – Times series of data
Returns: state_preds – Predicted sequence of states with length of the inputted time series.
Return type: ndarray of shape (n_samples,)

emission_probs(X)¶

Computes the probability distribution p(x) given an observation sequence X The calculation will return a T X N matrix

Parameters

X (ndarray of shape (n_samples,)) – Time series of data

Returns

probs (ndarray of shape (n_samples, n_states)) – Probability for sampling from a particular state distribution
log_probs (ndarray of shape (n_samples, n_states)) – Log probability for sampling from a particular state distribution

fit(X, get_hmm_params=True, sort_state_seq=True, verbose=False, feature_set='feature_set_2')¶: fit model to data. Defined in respective child classes

fit_predict(X, n_preds=15, verbose=False)¶

Fit model, then decode states and make n predictions.

Wraps .fit(), .decode() and .predict_proba() into one method.

Parameters

X (ndarray of shape (n_samples,)) – Data to be fitted.
n_preds (int, default=15) – Number of time steps to look forward from current time

Returns

state_preds (ndarray of shape (n_samples,)) – Predicted sequence of states with length of the inputted time series.
posteriors (ndarray of shape (n_preds, n_samples)) – posterior distribution between states at terminal time step T.

predict_proba(n_preds=15)¶

Compute the probability P(St+h = i | X^T = x^T). Calculates the probability of being in state i at future time step h given a specific observation sequence up untill time T.

Parameters: n_preds (int, default=15) – Number of time steps to look forward from current time
Returns: state_preds – Output the probability of being in state i at time t+h
Return type: ndarray of shape (n_states, n_preds)

rolling_posteriors(X)¶

Compute the posterior probability of being in state i at time T.

Function should be used as part of rolling estimation when one is interested only in the final smoothing probability of the current window sample.

Parameters: X (ndarray of shape (n_samples,)) – Times series of data
Returns: posterior – posterior distribution between states at terminal time step T.
Return type: ndarray of shape (n_states,)

sample(n_samples, n_sequences=1, hmm_params=None)¶

Sample states from a fitted Hidden Markov Model.

See also hmmpy.sampler.SampleHMM for full class supporting more sampling methods.

Parameters

n_samples (int) – Amount of samples to generate
n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled
hmm_params (dict, default=None) – hmm model parameters to sample from. If None and model is fitted it will use fitted parameters. To manually set params, create a dict with ‘mu’, ‘std’, ‘tpm’ and ‘stationary distribution’ as kwds and values ndarrays.

Returns

samples (ndarray of shape (n_samples,)) – Outputs the generated samples
sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states

hmmpy.mle¶

class hmmpy.mle.MLEHMM(n_states: int = 2, init: str = 'random', max_iter: int = 100, tol: float = 1e-06, epochs: int = 10, random_state: int = 42)¶

Class for training HMM’s using the EM (Baum-Welch) algorithm.

Parameters

n_states (int, default=2) – Number of hidden states
max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization
tol (float, default=1e-6) – Criterion for early stopping
epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.
random_state (int, default = 42) – Set seed. Used to create reproducible results.
init (str) –
- Set to ‘kmeans++’ to use that init method - only supported for JumpHMM
- Set to ‘random’ for random initialization.
- Set to “deterministic” for deterministic init.

is_fitted¶

Whether the model has been successfully fitted or not.

Type: bool

mu¶

Fitted means for each state

Type: ndarray of shape (n_states,)

std¶

Fitted std for each state

Type: ndarray of shape (n_states,)

tpm¶

Matrix of transition probabilities between states

Type: ndarray of shape (n_states, n_states)

start_proba¶

Initial state occupation distribution

Type: ndarray of shape (n_states,)

stationary_dist¶

Stationary distribution - requires model to be fitted.

Type: ndarray of shape (n_states,)

compute_log_posteriors(log_alphas, log_betas)¶

Expectation of being in state j at time t given observations, $P(s_t = j | x_1,...,x_T)$.

Parameters

log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities $\log\alpha_t$ at each time step.
log_betas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities $\log\beta_t$ at each time step.

Returns

log_gamma – Expectation of being in state j at time t given observations, $P(s_t = j | x_1,...,x_T)$

Return type

ndarray of shape (n_samples, n_states)

compute_log_xi(log_alphas, log_betas)¶

Expected number of transitions from state i to j, $P(s_{t-1} = j, s_t = i | x_1,...,x_t)$

Parameters

log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities $\log\alpha_t$ at each time step.
log_betas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities $\log\beta_t$ at each time step.

Returns

log_xi – Expected number of transitions from state i to j, $P(s_{t-1} = j, s_t = i | x_1,...,x_t)$

Return type

ndarray of shape (n_samples, n_states)

fit(X: numpy.ndarray, sort_state_seq=True, verbose=False)¶

Perform the full EM-algorithm.

Iterates through the e-step and the m-step recursively to find the optimal model parameters.

Parameters

X (ndarray of shape (n_samples,)) – Time series of data
sort_state_seq (bool, default=True) – Sort predicted states according to their variance with the low-variance state at first index position.
verbose (boolean) – False / True for extra information regarding the function.

Returns

Return type

Derives the optimal model parameters

hmmpy.jump¶

class hmmpy.jump.JumpHMM(n_states: int = 2, jump_penalty: float = 16, window_len: tuple = (6, 14), init: str = 'kmeans++', max_iter: int = 30, tol: int = 1e-06, epochs: int = 10, random_state: int = 42)¶

Class for training HMM’s using the jump estimation.

Parameters

n_states (int, default=2) – Number of hidden states
max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization
tol (float, default=1e-6) – Criterion for early stopping
epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.
random_state (int, default = 42) – Set seed. Used to create reproducible results.
init (str) –
- Set to ‘kmeans++’ to use that init method - only supported for JumpHMM
- Set to ‘random’ for random initialization.
- Set to “deterministic” for deterministic init.

is_fitted¶

Whether the model has been successfully fitted or not.

Type: bool

mu¶

Fitted means for each state

Type: ndarray of shape (n_states,)

std¶

Fitted std for each state

Type: ndarray of shape (n_states,)

tpm¶

Matrix of transition probabilities between states

Type: ndarray of shape (n_states, n_states)

start_proba¶

Initial state occupation distribution

Type: ndarray of shape (n_states,)

stationary_dist¶

Stationary distribution - requires model to be fitted.

Type: ndarray of shape (n_states,)

_check_state_sort(X, state_sequence)¶

Checks whether the low-variance state is the first state.

Otherwise sorts state predictions accoridngly.

Parameters

X (ndarray of shape (n_samples,)) – Data to be fitted.
state_sequence (ndarray of shape (n_samples,)) – Predicted state sequence.

Returns

state_sequence – Predicted state sequence sorted in correct order.

Return type

ndarray of shape (n_samples,)

_fit(X, Z, verbose=False)¶: Container for the loop where fitting happens

_fit_state_seq(Z, theta)¶

Fit a state sequence based on current theta estimates.

Used in jump model fitting. Uses a dynamic programming technique very similar to the viterbi algorithm.

Parameters

Z (ndarray of shape (n_samples, n_features)) – Set of standardized times series features
theta (ndarray of shape (n_features, n_states)) – jump model parameters. Distances from state (cluster) centers.

Returns

state_seq (ndarray of shape (n_samples,)) – Current estimate of state sequence
objective_score (float) – Objective score under the model

_fit_theta(Z, state_seq)¶

Fit theta, i.e minimize the squared $\ell_2$ norm in each latent state.

Analytical solution in state $j$ is: $\theta_j = \frac{1}{N_j} \sum_{t\forall s_t=j} z_t$

Parameters

Z (ndarray of shape (n_samples, n_features)) – Set of standardized times series features
state_seq (ndarray of shape (n_samples,)) – Current estimate of hidden state sequence

Returns

theta – Distances from state (cluster) centers.

Return type

ndarray of shape (n_features, n_states)

_init_params(Z, X=None, diag_uniform_dist=(0.95, 0.99), output_hmm_params=False)¶

Initializes HMM parameters randomly using uniform distributions or kmeans++.

Set when instantiating the class.

Parameters: diag_uniform_dist (1D-array) – The lower and upper bounds of uniform distribution to sample init from.

_l2_norm_squared(z, theta)¶

Compute the squared $\ell_2$ norm at each time step.

Squared $\ell_2$ norm is computed as $||z_t - \theta_{s_t} ||_2^2$.

Parameters

z (ndarray of shape (n_samples, n_features)) – Data to be fitted
theta (ndarray of shape (n_features, n_states)) – jump model parameters

Returns

norms – Squared $\ell_2$ norms for all conditional states.

Return type

ndarray of shape (n_samples, n_states)

construct_features(X: numpy.ndarray, window_len: tuple, feature_set='feature_set_3')¶

Construct a number of standardized times series features from data X.

Parameters

X (ndarray of shape (n_samples,)) – Time series of data
window_len (tuple, default=(6,14)) – Number and size of rolling window lengths used when constructing features.
feature_set (str) –
- ‘feature_set_1’ - basic features such as mean, std, min/max etc.
- ’feature_set_2’ - technical features such as bollinger bands
- ’feature_set_1’ - feature set 1 and feature 2 combined

Returns

Z – Standardized times series features in a 2-D array.

Return type

ndarray of shape (n_samples-window_len, n_features)

fit(X, get_hmm_params=True, sort_state_seq=True, verbose=False, feature_set='feature_set_3')¶

Fit jump model

Iterates through fitting cluster centers ($ heta_s_t)$ and fitting state sequences until convergence criterion is met.

Parameters

X (ndarray of shape (n_samples,)) – Time series of data
get_hmm_params (bool, default=True) – If True, outputs hmm parameters if when converged, i.e. mu, std, tpm and stationary_dist
sort_state_seq (bool, default=True) – Sort predicted states according to their variance with the low-variance state at first index position.
verbose (boolean) – False / True for extra information regarding the function.

Returns

Return type

Derives the optimal model parameters

get_params_from_seq(X, state_sequence)¶

Stores and outputs the model parameters based on the input sequence.

Parameters

X (ndarray of shape (n_samples,)) – Time series of data
state_sequence (ndarray of shape (n_samples)) – State sequence for a given observation sequence

Returns

Return type

Hmm parameters

hmmpy.sampler¶

class hmmpy.sampler.SampleHMM(n_states=2, frequency='daily', hmm_params=None, random_state=42)¶

Class to handle sampling from HMM hidden_markov with user parameters.

Parameters

n_states (int, default=2) – Number of hidden states
hmm_params (dict, default=None) – hmm model parameters to sample from. To set params, create a dict with ‘mu’, ‘std’ and ‘tpm’ as keys and their values in lists or numpy arrays.
random_state (int, default = 42) – Parameter set to recreate output

mu¶

means to sample from

Type: ndarray of shape (n_states,)

std¶

STDs to sample from

Type: ndarray of shape (n_states,)

tpm¶

Transition probability matrix between states

Type: ndarray of shape (n_states, n_states)

sample(n_samples, n_sequences=1)¶

Sample states from a fitted Hidden Markov Model.

Parameters

n_samples (int) – Amount of samples to generate
n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled

Returns

samples (ndarray of shape (n_samples, n_sequences)) – Outputs the generated samples of size n_samples
sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states

sample_t(n_samples, n_sequences=1, dof=5)¶

Sample states from a fitted Hidden Markov Model.

Parameters

n_samples (int) – Amount of samples to generate
n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled
dof (int, default=5) – degrees of freedom in the conditional t-distributions.

Returns

samples (ndarray of shape (n_samples, n_sequences)) – Outputs the generated samples of size n_samples
sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states