API Documentation

This is the documentation of hmmpy. It consists of 3 general classes for estimation and one class used for sampling. They are structured as follows:

  • hmmpy.base.BaseHiddenMarkov: This class holds all the base methods shared across estimation methods. This includes methods related to an HMM that is already fitted (such as sampling), initialization methods etc.

  • hmmpy.mle.MLEHMM: Methods related to maximum likeleihood estimation using Expectation Maximizing (Baum-Welch) algorithm.

  • hmmpy.jump.JumpHMM: Methods related to jump estimation.

hmmpy.base

class hmmpy.base.BaseHiddenMarkov(n_states: int = 2, init: str = 'random', max_iter: int = 100, tol: float = 1e-06, epochs: int = 1, random_state: int = 42)

Parent class for Hidden Markov methods with gaussian distributions.

Contain methods related to: 1. Initializing HMM parameters using uniform distributions 2. Methods requiring an HMM to be fitted, e.g. prediction, decoding, sampling etc.

To fit an HMM refer to the respective child classes MLEHMM and JumpHMM.

Parameters
  • n_states (int, default=2) – Number of hidden states

  • max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization

  • tol (float, default=1e-6) – Criterion for early stopping

  • epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.

  • random_state (int, default = 42) – Set seed. Used to create reproducible results.

  • init (str) –

    • Set to ‘kmeans++’ to use that init method - only supported for JumpHMM

    • Set to ‘random’ for random initialization.

    • Set to “deterministic” for deterministic init.

is_fitted

Whether the model has been successfully fitted or not.

Type

bool

mu

Fitted means for each state

Type

ndarray of shape (n_states,)

std

Fitted std for each state

Type

ndarray of shape (n_states,)

tpm

Transition probability matrix between states

Type

ndarray of shape (n_states, n_states)

start_proba

Initial state distribution

Type

ndarray of shape (n_states,)

stationary_dist

Stationary distribution - requires model to be fitted.

Type

ndarray of shape (n_states,)

_get_stationary_dist(tpm)

Outputs the stationary distribution of the fitted model.

The stationary distributions corresponds to the largest eigenvector of the transition probability matrix. Since all values in the TPM are bounded between 0 and 1, we know that the largest eigenvalue is 1, and that the eigenvectors will all be defined by real numbers.

Computed by taking the eigenvector corresponding to the largest eigenvalue and scaling it to sum to 1.

Parameters

tpm (ndarray of shape (n_states, n_states)) – Transition probability matrix

Returns

stationary_dist

Return type

ndarray of shape (n_states,)

_init_params(X=None, diag_uniform_dist=(0.95, 0.99), output_hmm_params=True)

Initializes HMM parameters randomly using uniform distributions.

Parameters

diag_uniform_dist (1D-array) – The lower and upper bounds of uniform distribution to sample init from.

_log_backward_proba()

Compute the log of backward probabilities \(\log\beta_t\).

Backward probabilities are the conditional probability of observating the future observations \(x_{t+1},...x_{T}\) given the current state, i.e \(P(x_{t+1},...x_{T} | s_t = i)\)

Returns

log_betas – Array containing the log of forward probabilities \(\log\beta_t\) at each time step.

Return type

ndarray of shape (n_samples, n_states)

_log_forward_proba()

Compute log forward probabilities \(\log\alpha_t\).

Forward probabilities are the joint probability \(P(s_t=i , x_1,...x_t)\).

Returns

  • llk (float) – log-likehood of given HMM parameters

  • log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\alpha_t\) at each time step.

bac_score(X, y_true, verbose=False)

Computes balanced accuracy score when true state sequence is known

Parameters
  • X (ndarray of shape (n_samples,)) – Times series of data

  • y_true (ndarray of shape (n_samples,)) – True states.

  • verbose (bool) – Verbose output.

Returns

bac – Balanced accuracy score

Return type

float

decode(X)

Output the most likely sequence of states given an observation sequence using Viterbi algorithm.

Parameters

X (ndarray of shape (n_samples,)) – Times series of data

Returns

state_preds – Predicted sequence of states with length of the inputted time series.

Return type

ndarray of shape (n_samples,)

emission_probs(X)

Computes the probability distribution p(x) given an observation sequence X The calculation will return a T X N matrix

Parameters

X (ndarray of shape (n_samples,)) – Time series of data

Returns

  • probs (ndarray of shape (n_samples, n_states)) – Probability for sampling from a particular state distribution

  • log_probs (ndarray of shape (n_samples, n_states)) – Log probability for sampling from a particular state distribution

fit(X, get_hmm_params=True, sort_state_seq=True, verbose=False, feature_set='feature_set_2')

fit model to data. Defined in respective child classes

fit_predict(X, n_preds=15, verbose=False)

Fit model, then decode states and make n predictions.

Wraps .fit(), .decode() and .predict_proba() into one method.

Parameters
  • X (ndarray of shape (n_samples,)) – Data to be fitted.

  • n_preds (int, default=15) – Number of time steps to look forward from current time

Returns

  • state_preds (ndarray of shape (n_samples,)) – Predicted sequence of states with length of the inputted time series.

  • posteriors (ndarray of shape (n_preds, n_samples)) – posterior distribution between states at terminal time step T.

predict_proba(n_preds=15)

Compute the probability P(St+h = i | X^T = x^T). Calculates the probability of being in state i at future time step h given a specific observation sequence up untill time T.

Parameters

n_preds (int, default=15) – Number of time steps to look forward from current time

Returns

state_preds – Output the probability of being in state i at time t+h

Return type

ndarray of shape (n_states, n_preds)

rolling_posteriors(X)

Compute the posterior probability of being in state i at time T.

Function should be used as part of rolling estimation when one is interested only in the final smoothing probability of the current window sample.

Parameters

X (ndarray of shape (n_samples,)) – Times series of data

Returns

posterior – posterior distribution between states at terminal time step T.

Return type

ndarray of shape (n_states,)

sample(n_samples, n_sequences=1, hmm_params=None)

Sample states from a fitted Hidden Markov Model.

See also hmmpy.sampler.SampleHMM for full class supporting more sampling methods.

Parameters
  • n_samples (int) – Amount of samples to generate

  • n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled

  • hmm_params (dict, default=None) – hmm model parameters to sample from. If None and model is fitted it will use fitted parameters. To manually set params, create a dict with ‘mu’, ‘std’, ‘tpm’ and ‘stationary distribution’ as kwds and values ndarrays.

Returns

  • samples (ndarray of shape (n_samples,)) – Outputs the generated samples

  • sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states

hmmpy.mle

class hmmpy.mle.MLEHMM(n_states: int = 2, init: str = 'random', max_iter: int = 100, tol: float = 1e-06, epochs: int = 10, random_state: int = 42)

Class for training HMM’s using the EM (Baum-Welch) algorithm.

Parameters
  • n_states (int, default=2) – Number of hidden states

  • max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization

  • tol (float, default=1e-6) – Criterion for early stopping

  • epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.

  • random_state (int, default = 42) – Set seed. Used to create reproducible results.

  • init (str) –

    • Set to ‘kmeans++’ to use that init method - only supported for JumpHMM

    • Set to ‘random’ for random initialization.

    • Set to “deterministic” for deterministic init.

is_fitted

Whether the model has been successfully fitted or not.

Type

bool

mu

Fitted means for each state

Type

ndarray of shape (n_states,)

std

Fitted std for each state

Type

ndarray of shape (n_states,)

tpm

Matrix of transition probabilities between states

Type

ndarray of shape (n_states, n_states)

start_proba

Initial state occupation distribution

Type

ndarray of shape (n_states,)

stationary_dist

Stationary distribution - requires model to be fitted.

Type

ndarray of shape (n_states,)

compute_log_posteriors(log_alphas, log_betas)

Expectation of being in state j at time t given observations, \(P(s_t = j | x_1,...,x_T)\).

Parameters
  • log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\alpha_t\) at each time step.

  • log_betas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\beta_t\) at each time step.

Returns

log_gamma – Expectation of being in state j at time t given observations, \(P(s_t = j | x_1,...,x_T)\)

Return type

ndarray of shape (n_samples, n_states)

compute_log_xi(log_alphas, log_betas)

Expected number of transitions from state i to j, \(P(s_{t-1} = j, s_t = i | x_1,...,x_t)\)

Parameters
  • log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\alpha_t\) at each time step.

  • log_betas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\beta_t\) at each time step.

Returns

log_xi – Expected number of transitions from state i to j, \(P(s_{t-1} = j, s_t = i | x_1,...,x_t)\)

Return type

ndarray of shape (n_samples, n_states)

fit(X: numpy.ndarray, sort_state_seq=True, verbose=False)

Perform the full EM-algorithm.

Iterates through the e-step and the m-step recursively to find the optimal model parameters.

Parameters
  • X (ndarray of shape (n_samples,)) – Time series of data

  • sort_state_seq (bool, default=True) – Sort predicted states according to their variance with the low-variance state at first index position.

  • verbose (boolean) – False / True for extra information regarding the function.

Returns

Return type

Derives the optimal model parameters

hmmpy.jump

class hmmpy.jump.JumpHMM(n_states: int = 2, jump_penalty: float = 16, window_len: tuple = (6, 14), init: str = 'kmeans++', max_iter: int = 30, tol: int = 1e-06, epochs: int = 10, random_state: int = 42)

Class for training HMM’s using the jump estimation.

Parameters
  • n_states (int, default=2) – Number of hidden states

  • max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization

  • tol (float, default=1e-6) – Criterion for early stopping

  • epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.

  • random_state (int, default = 42) – Set seed. Used to create reproducible results.

  • init (str) –

    • Set to ‘kmeans++’ to use that init method - only supported for JumpHMM

    • Set to ‘random’ for random initialization.

    • Set to “deterministic” for deterministic init.

is_fitted

Whether the model has been successfully fitted or not.

Type

bool

mu

Fitted means for each state

Type

ndarray of shape (n_states,)

std

Fitted std for each state

Type

ndarray of shape (n_states,)

tpm

Matrix of transition probabilities between states

Type

ndarray of shape (n_states, n_states)

start_proba

Initial state occupation distribution

Type

ndarray of shape (n_states,)

stationary_dist

Stationary distribution - requires model to be fitted.

Type

ndarray of shape (n_states,)

_check_state_sort(X, state_sequence)

Checks whether the low-variance state is the first state.

Otherwise sorts state predictions accoridngly.

Parameters
  • X (ndarray of shape (n_samples,)) – Data to be fitted.

  • state_sequence (ndarray of shape (n_samples,)) – Predicted state sequence.

Returns

state_sequence – Predicted state sequence sorted in correct order.

Return type

ndarray of shape (n_samples,)

_fit(X, Z, verbose=False)

Container for the loop where fitting happens

_fit_state_seq(Z, theta)

Fit a state sequence based on current theta estimates.

Used in jump model fitting. Uses a dynamic programming technique very similar to the viterbi algorithm.

Parameters
  • Z (ndarray of shape (n_samples, n_features)) – Set of standardized times series features

  • theta (ndarray of shape (n_features, n_states)) – jump model parameters. Distances from state (cluster) centers.

Returns

  • state_seq (ndarray of shape (n_samples,)) – Current estimate of state sequence

  • objective_score (float) – Objective score under the model

_fit_theta(Z, state_seq)

Fit theta, i.e minimize the squared \(\ell_2\) norm in each latent state.

Analytical solution in state \(j\) is: \(\theta_j = \frac{1}{N_j} \sum_{t\forall s_t=j} z_t\)

Parameters
  • Z (ndarray of shape (n_samples, n_features)) – Set of standardized times series features

  • state_seq (ndarray of shape (n_samples,)) – Current estimate of hidden state sequence

Returns

theta – Distances from state (cluster) centers.

Return type

ndarray of shape (n_features, n_states)

_init_params(Z, X=None, diag_uniform_dist=(0.95, 0.99), output_hmm_params=False)

Initializes HMM parameters randomly using uniform distributions or kmeans++.

Set when instantiating the class.

Parameters

diag_uniform_dist (1D-array) – The lower and upper bounds of uniform distribution to sample init from.

_l2_norm_squared(z, theta)

Compute the squared \(\ell_2\) norm at each time step.

Squared \(\ell_2\) norm is computed as \(||z_t - \theta_{s_t} ||_2^2\).

Parameters
  • z (ndarray of shape (n_samples, n_features)) – Data to be fitted

  • theta (ndarray of shape (n_features, n_states)) – jump model parameters

Returns

norms – Squared \(\ell_2\) norms for all conditional states.

Return type

ndarray of shape (n_samples, n_states)

construct_features(X: numpy.ndarray, window_len: tuple, feature_set='feature_set_3')

Construct a number of standardized times series features from data X.

Parameters
  • X (ndarray of shape (n_samples,)) – Time series of data

  • window_len (tuple, default=(6,14)) – Number and size of rolling window lengths used when constructing features.

  • feature_set (str) –

    • ‘feature_set_1’ - basic features such as mean, std, min/max etc.

    • ’feature_set_2’ - technical features such as bollinger bands

    • ’feature_set_1’ - feature set 1 and feature 2 combined

Returns

Z – Standardized times series features in a 2-D array.

Return type

ndarray of shape (n_samples-window_len, n_features)

fit(X, get_hmm_params=True, sort_state_seq=True, verbose=False, feature_set='feature_set_3')

Fit jump model

Iterates through fitting cluster centers ($ heta_s_t)$ and fitting state sequences until convergence criterion is met.

Parameters
  • X (ndarray of shape (n_samples,)) – Time series of data

  • get_hmm_params (bool, default=True) – If True, outputs hmm parameters if when converged, i.e. mu, std, tpm and stationary_dist

  • sort_state_seq (bool, default=True) – Sort predicted states according to their variance with the low-variance state at first index position.

  • verbose (boolean) – False / True for extra information regarding the function.

Returns

Return type

Derives the optimal model parameters

get_params_from_seq(X, state_sequence)

Stores and outputs the model parameters based on the input sequence.

Parameters
  • X (ndarray of shape (n_samples,)) – Time series of data

  • state_sequence (ndarray of shape (n_samples)) – State sequence for a given observation sequence

Returns

Return type

Hmm parameters

hmmpy.sampler

class hmmpy.sampler.SampleHMM(n_states=2, frequency='daily', hmm_params=None, random_state=42)

Class to handle sampling from HMM hidden_markov with user parameters.

Parameters
  • n_states (int, default=2) – Number of hidden states

  • hmm_params (dict, default=None) – hmm model parameters to sample from. To set params, create a dict with ‘mu’, ‘std’ and ‘tpm’ as keys and their values in lists or numpy arrays.

  • random_state (int, default = 42) – Parameter set to recreate output

mu

means to sample from

Type

ndarray of shape (n_states,)

std

STDs to sample from

Type

ndarray of shape (n_states,)

tpm

Transition probability matrix between states

Type

ndarray of shape (n_states, n_states)

sample(n_samples, n_sequences=1)

Sample states from a fitted Hidden Markov Model.

Parameters
  • n_samples (int) – Amount of samples to generate

  • n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled

Returns

  • samples (ndarray of shape (n_samples, n_sequences)) – Outputs the generated samples of size n_samples

  • sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states

sample_t(n_samples, n_sequences=1, dof=5)

Sample states from a fitted Hidden Markov Model.

Parameters
  • n_samples (int) – Amount of samples to generate

  • n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled

  • dof (int, default=5) – degrees of freedom in the conditional t-distributions.

Returns

  • samples (ndarray of shape (n_samples, n_sequences)) – Outputs the generated samples of size n_samples

  • sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states