API Documentation¶
This is the documentation of hmmpy. It consists of 3 general classes for estimation and one class used for sampling. They are structured as follows:
hmmpy.base.BaseHiddenMarkov: This class holds all the base methods shared across estimation methods. This includes methods related to an HMM that is already fitted (such as sampling), initialization methods etc.
hmmpy.mle.MLEHMM: Methods related to maximum likeleihood estimation using Expectation Maximizing (Baum-Welch) algorithm.
hmmpy.jump.JumpHMM: Methods related to jump estimation.
hmmpy.base
- class hmmpy.base.BaseHiddenMarkov(n_states: int = 2, init: str = 'random', max_iter: int = 100, tol: float = 1e-06, epochs: int = 1, random_state: int = 42)¶
Parent class for Hidden Markov methods with gaussian distributions.
Contain methods related to: 1. Initializing HMM parameters using uniform distributions 2. Methods requiring an HMM to be fitted, e.g. prediction, decoding, sampling etc.
To fit an HMM refer to the respective child classes MLEHMM and JumpHMM.
- Parameters
n_states (int, default=2) – Number of hidden states
max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization
tol (float, default=1e-6) – Criterion for early stopping
epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.
random_state (int, default = 42) – Set seed. Used to create reproducible results.
init (str) –
Set to ‘kmeans++’ to use that init method - only supported for JumpHMM
Set to ‘random’ for random initialization.
Set to “deterministic” for deterministic init.
- is_fitted¶
Whether the model has been successfully fitted or not.
- Type
bool
- mu¶
Fitted means for each state
- Type
ndarray of shape (n_states,)
- std¶
Fitted std for each state
- Type
ndarray of shape (n_states,)
- tpm¶
Transition probability matrix between states
- Type
ndarray of shape (n_states, n_states)
- start_proba¶
Initial state distribution
- Type
ndarray of shape (n_states,)
- stationary_dist¶
Stationary distribution - requires model to be fitted.
- Type
ndarray of shape (n_states,)
- _get_stationary_dist(tpm)¶
Outputs the stationary distribution of the fitted model.
The stationary distributions corresponds to the largest eigenvector of the transition probability matrix. Since all values in the TPM are bounded between 0 and 1, we know that the largest eigenvalue is 1, and that the eigenvectors will all be defined by real numbers.
Computed by taking the eigenvector corresponding to the largest eigenvalue and scaling it to sum to 1.
- Parameters
tpm (ndarray of shape (n_states, n_states)) – Transition probability matrix
- Returns
stationary_dist
- Return type
ndarray of shape (n_states,)
- _init_params(X=None, diag_uniform_dist=(0.95, 0.99), output_hmm_params=True)¶
Initializes HMM parameters randomly using uniform distributions.
- Parameters
diag_uniform_dist (1D-array) – The lower and upper bounds of uniform distribution to sample init from.
- _log_backward_proba()¶
Compute the log of backward probabilities \(\log\beta_t\).
Backward probabilities are the conditional probability of observating the future observations \(x_{t+1},...x_{T}\) given the current state, i.e \(P(x_{t+1},...x_{T} | s_t = i)\)
- Returns
log_betas – Array containing the log of forward probabilities \(\log\beta_t\) at each time step.
- Return type
ndarray of shape (n_samples, n_states)
- _log_forward_proba()¶
Compute log forward probabilities \(\log\alpha_t\).
Forward probabilities are the joint probability \(P(s_t=i , x_1,...x_t)\).
- Returns
llk (float) – log-likehood of given HMM parameters
log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\alpha_t\) at each time step.
- bac_score(X, y_true, verbose=False)¶
Computes balanced accuracy score when true state sequence is known
- Parameters
X (ndarray of shape (n_samples,)) – Times series of data
y_true (ndarray of shape (n_samples,)) – True states.
verbose (bool) – Verbose output.
- Returns
bac – Balanced accuracy score
- Return type
float
- decode(X)¶
Output the most likely sequence of states given an observation sequence using Viterbi algorithm.
- Parameters
X (ndarray of shape (n_samples,)) – Times series of data
- Returns
state_preds – Predicted sequence of states with length of the inputted time series.
- Return type
ndarray of shape (n_samples,)
- emission_probs(X)¶
Computes the probability distribution p(x) given an observation sequence X The calculation will return a T X N matrix
- Parameters
X (ndarray of shape (n_samples,)) – Time series of data
- Returns
probs (ndarray of shape (n_samples, n_states)) – Probability for sampling from a particular state distribution
log_probs (ndarray of shape (n_samples, n_states)) – Log probability for sampling from a particular state distribution
- fit(X, get_hmm_params=True, sort_state_seq=True, verbose=False, feature_set='feature_set_2')¶
fit model to data. Defined in respective child classes
- fit_predict(X, n_preds=15, verbose=False)¶
Fit model, then decode states and make n predictions.
Wraps .fit(), .decode() and .predict_proba() into one method.
- Parameters
X (ndarray of shape (n_samples,)) – Data to be fitted.
n_preds (int, default=15) – Number of time steps to look forward from current time
- Returns
state_preds (ndarray of shape (n_samples,)) – Predicted sequence of states with length of the inputted time series.
posteriors (ndarray of shape (n_preds, n_samples)) – posterior distribution between states at terminal time step T.
- predict_proba(n_preds=15)¶
Compute the probability P(St+h = i | X^T = x^T). Calculates the probability of being in state i at future time step h given a specific observation sequence up untill time T.
- Parameters
n_preds (int, default=15) – Number of time steps to look forward from current time
- Returns
state_preds – Output the probability of being in state i at time t+h
- Return type
ndarray of shape (n_states, n_preds)
- rolling_posteriors(X)¶
Compute the posterior probability of being in state i at time T.
Function should be used as part of rolling estimation when one is interested only in the final smoothing probability of the current window sample.
- Parameters
X (ndarray of shape (n_samples,)) – Times series of data
- Returns
posterior – posterior distribution between states at terminal time step T.
- Return type
ndarray of shape (n_states,)
- sample(n_samples, n_sequences=1, hmm_params=None)¶
Sample states from a fitted Hidden Markov Model.
See also hmmpy.sampler.SampleHMM for full class supporting more sampling methods.
- Parameters
n_samples (int) – Amount of samples to generate
n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled
hmm_params (dict, default=None) – hmm model parameters to sample from. If None and model is fitted it will use fitted parameters. To manually set params, create a dict with ‘mu’, ‘std’, ‘tpm’ and ‘stationary distribution’ as kwds and values ndarrays.
- Returns
samples (ndarray of shape (n_samples,)) – Outputs the generated samples
sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states
hmmpy.mle¶
- class hmmpy.mle.MLEHMM(n_states: int = 2, init: str = 'random', max_iter: int = 100, tol: float = 1e-06, epochs: int = 10, random_state: int = 42)¶
Class for training HMM’s using the EM (Baum-Welch) algorithm.
- Parameters
n_states (int, default=2) – Number of hidden states
max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization
tol (float, default=1e-6) – Criterion for early stopping
epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.
random_state (int, default = 42) – Set seed. Used to create reproducible results.
init (str) –
Set to ‘kmeans++’ to use that init method - only supported for JumpHMM
Set to ‘random’ for random initialization.
Set to “deterministic” for deterministic init.
- is_fitted¶
Whether the model has been successfully fitted or not.
- Type
bool
- mu¶
Fitted means for each state
- Type
ndarray of shape (n_states,)
- std¶
Fitted std for each state
- Type
ndarray of shape (n_states,)
- tpm¶
Matrix of transition probabilities between states
- Type
ndarray of shape (n_states, n_states)
- start_proba¶
Initial state occupation distribution
- Type
ndarray of shape (n_states,)
- stationary_dist¶
Stationary distribution - requires model to be fitted.
- Type
ndarray of shape (n_states,)
- compute_log_posteriors(log_alphas, log_betas)¶
Expectation of being in state j at time t given observations, \(P(s_t = j | x_1,...,x_T)\).
- Parameters
log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\alpha_t\) at each time step.
log_betas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\beta_t\) at each time step.
- Returns
log_gamma – Expectation of being in state j at time t given observations, \(P(s_t = j | x_1,...,x_T)\)
- Return type
ndarray of shape (n_samples, n_states)
- compute_log_xi(log_alphas, log_betas)¶
Expected number of transitions from state i to j, \(P(s_{t-1} = j, s_t = i | x_1,...,x_t)\)
- Parameters
log_alphas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\alpha_t\) at each time step.
log_betas (ndarray of shape (n_samples, n_states)) – Array containing the log of forward probabilities \(\log\beta_t\) at each time step.
- Returns
log_xi – Expected number of transitions from state i to j, \(P(s_{t-1} = j, s_t = i | x_1,...,x_t)\)
- Return type
ndarray of shape (n_samples, n_states)
- fit(X: numpy.ndarray, sort_state_seq=True, verbose=False)¶
Perform the full EM-algorithm.
Iterates through the e-step and the m-step recursively to find the optimal model parameters.
- Parameters
X (ndarray of shape (n_samples,)) – Time series of data
sort_state_seq (bool, default=True) – Sort predicted states according to their variance with the low-variance state at first index position.
verbose (boolean) – False / True for extra information regarding the function.
- Returns
- Return type
Derives the optimal model parameters
hmmpy.jump¶
- class hmmpy.jump.JumpHMM(n_states: int = 2, jump_penalty: float = 16, window_len: tuple = (6, 14), init: str = 'kmeans++', max_iter: int = 30, tol: int = 1e-06, epochs: int = 10, random_state: int = 42)¶
Class for training HMM’s using the jump estimation.
- Parameters
n_states (int, default=2) – Number of hidden states
max_iter (int, default=100) – Maximum number of iterations to perform during expectation-maximization
tol (float, default=1e-6) – Criterion for early stopping
epochs (int, default=1) – Number of independent runs through fit method. Uses new initial parameters each time and choose the epoch with the highest likelihood.
random_state (int, default = 42) – Set seed. Used to create reproducible results.
init (str) –
Set to ‘kmeans++’ to use that init method - only supported for JumpHMM
Set to ‘random’ for random initialization.
Set to “deterministic” for deterministic init.
- is_fitted¶
Whether the model has been successfully fitted or not.
- Type
bool
- mu¶
Fitted means for each state
- Type
ndarray of shape (n_states,)
- std¶
Fitted std for each state
- Type
ndarray of shape (n_states,)
- tpm¶
Matrix of transition probabilities between states
- Type
ndarray of shape (n_states, n_states)
- start_proba¶
Initial state occupation distribution
- Type
ndarray of shape (n_states,)
- stationary_dist¶
Stationary distribution - requires model to be fitted.
- Type
ndarray of shape (n_states,)
- _check_state_sort(X, state_sequence)¶
Checks whether the low-variance state is the first state.
Otherwise sorts state predictions accoridngly.
- Parameters
X (ndarray of shape (n_samples,)) – Data to be fitted.
state_sequence (ndarray of shape (n_samples,)) – Predicted state sequence.
- Returns
state_sequence – Predicted state sequence sorted in correct order.
- Return type
ndarray of shape (n_samples,)
- _fit(X, Z, verbose=False)¶
Container for the loop where fitting happens
- _fit_state_seq(Z, theta)¶
Fit a state sequence based on current theta estimates.
Used in jump model fitting. Uses a dynamic programming technique very similar to the viterbi algorithm.
- Parameters
Z (ndarray of shape (n_samples, n_features)) – Set of standardized times series features
theta (ndarray of shape (n_features, n_states)) – jump model parameters. Distances from state (cluster) centers.
- Returns
state_seq (ndarray of shape (n_samples,)) – Current estimate of state sequence
objective_score (float) – Objective score under the model
- _fit_theta(Z, state_seq)¶
Fit theta, i.e minimize the squared \(\ell_2\) norm in each latent state.
Analytical solution in state \(j\) is: \(\theta_j = \frac{1}{N_j} \sum_{t\forall s_t=j} z_t\)
- Parameters
Z (ndarray of shape (n_samples, n_features)) – Set of standardized times series features
state_seq (ndarray of shape (n_samples,)) – Current estimate of hidden state sequence
- Returns
theta – Distances from state (cluster) centers.
- Return type
ndarray of shape (n_features, n_states)
- _init_params(Z, X=None, diag_uniform_dist=(0.95, 0.99), output_hmm_params=False)¶
Initializes HMM parameters randomly using uniform distributions or kmeans++.
Set when instantiating the class.
- Parameters
diag_uniform_dist (1D-array) – The lower and upper bounds of uniform distribution to sample init from.
- _l2_norm_squared(z, theta)¶
Compute the squared \(\ell_2\) norm at each time step.
Squared \(\ell_2\) norm is computed as \(||z_t - \theta_{s_t} ||_2^2\).
- Parameters
z (ndarray of shape (n_samples, n_features)) – Data to be fitted
theta (ndarray of shape (n_features, n_states)) – jump model parameters
- Returns
norms – Squared \(\ell_2\) norms for all conditional states.
- Return type
ndarray of shape (n_samples, n_states)
- construct_features(X: numpy.ndarray, window_len: tuple, feature_set='feature_set_3')¶
Construct a number of standardized times series features from data X.
- Parameters
X (ndarray of shape (n_samples,)) – Time series of data
window_len (tuple, default=(6,14)) – Number and size of rolling window lengths used when constructing features.
feature_set (str) –
‘feature_set_1’ - basic features such as mean, std, min/max etc.
’feature_set_2’ - technical features such as bollinger bands
’feature_set_1’ - feature set 1 and feature 2 combined
- Returns
Z – Standardized times series features in a 2-D array.
- Return type
ndarray of shape (n_samples-window_len, n_features)
- fit(X, get_hmm_params=True, sort_state_seq=True, verbose=False, feature_set='feature_set_3')¶
Fit jump model
Iterates through fitting cluster centers ($ heta_s_t)$ and fitting state sequences until convergence criterion is met.
- Parameters
X (ndarray of shape (n_samples,)) – Time series of data
get_hmm_params (bool, default=True) – If True, outputs hmm parameters if when converged, i.e. mu, std, tpm and stationary_dist
sort_state_seq (bool, default=True) – Sort predicted states according to their variance with the low-variance state at first index position.
verbose (boolean) – False / True for extra information regarding the function.
- Returns
- Return type
Derives the optimal model parameters
- get_params_from_seq(X, state_sequence)¶
Stores and outputs the model parameters based on the input sequence.
- Parameters
X (ndarray of shape (n_samples,)) – Time series of data
state_sequence (ndarray of shape (n_samples)) – State sequence for a given observation sequence
- Returns
- Return type
Hmm parameters
hmmpy.sampler¶
- class hmmpy.sampler.SampleHMM(n_states=2, frequency='daily', hmm_params=None, random_state=42)¶
Class to handle sampling from HMM hidden_markov with user parameters.
- Parameters
n_states (int, default=2) – Number of hidden states
hmm_params (dict, default=None) – hmm model parameters to sample from. To set params, create a dict with ‘mu’, ‘std’ and ‘tpm’ as keys and their values in lists or numpy arrays.
random_state (int, default = 42) – Parameter set to recreate output
- mu¶
means to sample from
- Type
ndarray of shape (n_states,)
- std¶
STDs to sample from
- Type
ndarray of shape (n_states,)
- tpm¶
Transition probability matrix between states
- Type
ndarray of shape (n_states, n_states)
- sample(n_samples, n_sequences=1)¶
Sample states from a fitted Hidden Markov Model.
- Parameters
n_samples (int) – Amount of samples to generate
n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled
- Returns
samples (ndarray of shape (n_samples, n_sequences)) – Outputs the generated samples of size n_samples
sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states
- sample_t(n_samples, n_sequences=1, dof=5)¶
Sample states from a fitted Hidden Markov Model.
- Parameters
n_samples (int) – Amount of samples to generate
n_sequences (int, default=1) – Number of independent sequences to sample from, e.g. if n_samples=100 and n_sequences=3 then 3 different sequences of length 100 are sampled
dof (int, default=5) – degrees of freedom in the conditional t-distributions.
- Returns
samples (ndarray of shape (n_samples, n_sequences)) – Outputs the generated samples of size n_samples
sample_states (ndarray of shape (n_samples, n_sequences)) – Outputs sampled states