Regression Metrics
Functional API
SeqMetrics also provides a functional API for all the performance metrics.
- SeqMetrics.acc(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Anomaly correction coefficient. See Langland et al., 2012; Miyakoda et al., 1972 and Murphy et al., 1989.
\[ACC = \frac{\sum_{i=1}^{N} \left( (\text{predicted}_i - \overline{\text{predicted}})(\text{true}_i - \overline{\text{true}}) \right)}{(N-1) \cdot \sigma_{\text{true}} \cdot \sigma_{\text{predicted}}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import acc >>> t = np.random.random(10) >>> p = np.random.random(10) >>> acc(t, p)
- SeqMetrics.adjusted_r2(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Adjusted R squared also known as Ezekiel estimate.
\[\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2) \cdot (n - 1)}{n - k - 1} \right)\]where n = number of observations and k = 1.
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import adjusted_r2 >>> t = np.random.random(10) >>> p = np.random.random(10) >>> adjusted_r2(t, p)
- SeqMetrics.agreement_index(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Agreement Index (d) developed by Willmott, 1981.
It detects additive and pro-portional differences in the observed and simulated means and variances (Moriasi et al., 2015). It is overly sensitive to extreme values due to the squared differences. It can also be used as a substitute for R2 to identify the degree to which model predictions are error-free.
\[d = 1 - \frac{\sum_{i=1}^{N}(e_{i} - s_{i})^2}{\sum_{i=1}^{N}(\left | s_{i} - \bar{e} \right | + \left | e_{i} - \bar{e} \right |)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import agreement_index >>> t = np.random.random(10) >>> p = np.random.random(10) >>> agreement_index(t, p)
- SeqMetrics.aitchison(true, predicted, treat_arrays: bool = True, center='mean', **treat_arrays_kws) float[source]
Aitchison distance as used in Zhang et al., 2020.
\[d_{\text{Aitchison}} = \sqrt{\sum_{i=1}^{n} \left( \log(\text{true}_i) - \text{center}(\log(\text{true})) - \left(\log(\text{predicted}_i) - \text{center}(\log(\text{predicted}))\right) \right)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
center –
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import aitchison >>> t = np.random.random(10) >>> p = np.random.random(10) >>> aitchison(t, p)
- SeqMetrics.aic(true, predicted, treat_arrays: bool = True, p: int = 1, **treat_arrays_kws) float[source]
It estimates relative quality of a model for a given input. By comparing AIC for differnt models, we can identify the model which best explains the data. Theoretically, it penlizes those models with more parameters thereby reducing overfitting/model complexity. When comparing multiple models, the one with the lowest value is generally preferred. When sample size is small, then AIC can be biased. Akaike Information Criterion. Modifying from this sourcee
\[AIC = n \cdot \ln\left(\frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{n}\right) + 2p\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
p (int) – number of parameters in the model
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import aic >>> t = np.random.random(10) >>> p = np.random.random(10) >>> aic(t, p)
- SeqMetrics.amemiya_pred_criterion(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Amemiya’s Prediction Criterion
\[\text{APC} = \left( \frac{n + k}{n - k} \right) \left( \frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2 \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import amemiya_pred_criterion >>> t = np.random.random(10) >>> p = np.random.random(10) >>> amemiya_pred_criterion(t, p)
- SeqMetrics.amemiya_adj_r2(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[R^2_{\text{adj, Amemiya}} = 1 - \left( \frac{(1 - R^2) \cdot (n + k)}{n - k - 1} \right)\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import amemiya_adj_r2 >>> t = np.random.random(10) >>> p = np.random.random(10) >>> amemiya_adj_r2(t, p)
- SeqMetrics.bias(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Bias as and given by Gupta1998 et al., 1998 in Table 1 It is also called mean error.
\[Bias=\frac{1}{N}\sum_{i=1}^{N}(True_{i}-Predicted_{i})\]- Parameters:
true – true/observed/actual/measured/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import bias >>> t = np.random.random(10) >>> p = np.random.random(10) >>> bias(t, p) ... >>> bias([1.1, 2.2, 3.3], [11.1, 12.2, 13.3]) -10.0
- SeqMetrics.bic(true, predicted, treat_arrays: bool = True, p=1, **treat_arrays_kws) float[source]
Bayesian Information Criterion
Minimising the BIC is intended to give the best model. The model chosen by the BIC is either the same as that chosen by the AIC, or one with fewer terms. This is because the BIC penalises the number of parameters more heavily than the AIC. Modified after RegscorePy.
\[BIC = n \cdot \ln\left(\frac{\text{SSE}}{n}\right) + p \cdot \ln(n)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
p –
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import bic >>> t = np.random.random(10) >>> p = np.random.random(10) >>> bic(t, p)
- SeqMetrics.brier_score(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Adopted from SkillMetrics This function calculates the Brier score (BS), which is a measure of the mean-square error of probability forecasts for a dichotomous (two-category) event, such as the occurrence/non-occurrence of precipitation. The score is calculated using the formula:
\[BS = sum_(n=1)^N (f_n - o_n)^2/N\]where f is the forecast probabilities, o is the observed probabilities (0 or 1), and N is the total number of values in f & o. Note that f & o must have the same number of values, and those values must be in the range [0,1].
- Returns:
BS : Brier score
- Return type:
References
D. S. Wilks, 1995: Statistical Methods in the Atmospheric Sciences. Cambridge Press. 547 pp
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import brier_score >>> t = np.random.random(10) >>> p = np.random.random(10) >>> brier_score(t, p)
- SeqMetrics.centered_rms_dev(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Modified after SkillMetrics. Calculates the centered root-mean-square (RMS) difference between true and predicted using the formula: (E’)^2 = sum_(n=1)^N [(p_n - mean(p))(r_n - mean(r))]^2/N where p is the predicted values, r is the true values, and N is the total number of values in p & r.
\[CRMSD = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left( (p_i - \text{mean}(p)) - (r_i - \text{mean}(r)) \right)^2}\]Output: CRMSDIFF : centered root-mean-square (RMS) difference (E’)^2
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import centered_rms_dev >>> t = np.random.random(10) >>> p = np.random.random(10) >>> centered_rms_dev(t, p)
- SeqMetrics.coeff_of_persistence(true, predicted, lag: int = 1, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Coefficient of Persistence as introducted by Kitanidis and Bras . Varies between -inf to 1. The higher the better.
- Parameters:
true – True/observed/actual/target values. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted values, same format as ‘true’.
lag – The lag for the baseline
treat_arrays – treat_arrays the true and predicted array
https (//rdrr.io/cran/hydroGOF/man/cp.html) –
Examples
>>> import numpy as np >>> from SeqMetrics import manhattan_distance >>> t = np.random.random(100) >>> p = np.random.random(100) >>> coeff_of_persistence(t, p)
- SeqMetrics.corr_coeff(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Pearson correlation coefficient as proposed by Pearson, 1895. It measures linear correlatin between true and predicted arrays. It is sensitive to outliers. The following equation is taken after Jiang et al., 2022 .
\[r = \frac{\sum ^n _{i=1}(predicted_i - \bar{predicted})(s_i - \bar{observed})}{\sqrt{\sum ^n _{i=1}(predicted_i - \bar{predicted})^2} \sqrt{\sum ^n _{i=1}(true_i - \bar{true})^2}}\]Where n is length of true/predicted arrays.
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import corr_coeff >>> t = np.random.random(10) >>> p = np.random.random(10) >>> corr_coeff(t, p)
- SeqMetrics.covariance(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Covariance as defined in Eq. 3 at mathworld A positive covariance means that the means of true and predicted values increase or decrease together.
\[Covariance = \frac{1}{N} \sum_{i=1}^{N}((true_{i} - \bar{true}) * (predicted_{i} - \bar{predicted}))\]The bar represents the mean of the array.
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import covariance >>> t = np.random.random(10) >>> p = np.random.random(10) >>> covariance(t, p)
- SeqMetrics.concordance_corr_coef(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Concordance Correlation Coefficient (CCC) taken from this paper.
\[CCC = \frac{2 \rho \sigma_{true} \sigma_{predicted}}{\sigma_{true}^2 + \sigma_{predicted}^2 + (\bar{true} - \bar{predicted})^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import concordance_corr_coef >>> t = np.random.random(10) >>> p = np.random.random(10) >>> concordance_corr_coef(t, p)
- SeqMetrics.cosine_similarity(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
It is a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. See
\[\text{Cosine Similarity} = \frac{\sum_{i=1}^{n} \text{true}_i \cdot \text{predicted}_i}{\sqrt{\sum_{i=1}^{n} (\text{true}_i)^2} \cdot \sqrt{\sum_{i=1}^{n} (\text{predicted}_i)^2}}\]References
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import cosine_similarity >>> t = np.random.random(10) >>> p = np.random.random(10) >>> cosine_similarity(t, p)
- SeqMetrics.critical_success_index(true, predicted, treat_arrays: bool = True, threshold=0.5, **treat_arrays_kws) float[source]
-
\[CSI = \frac{TP}{TP + FN + FP}\]
- Parameters:
true – True/observed/actual/target values. It should be a binary array (0s and 1s), or a continuous array where values are binarized using a threshold.
predicted – Predicted values, same format as ‘true’.
treat_arrays – treat_arrays the true and predicted array
threshold – Threshold for binarizing continuous values (if applicable).
Examples
>>> import numpy as np >>> from SeqMetrics import critical_success_index >>> t = np.array([0.4, 0.1, 0.1, 0.3, 0.7, 0.1]) >>> p = np.array([0.8, 0.11, 0.5, 0.1, 0.1, 0.1]) >>> critical_success_index(t, p)
- SeqMetrics.cronbach_alpha(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
It is a measure of internal consitency of data following Cheung and Yip, 2005. See ucla and stackoverflow pages for more info.
\[alpha = \frac{N}{N - 1} \left(1 - \frac{\sum_{i=1}^{N} \sigma^2_{i}}{\sigma^2_{\text{total}}}\right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import cronbach_alpha >>> t = np.random.random(10) >>> p = np.random.random(10) >>> cronbach_alpha(t, p)
- SeqMetrics.decomposed_mse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Decomposed MSE developed by Kobayashi and Salam (2000) Equation 24
\[dMSE = (\frac{1}{N}\sum_{i=1}^{N}(e_{i}-s_{i}))^2 + SDSD + LCS\]\[SDSD = (\sigma(e) - \sigma(s))^2\]\[LCS = 2 \sigma(e) \sigma(s) * (1 - \frac{\sum ^n _{i=1}(e_i - \bar{e})(s_i - \bar{s})} {\sqrt{\sum ^n _{i=1}(e_i - \bar{e})^2} \sqrt{\sum ^n _{i=1}(s_i - \bar{s})^2}})\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import decomposed_mse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> decomposed_mse(t, p)
- SeqMetrics.euclid_distance(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Euclidian distance taken from Elementary DIfferential Geometry by Barret O’Neil.
\[D = \sqrt{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}\]- Parameters:
true – true/observed/actual/measured/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import euclid_distance >>> t = np.random.random(10) >>> p = np.random.random(10) >>> euclid_distance(t, p)
- SeqMetrics.exp_var_score(true, predicted, treat_arrays: bool = True, weights=None, **treat_arrays_kws) float | None[source]
Explained variance score . Best value is 1, lower values are less accurate.
\[\text{EVS} = 1 - \frac{\sum_{i=1}^{n} w_i \left( (true_i - predicted_i) - \frac{\sum_{j=1}^{n} w_j (true_j - predicted_j)}{\sum_{j=1}^{n} w_j} \right)^2}{\sum_{i=1}^{n} w_i (true_i - \frac{\sum_{j=1}^{n} w_j true_j}{\sum_{j=1}^{n} w_j})^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
weights –
Examples
>>> import numpy as np >>> from SeqMetrics import exp_var_score >>> t = np.random.random(10) >>> p = np.random.random(10) >>> exp_var_score(t, p)
- SeqMetrics.expanded_uncertainty(true, predicted, treat_arrays: bool = True, cov_fact=1.96, **treat_arrays_kws) float[source]
By default, it calculates uncertainty with 95% confidence interval. 1.96 is the coverage factor corresponding 95% confidence level .This indicator is used in order to show more information about the model deviation. Using formula from by Behar et al., 2015 and Gueymard et al., 2014.
\[U = \text{cov_fact} \times \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} \left( \left(\text{true}_i - \text{predicted}_i\right) - \overline{\left(\text{true} - \text{predicted}\right)} \right)^2 + \frac{1}{n} \sum_{i=1}^{n} \left(\text{true}_i - \text{predicted}_i\right)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
cov_fact –
Examples
>>> import numpy as np >>> from SeqMetrics import expanded_uncertainty >>> t = np.random.random(10) >>> p = np.random.random(10) >>> expanded_uncertainty(t, p)
- SeqMetrics.fdc_fhv(true, predicted, treat_arrays: bool = True, h: float = 0.02, **treat_arrays_kws) float[source]
Peak flow bias of the flow duration curve (Yilmaz 2008) as used in kratzert et al., 2019. Code modified Kratzert2018 code.
\[FHV = \frac{\sum_{i=1}^{k} (predicted_i - true_i)}{\sum_{i=1}^{k} true_i} \times 100\]- Parameters:
- Return type:
Bias of the peak flows
Examples
>>> import numpy as np >>> from SeqMetrics import fdc_fhv >>> t = np.random.random(10) >>> p = np.random.random(10) >>> fdc_fhv(t, p)
- SeqMetrics.fdc_flv(true, predicted, treat_arrays: bool = True, low_flow: float = 0.3, **treat_arrays_kws) float[source]
bias of the bottom 30 % low flows as used in kratzert et al., 2019.
\[\text{FLV} = -1 \times \frac{\sum (\log(\text{predicted}) - \min(\log(\text{predicted}))) - \sum (\log(\text{true}) - \min(\log(\text{true})))}{\sum (\log(\text{true}) - \min(\log(\text{true}))) + 1 \times 10^{-6}}\]- Parameters:
low_flow (float, optional) – Upper limit of the flow duration curve. E.g. 0.3 means the bottom 30% of the flows are considered as low flows, by default 0.3
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
- Return type:
Examples
>>> import numpy as np >>> from SeqMetrics import fdc_flv >>> t = np.random.random(10) >>> p = np.random.random(10) >>> fdc_flv(t, p)
- SeqMetrics.gmae(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[GMAE = \left( \prod_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right| \right)^{\frac{1}{n}}\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import gmae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> gmae(t, p)
- SeqMetrics.gmean_diff(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Geometric mean difference. First geometric mean is calculated for true and predicted arrays and their difference is calculated.
\[\text{gmean_diff} = \left( \prod_{i=1}^{n} \text{true}_i \right)^{\frac{1}{n}} - \left( \prod_{i=1}^{n} \text{predicted}_i \right)^{\frac{1}{n}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import gmean_diff >>> t = np.random.random(10) >>> p = np.random.random(10) >>> gmean_diff(t, p)
- SeqMetrics.gmrae(true, predicted, treat_arrays: bool = True, benchmark: ndarray | None = None, **treat_arrays_kws) float[source]
Geometric Mean Relative Absolute Error
\[GMRAE = \left( \prod_{i=1}^{n} \frac{|true_i - predicted_i|}{|true_i - benchmark_i|} \right)^{\frac{1}{n}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
benchmark –
Examples
>>> import numpy as np >>> from SeqMetrics import gmrae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> gmrae(t, p)
- SeqMetrics.inrse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Integral Normalized Root Squared Error
\[IN\text{-}RSE = \sqrt{\frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} (\text{true}_i - \overline{\text{true}})^2}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import inrse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> inrse(t, p)
- SeqMetrics.irmse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Inertial RMSE. RMSE divided by standard deviation of the gradient of true.
\[\text{IRMSE} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \text{true}_i - \text{predicted}_i \right)^2}}{\sqrt{\frac{1}{n-2} \sum_{i=1}^{n-1} \left( (\text{true}_{i+1} - \text{true}_i) - \overline{(\text{true}_{i+1} - \text{true}_i)} \right)^2}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import irmse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> irmse(t, p)
- SeqMetrics.JS(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[JS(P \parallel Q) = \frac{1}{2} \sum_{i} \left( P(i) \log_2 \left( \frac{2P(i)}{P(i) + Q(i)} \right) + Q(i) \log_2 \left( \frac{2Q(i)}{P(i) + Q(i)} \right) \right)\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import JS >>> t = np.random.random(10) >>> p = np.random.random(10) >>> JS(t, p)
- SeqMetrics.kge(true, predicted, treat_arrays: bool = True, return_all: bool = False, **treat_arrays_kws) float | ndarray[source]
Kling-Gupta Efficiency following Gupta et al. 2009. This error considers correlation (r), variability (\(\alpha\)) and mean difference/error which is also called bias (\(\beta\)). KGE values varies from -infinity to 1 with higher the better. KGE values above -0.41 means the simulted/predicted (by the model) is better than the mean of the observed data (Knoben et al, 2019).
\[\text{KGE} = 1 - \sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\]\[\alpha = \frac{\sigma_{\text{predicted}}}{\sigma_{\text{true}}}\]\[\beta = \frac{\mu_{\text{predicted}}}{\mu_{\text{true}}}\]Please note that bias (\(\beta\)) is not same as
SeqMetrics.bias()method. The term \(\sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\) is also called euclidean distance which means KGE can also be defined as below\[\text{KGE} = 1 - ED\]Another form of KGE equation is below:
\[\text{KGE} = \frac{\sum_{i=1}^{N} ( \text{true}_i - \bar{\text{true}} ) ( \text{predicted}_i - \bar{\text{predicted}} )}{\sqrt{\sum_{i=1}^{N} ( \text{true}_i - \bar{\text{true}} )^2} \sqrt{\sum_{i=1}^{N} ( \text{predicted}_i - \bar{\text{predicted}} )^2}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using
SeqMetrics.utils.treat_arrays()functionreturn_all – If True, it returns a numpy array of shape (4, ) containing kge, \(\gamma\), \(\alpha\), \(\beta\). Otherwise, it returns kge.
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
- Return type:
If return_all is True, it returns a numpy array of shape (4, ) containing kge, correlation (r), variability (\(\alpha\)) and bias (\(\beta\)). Otherwise, it returns kge score.
Examples
>>> import numpy as np >>> from SeqMetrics import kge >>> t = np.random.random(10) >>> p = np.random.random(10) >>> kge(t, p) >>> kge, corr, var, bias = kge(t, p, return_all=True)
- SeqMetrics.kge_bound(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mathevet et al. 2006 proposed a bounded version of NSE since the original NSE lacks a lower bound and thus have skewed distribution when calculated for large number of basins. To avoid its skewed distributions and make it vary between -1 and +1, they proposed a bounder version of the statistic i.e. NSE. The same concept is applied here to KGE. As per the authors, this bounded version of the statistic makes it less optimistic for positive values.
\[\text{KGE}_{\text{bound}} = \frac{\text{KGE}}{2 - \text{KGE}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import kge_bound >>> t = np.random.random(10) >>> p = np.random.random(10) >>> kge_bound(t, p)
- SeqMetrics.kge_mod(true, predicted, treat_arrays: bool = True, return_all=False, **treat_arrays_kws) float | ndarray[source]
Modified Kling-Gupta Efficiency after Kling et al. 2012. Similar to original KGE, its values varies fro -infinity to 1 with higher the better.
This version of KGE was introduced to avoid cross-correlation between bias and variability which happens when the precipitation data is biased. This is done by calculating the variability (\(\alpha\)) by \({CV}_s/{CV}_o\) instaed of \({\sigma}_s/{\sigma}_o\) where CV is the coefficient of variation which is defined as the ratio of the standard deviation to the mean (\({\sigma}/{\mu}\)).
\[\text{KGE`} = 1 - \sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a
numpy.array, orpandas.DataFrameorpandas.Seriesor a pythonlistor any object which has__len__method.predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
return_all –
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
- Return type:
If return_all is True, it returns a numpy array of shape (4, ) containing kge, \(\gamma\), \(\alpha\) and \(\beta\). Otherwise, it returns kge.
Examples
>>> import numpy as np >>> from SeqMetrics import kge_mod >>> t = np.random.random(10) >>> p = np.random.random(10) >>> kge_mod(t, p)
- SeqMetrics.kge_np(true, predicted, treat_arrays: bool = True, return_all=False, **treat_arrays_kws) float | ndarray[source]
Non-parametric Kling-Gupta Efficiency after Pool et al. 2018.
This differs from original KGE by using non-parameteric components of KGE i.e. \(\alpha\) and \(\gamma\) / cc. The variability (\(\alpha\)) is non-parametrized by using the FDCs of the true and predicted values. The FDCs are normalized to remove the volume information. It also differs from normal kge by using the Spearman’s rank correlation instead of Pearson’s correlation coefficient.
\[\text{KGE}_{\text{np}} = 1 - \sqrt{(cc - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\]\[cc = \rho(\text{true}, \text{predicted})\]\[\alpha = 1 - 0.5 \sum_{i=1}^{n} \left| \frac{\text{sorted(predicted}_i\text{)}}{\text{mean(predicted)} \cdot n} - \frac{\text{sorted(true}_i\text{)}}{\text{mean(true)} \cdot n} \right|\]\[\beta = \frac{\text{mean(predicted)}}{\text{mean(true)}}\]- Parameters:
true – true/observed/actual/measured/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
return_all –
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
- Return type:
If return_all is True, it returns a numpy array of shape (4, ) containing kge, \(cc\), \(\alpha\) and \(\beta\). Otherwise, it returns kge.
Examples
>>> import numpy as np >>> from SeqMetrics import kge_np >>> t = np.random.random(10) >>> p = np.random.random(10) >>> kge_np(t, p)
- SeqMetrics.kgenp_bound(true, predicted, treat_arrays: bool = True, **treat_arrays_kws)[source]
Bounded Version of the Non-Parametric Kling-Gupta Efficiency
\[KGE_{np_{bound}} = \frac{1 - \sqrt{\left(\rho(t, p) - 1\right)^2 + \left(1 - 0.5 \sum_{i=1}^{n} \left| \frac{\text{sorted}(p_i)}{\text{mean}(p) \cdot n} - \frac{\text{sorted}(t_i)}{\text{mean}(t) \cdot n} \right| - 1\right)^2 + \left(\frac{\text{mean}(p)}{\text{mean}(t)} - 1\right)^2}}{2 - \left(1 - \sqrt{\left(\rho(t, p) - 1\right)^2 + \left(1 - 0.5 \sum_{i=1}^{n} \left| \frac{\text{sorted}(p_i)}{\text{mean}(p) \cdot n} - \frac{\text{sorted}(t_i)}{\text{mean}(t) \cdot n} \right| - 1\right)^2 + \left(\frac{\text{mean}(p)}{\text{mean}(t)} - 1\right)^2}\right)}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import kgenp_bound >>> t = np.random.random(10) >>> p = np.random.random(10) >>> kgenp_bound(t, p)
- SeqMetrics.kl_sym(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float | None[source]
Symmetric kullback-leibler divergence
\[\text{KL}_{\text{sym}}(P || Q) = \frac{1}{2} \sum_{i=1}^{n} \left( P_i - Q_i \right) \left( \log_2 \frac{P_i}{Q_i} \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import kl_sym >>> t = np.random.random(10) >>> p = np.random.random(10) >>> kl_sym(t, p)
- SeqMetrics.kl_divergence(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[D_{KL}(P \parallel Q) = \sum_{i} P(i) \log \left( \frac{P(i)}{Q(i)} \right)\]
- Parameters:
true – True/observed/actual/target probability distribution. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted probability distribution, same format as ‘true’.
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import kl_divergence >>> t = np.array([0.1, 0.2, 0.3, 0.2, 0.2]) >>> p = np.array([0.2, 0.2, 0.2, 0.2, 0.2]) >>> divergence = kl_divergence(t, p)
- SeqMetrics.lm_index(true, predicted, treat_arrays: bool = True, obs_bar_p=None, **treat_arrays_kws) float[source]
Legate-McCabe Efficiency Index. Less sensitive to outliers in the data. The larger, the better
\[a_i = |predicted_i - true_i|\]\[b_i = |true_i - \text{obs\_bar\_p}| \text{if } \text{obs\_bar\_p} \text{ is provided} \|true_i - \bar{true}| \text{otherwise}\]\[\text{LM Index} = 1 - \frac{\sum_{i=1}^{n} a_i}{\sum_{i=1}^{n} b_i}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
obs_bar_p (float,) – Seasonal or other selected average. If None, the mean of the observed array will be used.
Examples
>>> import numpy as np >>> from SeqMetrics import lm_index >>> t = np.random.random(10) >>> p = np.random.random(10) >>> lm_index(t, p)
- SeqMetrics.log_prob(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Logarithmic probability distribution
\[\text{log_prob} = \frac{1}{N} \sum_{i=1}^{N} \left( -\frac{\left( \frac{\text{true}_i - \text{predicted}_i}{\text{scale}} \right)^2}{2} - \log(\sqrt{2\pi}) \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import log_prob >>> t = np.random.random(10) >>> p = np.random.random(10) >>> log_prob(t, p)
- SeqMetrics.log_nse(true, predicted, treat_arrays: bool = True, epsilon: float = 0.0, log_base: str = 'e', **treat_arrays_kws) float[source]
log transformed Nash-Sutcliffe Efficiency.
It is especially useful for capturing prediction performance for the lowest flows due to the logarithmic transform.
\[NSE = 1-\frac{\sum_{i=1}^{N}(log(e_{i})-log(s_{i}))^2}{\sum_{i=1}^{N}(log(e_{i})-log(\bar{e})^2}-1)*-1\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
epsilon – A small value to be added to true and predicted values to avoid log(0)
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
References
Pushpalatha, R.; Perrin, C.; le Moine, N. and Andréassian V. (2012). “A review of efficiency criteria suitable for evaluating low-flow simulations”. Journal of Hydrology. 420-421, 171-182. doi:10.1016/j.jhydrol.2011.11.055
https://doi.org/10.1029/2012WR012005
Examples
>>> import numpy as np >>> from SeqMetrics import log_nse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> log_nse(t, p)
- SeqMetrics.log_cosh_error(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[\text{Log-Cosh Error} = \frac{1}{n} \sum_{i=1}^{n} \log \left( \cosh(\text{predicted}_i - \text{true}_i) \right)\]
- Parameters:
true – True/observed/actual/target values. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted values, same format as ‘true’.
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import log_cosh_error >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> error = log_cosh_error(t, p)
- SeqMetrics.legates_coeff_eff(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Legates Coefficient of Efficiency. Its value varies between 0 and 1. It is not as sensitive to extreme values as agreement_index and coefficcient of determination because of the utilization of the absolute value of the difference instead of the squared difference. See Equaltion 23 in Dodo et al., 2022
\[LCE = 1 - \frac{\sum_{i=1}^{n} |true_i - predicted_i|}{\sum_{i=1}^{n} |true_i - \bar{true}|}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import legates_coeff_eff >>> t = np.random.random(10) >>> p = np.random.random(10) >>> agreement_index(t, p)
- SeqMetrics.maape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Arctangent Absolute Percentage Error Note: result is NOT multiplied by 100
\[MAAPE = \frac{1}{n} \sum_{i=1}^{n} \arctan \left( \frac{| \text{true}_i - \text{predicted}_i |}{| \text{true}_i | + \epsilon} \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import maape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> maape(t, p)
- SeqMetrics.mae(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Absolute Error. It is less sensitive to outliers as compared to mse/rmse.
\[\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import mae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mae(t, p)
- SeqMetrics.mape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Absolute Percentage Error. The MAPE is often used when the quantity to predict is known to remain way above zero. It is useful when the size or size of a prediction variable is significant in evaluating the accuracy of a prediction. It has advantages of scale-independency and interpretability. However, it has the significant disadvantage that it produces infinite or undefined values for zero or close-to-zero actual values.
\[MAPE = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{true_i - predicted_i}{true_i} \right| \times 100\]References
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import mape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mape(t, p)
- SeqMetrics.mase(true, predicted, treat_arrays: bool = True, seasonality: int = 1, **treat_arrays_kws)[source]
Mean Absolute Scaled Error following Hyndman et al., 2006. Baseline (benchmark) is computed with naive forecasting (shifted by seasonality) modified after this. It is the ratio of MAE of used model and MAE of naive forecast.
\[\text{MASE} = \frac{\frac{1}{n} \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\frac{1}{n-s} \sum_{i=s+1}^{n} \left| \text{true}_i - \text{true}_{i-s} \right|}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function process the true and predicted arrays using maybe_treat_arrays function
seasonality –
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import mase >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mase(t, p)
- SeqMetrics.mare(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Absolute Relative Error. When expressed in %age, it is also known as mape.
\[\text{MARE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \right|\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import mare >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mare(t, p)
- SeqMetrics.max_error(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
maximum absolute error In Sklearn, there is “absolute” in equation but not in name of metric.
\[\text{Max Error} = \max_{i=1}^n \left| \text{true}_i - \text{predicted}_i \right|\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import max_error >>> t = np.random.random(10) >>> p = np.random.random(10) >>> max_error(t, p)
- SeqMetrics.mape_for_peaks(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Absolute Percentage Error for peaks which are found using scipy.singnal.find_peaks
\[\text{MAPE}_\text{peak} = \frac{1}{P}\sum_{p=1}^{P} \left |\frac{Q_{s,p} - Q_{o,p}}{Q_{o,p}} \right | \times 100,\]- Parameters:
true – True/observed/actual/target values. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted values, same format as ‘true’.
treat_arrays – treat_arrays the true and predicted array
https (//github.com/neuralhydrology/neuralhydrology/blob/master/neuralhydrology/evaluation/metrics.py#L707) –
Examples
>>> import numpy as np >>> from SeqMetrics import mape_for_peaks >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> score = mre(t, p)
- SeqMetrics.manhattan_distance(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Manhattan distance, also known as cityblock distance or taxicab norm.
See Blanco-Mallo et al., 2023 and Alexei Botchkarev 2019 on the use of distances in performance measures.
\[D_{\text{manhattan}} = \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|\]- Parameters:
true – True/observed/actual/target values. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted values, same format as ‘true’.
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import manhattan_distance >>> t = np.random.random(100) >>> p = np.random.random(100) >>> manhattan_distance(t, p)
- SeqMetrics.mapd(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean absolute percentage deviation
\[MAPD = \frac{\sum_{i=1}^{n} \left| predicted_i - true_i \right|}{\sum_{i=1}^{n} \left| true_i \right|}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mapd >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mapd(t, p)
- SeqMetrics.mbrae(true, predicted, treat_arrays: bool = True, benchmark: ndarray | None = None, **treat_arrays_kws) float[source]
Mean Bounded Relative Absolute Error
\[MBRAE = \frac{1}{n} \sum_{i=1}^{n} \frac{| \text{true}_i - \text{predicted}_i |}{| \text{true}_i - \text{benchmark}_i |}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
benchmark –
Examples
>>> import numpy as np >>> from SeqMetrics import mbrae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mbrae(t, p)
- SeqMetrics.mb_r(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[R = 1 - \frac{n^2 \cdot \frac{1}{n} \sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|}{\sum_{i=1}^{n} \sum_{j=1}^{n} \left| \text{predicted}_j - \text{true}_i \right|}\]
References
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mb_r >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mb_r(t, p)
- SeqMetrics.msle(true, predicted, treat_arrays=True, weights=None, **treat_arrays_kws) float[source]
-
\[\text{MSLE} = \frac{\sum_{i=1}^{n} w_i \cdot \text{sq_log_error}_i}{\sum_{i=1}^{n} w_i}\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
weights –
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import msle >>> t = np.random.random(10) >>> p = np.random.random(10) >>> msle(t, p)
- SeqMetrics.med_seq_error(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Median Squared Error It is same as mse, but it takes median which reduces the impact of outliers.
\[\text{MedSE} = \text{median} \left( (\text{predicted}_i - \text{true}_i)^2 \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import med_seq_error >>> t = np.random.random(10) >>> p = np.random.random(10) >>> med_seq_error(t, p)
- SeqMetrics.mda(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Directional Accuracy
\[\text{MDA} = \frac{1}{n-1} \sum_{i=1}^{n-1} \left( \text{sign}( \text{true}_{i+1} - \text{true}_i) == \text{sign}( \text{predicted}_{i+1} - \text{predicted}_i) \right)\]modified after.
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mda >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mda(t, p)
- SeqMetrics.mde(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[MDE = \text{median}(\text{predicted}_i - \text{true}_i)\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mde >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mde(t, p)
- SeqMetrics.mdape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Median Absolute Percentage Error. The value is multiplied by 100.
\[\text{MdAPE} = 100 \times \text{Median} \left( \left\{ \frac{|\text{true}_i - \text{predicted}_i|}{|\text{true}_i|} \right\}_{i=1}^n \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mdape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mdape(t, p)
- SeqMetrics.mdrae(true, predicted, treat_arrays: bool = True, benchmark: ndarray | None = None, **treat_arrays_kws) float[source]
Median Relative Absolute Error In Sklearn, there is “absolute” in equation but not in name of metric.
\[MdRAE = \text{median} \left( \left| \frac{true_i - predicted_i}{true_i - benchmark_i} \right| \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
benchmark –
Examples
>>> import numpy as np >>> from SeqMetrics import mdrae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mdrae(t, p)
- SeqMetrics.me(true, predicted, treat_arrays: bool = True, **treat_arrays_kws)[source]
Mean error or bias.
\[ME = \frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import me >>> t = np.random.random(10) >>> p = np.random.random(10) >>> me(t, p)
- SeqMetrics.mean_bias_error(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Bias Error It represents overall bias error or systematic error. It shows average interpolation bias; i.e. average over- or underestimation. [1][2].This indicator expresses a tendency of model to underestimate (negative value) or overestimate (positive value) global radiation, while the mean bias error values closest to zero are desirable. The drawback of this test is that it does not show the correct performance when the model presents overestimated and underestimated values at the same time, since overestimation and underestimation values cancel each other.
\[\text{MBE} = \frac{1}{N} \sum_{i=1}^{N} (true_i - predicted_i)\]References
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mean_bias_error >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mean_bias_error(t, p)
- SeqMetrics.mean_var(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean variance, adopted from HydroErr
\[\text{mean_var} = \text{Var} \left( \log(1 + \text{true}) - \log(1 + \text{predicted}) \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mean_var >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mean_var(t, p)
- SeqMetrics.mean_poisson_deviance(true, predicted, treat_arrays: bool = True, weights=None, **treat_arrays_kws) float[source]
mean poisson deviance
\[\text{MPD} = \frac{1}{n} \sum_{i=1}^{n} 2 \left( \text{true}_i \log \left( \frac{\text{true}_i}{\text{predicted}_i} \right) - (\text{true}_i - \text{predicted}_i) \right)\]References
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_poisson_deviance.html
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
weights –
Examples
>>> import numpy as np >>> from SeqMetrics import mean_poisson_deviance >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mean_poisson_deviance(t, p)
- SeqMetrics.mean_gamma_deviance(true, predicted, treat_arrays: bool = True, weights=None, **treat_arrays_kws) float[source]
-
\[\text{Mean Gamma Deviance (Weighted)} = \frac{1}{\sum_{i=1}^{n} w_i} \sum_{i=1}^{n} w_i \frac{2}{\text{true}_i} \left( \text{predicted}_i - \text{true}_i - \text{true}_i \ln \left( \frac{\text{predicted}_i}{\text{true}_i} \right) \right)\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
weights –
Examples
>>> import numpy as np >>> from SeqMetrics import mean_gamma_deviance >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mean_gamma_deviance(t, p)
- SeqMetrics.median_abs_error(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
median absolute error
\[\text{MedAE} = \text{median} \left( \left| \text{true}_i - \text{predicted}_i \right| \right)\]References
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.median_absolute_error.html
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import median_abs_error >>> t = np.random.random(10) >>> p = np.random.random(10) >>> median_abs_error(t, p)
- SeqMetrics.mle(true, predicted, treat_arrays=True, **treat_arrays_kws) float[source]
-
\[\text{MLE} = \frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + \text{predicted}_i) - \log(1 + \text{true}_i) \right)\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mle >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mle(t, p)
- SeqMetrics.mod_agreement_index(true, predicted, treat_arrays: bool = True, j: int = 1, **treat_arrays_kws) float[source]
Modified agreement of index. It varies between 0 and 1 where 1 indicates perfect match between the observed and predicted values.
\[MAI = 1 - \frac{\sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|^j}{\sum_{i=1}^{n} \left( \left| \text{predicted}_i - \overline{\text{true}} \right| + \left| \text{true}_i - \overline{\text{true}} \right| \right)^j}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
j (int, default 1) – when j is 2, this is same as agreement_index. Higher j means more impact of outliers.
Examples
>>> import numpy as np >>> from SeqMetrics import mod_agreement_index >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mod_agreement_index(t, p)
- SeqMetrics.mpe(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Percentage Error. The value is multiplied by 100 to reflect percentage.
\[MPE = \frac{1}{n} \sum_{i=1}^{n} \left( \frac{true_i - predicted_i}{true_i} \right) \times 100\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import mpe >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mpe(t, p)
- SeqMetrics.mrae(true, predicted, treat_arrays: bool = True, benchmark: ndarray | None = None, **treat_arrays_kws)[source]
-
\[MRAE = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{true}_i - \text{predicted}_i}{\text{benchmark}_i} \right|\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
benchmark –
Examples
>>> import numpy as np >>> from SeqMetrics import mrae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> mrae(t, p)
- SeqMetrics.mse(true, predicted, treat_arrays: bool = True, weights=None, **treat_arrays_kws) float[source]
-
\[MSE = \frac{\sum_{i=1}^{N} w_i (true_i - predicted_i)^2}{\sum_{i=1}^{N} w_i}\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
weights –
Examples
>>> import numpy as np >>> from SeqMetrics import mse >>> t = np.random.random(10) >>> p = np.random.random(10)treat_arrays >>> mse(t, p)
- SeqMetrics.minkowski_distance(true, predicted, order=1, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[D_{Minkowski} = \left( \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|^p \right)^{\frac{1}{p}}\]
- Parameters:
true – True/observed/actual/target values. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted values, same format as ‘true’.
order – The order of the norm of the difference. order=2 is equivalent to the Euclidean distance, order=1 is the Manhattan distance.
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import minkowski_distance >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> order = 2 # Euclidean distance >>> distance = minkowski_distance(t, p, order)
- SeqMetrics.mre(true, predicted, benchmark: ndarray | None = None, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[\text{MRE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \right|\]
- Parameters:
true – True/observed/actual/target values. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted values, same format as ‘true’.
benchmark –
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import mre >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> score = mre(t, p)
- SeqMetrics.nse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Nash-Sutcliff Efficiency.
The Nash-Sutcliffe efficiency (NSE) is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance It determines how well the model simulates trends for the output response of concern. But cannot help identify model bias and cannot be used to identify differences in timing and magnitude of peak flows and shape of recession curves; in other words, it cannot be used for single-event simulations. It is sensitive to extreme values due to the squared differences Moriasi et a., 2015. To make it less sensitive to outliers, Krause et al., 2005 proposed log and relative nse.
\[\text{NSE} = 1 - \frac{\sum_{i=1}^{N} (predicted_i - true_i)^2}{\sum_{i=1}^{N} (true_i - \bar{true})^2}\]where the bar above predicted and true indicates the mean of the array.
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import nse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nse(t, p)
- SeqMetrics.nse_alpha(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Alpha decomposition of the NSE, see Gupta et al. 2009 used in kratzert et al., 2019.
\[\text{NSE}_{\text{alpha}} = \frac{\sigma_{\text{predicted}}}{\sigma_{\text{true}}}\]- Returns:
Alpha decomposition of the NSE
- Return type:
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import nse_alpha >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nse_alpha(t, p)
- SeqMetrics.nse_beta(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Beta decomposition of NSE. See Gupta et al. 2009 used in kratzert et al., 2019.
\[\text{NSE}_{\text{beta}} = \frac{\mu_{\text{predicted}} - \mu_{\text{true}}}{\sigma_{\text{true}}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
- Returns:
Beta decomposition of the NSE
- Return type:
Examples
>>> import numpy as np >>> from SeqMetrics import nse_beta >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nse_beta(t, p)
- SeqMetrics.nse_mod(true, predicted, treat_arrays: bool = True, j=1, **treat_arrays_kws) float[source]
Gives less weightage to outliers if j=1 and if j>1 then it gives more weightage to outliers following Krause_ et al., 2005.
\[\text{NSE}_{\text{mod}} = 1 - \frac{\sum_{i=1}^{N} \left| \text{predicted}_i - \text{true}_i \right|^j}{\sum_{i=1}^{N} \left| \text{true}_i - \bar{ ext{true}} \right|^j}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
j –
Examples
>>> import numpy as np >>> from SeqMetrics import nse_mod >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nse_mod(t, p)
- SeqMetrics.nse_rel(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Relative Nash-Sutcliff Efficiency.
\[\text{NSE}_{\text{rel}} = 1 - \frac{\sum_{i=1}^{N} \left( \frac{|\text{predicted}_i - \text{true}_i|}{\text{true}_i} \right)^2}{\sum_{i=1}^{N} \left( \frac{|\text{true}_i - \overline{\text{true}}|}{\overline{\text{true}}} \right)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import nse_rel >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nse_rel(t, p)
- SeqMetrics.nse_bound(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Bounded Version of the Nash-Sutcliffe Efficiency (nse)
\[\text{NSE}_{\text{bound}} = \frac{\text{NSE}}{2 - \text{NSE}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import nse_bound >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nse_bound(t, p)
- SeqMetrics.nrmse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Normalized Root Mean Squared Error
\[NRMSE = \frac{\sqrt{\frac{1}{N} \sum_{i=1}^{N} (\text{true}_i - \text{predicted}_i)^2}}{\max(\text{true}) - \min( ext{true})}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import nrmse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nrmse(t, p)
- SeqMetrics.norm_euclid_distance(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[D_{norm} = \sqrt{\sum_{i=1}^{n} \left( \frac{\text{true}_i}{\bar{\text{true}}} - \frac{\text{predicted}_i}{\bar{\text{predicted}}} \right)^2}\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import norm_euclid_distance >>> t = np.random.random(10) >>> p = np.random.random(10) >>> norm_euclid_distance(t, p)
- SeqMetrics.nrmse_range(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Range Normalized Root Mean Squared Error after Pontius et al., 2008
RMSE normalized by true values. This allows comparison between data sets with different scales. It is more sensitive to outliers.
\[\text{NRMSE} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{predicted}_i - \text{true}_i)^2}}{\max(\text{true}) - \min(\text{true})}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import nrmse_range >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nrmse_range(t, p)
- SeqMetrics.nrmse_ipercentile(true, predicted, treat_arrays: bool = True, q1=25, q2=75, **treat_arrays_kws) float[source]
RMSE normalized by inter percentile range of true. This is the least sensitive to outliers. q1: any interger between 1 and 99 q2: any integer between 2 and 100. Should be greater than q1. Reference: Pontius et al., 2008.
\[\text{NRMSE}_{\text{IP}} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}}{Q_{q2} - Q_{q1}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
q1 –
q2 –
Examples
>>> import numpy as np >>> from SeqMetrics import nrmse_ipercentile >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nrmse_ipercentile(t, p)
- SeqMetrics.nrmse_mean(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Mean Normalized RMSE
RMSE normalized by mean of true values.This allows comparison between datasets with different scales.
\[NRMSE_{mean} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}}{\bar{\text{true}}}\]Reference: Pontius et al., 2008 :param true: true/observed/actual/target values. It must be a numpy array,
or pandas series/DataFrame or a list.
- Parameters:
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import nrmse_mean >>> t = np.random.random(10) >>> p = np.random.random(10) >>> nrmse_mean(t, p)
- SeqMetrics.norm_ae(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[norm\_ae = \sqrt{\frac{\sum_{i=1}^{n} (error_i - MAE)^2}{n - 1}}\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import norm_ae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> norm_ae(t, p)
- SeqMetrics.norm_nse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Normalized Nash-Sutcliffe Efficiency. It ranges from 0 to 1. A value of 1 indicates perfect fit.
Parameters
or pandas series/DataFrame or a list.
- predicted :
simulated values
- treat_arrays :
process the true and predicted arrays using maybe_treat_arrays function
- SeqMetrics.norm_ape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Normalized Absolute Percentage Error
\[\text{norm_APE} = \sqrt{ \frac{1}{n-1} \sum_{i=1}^{n} \left( \left| \frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \right| - \frac{1}{n} \sum_{j=1}^{n} \left| \frac{\text{true}_j - \text{predicted}_j}{\text{true}_j} \right| \right)^2 }\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import norm_ape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> norm_ape(t, p)
- SeqMetrics.pbias(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Percent bias determines how well the model simulates the average magnitudes for the output response of interest. It can also determine over and under-prediction. It cannot be used (1) for single-event simulations to identify differences in timing and magnitude of peak flows and the shape of recession curves nor (2) to determine how well the model simulates residual variations and/or trends for the output response of interest. It can give a deceiving rating of model performance if the model overpredicts as much as it underpredicts, in which case percent bias will be close to zero even though the model simulation is poor.
\[PBIAS = 100 \times \frac{\sum_{i=1}^{N} (\text{true}_i - \text{predicted}_i)}{\sum_{i=1}^{N} \text{true}_i}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import pbias >>> t = np.random.random(10) >>> p = np.random.random(10) >>> pbias(t, p)
- SeqMetrics.rae(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Relative Absolute Error (aka Approximation Error)
\[\text{RAE} = \frac{\sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\sum_{i=1}^{n} \left| \text{true}_i - \overline{\text{true}} \right|}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import rae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rae(t, p)
- SeqMetrics.ref_agreement_index(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Refined Index of Agreement after after Willmott et al., 2012. It varies from -1 to 1. Larger the better.
\[a = \sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|\]\[b = 2 \sum_{i=1}^{n} \left| \text{true}_i - \overline{\text{true}} \right|\]\[d_{\text{ref}} = \begin{cases} 1 - \frac{a}{b} & \text{if } a \leq b \ \frac{b}{a} - 1 & \text{if } a > b \end{cases}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import ref_agreement_index >>> t = np.random.random(10) >>> p = np.random.random(10) >>> ref_agreement_index(t, p)
- SeqMetrics.rel_agreement_index(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Relative index of agreement. from 0 to 1. larger the better.
\[\text{rel_agreement_index} = 1 - \frac{\sum_{i=1}^{n} \left( \frac{\text{predicted}_i - \text{true}_i}{\text{true}_i} \right)^2}{\sum_{i=1}^{n} \left( \frac{|\text{predicted}_i - \bar{\text{true}}| + |\text{true}_i - \bar{\text{true}}|}{\bar{\text{true}}} \right)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import rel_agreement_index >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rel_agreement_index(t, p)
- SeqMetrics.relative_rmse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Relative Root Mean Squared Error. It normalizes teh rmse by mean of true values.
\[RRMSE=\frac{\sqrt{\frac{1}{N}\sum_{i=1}^{N}(e_{i}-s_{i})^2}}{\bar{e}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import relative_rmse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> relative_rmse(t, p)
- SeqMetrics.rmse(true, predicted, treat_arrays: bool = True, weights=None, **treat_arrays_kws) float[source]
-
\[\text{RMSE} = \sqrt{\frac{\sum_{i=1}^{n} w_i (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} w_i}}\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
weights –
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import rmse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rmse(t, p)
- SeqMetrics.rmsle(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Root mean square log error.
This error is less sensitive to outliers . Compared to RMSE, RMSLE only considers the relative error between predicted and actual values, and the scale of the error is nullified by the log-transformation. Furthermore, RMSLE penalizes underestimation more than overestimation. This is especially useful in those studies where the underestimation of the target variable is not acceptable but overestimation can be tolerated .
\[RMSLE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + \text{predicted}_i) - \log(1 + \text{true}_i) \right)^2}\]References
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.root_mean_squared_log_error.html
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import rmsle >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rmsle(t, p)
- SeqMetrics.rmdspe(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Root Median Squared Percentage Error. The value is multiplied by 100 to reflect percentage.
\[\text{RMDSPE} = \sqrt{\text{median}\left(\left(\frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \times 100\right)^2\right)}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import rmdspe >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rmdspe(t, p)
- SeqMetrics.r2(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
R2 is a statistical measure of how well the regression line approximates the actual data. Quantifies the percent of variation in the response that the ‘model’ explains. The ‘model’ here is anything from which we obtained predicted array. It is also called coefficient of determination or square of pearson correlation coefficient. More heavily affected by outliers than pearson correlatin r.
\[R^2 = \left( \frac{\sum_{i=1}^{N} \left( \frac{true_i - \bar{true}}{\sigma_{true}} \cdot \frac{predicted_i - \bar{predicted}}{\sigma_{predicted}} \right)}{N - 1} \right)^2\]where the bar above predicted and true indicates the mean of the array.
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import r2 >>> t = np.random.random(10) >>> p = np.random.random(10) >>> r2(t, p)
- SeqMetrics.r2_score(true, predicted, treat_arrays: bool = True, weights=None, **treat_arrays_kws)[source]
This is not a symmetric function. Unlike most other scores, R^2 score score may be negative (it need not actually be the square of a quantity R). This metric is not well-defined for single samples and will return a NaN value if n_samples is less than two.
\[\text{R2}_{\text{score}} = 1 - \frac{\sum_{i=1}^{n} w_i (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} w_i (\text{true}_i - \bar{\text{true}})^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
weights –
Examples
>>> import numpy as np >>> from SeqMetrics import r2_score >>> t = np.random.random(10) >>> p = np.random.random(10) >>> r2_score(t, p)
- SeqMetrics.rse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Relative Squared Error
\[\text{RSE} = \frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} (\text{true}_i - \bar{\text{true}})^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import rse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rse(t, p)
- SeqMetrics.rrse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[RRSE = \sqrt{\frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} (\text{true}_i - \bar{\text{true}})^2}}\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import rrse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rrse(t, p)
- SeqMetrics.rmspe(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Root Mean Square Percentage Error.
\[RMSPE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left(PE_i\right)^2} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left(\frac{\text{true}_i - \text{predicted}_i}{\text{true}_i}\right)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import rmspe >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rmspe(t, p)
- SeqMetrics.rsr(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
It is MSE normalized by standard deviation of true values. Following Moriasi et al., 2007..
It incorporates the benefits of error index statistics and includes a scaling/normalization factor, so that the resulting statistic and reported values can apply to various constituents. It ranges from 0 to infinity, with 0-0.5 indicating very good model performance, 0.5-0.8 indicating good model performance.
Standard deviation is calculated using np.ntd(true, ddof=1) to match the results of this implementation.
\[\text{RSR} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}}{\sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (\text{true}_i - \bar{\text{true}})^2}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import rsr >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rsr(t, p)
- SeqMetrics.rmsse(true, predicted, treat_arrays: bool = True, seasonality: int = 1, **treat_arrays_kws) float[source]
Root Mean Squared Scaled Error after Muhaimin et al., 2021 and Zhou T, 2023. It is also considered similar to MASE.
\[\text{RMSSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \frac{\left| \text{true}_i - \text{predicted}_i \right|}{\frac{1}{n-s} \sum_{j=s+1}^{n} \left| \text{true}_j - \text{true}_{j-s} \right|} \right)^2}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
seasonality –
Examples
>>> import numpy as np >>> from SeqMetrics import rmsse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> rmsse(t, p)
- SeqMetrics.sga(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Spectral gradient angle. It varies from -pi/2 to pi/2. Closer to 0 is better.
\[\text{SGA} = \arccos \left( \frac{\sum_{i=1}^{n-1} \left( (true_{i+1} - true_i) \cdot (predicted_{i+1} - predicted_i) \right)}{\sqrt{\sum_{i=1}^{n-1} (true_{i+1} - true_i)^2} \times \sqrt{\sum_{i=1}^{n-1} (predicted_{i+1} - predicted_i)^2}} \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import sga >>> t = np.random.random(10) >>> p = np.random.random(10) >>> sga(t, p)
- SeqMetrics.sse(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Sum of squared errors (model vs actual). It is measure of how far off our model’s predictions are from the observed values. A value of 0 indicates that all predications are spot on. A non-zero value indicates errors.
This is also called residual sum of squares (RSS) or sum of squared residuals as per tutorialspoint .
\[\text{SSE} = \sum_{i=1}^{n} (true_i - predicted_i)^2\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import sse >>> t = np.random.random(10) >>> p = np.random.random(10) >>> sse(t, p)
- SeqMetrics.sa(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Spectral angle Keshava N, 2004. It is arccosine of the dot product of true and predicted arrays. It varies from -pi/2 to pi/2. Closer to 0 is better. It measures angle between two vectors in hyperspace indicating how well the shape of two arrays match instead of their magnitude.
\[SA = \arccos \left( \frac{\sum_{i=1}^{n} (\text{true}_i \cdot \text{predicted}_i)}{\sqrt{\sum_{i=1}^{n} (\text{true}_i)^2} \cdot \sqrt{\sum_{i=1}^{n} (\text{predicted}_i)^2}} \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import sa >>> t = np.random.random(10) >>> p = np.random.random(10) >>> sa(t, p)
- SeqMetrics.sc(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Spectral correlation ater Robila and Gershman, 2005.. It varies from -pi/2 to pi/2. Closer to 0 is better. It measures the angle between the two vectors in hyperspace and highlights how well the shape of the two series match.
\[sc = \arccos \left( \frac{ \sum_{i=1}^{n} (t_i - \bar{t}) \cdot (p_i - \bar{p}) }{ \sqrt{\sum_{i=1}^{n} (t_i - \bar{t})^2} \cdot \sqrt{\sum_{i=1}^{n} (p_i - \bar{p})^2} } \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import sc >>> t = np.random.random(10) >>> p = np.random.random(10) >>> sc(t, p)
- SeqMetrics.smape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Symmetric Mean Absolute Percentage Error. Adoption from this.
\[SMAPE = \frac{100}{n} \sum_{i=1}^{n} \frac{2 \left| \text{predicted}_i - \text{true}_i \right|}{\left| \text{true}_i \right| + \left| \text{predicted}_i \right|}\]Goodwin and Lawton, 1999 : https://doi.org/10.1016/S0169-2070(99)00007-2 Flores et al., 1986 : https://doi.org/10.1016/0305-0483(86)90013-7
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import smape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> smape(t, p)
- SeqMetrics.smdape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Symmetric Median Absolute Percentage Error Note: result is NOT multiplied by 100
\[\text{smdape} = \text{median} \left( \frac{2 \cdot | \text{predicted} - \text{true} |}{| \text{true} | + | \text{predicted} | + \epsilon} \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import smdape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> smdape(t, p)
- SeqMetrics.sid(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Spectral Information Divergence. From -pi/2 to pi/2. Closer to 0 is better.
\[\text{SID} = \left( \frac{\text{t}}{\text{mean(t)}} - \frac{\text{p}}{\text{mean(p)}} \right) \cdot \left( \log_{10}(\text{t}) - \log_{10}(\text{mean(t)}) - \log_{10}(\text{p}) + \log_{10}(\text{mean(p)}) \right)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import sid >>> t = np.random.random(10) >>> p = np.random.random(10) >>> sid(t, p)
- SeqMetrics.skill_score_murphy(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Skill score after Murphy, 1988. Adopted from SkillMetrics . Calculate non-dimensional skill score (SS) between two variables using definition of Murphy (1988) using the formula:
\[SS = 1 - RMSE^2/SDEV^2\]where SDEV is the standard deviation of the true values
\[SDEV^2 = sum_(n=1)^N [r_n - mean(r)]^2/(N-1)\]where p is the predicted values, r is the reference values, and N is the total number of values in p & r. Note that p & r must have the same number of values. A positive skill score can be interpreted as the percentage of improvement of the new model forecast in comparison to the reference. On the other hand, a negative skill score denotes that the forecast of interest is worse than the referencing forecast. Consequently, a value of zero denotes that both forecasts perform equally [MLAir, 2020].
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Returns – flaot
Examples
>>> import numpy as np >>> from SeqMetrics import skill_score_murphy >>> t = np.random.random(10) >>> p = np.random.random(10) >>> skill_score_murphy(t, p)
- SeqMetrics.std_ratio(true, predicted, treat_arrays: bool = True, std_kwargs: dict | None = None, **treat_arrays_kws) float[source]
Ratio of standard deviations of predictions and trues. Also known as standard ratio, it varies from 0.0 to infinity while 1.0 being the perfect value.
\[\text{std_ratio} = \frac{\sigma_{\text{predicted}}}{\sigma_{\text{true}}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import std_ratio >>> t = np.random.random(10) >>> p = np.random.random(10) >>> std_ratio(t, p)
- SeqMetrics.spearmann_corr(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Separmann correlation coefficient.
\[r = \frac{\sum_{i=1}^{n} \left( R_{t,i} - \overline{R_t} \right) \left( R_{p,i} - \overline{R_p} \right)}{\sqrt{ \sum_{i=1}^{n} \left( R_{t,i} - \overline{R_t} \right)^2 \sum_{i=1}^{n} \left( R_{p,i} - \overline{R_p} \right)^2 }}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
treat_arrays_kws – Additional keyword arguments to be passed to
SeqMetrics.utils.treat_arrays()function.
Examples
>>> import numpy as np >>> from SeqMetrics import spearmann_corr >>> t = np.random.random(10) >>> p = np.random.random(10) >>> spearmann_corr(t, p)
- SeqMetrics.tweedie_deviance_score(true, predicted, power=0, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[D(\text{true}, \text{predicted}) = \frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2\]\[D(\text{true}, \text{predicted}) = 2 \sum_{i=1}^{n} \left( \text{true}_i \log\left(\frac{\text{true}_i + (\text{true}_i = 0)}{\text{predicted}_i}\right) - \text{true}_i + \text{predicted}_i \right)\]\[D(\text{true}, \text{predicted}) = 2 \sum_{i=1}^{n} \left( \frac{\text{true}_i}{\text{predicted}_i} - \log\left(\frac{\text{true}_i}{\text{predicted}_i}\right) - 1 \right)\]\[D(\text{true}, \text{predicted}) = 2 \sum_{i=1}^{n} \left( \frac{(\text{true}_i - \text{predicted}_i)^2}{\text{true}_i^2 \text{predicted}_i} \right)\]
- Parameters:
true – True/observed/actual/target values. It must be a numpy array, pandas series/DataFrame, or a list.
predicted – Predicted values, same format as ‘true’.
power – The power determines the underlying target distribution. power=0 for Normal, power=1 for Poisson, power=2 for Gamma, and power=3 for Inverse Gaussian.
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import tweedie_deviance_score >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> power = 2 # Gamma distribution >>> score = tweedie_deviance_score(t, p, power)
- SeqMetrics.umbrae(true, predicted, treat_arrays: bool = True, benchmark: ndarray | None = None, **treat_arrays_kws)[source]
Unscaled Mean Bounded Relative Absolute Error
\[UMBRAE = \frac{\frac{1}{n} \sum_{i=1}^{n} \frac{|t_i - p_i|}{|t_i - b_i|}}{1 - \frac{1}{n} \sum_{i=1}^{n} \frac{|t_i - p_i|}{|t_i - b_i|}}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
benchmark –
Examples
>>> import numpy as np >>> from SeqMetrics import umbrae >>> t = np.random.random(10) >>> p = np.random.random(10) >>> umbrae(t, p)
- SeqMetrics.variability_ratio(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Variability Ratio It is the ratio of the variance of the predicted values to the variance of the true values. It is used to measure the variability of the predicted values relative to the true values.
\[VR = 1 - \left| \frac{\frac{\sigma_{\text{predicted}}}{\mu_{\text{predicted}}}}{\frac{\sigma_{\text{true}}}{\mu_{\text{true}}}} - 1 \right|\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated/predicted values
treat_arrays – treat_arrays the true and predicted array
Examples
>>> import numpy as np >>> from SeqMetrics import variability_ratio >>> t = np.random.random(10) >>> p = np.random.random(10) >>> variability_ratio(t, p)
- SeqMetrics.ve(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Volumetric efficiency. Ranges from 0 to 1. Smaller the better.
\[VE = 1 - \frac{\sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|}{\sum_{i=1}^{n} \text{true}_i}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import ve >>> t = np.random.random(10) >>> p = np.random.random(10) >>> ve(t, p)
- SeqMetrics.volume_error(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Returns the Volume Error (Ve) after Reynolds, 2017. It is an indicator of the agreement between the averages of the simulated and observed runoff (i.e. long-term water balance).
\[\text{volume_error}= Sum(predicted- true)/sum(predicted)\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import volume_error >>> t = np.random.random(10) >>> p = np.random.random(10) >>> volume_error(t, p)
- SeqMetrics.wape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
weighted absolute percentage error. The lower the better.
It is a variation of mape but more suitable for intermittent and low-volume data.
\[\text{WAPE} = \frac{\sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\sum_{i=1}^{n} \text{true}_i}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import wape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> wape(t, p)
- SeqMetrics.watt_m(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
-
\[M = \frac{2}{\pi} \cdot \arcsin \left( 1 - \frac{\frac{1}{n} \sum_{i=1}^{n} ( \text{true}_i - \text{predicted}_i )^2}{\sigma_{\text{true}}^2 + \sigma_{\text{predicted}}^2 + (\mu_{\text{predicted}} - \mu_{\text{true}})^2} \right)\]
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import watt_m >>> t = np.random.random(10) >>> p = np.random.random(10) >>> watt_m(t, p)
- SeqMetrics.wmape(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) float[source]
Weighted Mean Absolute Percent Error.
\[\text{WMAPE} = \frac{\sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\sum_{i=1}^{n} \text{true}_i}\]- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import wmape >>> t = np.random.random(10) >>> p = np.random.random(10) >>> wmape(t, p)
- SeqMetrics.calculate_hydro_metrics(true, predicted, treat_arrays: bool = True, **treat_arrays_kws) dict[source]
- Calculates the following performance metrics related to hydrology.
fdc_flv
fdc_fhv
kge
kge_np
kge_mod
kge_bound
kgeprime_bound
kgenp_bound
nse
nse_alpha
nse_beta
nse_mod
nse_bound
r2
mape
nrmse
corr_coeff
rmse
mae
mse
mpe
mase
r2_score
- Returns:
Dictionary with all metrics
- Return type:
- Parameters:
true – true/observed/actual/target values. It must be a numpy array, or pandas series/DataFrame or a list.
predicted – simulated values
treat_arrays – process the true and predicted arrays using maybe_treat_arrays function
Examples
>>> import numpy as np >>> from SeqMetrics import calculate_hydro_metrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> calculate_hydro_metrics(t, p)
Class-Based API
- class SeqMetrics.RegressionMetrics(*args, **kwargs)[source]
Bases:
MetricsCalculates more than 100 regression performance metrics related to sequence data.
Example
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> errors = RegressionMetrics(t,p) >>> all_errors = errors.calculate_all()
- __init__(*args, **kwargs)[source]
Initializes
Metrics.args and kwargs go to parent class
SeqMetrics.Metrics.
- JS() float[source]
Jensen-shannon divergence
\[JS(P \parallel Q) = \frac{1}{2} \sum_{i} \left( P(i) \log_2 \left( \frac{2P(i)}{P(i) + Q(i)} \right) + Q(i) \log_2 \left( \frac{2Q(i)}{P(i) + Q(i)} \right) \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.JS()
- acc() float[source]
Anomaly correction coefficient. See Langland et al., 2012; Miyakoda_ et al., 1972 and Murphy et al., 1989.
\[ACC = \frac{\sum_{i=1}^{N} \left( (\text{predicted}_i - \overline{\text{predicted}})(\text{true}_i - \overline{\text{true}}) \right)}{(N-1) \cdot \sigma_{\text{true}} \cdot \sigma_{\text{predicted}}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.acc()
- adjusted_r2() float[source]
Adjusted R squared also known as Ezekiel estimate <https://www.glmj.org/archives/MLRV_2007_33_1.pdf>`_.
\[\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2) \cdot (n - 1)}{n - k - 1} \right)\]where n = number of observations and k = 1.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.adjusted_r2()
- agreement_index() float[source]
Agreement Index (d) developed by Willmott, 1981.
It detects additive and pro-portional differences in the observed and simulated means and variances (Moriasi_ et al., 2015 <https://web.ics.purdue.edu/~mgitau/pdf/Moriasi%20et%20al%202015.pdf>`_). It is overly sensitive to extreme values due to the squared differences. It can also be used as a substitute for R2 to identify the degree to which model predictions are error-free.
\[d = 1 - \frac{\sum_{i=1}^{N}(e_{i} - s_{i})^2}{\sum_{i=1}^{N}(\left | s_{i} - \bar{e} \right | + \left | e_{i} - \bar{e} \right |)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.agreement_index()
- aic(p=1) float[source]
It estimates relative quality of a model for a given input. By comparing AIC for differnt models, we can identify the model which best explains the data. Theoretically, it penlizes those models with more parameters thereby reducing overfitting/model complexity. When comparing multiple models, the one with the lowest value is generally preferred. When sample size is small, then AIC can be biased. Akaike_ Information Criterion. Modifying from this source
\[AIC = n \cdot \ln\left(\frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{n}\right) + 2p\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.aic( )
- aitchison(center='mean') float[source]
Aitchison distance as used in Zhang et al., 2020.
\[d_{\text{Aitchison}} = \sqrt{\sum_{i=1}^{n} \left( \log(\text{true}_i) - \text{center}(\log(\text{true})) - \left(\log(\text{predicted}_i) - \text{center}(\log(\text{predicted}))\right) \right)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.aitchison( )
- amemiya_adj_r2() float[source]
Amemiya’s Adjusted R-squared
\[R^2_{\text{adj, Amemiya}} = 1 - \left( \frac{(1 - R^2) \cdot (n + k)}{n - k - 1} \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.amemiya_adj_r2( )
- amemiya_pred_criterion() float[source]
Amemiya’s Prediction Criterion
\[\text{APC} = \left( \frac{n + k}{n - k} \right) \left( \frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2 \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.amemiya_pred_criterion()
- bias() float[source]
Bias as and given by Gupta1998 et al., 1998 It is also called mean error.
\[Bias=\frac{1}{N}\sum_{i=1}^{N}(e_{i}-s_{i})\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.bias()
- bic(p=1) float[source]
Bayesian Information Criterion
Minimising the BIC is intended to give the best model. The model chosen by the BIC is either the same as that chosen by the AIC, or one with fewer terms. This is because the BIC penalises the number of parameters more heavily than the AIC. Modified after RegscorePy.
\[BIC = n \cdot \ln\left(\frac{\text{SSE}}{n}\right) + p \cdot \ln(n)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.bic()
- brier_score() float[source]
Adopted from SkillMetrics This function calculates the Brier score (BS), which is a measure of the mean-square error of probability forecasts for a dichotomous (two-category) event, such as the occurrence/non-occurrence of precipitation. The score is calculated using the formula:
\[BS = sum_(n=1)^N (f_n - o_n)^2/N\]where f is the forecast probabilities, o is the observed probabilities (0 or 1), and N is the total number of values in f & o. Note that f & o must have the same number of values, and those values must be in the range [0,1].
- Returns:
BS : Brier score
- Return type:
References
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.brier_score()
- calculate_hydro_metrics()[source]
- Calculates the following performance metrics related to hydrology.
fdc_flv
fdc_fhv
kge
kge_np
kge_mod
kge_bound
kgeprime_bound
kgenp_bound
nse
nse_alpha
nse_beta
nse_mod
nse_bound
r2
mape
nrmse
corr_coeff
rmse
mae
mse
mpe
mase
r2_score
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.calculate_hydro_metrics()
- centered_rms_dev() float[source]
Modified after SkillMetrics_. Calculates the centered root-mean-square (RMS) difference between true and predicted using the formula: (E’)^2 = sum_(n=1)^N [(p_n - mean(p))(r_n - mean(r))]^2/N where p is the predicted values, r is the true values, and N is the total number of values in p & r.
\[CRMSD = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left( (p_i - \text{mean}(p)) - (r_i - \text{mean}(r)) \right)^2}\]Output: CRMSDIFF : centered root-mean-square (RMS) difference (E’)^2
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.centered_rms_dev()
- concordance_corr_coef() float[source]
Concordance Correlation Coefficient (CCC) taken from this paper.
\[CCC = \frac{2 \rho \sigma_{true} \sigma_{predicted}}{\sigma_{true}^2 + \sigma_{predicted}^2 + (\bar{true} - \bar{predicted})^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.concordance_corr_coef()
- corr_coeff() float[source]
Pearson correlation coefficient as proposed by Pearson, 1895. It measures linear correlatin between true and predicted arrays. It is sensitive to outliers. The following equation is taken after Jiang et al., 2022 .
\[r = \frac{\sum ^n _{i=1}(predicted_i - \bar{predicted})(s_i - \bar{observed})}{\sqrt{\sum ^n _{i=1}(predicted_i - \bar{predicted})^2} \sqrt{\sum ^n _{i=1}(true_i - \bar{true})^2}}\]Where n is length of true/predicted arrays.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.corr_coeff()
- cosine_similarity() float[source]
It is a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. See
\[\text{Cosine Similarity} = \frac{\sum_{i=1}^{n} \text{true}_i \cdot \text{predicted}_i}{\sqrt{\sum_{i=1}^{n} (\text{true}_i)^2} \cdot \sqrt{\sum_{i=1}^{n} (\text{predicted}_i)^2}}\]References
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.cosine_similarity()
- covariance() float[source]
Covariance as defined in Eq. 3 at mathworld A positive covariance means that the means of true and predicted values increase or decrease together.
\[Covariance = \frac{1}{N} \sum_{i=1}^{N}((true_{i} - \bar{true}) * (predicted_{i} - \bar{predicted}))\]The bar represents the mean of the array.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.covariance()
- critical_success_index(threshold=0.5) float[source]
-
\[CSI = \frac{TP}{TP + FN + FP}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.array([0, 1, 1, 0, 0, 1]) >>> p = np.array([0, 1, 0, 1, 1, 1]) >>> metrics= RegressionMetrics(t, p) >>> metrics.critical_success_index()
- cronbach_alpha() float[source]
It is a measure of internal consitency of data following Cheung and Yip, 2005 https://doi.org/10.1016/B0-12-369398-5/00396-0. See ucla and stackoverflow pages for more info.
\[alpha = \frac{N}{N - 1} \left(1 - \frac{\sum_{i=1}^{N} \sigma^2_{i}}{\sigma^2_{\text{total}}}\right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.cronbach_alpha()
- decomposed_mse() float[source]
Decomposed MSE developed by Kobayashi and Salam (2000)
\[dMSE = (\frac{1}{N}\sum_{i=1}^{N}(e_{i}-s_{i}))^2 + SDSD + LCS\]\[SDSD = (\sigma(e) - \sigma(s))^2\]\[LCS = 2 \sigma(e) \sigma(s) * (1 - \frac{\sum ^n _{i=1}(e_i - \bar{e})(s_i - \bar{s})} {\sqrt{\sum ^n _{i=1}(e_i - \bar{e})^2} \sqrt{\sum ^n _{i=1}(s_i - \bar{s})^2}})\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.decomposed_mse()
- euclid_distance() float[source]
Euclidian distance taken from `this book <https://doi.org/10.1016/B978-0-12-088735-4.50006-7`_.
\[D = \sqrt{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}\]Referneces: Kennard et al., 2010
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.euclid_distance()
- exp_var_score(weights=None) float | None[source]
Explained variance score . Best value is 1, lower values are less accurate.
\[\text{EVS} = 1 - \frac{\sum_{i=1}^{n} w_i \left( (true_i - predicted_i) - \frac{\sum_{j=1}^{n} w_j (true_j - predicted_j)}{\sum_{j=1}^{n} w_j} \right)^2}{\sum_{i=1}^{n} w_i (true_i - \frac{\sum_{j=1}^{n} w_j true_j}{\sum_{j=1}^{n} w_j})^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.exp_var_score()
- expanded_uncertainty(cov_fact=1.96) float[source]
By default, it calculates uncertainty with 95% confidence interval. 1.96 is the coverage factor corresponding 95% confidence level .This indicator is used in order to show more information about the model deviation. Using formula from by Behar et al., 2015 and Gueymard et al., 2014.
\[U = \text{cov_fact} \times \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} \left( \left(\text{true}_i - \text{predicted}_i\right) - \overline{\left(\text{true} - \text{predicted}\right)} \right)^2 + \frac{1}{n} \sum_{i=1}^{n} \left(\text{true}_i - \text{predicted}_i\right)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.expanded_uncertainty()
- fdc_fhv(h: float = 0.02) float[source]
Peak flow bias of the flow duration curve (Yilmaz 2008) as used in kratzert et al., 2019. Code modified Kratzert2018 code.
\[FHV = \frac{\sum_{i=1}^{k} (predicted_i - true_i)}{\sum_{i=1}^{k} true_i} \times 100\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.fdc_fhv()
- fdc_flv(low_flow: float = 0.3) float[source]
bias of the bottom 30 % low flows as used in kratzert et al., 2019.
\[\text{FLV} = -1 \times \frac{\sum (\log(\text{predicted}) - \min(\log(\text{predicted}))) - \sum (\log(\text{true}) - \min(\log(\text{true})))}{\sum (\log(\text{true}) - \min(\log(\text{true}))) + 1 \times 10^{-6}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.fdc_flv()
- gmae() float[source]
-
\[GMAE = \left( \prod_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right| \right)^{\frac{1}{n}}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.gmae()
- gmean_diff() float[source]
- Geometric mean difference.
First geometric mean is calculated for each
of two samples and their difference is calculated.
\[\text{gmean_diff} = \left( \prod_{i=1}^{n} \text{true}_i \right)^{\frac{1}{n}} - \left( \prod_{i=1}^{n} \text{predicted}_i \right)^{\frac{1}{n}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.gmean_diff()
- gmrae(benchmark: ndarray | None = None) float[source]
Geometric Mean Relative Absolute Error
\[GMRAE = \left( \prod_{i=1}^{n} \frac{|true_i - predicted_i|}{|true_i - benchmark_i|} \right)^{\frac{1}{n}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.gmrae()
- inrse() float[source]
Integral Normalized Root Squared Error
\[IN\text{-}RSE = \sqrt{\frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} (\text{true}_i - \overline{\text{true}})^2}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.inrse()
- irmse() float[source]
Inertial RMSE. RMSE divided by standard deviation of the gradient of true.
\[\text{IRMSE} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \text{true}_i - \text{predicted}_i \right)^2}}{\sqrt{\frac{1}{n-2} \sum_{i=1}^{n-1} \left( (\text{true}_{i+1} - \text{true}_i) - \overline{(\text{true}_{i+1} - \text{true}_i)} \right)^2}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.irmse()
- kendall_tau(return_p=False) float | tuple[source]
Kendall’s tau .used in Probst et al., 2019.
\[tau = \frac{(C - D)}{\sqrt{(C + D + T_{\text{true}})(C + D + T_{\text{predicted}})}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kendall_tau()
- kge(return_all: bool = False) float | ndarray[source]
Kling-Gupta Efficiency following Gupta et al. 2009. This error considers correlation (r), variability (\(\alpha\)) and mean difference/error which is also called bias (\(\beta\)). KGE values varies from -infinity to 1 with higher the better. KGE values above -0.41 means the simulted/predicted (by the model) is better than the mean of the observed data (Knoben et al, 2019).
\[\text{KGE} = 1 - \sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\]\[\alpha = \frac{\sigma_{\text{predicted}}}{\sigma_{\text{true}}}\]\[\beta = \frac{\mu_{\text{predicted}}}{\mu_{\text{true}}}\]Please note that bias (\(\beta\)) is not same as
SeqMetrics.bias()method.The term \(\sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\) is also called euclidean distance which means KGE can also be defined as below
\[\text{KGE} = 1 - ED\]Another form of KGE equation is below:
\[\text{KGE} = \frac{\sum_{i=1}^{N} ( \text{true}_i - \bar{\text{true}} ) ( \text{predicted}_i - \bar{\text{predicted}} )}{\sqrt{\sum_{i=1}^{N} ( \text{true}_i - \bar{\text{true}} )^2} \sqrt{\sum_{i=1}^{N} ( \text{predicted}_i - \bar{\text{predicted}} )^2}}\]output
If
return_allis True, it returns a numpy array of shape (4, ) containing kge, correlation (r), variability (\(\alpha\)) and bias (\(\beta\)). Otherwise, it returns kge score.Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kge() >>> kge, corr, var, bias = metrics.kge(return_all=True)
- kge_bound() float[source]
Mathevet et al. 2006 proposed a bounded version of NSE since the original NSE lacks a lower bound and thus have skewed distribution when calculated for large number of basins. To avoid its skewed distributions and make it vary between -1 and +1, they proposed a bounder version of the statistic i.e. NSE. The same concept is applied here to KGE. As per the authors, this bounded version of the statistic makes it less optimistic for positive values.
\[\text{KGE}_{\text{bound}} = \frac{\text{KGE}}{2 - \text{KGE}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kge_bound()
- kge_mod(return_all: bool = False)[source]
Modified Kling-Gupta Efficiency after Kling et al. 2012. Similar to original KGE, its values varies fro -infinity to 1 with higher the better.
This version of KGE was introduced to avoid cross-correlation between bias and variability which happens when the precipitation data is biased. This is done by calculating the variability (\(\alpha\)) by \({CV}_s/{CV}_o\) instaed of \({\sigma}_s/{\sigma}_o\) where CV is the coefficient of variation which is defined as the ratio of the standard deviation to the mean (\({\sigma}/{\mu}\)).
\[\text{KGE`} = 1 - \sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\]output
If return_all is True, it returns a numpy array of shape (4, ) containing kge, \(\gamma\), \(\alpha\) and \(\beta\). Otherwise, it returns kge.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kge_mod()
- kge_np(return_all: bool = False) float | ndarray[source]
Non-parametric Kling-Gupta Efficiency after Pool et al. 2018.
This differs from original KGE by using non-parameteric components of KGE i.e. \(\alpha\) and \(\gamma\) / cc. The variability (\(\alpha\)) is non-parametrized by using the FDCs of the true and predicted values. The FDCs are normalized to remove the volume information. It also differs from normal kge by using the Spearman’s rank correlation instead of Pearson’s correlation coefficient.
\[cc = \rho(\text{true}, \text{predicted})\]\[\alpha = 1 - 0.5 \sum_{i=1}^{n} \left| \frac{\text{sorted(predicted}_i\text{)}}{\text{mean(predicted)} \cdot n} - \frac{\text{sorted(true}_i\text{)}}{\text{mean(true)} \cdot n} \right|\]\[\beta = \frac{\text{mean(predicted)}}{\text{mean(true)}}\]\[\text{KGE}_{\text{np}} = 1 - \sqrt{(cc - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kge_np()
- kgenp_bound()[source]
Bounded Version of the Non-Parametric Kling-Gupta Efficiency
\[KGE_{np_{bound}} = \frac{1 - \sqrt{\left(\rho(t, p) - 1\right)^2 + \left(1 - 0.5 \sum_{i=1}^{n} \left| \frac{\text{sorted}(p_i)}{\text{mean}(p) \cdot n} - \frac{\text{sorted}(t_i)}{\text{mean}(t) \cdot n} \right| - 1\right)^2 + \left(\frac{\text{mean}(p)}{\text{mean}(t)} - 1\right)^2}}{2 - \left(1 - \sqrt{\left(\rho(t, p) - 1\right)^2 + \left(1 - 0.5 \sum_{i=1}^{n} \left| \frac{\text{sorted}(p_i)}{\text{mean}(p) \cdot n} - \frac{\text{sorted}(t_i)}{\text{mean}(t) \cdot n} \right| - 1\right)^2 + \left(\frac{\text{mean}(p)}{\text{mean}(t)} - 1\right)^2}\right)}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kgenp_bound()
- kgeprime_bound() float[source]
Bounded Version of the Modified Kling-Gupta Efficiency
\[KGE'_{\text{bounded}} = \frac{1 - \sqrt{(r - 1)^2 + (\gamma - 1)^2 + (\beta - 1)^2}}{2 - (1 - \sqrt{(r - 1)^2 + (\gamma - 1)^2 + (\beta - 1)^2})}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kgeprime_bound()
- kl_divergence() float[source]
- \[D_{KL}(P||Q) = \sum_{x\in\mathcal{X}} P(x) \log\]
rac{P(x)}{Q{x}}
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.array([0.1, 0.2, 0.3, 0.2, 0.2]) >>> p = np.array([0.2, 0.2, 0.2, 0.2, 0.2]) >>> metrics= RegressionMetrics(t, p) >>> divergence = metrics.kl_divergence()
- kl_sym() float | None[source]
Symmetric kullback-leibler divergence
\[\text{KL}_{\text{sym}}(P || Q) = \frac{1}{2} \sum_{i=1}^{n} \left( P_i - Q_i \right) \left( \log_2 \frac{P_i}{Q_i} \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.kl_sym()
- legates_coeff_eff(power=0) float[source]
Legates Coefficient of Efficiency. Its value varies between 0 and 1. It is not as sensitive to extreme values as agreement_index and coefficcient of determination because of the utilization of the absolute value of the difference instead of the squared difference. See Equaltion 23 in Dodo et al., 2022
\[LCE = 1 - \frac{\sum_{i=1}^{n} |true_i - predicted_i|}{\sum_{i=1}^{n} |true_i - \bar{true}|}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> metrics= RegressionMetrics(t, p) >>> score = metrics.legates_coeff_eff()
- lm_index(obs_bar_p=None) float[source]
Legate-McCabe Efficiency Index. Less sensitive to outliers in the data. The larger, the better
\[a_i = |predicted_i - true_i|\]\[b_i = |true_i - \text{obs\_bar\_p}| \text{if } \text{obs\_bar\_p} \text{ is provided} \|true_i - \bar{true}| \text{otherwise}\]\[\text{LM Index} = 1 - \frac{\sum_{i=1}^{n} a_i}{\sum_{i=1}^{n} b_i}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.lm_index()
- log_cosh_error() float[source]
-
\[\text{Log-Cosh Error} = \frac{1}{n} \sum_{i=1}^{n} \log \left( \cosh(\text{predicted}_i - \text{true}_i) \right)\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> metrics= RegressionMetrics(t, p) >>> error = metrics.log_cosh_error()
- log_nse(epsilon: float = 0.0, log_base: str = 'e') float[source]
log transformed Nash-Sutcliffe Efficiency. It is especially useful for capturing prediction performance for the lowest flows due to the logarithmic transform.
\[ \begin{align}\begin{aligned}NSE = 1-\frac{\sum_{i=1}^{N}(log(e_{i})-log(s_{i}))^2}{\sum_{i=1}^{N}(log(e_{i})-log(\bar{e})^2}-1)*-1\\Examples\end{aligned}\end{align} \]>>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.log_nse()
- log_prob() float[source]
Logarithmic probability distribution
\[\text{log_prob} = \frac{1}{N} \sum_{i=1}^{N} \left( -\frac{\left( \frac{\text{true}_i - \text{predicted}_i}{\text{scale}} \right)^2}{2} - \log(\sqrt{2\pi}) \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.log_prob()
- maape() float[source]
Mean Arctangent Absolute Percentage Error Note: result is NOT multiplied by 100
\[MAAPE = \frac{1}{n} \sum_{i=1}^{n} \arctan \left( \frac{| \text{true}_i - \text{predicted}_i |}{| \text{true}_i | + \epsilon} \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.maape()
- mae() float[source]
Mean Absolute Error. It is less sensitive to outliers as compared to mse/rmse.
\[\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mae()
- manhattan_distance() float[source]
Manhattan distance, also known as cityblock distance or taxicab norm.
\[D_{\text{manhattan}} = \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|\]- See Blanco-Mallo et al., 2023 and Cha et al., 2007
and Alexei Botchkarev 2019 on the use of distances in performance measures.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> metrics= RegressionMetrics(t, p) >>> score = metrics.manhattan_distance()
- mapd() float[source]
Mean absolute percentage deviation
\[MAPD = \frac{\sum_{i=1}^{n} \left| predicted_i - true_i \right|}{\sum_{i=1}^{n} \left| true_i \right|}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mapd(t, p)
- mape() float[source]
Mean Absolute Percentage Error. The MAPE is often used when the quantity to predict is known to remain way above zero. It is useful when the size or size of a prediction variable is significant in evaluating the accuracy of a prediction. It has advantages of scale-independency and interpretability. However, it has the significant disadvantage that it produces infinite or undefined values for zero or close-to-zero actual values.
\[MAPE = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{true_i - predicted_i}{true_i} \right| \times 100\]References
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mape()
- mape_for_peaks() float[source]
Mean Absolute Percentage Error for peaks which are found using scipy.singnal.find_peaks
\[\text{MAPE}_\text{peak} = \frac{1}{P}\sum_{p=1}^{P} \left |\frac{Q_{s,p} - Q_{o,p}}{Q_{o,p}} \right | \times 100,\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mape_for_peaks()
- mare() float[source]
Mean Absolute Relative Error. When expressed in %age, it is also known as mape.
\[\text{MARE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \right|\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mare()
- mase(seasonality: int = 1)[source]
Mean Absolute Scaled Error following Hyndman et al., 2006. Baseline (benchmark) is computed with naive forecasting (shifted by seasonality) modified after this. It is the ratio of MAE of used model and MAE of naive forecast.
\[\text{MASE} = \frac{\frac{1}{n} \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\frac{1}{n-s} \sum_{i=s+1}^{n} \left| \text{true}_i - \text{true}_{i-s} \right|}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mase()
- max_error() float[source]
maximum absolute error In Sklearn, there is “absolute” in equation but not in name of metric.
\[\text{Max Error} = \max_{i=1}^n \left| \text{true}_i - \text{predicted}_i \right|\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.max_error()
- mb_r() float[source]
Mielke-Berry R value. Berry and Mielke, 1988.
\[R = 1 - \frac{n^2 \cdot \frac{1}{n} \sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|}{\sum_{i=1}^{n} \sum_{j=1}^{n} \left| \text{predicted}_j - \text{true}_i \right|}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mb_r()
- mbrae(benchmark: ndarray | None = None) float[source]
Mean Bounded Relative Absolute Error
\[MBRAE = \frac{1}{n} \sum_{i=1}^{n} \frac{| \text{true}_i - \text{predicted}_i |}{| \text{true}_i - \text{benchmark}_i |}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mbrae()
- mda() float[source]
Mean Directional Accuracy modified after
\[\text{MDA} = \frac{1}{n-1} \sum_{i=1}^{n-1} \left( \text{sign}( \text{true}_{i+1} - \text{true}_i) == \text{sign}( \text{predicted}_{i+1} - \text{predicted}_i) \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mda()
- mdape() float[source]
Median Absolute Percentage Error. The value is multiplied by 100.
\[\text{MdAPE} = 100 \times \text{Median} \left( \left\{ \frac{|\text{true}_i - \text{predicted}_i|}{|\text{true}_i|} \right\}_{i=1}^n \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mdape()
- mde() float[source]
-
\[MDE = \text{median}(\text{predicted}_i - \text{true}_i)\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mde()
- mdrae(benchmark: ndarray | None = None) float[source]
Median Relative Absolute Error In Sklearn, there is “absolute” in equation but not in name of metric.
\[MdRAE = \text{median} \left( \left| \frac{true_i - predicted_i}{true_i - benchmark_i} \right| \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mdrae()
- me()[source]
Mean error or bias
\[ME = \frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.me()
- mean_bias_error() float[source]
Mean Bias Error It represents overall bias error or systematic error. It shows average interpolation bias; i.e. average over- or underestimation. [1][2].This indicator expresses a tendency of model to underestimate (negative value) or overestimate (positive value) global radiation, while the mean bias error values closest to zero are desirable. The drawback of this test is that it does not show the correct performance when the model presents overestimated and underestimated values at the same time, since overestimation and underestimation values cancel each other.
\[\text{MBE} = \frac{1}{N} \sum_{i=1}^{N} (true_i - predicted_i)\]References
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mean_bias_error()
- mean_gamma_deviance(weights=None) float[source]
-
\[\text{Mean Gamma Deviance (Weighted)} = \frac{1}{\sum_{i=1}^{n} w_i} \sum_{i=1}^{n} w_i \frac{2}{\text{true}_i} \left( \text{predicted}_i - \text{true}_i - \text{true}_i \ln \left( \frac{\text{predicted}_i}{\text{true}_i} \right) \right)\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mean_gamma_deviance()
- mean_poisson_deviance(weights=None) float[source]
-
\[\text{MPD} = \frac{1}{n} \sum_{i=1}^{n} 2 \left( \text{true}_i \log \left( \frac{\text{true}_i}{\text{predicted}_i} \right) - (\text{true}_i - \text{predicted}_i) \right)\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mean_poisson_deviance()
- mean_var() float[source]
Mean variance, adopted from HydroErr
\[\text{mean_var} = \text{Var} \left( \log(1 + \text{true}) - \log(1 + \text{predicted}) \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mean_var()
- med_seq_error() float[source]
Median Squared Error Same as mse, but it takes median which reduces the impact of outliers.
\[\text{MedSE} = \text{median} \left( (\text{predicted}_i - \text{true}_i)^2 \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics = RegressionMetrics(t, p) >>> metrics.med_seq_error()
- median_abs_error() float[source]
median absolute error
\[\text{MedAE} = \text{median} \left( \left| \text{true}_i - \text{predicted}_i \right| \right)\]References
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.median_absolute_error.html
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.median_abs_error()
- minkowski_distance(order=1) float[source]
-
\[D_{Minkowski} = \left( \sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|^p \right)^{\frac{1}{p}}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> metrics= RegressionMetrics(t, p) >>> distance = metrics.minkowski_distance()
- mle() float[source]
-
\[\text{MLE} = \frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + \text{predicted}_i) - \log(1 + \text{true}_i) \right)\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics = RegressionMetrics(t, p) >>> metrics.mle()
- mod_agreement_index(j: int = 1) float[source]
Modified agreement of index. It varies between 0 and 1 where 1 indicates perfect match between the observed and predicted values.
\[MAI = 1 - \frac{\sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|^j}{\sum_{i=1}^{n} \left( \left| \text{predicted}_i - \overline{\text{true}} \right| + \left| \text{true}_i - \overline{\text{true}} \right| \right)^j}\]- Parameters:
j (int, default 1) – when j is 2, this is same as agreement_index. Higher j means more impact of outliers.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics = RegressionMetrics(t, p) >>> metrics.mod_agreement_index()
- mpe() float[source]
Mean Percentage Error The value is multiplied by 100 to reflect percentage.
\[MPE = \frac{1}{n} \sum_{i=1}^{n} \left( \frac{true_i - predicted_i}{true_i} \right) \times 100\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mpe()
- mrae(benchmark: ndarray | None = None)[source]
-
\[MRAE = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{true}_i - \text{predicted}_i}{\text{benchmark}_i} \right|\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mrae()
- mre(benchmark: ndarray | None = None)[source]
-
\[\text{MRE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \right|\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mre()
- mse() float[source]
-
\[MSE = \frac{\sum_{i=1}^{N} w_i (true_i - predicted_i)^2}{\sum_{i=1}^{N} w_i}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.mse()
- msle(weights=None) float[source]
-
\[\text{MSLE} = \frac{\sum_{i=1}^{n} w_i \cdot \text{sq_log_error}_i}{\sum_{i=1}^{n} w_i}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.msle()
- norm_ae() float[source]
-
\[norm\_ae = \sqrt{\frac{\sum_{i=1}^{n} (error_i - MAE)^2}{n - 1}}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.norm_ae()
- norm_ape() float[source]
Normalized Absolute Percentage Error
\[\text{norm_APE} = \sqrt{ \frac{1}{n-1} \sum_{i=1}^{n} \left( \left| \frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \right| - \frac{1}{n} \sum_{j=1}^{n} \left| \frac{\text{true}_j - \text{predicted}_j}{\text{true}_j} \right| \right)^2 }\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.norm_ape()
- norm_euclid_distance() float[source]
-
\[D_{norm} = \sqrt{\sum_{i=1}^{n} \left( \frac{\text{true}_i}{\bar{\text{true}}} - \frac{\text{predicted}_i}{\bar{\text{predicted}}} \right)^2}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.norm_euclid_distance()
- nrmse() float[source]
Normalized Root Mean Squared Error
\[ \begin{align}\begin{aligned}NRMSE = \frac{\sqrt{\frac{1}{N} \sum_{i=1}^{N} (\text{true}_i - \text{predicted}_i)^2}}{\max(\text{true}) - \min( ext{true})}\\Examples\end{aligned}\end{align} \]>>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nrmse()
- nrmse_ipercentile(q1=25, q2=75) float[source]
RMSE normalized by inter percentile range of true. This is the least sensitive to outliers. q1: any interger between 1 and 99 q2: any integer between 2 and 100. Should be greater than q1. Reference: Pontius et al., 2008
\[\text{NRMSE}_{\text{IP}} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}}{Q_{q2} - Q_{q1}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nrmse_ipercentile()
- nrmse_mean() float[source]
Mean Normalized RMSE RMSE normalized by mean of true values.This allows comparison between datasets with different scales.
Reference: Pontius et al., 2008
\[NRMSE_{mean} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}}{\bar{\text{true}}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nrmse_mean()
- nrmse_range() float[source]
Range Normalized Root Mean Squared Error after Pontius et al., 2008
RMSE normalized by true values. This allows comparison between data sets with different scales. It is more sensitive to outliers.
Reference: .. math:
\text{NRMSE} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{predicted}_i - \text{true}_i)^2}}{\max(\text{true}) - \min(\text{true})}
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nrmse_range()
- nse() float[source]
Nash-Sutcliff Efficiency.
The Nash-Sutcliffe efficiency (NSE) is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance It determines how well the model simulates trends for the output response of concern. But cannot help identify model bias and cannot be used to identify differences in timing and magnitude of peak flows and shape of recession curves; in other words, it cannot be used for single-event simulations. It is sensitive to extreme values due to the squared differ-ences (Modirasi et al., 2015). To make it less sensitive to outliers, (Krause et al., 2005) proposed log and relative nse.
\[\text{NSE} = 1 - \frac{\sum_{i=1}^{N} (predicted_i - true_i)^2}{\sum_{i=1}^{N} (true_i - \bar{true})^2}\]where the bar above predicted and true indicates the mean of the array.
References
- Moriasi, D. N., Gitau, M. W., Pai, N., & Daggupati, P. (2015). Hydrologic and water quality models:
Performance measures and evaluation criteria. Transactions of the ASABE, 58(6), 1763-1785.
- Krause, P., Boyle, D., & Bäse, F. (2005). Comparison of different efficiency criteria for hydrological
model assessment. Adv. Geosci., 5, 89-97. https://dx.doi.org/10.5194/adgeo-5-89-2005.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nse()
- nse_alpha() float[source]
Alpha decomposition of the NSE, see Gupta et al., 2009 used in Kratzert et al., 2019.
\[\text{NSE}_{\text{alpha}} = \frac{\sigma_{\text{predicted}}}{\sigma_{\text{true}}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nse_alpha()
- nse_beta() float[source]
Beta decomposition of NSE. Gupta et al. 2009 used in kratzert et al., 2019.
\[\text{NSE}_{\text{beta}} = \frac{\mu_{\text{predicted}} - \mu_{\text{true}}}{\sigma_{\text{true}}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nse_beta()
- nse_bound() float[source]
Bounded Version of the Nash-Sutcliffe Efficiency (nse)
\[\text{NSE}_{\text{bound}} = \frac{\text{NSE}}{2 - \text{NSE}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nse_bound()
- nse_mod(j=1) float[source]
Gives less weightage to outliers if j=1 and if j>1 then it gives more weightage to outliers. Reference: Krause_ et al., 2005.
\[\text{NSE}_{\text{mod}} = 1 - \frac{\sum_{i=1}^{N} \left| \text{predicted}_i - \text{true}_i \right|^j}{\sum_{i=1}^{N} \left| \text{true}_i - \bar{ ext{true}} \right|^j}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nse_mod()
- nse_rel() float[source]
Relative Nash-Sutcliff Efficiency.
\[\text{NSE}_{\text{rel}} = 1 - \frac{\sum_{i=1}^{N} \left( \frac{|\text{predicted}_i - \text{true}_i|}{\text{true}_i} \right)^2}{\sum_{i=1}^{N} \left( \frac{|\text{true}_i - \overline{\text{true}}|}{\overline{\text{true}}} \right)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.nse_rel()
- pbias() float[source]
Percent bias determines how well the model simulates the average magnitudes for the output response of interest. It can also determine over and under-prediction. It cannot be used (1) for single-event simulations to identify differences in timing and magnitude of peak flows and the shape of recession curves nor (2) to determine how well the model simulates residual variations and/or trends for the output response of interest. It can give a deceiving rating of model performance if the model overpredicts as much as it underpredicts, in which case percent bias will be close to zero even though the model simulation is poor.
\[PBIAS = 100 \times \frac{\sum_{i=1}^{N} (\text{true}_i - \text{predicted}_i)}{\sum_{i=1}^{N} \text{true}_i}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.pbias()
- r2() float[source]
R2 is a statistical measure of how well the regression line approximates the actual data. Quantifies the percent of variation in the response that the ‘model’ explains. The ‘model’ here is anything from which we obtained predicted array. It is also called coefficient of determination or square of pearson correlation coefficient. More heavily affected by outliers than pearson correlatin r.
\[R^2 = \left( \frac{\sum_{i=1}^{N} \left( \frac{true_i - \bar{true}}{\sigma_{true}} \cdot \frac{predicted_i - \bar{predicted}}{\sigma_{predicted}} \right)}{N - 1} \right)^2\]where the bar above predicted and true indicates the mean of the array.
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> r_square= metrics.r2() >>> r_square
- r2_score(weights=None)[source]
This is not a symmetric function. Unlike most other scores, R^2 score may be negative (it need not actually be the square of a quantity R). This metric is not well-defined for single samples and will return a NaN value if n_samples is less than two.
\[\text{R2}_{\text{score}} = 1 - \frac{\sum_{i=1}^{n} w_i (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} w_i (\text{true}_i - \bar{\text{true}})^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.r2_score()
- rae() float[source]
Relative Absolute Error (aka Approximation Error)
\[\text{RAE} = \frac{\sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\sum_{i=1}^{n} \left| \text{true}_i - \overline{\text{true}} \right|}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rae()
- ref_agreement_index() float[source]
Refined Index of Agreement after after Willmott et al., 2012. It varies from -1 to 1. Larger the better.
\[a = \sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|\]\[b = 2 \sum_{i=1}^{n} \left| \text{true}_i - \overline{\text{true}} \right|\]\[d_{\text{ref}} = \begin{cases} 1 - \frac{a}{b} & \text{if } a \leq b \ \frac{b}{a} - 1 & \text{if } a > b \end{cases}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.ref_agreement_index()
- rel_agreement_index() float[source]
Relative index of agreement. from 0 to 1. larger the better.
\[\text{rel_agreement_index} = 1 - \frac{\sum_{i=1}^{n} \left( \frac{\text{predicted}_i - \text{true}_i}{\text{true}_i} \right)^2}{\sum_{i=1}^{n} \left( \frac{|\text{predicted}_i - \bar{\text{true}}| + |\text{true}_i - \bar{\text{true}}|}{\bar{\text{true}}} \right)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rel_agreement_index()
- relative_rmse() float[source]
Relative Root Mean Squared Error
\[RRMSE=\frac{\sqrt{\frac{1}{N}\sum_{i=1}^{N}(e_{i}-s_{i})^2}}{\bar{e}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.relative_rmse()
- rmdspe() float[source]
Root Median Squared Percentage Error. The value is multiplied by 100 to reflect percentage.
\[\text{RMDSPE} = \sqrt{\text{median}\left(\left(\frac{\text{true}_i - \text{predicted}_i}{\text{true}_i} \times 100\right)^2\right)}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rmdspe()
- rmse(weights=None) float[source]
-
\[\text{RMSE} = \sqrt{\frac{\sum_{i=1}^{n} w_i (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} w_i}}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rmse()
- rmsle() float[source]
-
This error is less sensitive to outliers . Compared to RMSE, RMSLE only considers the relative error between predicted and actual values, and the scale of the error is nullified by the log-transformation. Furthermore, RMSLE penalizes underestimation more than overestimation. This is especially useful in those studies where the underestimation of the target variable is not acceptable but overestimation can be tolerated .
\[RMSLE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + \text{predicted}_i) - \log(1 + \text{true}_i) \right)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rmsle()
- rmspe() float[source]
Root Mean Square Percentage Error .
\[RMSPE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left(PE_i\right)^2} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left(\frac{\text{true}_i - \text{predicted}_i}{\text{true}_i}\right)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rmspe()
- rmsse() float[source]
Root Mean Squared Scaled Error after Muhaimin et al., 2021 and Zhou T, 2023. It is also considered similar to MASE.
\[\text{RMSSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \frac{\left| \text{true}_i - \text{predicted}_i \right|}{\frac{1}{n-s} \sum_{j=s+1}^{n} \left| \text{true}_j - \text{true}_{j-s} \right|} \right)^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rmsse()
- rrse() float[source]
-
\[RRSE = \sqrt{\frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} (\text{true}_i - \bar{\text{true}})^2}}\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rrse()
- rse() float[source]
Relative Squared Error
\[\text{RSE} = \frac{\sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}{\sum_{i=1}^{n} (\text{true}_i - \bar{\text{true}})^2}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rse()
- rsr() float[source]
It is MSE normalized by standard deviation of true values. Following Moriasi et al., 2007..
It incorporates the benefits of error index statistics and includes a scaling/normalization factor, so that the resulting statistic and reported values can apply to various constituents. It ranges from 0 to infinity, with 0-0.5 indicating very good model performance, 0.5-0.8 indicating good model performance.
Standard deviation is calculated using np.ntd(true, ddof=1) to match the results of this implementation.
\[\text{RSR} = \frac{\sqrt{\frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2}}{\sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (\text{true}_i - \bar{\text{true}})^2}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.rsr()
- sa() float[source]
Spectral angle Keshava N, 2004. It is arccosine of the dot product of true and predicted arrays. It varies from -pi/2 to pi/2. Closer to 0 is better. It measures angle between two vectors in hyperspace indicating how well the shape of two arrays match instead of their magnitude.
\[SA = \arccos \left( \frac{\sum_{i=1}^{n} (\text{true}_i \cdot \text{predicted}_i)}{\sqrt{\sum_{i=1}^{n} (\text{true}_i)^2} \cdot \sqrt{\sum_{i=1}^{n} (\text{predicted}_i)^2}} \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.sa()
- sc() float[source]
Spectral correlation ater Robila and Gershman, 2005.. It varies from -pi/2 to pi/2. Closer to 0 is better. It measures the angle between the two vectors in hyperspace and highlights how well the shape of the two series match.
\[sc = \arccos \left( \frac{ \sum_{i=1}^{n} (t_i - \bar{t}) \cdot (p_i - \bar{p}) }{ \sqrt{\sum_{i=1}^{n} (t_i - \bar{t})^2} \cdot \sqrt{\sum_{i=1}^{n} (p_i - \bar{p})^2} } \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.sc()
- sga() float[source]
Spectral gradient angle. It varies from -pi/2 to pi/2. Closer to 0 is better.
\[\text{SGA} = \arccos \left( \frac{\sum_{i=1}^{n-1} \left( (true_{i+1} - true_i) \cdot (predicted_{i+1} - predicted_i) \right)}{\sqrt{\sum_{i=1}^{n-1} (true_{i+1} - true_i)^2} \times \sqrt{\sum_{i=1}^{n-1} (predicted_{i+1} - predicted_i)^2}} \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.sga()
- sid() float[source]
Spectral Information Divergence. From -pi/2 to pi/2. Closer to 0 is better.
\[\text{SID} = \left( \frac{\text{t}}{\text{mean(t)}} - \frac{\text{p}}{\text{mean(p)}} \right) \cdot \left( \log_{10}(\text{t}) - \log_{10}(\text{mean(t)}) - \log_{10}(\text{p}) + \log_{10}(\text{mean(p)}) \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.sid()
- skill_score_murphy() float[source]
Skill score after Murphy, 1988. Adopted from SkillMetrics . Calculate non-dimensional skill score (SS) between two variables using definition of Murphy (1988) using the formula:
\[SS = 1 - RMSE^2/SDEV^2\]where SDEV is the standard deviation of the true values
\[SDEV^2 = sum_(n=1)^N [r_n - mean(r)]^2/(N-1)\]where p is the predicted values, r is the reference values, and N is the total number of values in p & r. Note that p & r must have the same number of values. A positive skill score can be interpreted as the percentage of improvement of the new model forecast in comparison to the reference. On the other hand, a negative skill score denotes that the forecast of interest is worse than the referencing forecast. Consequently, a value of zero denotes that both forecasts perform equally [MLAir, 2020].
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.skill_score_murphy()
- smape() float[source]
Symmetric Mean Absolute Percentage Error. Adoption from this.
\[SMAPE = \frac{100}{n} \sum_{i=1}^{n} \frac{2 \left| \text{predicted}_i - \text{true}_i \right|}{\left| \text{true}_i \right| + \left| \text{predicted}_i \right|}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.smape()
- smdape() float[source]
Symmetric Median Absolute Percentage Error Note: result is NOT multiplied by 100
\[\text{smdape} = \text{median} \left( \frac{2 \cdot | \text{predicted} - \text{true} |}{| \text{true} | + | \text{predicted} | + \epsilon} \right)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.smdape()
- spearmann_corr() float[source]
Separmann correlation coefficient.
This is a nonparametric metric and assesses how well the relationship between the true and predicted data can be described using a monotonic function.
\[r = \frac{\sum_{i=1}^{n} \left( R_{t,i} - \overline{R_t} \right) \left( R_{p,i} - \overline{R_p} \right)}{\sqrt{ \sum_{i=1}^{n} \left( R_{t,i} - \overline{R_t} \right)^2 \sum_{i=1}^{n} \left( R_{p,i} - \overline{R_p} \right)^2 }}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.spearmann_corr()
- sse() float[source]
Sum of squared errors (model vs actual). It is measure of how far off our model’s predictions are from the observed values. A value of 0 indicates that all predications are spot on. A non-zero value indicates errors.
This is also called residual sum of squares (RSS) or sum of squared residuals as per tutorialspoint .
\[\text{SSE} = \sum_{i=1}^{n} (true_i - predicted_i)^2\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.sse()
- std_ratio(**kwargs) float[source]
Ratio of standard deviations of predictions and trues. Also known as standard ratio, it varies from 0.0 to infinity while 1.0 being the perfect value.
\[\text{std_ratio} = \frac{\sigma_{\text{predicted}}}{\sigma_{\text{true}}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.std_ratio()
- tweedie_deviance_score(power=0) float[source]
-
\[D(\text{true}, \text{predicted}) = \frac{1}{n} \sum_{i=1}^{n} (\text{true}_i - \text{predicted}_i)^2\]\[D(\text{true}, \text{predicted}) = 2 \sum_{i=1}^{n} \left( \text{true}_i \log\left(\frac{\text{true}_i + (\text{true}_i = 0)}{\text{predicted}_i}\right) - \text{true}_i + \text{predicted}_i \right)\]\[D(\text{true}, \text{predicted}) = 2 \sum_{i=1}^{n} \left( \frac{\text{true}_i}{\text{predicted}_i} - \log\left(\frac{\text{true}_i}{\text{predicted}_i}\right) - 1 \right)\]\[D(\text{true}, \text{predicted}) = 2 \sum_{i=1}^{n} \left( \frac{(\text{true}_i - \text{predicted}_i)^2}{\text{true}_i^2 \text{predicted}_i} \right)\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.array([1, 2, 3, 4, 5]) >>> p = np.array([1.1, 1.9, 3.1, 4.2, 4.8]) >>> metrics= RegressionMetrics(t, p) >>> score = metrics.tweedie_deviance_score()
- umbrae(benchmark: ndarray | None = None)[source]
Unscaled Mean Bounded Relative Absolute Error
\[UMBRAE = \frac{\frac{1}{n} \sum_{i=1}^{n} \frac{|t_i - p_i|}{|t_i - b_i|}}{1 - \frac{1}{n} \sum_{i=1}^{n} \frac{|t_i - p_i|}{|t_i - b_i|}}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.umbrae()
- variability_ratio() float[source]
Variability Ratio It is the ratio of the variance of the predicted values to the variance of the true values. It is used to measure the variability of the predicted values relative to the true values.
\[VR = 1 - \left| \frac{\frac{\sigma_{\text{predicted}}}{\mu_{\text{predicted}}}}{\frac{\sigma_{\text{true}}}{\mu_{\text{true}}}} - 1 \right|\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.variability_ratio()
- ve() float[source]
Volumetric efficiency. from 0 to 1. Smaller the better.
\[VE = 1 - \frac{\sum_{i=1}^{n} \left| \text{predicted}_i - \text{true}_i \right|}{\sum_{i=1}^{n} \text{true}_i}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.ve()
- volume_error() float[source]
Returns the Volume Error (Ve) after Reynolds, 2017. It is an indicator of the agreement between the averages of the simulated and observed runoff (i.e. long-term water balance).
\[\text{volume_error}= Sum(predicted- true)/sum(predicted)\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.volume_error()
- wape() float[source]
weighted absolute percentage error (wape)
It is a variation of mape but more suitable for intermittent and low-volume data.
\[\text{WAPE} = \frac{\sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\sum_{i=1}^{n} \text{true}_i}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.wape()
- watt_m() float[source]
-
\[M = \frac{2}{\pi} \cdot \arcsin \left( 1 - \frac{\frac{1}{n} \sum_{i=1}^{n} ( \text{true}_i - \text{predicted}_i )^2}{\sigma_{\text{true}}^2 + \sigma_{\text{predicted}}^2 + (\mu_{\text{predicted}} - \mu_{\text{true}})^2} \right)\]
Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.watt_m()
- wmape() float[source]
Weighted Mean Absolute Percent Error
\[\text{WMAPE} = \frac{\sum_{i=1}^{n} \left| \text{true}_i - \text{predicted}_i \right|}{\sum_{i=1}^{n} \text{true}_i}\]Examples
>>> import numpy as np >>> from SeqMetrics import RegressionMetrics >>> t = np.random.random(10) >>> p = np.random.random(10) >>> metrics= RegressionMetrics(t, p) >>> metrics.wmape()