Utils
This module contains auxiliary functions that may be useful to users during the processing and analysis of forest data.
Available Functions
stats_summary
Generates a statistical summary for the specified numeric columns of a DataFrame.
Parameters:
- df
: Pandas DataFrame with the input data.
- *args
: Names of the numeric columns to be summarized.
- ignore_zeros
(bool): If True
, zero values are ignored in the calculations.
- language
(str): Sets the output column language. Accepts "en"
or "pt-br"
.
Output:
- DataFrame with statistics: mean, minimum, maximum, standard deviation, coefficient of variation (CV), quartiles (Q1, Q2, Q3), and interquartile range (IQR).
get_metrics
Calculates evaluation metrics for predictive models.
Parameters:
- real_y
: List or array with the actual values.
- predicted_y
: List or array with the predicted values.
Calculated metrics:
- MAE: Mean Absolute Error.
- MAPE: Mean Absolute Percentage Error.
- MSE: Mean Squared Error.
- RMSE: Root Mean Squared Error.
- R²: Coefficient of determination.
- Explained variance.
- Mean error (model bias).
Output:
- Tuple with the metric values in the following order: (mae, mape, mse, rmse, r_squared, explained_variance, mean_error)
.
plot_x_y
Generates a scatter plot for one variable x
and one or more variables y
.
Parameters:
- x
: List or array with the X-axis values.
- *ys
: One or more lists or arrays with the Y-axis values.
Behavior:
- Each y
series is represented with a unique combination of marker and color.
- Axes start from zero, grid is shown in the background, and each y
series has a legend.
Output:
- Displays the plot on screen.