ANN Forecast
Warning
This library is under development, none of the presented solutions are available for download.
Use continuous forest inventory databases to predict forest growth and production. Utilize artificial neural networks for greater flexibility. With this module, you will be able to estimate volume, the number of stems, basal area, among other variables of interest.
Class Parameters
ANN Trainer
AnnTrainer(df, y, *train_columns, iterator=None)
Parameters | Description |
---|---|
df | The dataframe containing the continous processed forest inventory data. |
y | The target variable for training the ANN (Y), the variable on which the ANN will be trained to predict. |
*train_columns | (*args ) Names of the columns that will be used to train the artificial neural network so that it can predict the values of Y. Must be numeric. |
iterator | (Optional) Name of the column that contains the iterator . An artificial neural network will be adjusted for each iterator . |
Class Methods
- save_dir = Directory where the
.pkl
ann file will be saved.
Methods | Description |
---|---|
.fit_model() | Adjust the model using *train_columns to predict the variable Y. |
Ann structures
6 different structures of artificial neural networks will be tested. Only the result from 1 model will be returned. The model returned will be selected by the ranking function.
For the 'ann' model, the module sklearn.neural_network.MLPRegressor is used.
ANN Predictor
AnnPredictor(pkl_file)
Parameters | Description |
---|---|
pkl_file | Directory of the .pkl file that will be used for prediction. |
Class Methods
- Returns the prediction of
Y
for the*args
columns. The*args
columns must be the same as those used in*train_columns
for training.
Example Usage
The main advantage of using artificial neural networks in the estimation of forest variables lies in the possibility of including a large number of variables in the prediction. However, currently, the neural networks in this module operate preferably with continuous variables. Therefore, when one wishes to use categorical variables, it is recommended to apply some kind of transformation, such as one-hot encoding, to convert them into a numerical format before inserting them into the network.
As an example, we will use an adaptation of the data obtained by Arce and Dobner Jr. (2024) for Eucalyptus dunnii. The dataset consists of 81 permanent plots, with ages ranging from 3 to 9 years, measured continuously over time.
First 5 rows of the file:
Chave_Parcela | Area_m2 | Idade | N_ha | d médio | h médio | H dom | G_m2_ha | V_m3_ha | S |
---|---|---|---|---|---|---|---|---|---|
14401109002_P1 | 300 | 3 | 933 | 7.47 | 6.8 | 7.4 | 4.2 | 6.0 | 15.4 |
14401109002_P1 | 300 | 4 | 933 | 10.3 | 10.1 | 10.8 | 8.0 | 24.5 | 15.4 |
14401109002_P1 | 300 | 5 | 933 | 14.0 | 12.8 | 13.7 | 14.9 | 67.1 | 15.4 |
14401109002_P1 | 300 | 6 | 933 | 14.1 | 14.5 | 15.4 | 15.1 | 77.2 | 15.4 |
14401109002_P1 | 300 | 7 | 867 | 16.6 | 16.1 | 16.5 | 19.4 | 112.8 | 15.4 |
In this case, we will use the columns "Idade", "N_ha", "d médio", "h médio", "H dom", "G_m2_ha", and "S" to predict the value of "V_m3_ha".
exemplo_previsao_rna.py | |
---|---|
- Importa a classe
AnnTrainer
eAnnPredictor
do móduloforecast
. - Importa
pandas
para manipulação de dados. - Importa
train_test_split
para dividir dados em treino e validação.
- Define the directory where the data in xlsx format is located, saving it in the variable
path
. - Load the data and save it in the variable
dados
. - Save 80% of the data as training data in the variable
dados_treino
and 20% as validation data in the variabledados_validacao
, using a random seed of 42. - Create a variable called
train_columns
containing the list of column names to be used for training. - Instantiate the class
AnnTrainer
, saving it in the variableann
, passing the training data, the columnV_m3_ha
as the target variable, and the list of training columns. - Fit the neural network model, saving the metrics in the variable
metrics
and the generated.pkl
file in the defined directory.
After that, the trained model will be ready for use. We can test the performance of our model by using it to predict the validation data saved in dados_validacao
.
- Import the function
get_metrics
from theutils
module for later metric calculation. - Instantiate the class
AnnPredictor
, saving it in the variablepredictor
, and passing the.pkl
file generated during the neural network training. - Create the column
V_m3_ha_predicted
in thedados_validacao
DataFrame, containing the predictions made by the trained ANN for the training columns in thedados_validacao
DataFrame. - Use the
get_metrics
function to obtain the metrics between the actual valuesV_m3_ha
and the predicted valuesV_m3_ha_predicted
from thedados_validacao
DataFrame. The returned metrics are: MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), R² (Coefficient of Determination), EXP_VAR (Explained Variance Score), and ME (Mean Error). - Create a DataFrame with the validation metrics.
Outputs
Tables
metrics
(1)
{ .annotate }
- Table with the performance metrics of the ANN during training.
Iterator | Model | MSE | RMSE | MAE | MAPE | R² | Explained Variance | Max Error |
---|---|---|---|---|---|---|---|---|
Not used | V_m3_ha_ann_predictor | 32.29 | 5.68 | 3.05 | 4.23 | 0.99 | 0.99 | 0.14 |
metrics_val
(1)
- Table with the performance metrics of the ANN on validation.
MAE | MAPE | MSE | RMSE | R² | Explained Variance | Mean Error |
---|---|---|---|---|---|---|
2.99 | 3.28 | 26.13 | 5.11 | 0.99 | 0.99 | 0.12 |
Files
V_m3_ha_ann_predictor.pkl
(1)
.pkl
file containing the trained ANN parameters.
References
ARCE, JULIO EDUARDO; DOBNER JR., MARIO. (2024). Manejo e planejamento de florestas plantadas: com ênfase nos gêneros Pinus e Eucalyptus. Curitiba, PR: Ed. dos Autores, 419p.