ANN Forecast

Use continuous forest inventory databases to predict forest growth and production. Utilize artificial neural networks for greater flexibility. With this module, you will be able to estimate volume, the number of stems, basal area, among other variables of interest.

Class Parameters

ANN Trainer

AnnTrainer(df, y, *train_columns, iterator=None)

Parameters	Description
df	The dataframe containing the continous processed forest inventory data.
y	The target variable for training the ANN (Y), the variable on which the ANN will be trained to predict.
*train_columns	(`*args`) Names of the columns that will be used to train the artificial neural network so that it can predict the values of Y. Must be numeric.
iterator	(Optional) Name of the column that contains the `iterator`. An artificial neural network will be adjusted for each `iterator`.

Class Methods

methods and parameters

  AnnTrainer.fit_model(save_dir=None)#(1)!

save_dir = Directory where the .pkl ann file will be saved.

Methods	Description
.fit_model()	Adjust the model using `*train_columns` to predict the variable Y.

Ann structures

6 different structures of artificial neural networks will be tested. Only the result from 1 model will be returned. The model returned will be selected by the ranking function.
For the 'ann' model, the module sklearn.neural_network.MLPRegressor is used.

--- title: ANN Parameters --- classDiagram direction LR class MLPRegressor { Epochs: 3000 Activation: logistic Solver Mode: lbfgs Batch size: dynamic Learning rate init: 0.1 Learning rate mode: adaptive } class Model-0 { Hidden layer sizes: (15, 25, 20, 30, 10) } class Model-1 { Hidden layer sizes: (35, 10, 25, 35, 15) } class Model-2 { Hidden layer sizes: (25, 15, 30, 20) } class Model-3 { Hidden layer sizes: (15, 35, 45) } class Model-4 { Hidden layer sizes: (35, 10, 25, 35, 15) } class Model-5 { Hidden layer sizes: (35, 10, 25, 35, 15, 20, 15, 30) } MLPRegressor <|-- Model-0 MLPRegressor <|-- Model-1 MLPRegressor <|-- Model-2 MLPRegressor <|-- Model-3 MLPRegressor <|-- Model-4 MLPRegressor <|-- Model-5

ANN Predictor

AnnPredictor(pkl_file)

Parameters	Description
pkl_file	Directory of the `.pkl` file that will be used for prediction.

Class Methods

methods and parameters

  AnnPredictor.predict(df, *args)#(1)!

Returns the prediction of Y for the *args columns. The *args columns must be the same as those used in *train_columns for training.

Example Usage

The main advantage of using artificial neural networks in the estimation of forest variables lies in the possibility of including a large number of variables in the prediction. However, currently, the neural networks in this module operate preferably with continuous variables. Therefore, when one wishes to use categorical variables, it is recommended to apply some kind of transformation, such as one-hot encoding, to convert them into a numerical format before inserting them into the network.

As an example, we will use an adaptation of the data obtained by Arce and Dobner Jr. (2024) for Eucalyptus dunnii. The dataset consists of 81 permanent plots, with ages ranging from 3 to 9 years, measured continuously over time.

Download the file.

First 5 rows of the file:

Chave_Parcela	Area_m2	Idade	N_ha	d médio	h médio	H dom	G_m2_ha	V_m3_ha	S
14401109002_P1	300	3	933	7.47	6.8	7.4	4.2	6.0	15.4
14401109002_P1	300	4	933	10.3	10.1	10.8	8.0	24.5	15.4
14401109002_P1	300	5	933	14.0	12.8	13.7	14.9	67.1	15.4
14401109002_P1	300	6	933	14.1	14.5	15.4	15.1	77.2	15.4
14401109002_P1	300	7	867	16.6	16.1	16.5	19.4	112.8	15.4

In this case, we will use the columns "Idade", "N_ha", "d médio", "h médio", "H dom", "G_m2_ha", and "S" to predict the value of "V_m3_ha".

exemplo_previsao_rna.py
from fptools.forecast import AnnTrainer, AnnPredictor#(1)!

import pandas as pd#(2)!

from sklearn.model_selection import train_test_split#(3)!

Importa a classe AnnTrainer e AnnPredictor do módulo forecast.
Importa pandas para manipulação de dados.
Importa train_test_split para dividir dados em treino e validação.

exemplo_previsao_rna.py
path = r"Your/directory/to/dados_ann.xlsx"#(1)!
dados = pd.read_excel(path)#(2)!

dados_treino, dados_validacao = train_test_split(dados,
                                                test_size=0.2,
                                                random_state=42)#(3)!

train_columns = ["Idade",
                 "N_ha",
                 "d médio",
                 "h médio",
                 "H dom",
                 "G_m2_ha",
                 "S"]#(4)!

ann = AnnTrainer(dados_treino, "V_m3_ha", *train_columns)#(5)!        

metrics = ann.fit_model(r"Your/directory/to/save")#(6)!  

Define the directory where the data in xlsx format is located, saving it in the variable path.
Load the data and save it in the variable dados.
Save 80% of the data as training data in the variable dados_treino and 20% as validation data in the variable dados_validacao, using a random seed of 42.
Create a variable called train_columns containing the list of column names to be used for training.
Instantiate the class AnnTrainer, saving it in the variable ann, passing the training data, the column V_m3_ha as the target variable, and the list of training columns.
Fit the neural network model, saving the metrics in the variable metrics and the generated .pkl file in the defined directory.

After that, the trained model will be ready for use. We can test the performance of our model by using it to predict the validation data saved in dados_validacao.

exemplo_previsao_rna.py
from fptools.utils import get_metrics #(1)!

predictor = AnnPredictor(
                        r"Your/directory/to/save/V_m3_ha_ann_predictor.pkl"
                        )#(2)!

dados_validacao['V_m3_ha_predicted'] = predictor.predict(
                                                         dados_validacao,
                                                        *train_columns
                                                        )#(3)!

mae, mape, mse, rmse, r_squared, exp_var, m_error = get_metrics(
                                                                dados_validacao['V_m3_ha'],
                                                                dados_validacao['V_m3_ha_predicted']
                                                                )#(4)!
metrics_val = pd.DataFrame({
    'MAE': [mae],
    'MAPE': [mape],
    'MSE': [mse],
    'RMSE': [rmse],
    'R squared': [r_squared],
    'Explained Var': [exp_var],
    'Mean Error': [m_error]
})#(5)!

Import the function get_metrics from the utils module for later metric calculation.
Instantiate the class AnnPredictor, saving it in the variable predictor, and passing the .pkl file generated during the neural network training.
Create the column V_m3_ha_predicted in the dados_validacao DataFrame, containing the predictions made by the trained ANN for the training columns in the dados_validacao DataFrame.
Use the get_metrics function to obtain the metrics between the actual values V_m3_ha and the predicted values V_m3_ha_predicted from the dados_validacao DataFrame. The returned metrics are: MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), R² (Coefficient of Determination), EXP_VAR (Explained Variance Score), and ME (Mean Error).
Create a DataFrame with the validation metrics.

Outputs

Tables

metrics(1)

{ .annotate }

Table with the performance metrics of the ANN during training.

Iterator	Model	MSE	RMSE	MAE	MAPE	R²	Explained Variance	Max Error
Not used	V_m3_ha_ann_predictor	32.29	5.68	3.05	4.23	0.99	0.99	0.14

metrics_val(1)

Table with the performance metrics of the ANN on validation.

MAE	MAPE	MSE	RMSE	R²	Explained Variance	Mean Error
2.99	3.28	26.13	5.11	0.99	0.99	0.12

Files

V_m3_ha_ann_predictor.pkl(1)

.pkl file containing the trained ANN parameters.

Download the file.

References

ARCE, JULIO EDUARDO; DOBNER JR., MARIO. (2024). Manejo e planejamento de florestas plantadas: com ênfase nos gêneros Pinus e Eucalyptus. Curitiba, PR: Ed. dos Autores, 419p.