CNC machines are essential in industries such as aerospace, automotive, and defense, where tool reliability is critical. Predicting tool wear is challenging due to the complex, time-dependent nature of sensor data. These time-series signals often involve nonlinear relationships and dynamic behaviors.
To address this, we propose an efficient deep learning framework combining Long Short-Term Memory (LSTM) networks and Autoencoders (AEs) for tool wear prediction in high-speed CNC milling cutters. The model begins with multi-domain feature extraction and correlation analysis, incorporating features like entropy and interquartile range (IQR), which show strong relevance to tool wear.
Trained on the PHM10 run-to-failure dataset, the LSTM-AE model predicts tool wear values, which are then used to estimate Remaining Useful Life (RUL). The model generally underestimates RUL—favorable for preventive maintenance—and achieves up to 98% prediction accuracy, with improved MAE and RMSE of 2.6 ± 0.3222E-3 and 3.1 ± 0.6146E-3, respectively.Editor
Hamdy K. Elminir, Mohamed A. El-Brawany, Dina Adel Ibrahim, Hatem M. Elattar , E.A. Ramadan
Introduction
The goal of Tool Condition Monitoring (TCM) can be divided into three main categories: fault detection, identification of fault types, and estimation of the system’s Remaining Useful Life (RUL), which is commonly referred to as prognostics. Prognostics is an emerging area of research focused on predicting failures before they occur, rather than merely detecting them.
Broadly defined, prognostics refers to any approach that forecasts future conditions. It primarily addresses the health assessment of a system by using sensor data to estimate RUL. This involves several steps, including identifying failure indicators, estimating the current system state, and constructing a health index [1,2] .
The key objective of prognostics is RUL prediction—determining how long a machine can operate safely before experiencing failure [3] . In this context, tool wear serves as a critical health index that supports accurate RUL estimation.
Data-driven prognostics do not require prior knowledge of system physics but instead rely on Run-To-Failure (RTF) data reflecting system performance. Artificial Intelligence (AI) has gained significant popularity in predictive maintenance, supporting applications such as diagnosis, fault classification, and Remaining Useful Life (RUL) prediction [4] .
Classical AI approaches in this domain often involve Machine Learning (ML) methods like Support Vector Machines (SVM) and Random Forests (RF), as well as Deep Learning (DL) techniques.
SVM and RF have been extensively used in research for tool wear forecasting and cutter RUL prediction. For example, a study using RF demonstrated an effective tool wear prediction method in milling operations [5] , later comparing it to earlier ML algorithms [6] .
XGBoost, an ML algorithm based on gradient boosting, has also been applied for RUL estimation in lithium-ion batteries (LIB), with its hyperparameters finely tuned to improve performance [7] .
XGBoost further proved effective in estimating the state of charge in LIBs, outperforming traditional regression models in terms of Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) [8] . Additional ML techniques have been employed for assessing battery health [9] .
SVM has also been used in combination with Artificial Neural Networks (ANN) and XGBoost for predictive maintenance (PdM) of active chilled beam air conditioning systems [10] . In the era of Industry 5.0, ML methods have been extensively adopted for both PdM and Condition Monitoring (CM) [11] .
The performance of RF has been shown to surpass that of feed-forward backpropagation (FFBP) models. Furthermore, SVM has been widely implemented in tool condition monitoring studies [12] .
Artificial Neural Networks (ANNs) are another widely used approach in TCM. Techniques such as SVM, RF, and Multi-Layer Perceptron (MLP) have been applied to predict and classify tool wear in additively manufactured 316 stainless steel components [13] .
As a cornerstone of modern AI research, ANN continues to attract attention for its robust capabilities in prediction, failure analysis, and diagnostics.
For instance, Sindhu, Tabassum Naz, et al. proposed an enhanced ANN model for disease analysis [14], while Çolak, Andaç Batur, et al. studied how ANN combined with Maximum Likelihood Estimation (MLE) could predict the reliability of electrical components [15] .
ANN was also utilized to model breast carcinoma, demonstrating its effectiveness in predicting various parameters, including patient survival [16] . Shafiq, Anum, et al. conducted a comparative study involving ANN and MLE on COVID-19 datasets [17] . In another study [18] , the Rayleigh distribution was applied to develop a multi-layer ANN optimized with Bayesian techniques for reliability parameter estimation.
ANN has also been employed to model fluid flow dynamics and predict controllable parameters under specific conditions, including Ree–Eyring fluid [19] and nanofluids [20,21] . In summary, ANN has consistently demonstrated its superiority in the reliability analysis of lifetime models [22,23] .
Deep Learning (DL) techniques are widely employed in the failure analysis and prognostics of industrial systems [24] . The rise of DL-based models in the prognostics domain has been largely driven by advancements in sensor technology and the proliferation of Big Data [25] .
Among various DL architectures, Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) have been extensively applied in Tool Condition Monitoring (TCM).
CNNs have proven effective in various prognostic applications, including robotic fuse fault diagnosis [26] . They have also been applied to tool wear prediction in numerous studies involving milling data.
For instance, in [27] , a deep learning model was developed that leveraged features extracted from force and vibration sensors. Building on this, the researchers introduced a novel model called the Reshaped Time Series Convolutional Neural Network (RTSCNN) [28] , which used raw sensor data with CNN as a feature extractor.
The model incorporated a dense layer with a Rectified Linear Unit (ReLU) activation function followed by a regression layer for predicting tool wear. However, the authors noted that the performance showed no significant improvement over their earlier work. In other studies, CNN has been integrated with LSTM in hybrid models for tool wear prediction [29] .
In [30] , Convolutional Bi-directional Long Short-Term Memory (CBLSTM) networks were designed to process raw sensor inputs. Here, CNN is first used to extract local features, which are then fed into bi-directional LSTMs.
These are followed by stacked fully connected layers and a linear regression layer to predict the target values. This approach was validated through real-world tool wear experiments using raw sensor data. Furthermore, CNNs combined with LSTMs have been used to construct a health index (HI) and perform RUL estimation based on the C-MAPSS dataset [31] .
Long Short-Term Memory (LSTM), a specialized form of Recurrent Neural Network (RNN), is widely recognized for its effectiveness in handling sensor and time-series data. Its capabilities have been extensively leveraged in various prognostic applications. LSTM has recently been applied to domains such as Lithium-ion battery (LIB) health prediction and tool wear estimation.
For instance, in [32] , LSTM was combined with Gaussian Process Regression (GPR) to build a degradation model based on health indicators (HIs) extracted from LIB experimental data. This model demonstrated strong performance in predicting battery pack health, particularly in terms of Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).
In [33] , a three-step model integrating XGBoost, a stacked bidirectional LSTM, and Bayesian optimization was developed to predict battery capacity degradation. Similarly, Wang, Jiujian, et al. applied LSTM for solving the Prognostics and Health Management (PHM) challenge in the PHM08 competition [34] .
A transfer learning-based bidirectional LSTM model was introduced in [35] for Remaining Useful Life (RUL) prediction of rolling bearings under varying operational conditions. LSTM has also been used in broader prediction contexts, such as road traffic flow forecasting using Gated Recurrent Units (GRU) and wavelet transforms [36] , and in modeling the energy consumption of water treatment facilities [37] .
LSTM has proven especially effective in tool wear prediction, often combined with advanced preprocessing techniques. For example, in [38] , it was used alongside singular spectrum analysis for feature extraction and Principal Component Analysis (PCA) for dimensionality reduction in a predictive model for tool wear.
To enhance predictive accuracy, LSTM is frequently integrated with Convolutional Neural Networks (CNNs), leveraging the strengths of both architectures [29] . In [39] , a TCM model utilized CNN and bidirectional LSTM (BiLSTM), with tool wear values fed into a ResNetD-based framework for final prediction.
Another model in [40] combined CNN and BiLSTM with an attention mechanism to identify the most relevant features, achieving high prediction accuracy across multiple datasets. In [41] , a 1D-CNN with residual structure was used for feature extraction in a TCM model, followed by BiLSTM for final prediction.
Autoencoders (AEs) also play a significant role in prognostics due to their ability to capture long-term dependencies and extract representative features. Their strengths include data denoising and dynamic feature learning, making them well-suited for fault diagnosis in industrial systems.
For example, [42] proposed a model for diagnosing faults in rotating machinery using AEs for feature learning and the artificial fish swarm algorithm for optimization. In [43] , a denoising AE was improved to enhance fault diagnosis of rolling bearings, while [44] introduced a sparse stacked AE for analyzing faults in solid oxide fuel cell systems, demonstrating its superior performance.
AEs are also valued for their capacity to accurately reconstruct input signals, making them suitable for sequence prediction and sensor-driven tasks like those in the PHM10 challenge.
Based on this, we propose a hybrid model combining LSTM and AE for tool wear prediction, contributing to more precise RUL estimation in CNC milling operations.
Remaining Useful Life (RUL) estimation based on Real-Time Feedback (RTF) data is a major focus in prognostics. Our model approaches RUL estimation by leveraging predicted tool wear values and the critical wear threshold.
The accuracy of this prediction method plays a vital role in ensuring reliable RUL estimation, and our approach has demonstrated excellent performance in predicting the RUL of CNC milling machine cutters.
Sensor data used in this context typically consist of multivariate time-series, which often exhibit dynamic behavior with auto-correlation and varying transient correlations between variables. These characteristics pose significant challenges when working with large-scale datasets like PHM10.
Autoencoders (AEs), with their ability to perform dimensionality reduction and extract dynamic features, are well-suited for such high-dimensional sensor data [45] . Meanwhile, Long Short-Term Memory (LSTM) networks have consistently shown strong capabilities in modeling and predicting time-series data.
To address these challenges, we developed a composite LSTM-AE model that integrates the strengths of both architectures for effective prediction of large-scale sensor data. This model was evaluated using the PHM10 dataset, where it delivered accurate results in both tool wear prediction and RUL estimation.
Furthermore, the model is highly adaptable and can be generalized to other datasets involving raw sensor data paired with a target variable or extensive feature sets requiring reduction, making it broadly applicable across various prognostic scenarios.
Based on the reviewed literature, the key contributions of this research are as follows:
This study introduces a comprehensive framework for Tool Condition Monitoring (TCM) of CNC machine cutters, focusing on tool wear prediction and Remaining Useful Life (RUL) estimation. The framework employs the original PHM10 dataset, specifically using cutter data C1, C4, and C6 for model training, testing, and validation.
It incorporates a multi-domain feature extraction approach to identify and select the features most strongly correlated with tool wear. A deep learning-based hybrid model, combining Long Short-Term Memory (LSTM) and Autoencoder (AE) architectures, is developed to map these selected features to the corresponding wear values.
The proposed LSTM-AE model is trained, tested, and validated to predict flank wear using PHM10 data. Based on the predicted wear values and the predefined wear limit, the model estimates RUL with high precision.
The model demonstrated superior performance compared to existing deep learning approaches, achieving a tool wear prediction accuracy of approximately 98%.
This high level of accuracy enables precise RUL estimation and supports predictive maintenance by allowing maintenance to be scheduled just before failure. Importantly, the model tends to slightly underestimate RUL—an intentional and desirable outcome in predictive maintenance, ensuring reliability and safety.
The remainder of the paper is structured as follows: Section 2 provides an overview of the deep learning architectures used in this study, namely LSTM and AE networks.
Section 3 presents the proposed model in detail. Section 4 describes the experimental setup, including the PHM10 dataset and its preprocessing.
Section 5 outlines the model training process and RUL estimation algorithm. Section 6 evaluates the model’s performance using experimental results. Finally, Section 7 concludes the paper with key insights and future directions.
Theoretical methodology
∙ Basic LSTM architecture
Recurrent Neural Networks (RNNs) are designed to retain the state of previous cells, making them well-suited for handling sequence-based or time-dependent datasets. During training, the hidden state is updated by combining the state from the preceding time step with the current input, passed through an activation function.
While RNNs are capable of learning both long-term and short-term dependencies in sequential data, they face challenges such as gradient vanishing and exploding problems.
To overcome these limitations, advanced variants like Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) networks were introduced.
LSTM networks are specifically designed to capture long-term dependencies through a memory mechanism regulated by a system of gates that control information flow.
An LSTM network consists of a chain of LSTM cells, as illustrated in Fig. 1. Unlike conventional RNNs, LSTM incorporates four network layers (depicted as rectangles in Fig. 1) and uses three types of control gates—forget gate, input gate, and output gate—that work together to manage the memory.
These three gates function as follows:
(1)Forget Gate – ft determines which parts of the previous cell state should be discarded.
(2)Input Gate – it determines which parts of the current input should be used to update the cell state.
(3)Output Gate – selects which information from the cell state will be passed to the output.

In LSTM, the hidden state ht is updated at each time step t according to the following equations Eq. (1) [46] :
x t is the current data at the same time step, h t-1 is the preceding instant hidden state, the input it, the forget gate ft, 5-the output gate ot and a memory cell ct. Where parameters including all ∈ℝd×k, ∈ℝd×d and ∈ℝd are all learned during network training and are common to all time steps. The last equation gives the hidden layer function ℍ.
The symbol 𝜎(⋅) denotes the logistic activation function, while ⊙ represents the element-wise (Hadamard) product. The parameter 𝑘 is a hyperparameter that defines the dimensionality of the hidden vector.
∙ Auto-Encoder
An Autoencoder is a type of self-supervised learning model designed to learn a compact representation of input data. It has demonstrated strong performance in tasks such as denoising, fault diagnosis, and feature extraction. As illustrated in Fig. 2, an Autoencoder (AE) consists of two main parts: an encoder and a decoder.
The encoder compresses the input data into a latent feature representation, while the decoder takes this encoded information and reconstructs it to approximate the original input.
The input and output layers of the AE have the same number of neurons to ensure accurate reconstruction. For a given input 𝑥∈𝑅𝑚, the operations of the encoder and decoder can be mathematically described as follows [47] :

Here, 𝜎, the sigmoid activation function is applied, and ℎ represents the hidden layer output obtained after the encoding transformation. To minimize the error between the input and the reconstructed output, gradient descent is employed, and the corresponding loss function is defined as follows:
where is the data reconstructed by the decoder.
Methodology
∙ Research motivation
Time-series data involves more complex interrelationships between features compared to single-variable datasets, making feature engineering a challenging task.
While LSTM can effectively capture the dynamic and nonlinear interactions among parameters, the AE neural network excels in learning deep nonlinear relationships between time-series variables.
The LSTM-AE architecture combines the strengths of both LSTM and AE networks. In this model, input data is encoded into fixed-length vectors and then decoded into the desired sequences.
By integrating these two networks, the model can capture the dynamic properties of the input data, making it particularly suitable for industrial processes, such as those involving datasets like PHM10.
∙ Model construction
An LSTM-AE model is a hybrid structure that combines LSTM and AE components. It consists of two main parts: an encoder and a decoder, as shown in Fig. 3. The LSTM encoder includes multiple layers of concatenated LSTM cells, which capture long-term dependencies between input features.
The output of the encoding part is an encoded vector, which is then repeated a number of times equal to the timesteps of the LSTM by a RepeatVector component. This repeated vector is fed into the decoder LSTM, which is structured with layers of LSTMs in the reverse order of the encoder, followed by a dense layer and a Timedistributed function to reconstruct the features.
The model’s performance is evaluated based on its ability to reconstruct the input pattern. Once the model achieves satisfactory performance in reproducing the input data, the decoder component may be removed.
In this case, after the model is trained, the reconstruction portion can be discarded, and the model can be used solely for prediction tasks.
While the stacked LSTM is responsible for the prediction process, the ensemble model enhances the accuracy of wear prediction due to its ability to capture long-term dependencies and the most representative features in the encoder section.

Experimental study
The general framework of the model is illustrated in Fig. 4. It consists of several stages, including feature extraction, selection, model training, and tool wear prediction.
The next section provides a description of the dataset, followed by a discussion of feature engineering, data analysis, and feature selection. Model training and RUL estimation are presented thereafter.

∙ Data description
We used the dataset from the PHM10 data competition, which contains data from high-speed CNC milling machine cutters [48]. The dataset includes files for six cutters with three flutes (C1, C2, C3, C4, C5, and C6). Dynamometer, accelerometer, and acoustic emission (AE) sensors were strategically placed to capture data, as shown in Fig. 5.
Data readings from these seven sensors were collected for each of the 315 cuts performed by the cutters, at a rate of 50,000 Hz per channel. Each cutter has 315 independent files corresponding to the 315 runs, with each file structured into seven columns: three-dimensional force (fx, fy, fz), three-dimensional vibration (vx, vy, vz), and AE-RMS (V), representing the acoustic emission signal in RMS value.
The operational features of the high-speed CNC milling machine being examined are listed in Table 1. Data from C1, C4, and C6 are used for training, testing, and validation, as their wear files are included in the dataset. Each run file is divided into 50 frames, resulting in over 15,000 records for each cutter.


∙ Feature engineering
Researchers often install a dynamometer, accelerometer, and microphone on the machine at strategic locations to capture cutting force, vibration, and acoustic emission data. Previous studies have shown that features from both time and frequency domains can accurately assess tool wear status. In [49] , features extracted from force and vibration sensor readings in different domains were used to detect virtual tool wear.
A health index (HI) was developed in [50] to monitor the tool’s condition using time-frequency domain features from all sensors, extracted through wavelet packet decomposition. Wu et al. [51] also extracted features from various domains and analyzed the correlation between these features using the Pearson Correlation Coefficient (PCC).
The selected features were then input into an Adaptive Neuro-Fuzzy Inference System (ANFIS) for Remaining Useful Life (RUL) prediction of machining tools. PCC was also utilized in [52] for selecting key features for tool wear regression and RUL estimation.
Through a literature review, it becomes clear that multi-domain feature extraction is a crucial step in feature engineering. In our study, we included additional features, such as the interquartile range (IQR) and entropy, in various domains.
IQR is a valuable measure of data spread, helping to identify outliers and the skewness of the dataset. Another important feature is entropy, which quantifies the average amount of “information” or “uncertainty” associated with the potential outcomes of a variable. Both IQR and entropy have shown a strong correlation with tool wear.
Feature extraction, as part of exploratory data analysis (EDA), is a key step in the PHM cycle. These features act as condition indicators that reflect the machinery’s health.
Features can be combined from different domains to form a health index (HI) that expresses the degradation process. Redundant data can introduce noise, negatively affecting model performance. To address this, features with monotonic, trendable, and predictable behaviors should be selected, as discussed below.
We added minimum value, mean value, and IQR to the five-number statistics [53] , along with entropy from the dataset. IQR, which was integrated into the features extracted from the review, is an effective measure of data spread and can identify outliers and skewness.
Entropy, which measures the average level of “information” or “uncertainty” in the variable’s possible outcomes, was studied in both time and frequency domains. We extracted the multi-domain features shown in Table 2, including entropy and IQR, using the TSFEL library [54] .
These features were extracted for the seven signals (fx, fy, fz, vx, vy, vz, and AE) from each cutter, and their correlation with tool wear was analyzed to construct the health index for accurate tool wear prediction. Fig. 6 illustrates the extracted features for sensor fx across the entire lifecycle, as an example of the features obtained from RTF.


∙ Data analysis and feature selection
After feature extraction, data analysis is performed to select features that are strongly correlated with wear values, as these serve as health indicators for predicting the target wear value. The Pearson correlation coefficient measures the strength of the relationship between two parameters, indicating how one parameter is affected by changes in the other.
For example, a correlation value of −0.1 indicates a weak negative relationship between variables X and Y, whereas a correlation coefficient of −0.9 would suggest a strong negative relationship.
Principal Component Analysis (PCA) is a technique used to reduce the dimensionality of the dataset, helping to condense the feature space while still effectively representing the data. Many researchers have applied PCA for dataset reduction to express the target variable [55, 56] .
In [57], it was found that about 24 features could explain approximately 98% of the original dataset’s variance. The data is first standardized using the standard scalar formula expressed in Eq. 2, where xmean is the mean value and σ is the standard deviation.
The correlation between features from different signals and the target variable is evaluated using Pearson Correlation Coefficient (PCC) to identify the most relevant features for wear prediction. As shown in Fig. 7, the heatmap illustrates the correlation between wear and various features, helping to select those with the highest correlation.
It is evident that the additional features proposed in this framework, such as entropy and IQR, are significantly correlated with the wear value.
This analysis was also performed for other sensors, such as fy and fz, with similar results regarding the selected features. Some features exhibited inter-correlation, which was taken into account when selecting the feature space.
We reduced the feature space to 6 features per signal, keeping the most correlated ones with the wear signal. Since the Acoustic Emission signal showed a low correlation with the target wear value, it was excluded from the feature space used for training the model.

Where xmean is the mean value and σ is the standard deviation value.
Model training
During the training process, data from C1 and C6 are used for training and validation, while C4 data is utilized for testing the network. A vanilla LSTM model, implemented with a TensorFlow Keras backend, is trained for tool wear prediction and RUL (Remaining Useful Life) estimation.
The model is trained for 700 epochs, with two LSTM layers, two dropout layers, and one dense layer featuring a ReLU activation function. To reduce the time per epoch to 12 seconds, training is conducted using the Google Colab T4 GPU backend.
The tool wear value is predicted, and based on the wear limit, the maximum number of safe cuts (RUL) can be estimated. The trained network parameters are shown in Table 3, and the model’s architecture is illustrated in Fig. 8.


∙ RUL estimation
RUL (Remaining Useful Life) refers to the number of cuts a cutter tool can safely make before it reaches complete failure. Using run-to-failure (RTF) data, which spans from the beginning to the point of failure, as shown in Fig. 9, the predicted tool wear values can be combined with the RTF data for RUL estimation, using the algorithm outlined in Table 4.
During the study, it was observed that the first and last few runs contained noisy measurements, so these were excluded, and the wear limit was set to run #300. The PHM Society provided the dataset used in this study as part of the PHM10 competition.


The leaderboard for this competition is available in [58] . After predicting tool wear, the RTF data can be used to estimate the RUL, which represents the number of cuts the tool can safely make before failure, as illustrated in Fig. 10.
The scoring function used to evaluate the proposed model, which was also employed in the PHM10 competition, is given by Eq. (3). This scoring formula was introduced in [59] . The score achieved by our RUL estimation algorithm was reduced to approximately 40, while the winning score in the PHM10 competition was 5500.

Here, m represents the total number of data points, and di denotes the difference between the predicted RUL (RULˊi) and the actual RUL for each cycle.
The lower the value of this score function, the better the prediction algorithm’s performance. The accuracy of the RUL estimation reaches approximately 99%, with a low score value comparable to the leaderboard results of this competition.
Experimental results
∙ Performance evaluation criteria
Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are the performance metrics used to evaluate the overall efficiency of the proposed LSTM-AE tool wear prediction model. The mathematical formulas for these metrics are provided in Table 5.
Fig. 11 illustrates the predicted and actual tool wear values for C4, which was used as the test set. After testing the method with the C4 data, the wear curve for other tools or cutters can be forecasted, allowing for RUL estimation of these cutters.
The model can then be applied to estimate the RUL for other cutters as well. The prediction error decreases as the predicted wear values get closer to the actual values, demonstrating that the LSTM-AE model performs effectively within the dimensional criteria used for performance assessment.


∙ Method comparison
Fig. 12 displays the MAE and RMSE values for cutter wear across various methods. The LSTM-AE-based approach is compared with other methods to highlight its effectiveness and advancement.
Conventional intelligence techniques, such as SVR and SVR+KPCA [49], were specifically used for forecasting tool wear. In addition, deep learning methods, including RNN, LSTM, CBLSTM [30], and CNN [60] , were also considered for performance comparison. The results of these comparisons are presented in Table 6.
As shown in Table 6, the SVR algorithm performs poorly when addressing large-scale nonlinear regression problems. Although deep learning techniques like RNN, LSTM, CBLSTM, and CNN can handle nonlinear regression without reducing the feature space, their predicted accuracy still falls short compared to the proposed framework.
Fig. 12. Method Comparison between our LSTM-AE and earlier methods.
Proposed mehod | MAE(10–3 mm) | RMSE (10–3 mm) |
---|---|---|
SVR+PCA [49] | 3.9583±0.9371 | 5.4428±1.5894 |
RNN [30] | 12.1667±6.2292 | 15.7333±6.2164 |
SVR [49] | 9.3770±2.0422 | 11.968±3.3337 |
LSTM [30] | 10.7333±3.8734 | 13.7333±4.5742 |
CBLSTM [30] | 7.2333±1.0263 | 9.2333±1.9140 |
CNN [60] | 11.0000±1.3000 | 14.0428±5.5588 |
Proposed (LSTM-AE) | 2.6 ± 0.3222 | 3.1 ± 0.6146 |
Table 6. Result Comparison.
∙ Result discussion
Fig. 13 presents a comparison between the predicted and actual tool wear values for various cutter cuts throughout their life cycle. The figure highlights the discrepancies between these two values in the shaded areas. Some sections show underestimation, while others show overestimation.
Underestimation occurs when the predicted values are lower than the actual values, resulting in a better performance of the score function. Overestimation is seen when the predicted values exceed the actual tool wear curve. Referring back to Fig. 11, the overall predicted curve tends to underestimate the expected values across the entire curve.
This indicates that the proposed framework’s efficiency in predicting the RUL is lower than the actual values, potentially leading to earlier scheduling of maintenance or component replacement before failure occurs.

Fig. 14 illustrates the score function [59] . A notable feature of this function is that, in the case of underestimation, the penalty remains nearly constant, meaning the error is lower and the performance is better. This characteristic is evident in the results for the proposed model, as shown in Fig. 15.


∙ Model generalization
The model can be applied to any dataset containing sensor data or a feature map that requires reduction, or even any feature map aimed at predicting the target variable with high accuracy.
If there are specific operating conditions, these can be treated as part of the feature map, or the values of these conditions can be analyzed across different cases to add new columns or features to the dataset.
However, when dealing with very large datasets, the model may encounter computational complexities and redundant information, which could degrade its performance.
This issue can be addressed by incorporating a weighting mechanism for extremely long datasets. This mechanism assigns different weights based on the relevance of the information.
Conclusion
TCM plays a crucial role in industrial processes by focusing on prevention rather than failure detection, ultimately reducing replacement costs and saving lives. Time-series data often involve multiple variables with complex interconnections and dynamic features, which makes traditional TCM particularly challenging, especially when dealing with large sensor datasets. However, deep learning networks such as LSTM and AE show great promise in handling the complexity of these datasets.
As a result, a hybrid deep learning model, LSTM-AE, was developed using an encoder LSTM and an LSTM decoder for tool wear prediction. This model was evaluated for predicting tool wear in CNC milling machine cutters, using the PHM10 original dataset with data from C1, C4, and C6 cutters for training, testing, and validation.
The overall framework consists of several steps: 1) extracting features from different domains of the raw sensor data, 2) performing Pearson correlation coefficient (PCC) analysis to create a feature map that is then input to the LSTM model, and 3) feeding the selected features into the model to predict the target wear.
The study emphasized extracting key features from cutter data, including new ones like IQR and entropy. These features demonstrated a strong correlation with tool wear using PCC, which helped improve model performance.
The proposed framework outperforms previous methods in terms of tool wear prediction accuracy, achieving 98%. It also shows a significant improvement in RMSE and MAE for the test set compared to earlier techniques. By utilizing both the wear threshold and the tool’s degradation curve, the expected wear value is used to estimate RUL, ensuring that RUL values are mostly underestimated, preventing machine failure before it occurs.
The model can be adapted to any dataset with sensor data or feature maps that require reduction, or any feature map aimed at predicting the target variable with high accuracy. However, the model may face computational complexities and redundant data when dealing with very long datasets, which can reduce its performance.
This issue can be addressed by introducing a weighting mechanism for long datasets, assigning different weights based on the value of the information. This approach could open new avenues for research and applications with other datasets. Future research may also focus on developing a multi-objective optimization algorithm to best select the model’s hyperparameters.