CT Image Detection of Pulmonary Tuberculosis Based on the Improved Strategy YOLOv5

The diagnosis of pulmonary tuberculosis is a complicated process with a long wait. According to the WS 288-2017 standard, PTB can be divided into five types of imaging. To date, no relevant studies on PTB CT images based on the Yolov5 algorithm have been retrieved. To develop an improved strategy YOLOv5, for the classification of PTB lesions based on whole, CT slices were combined with three other modules. CT slices of PTB collected from hospitals were set as training, verification, and external test sets. It is compared with YOLOv5, SSD and RetinaNet neural network methods. The values of precision, recall, MAP, and F1-score of the improved strategy YOLOv5 for the external test were 0.707, 0.716, 0.715, and 0.71. In this study, based on the same dataset, the improved strategy YOLOv5 model has better results than other networks. Our method provides an effective method for the timely detection of PTB.


INTRODUCTION
Tuberculosis (TB) is an infectious disease caused by mycobacterium tuberculosis that is one of the leading causes of human health issues and death (Elsayed et al., 2021).As defined in the imaging classification of the PTB industry diagnostic standard of the People's Republic of China "WS 288-2017", PTB is the most common type of tuberculosis and occurs mostly in adults.The lesions tend to be mostly located in the superior lobe apicoposterior segment and lower lobe dorsal segment.Etiological examination first requires qualified test specimens, and sputum smear sensitivity is low (Steingart et al., 2006;Zheng, R. et al., 2022).Mycobacterium culture is the reference standard (Yan et al., 2017), but it takes a long time, approximately eight weeks, and requires effective laboratory quality control of specimen contamination rates.Histopathological examination is an invasive examination that involves obtaining tissue samples through percutaneous lung puncture or bronchoscopic biopsy for pathological diagnosis (Badr et al., 2022).
Chest CT imaging provides an additional diagnostic modality that is often used in clinical practice.The lesions of PTB are very complex and diverse (Iliyasu et al., 2018;Wang et al., 2022), and CT signs of active PTB include cavitation, pulmonary nodules, tree-in-bud signs, and consolidation (patchy proliferative lesions) (Wetscherek et al., 2022).These signs have clinical diagnostic value.Rapid and accurate diagnosis of TB remains a challenge for clinicians.However, all these diagnostic tests were evaluated by professional radiologists who must expend considerable time and effort to make accurate diagnostic decisions in daily work.This approach may not be suitable for real-time screening, and there is high variability between and within observers (Nel et al., 2022;Owais et al., 2020).
There are many methods for classifying diseases and segmenting images.Some scholars have studied the multi-feature fusion model of image classification using denoising convolutional neural networks and attention mechanisms (Zhang et al., 2023).The classification of breast cancer was based on the improved whale optimization algorithm and compared with other methods (Devi et al., 2023).Based on the interactive medical image segmentation framework of optimized swarm intelligence and convolutional neural networks, a method combining convolutional neural networks and swarm intelligence was proposed to optimally identify the required regions (Kaushal et al., 2022).However, a segmentation framework based on swarm intelligence and the grasshopper optimization algorithm (GOA) was used to successfully carry out feature extraction.The defect is that only three lesion images were trained (Thapar et al., 2022).
Object detection is the foundation of artificial intelligence.It was first proposed by Wax in 1955 (Wax et al., 1955).In recent years, deep learning has also applied some algorithms to object detection.YOLO (You Only Look Once) is an object recognition and location algorithm based on a deep learning neural network (Shelatkar et al., 2022).The most important features of the system are its fast running speed and small size, which can be used for real-time system monitoring (Baccouche et al., 2022;Luo et al., 2021).Its core idea was to use the whole image as the input terminal, and features were extracted through the network.The local features of CT images were analyzed by the classifier, and the position and category of the detected target regression boundary box were output through the output layer.YOLOv5 is a single-stage object recognition algorithm (Huang et al., 2022).It has adopted the best optimization strategy in the field of convolutional neural networks (CNNs) in recent years.YOLOv5 can run on most normal computers in hospitals.The greatest advantage was that the calculation results were very good.
In this study, the authors intend to innovatively develop an automatic detection network based on the YOLOv5 algorithm with NAM (Wang et al., 2023), SCYLLA-IoU (SIoU) loss function (Zheng, J. et al., 2022), and data augmentation, which the authors call the improved strategy YOLOv5, to quickly identify and classify lesions with PTB in CT images.The authors illustrate the methodology.They also show the results of each algorithm in detail.The discussion based on the performance is illustrated in a subsequent section, and a conclusion is provided.

MeTHODOLOGY Study Participants
This is a multicenter, retrospective study.The clinical information of each patient, including age, gender, and laboratory indexes (etiological or pathological findings were used as the reference standard for diagnosis), was retrieved from the internal hospital and recorded.A total of 3,015 CT slices of 131 patients from hospitals A and B were used as the training set and validation set (8:2) to construct the YOLOv5 PTB lesion classification model.A total of 825 CT slices from 83 patients from hospital C were used as an external test set to examine the model's efficacy in differentiating four types of PTB.The inclusion criteria were (a) in line with WS 288-2017 and satisfying etiological or pathological criteria as the basis for diagnosis, (b) a single CT scan without repetition, and (c) CT imaging characteristics consistent with PTB.The exclusion criteria were (a) an uncertain diagnosis of PTB, (b) patients with other lung diseases (including other infectious diseases, tumors, or interstitial lesions), (c) previous surgical pulmonary resection, (d) respiratory motion and metal artifacts on chest CT, (e) HIV and other immunodeficiency infections, and (f) no four types of lesions.The complete flowcharts of the data collection are shown in Figure 1.

CT Image Acquisition Technique
All patients in this study were scanned using spiral CT scanners (Optima GE CT 680, America; GE revolution CT 128, America; GE Bright Speed Elite 16 CT, America; Lightspeed VCT; GE Discovery CT 64 and SOMATOM CT).Patients assumed a supine position and were instructed to inhale maximally and hold their breath to ensure the accuracy of the data.Scanning was performed from the tip of the lung to the posterior costophrenic angle.The scanning parameters were as follows: tube voltage 120 kV, automatic tube current modulation, time of rotation 0.5 s/r, FOV was a fixed value adjusted according to the patientslation, and lung algorithm (slice thickness of 5 mm/1.25 mm/0.625 mm).

Image Qualitative Analysis and Dataset Annotation
The presence or absence of the above four types of lesions in CT images was recorded by two radiologists in a blinded manner.The statistician calculated the statistics and entered them into the form.The inconsistent evaluation of the two radiologists was submitted to the chief physician for final evaluation and recording of the results.Unnecessary interference images were screened and removed, standardizing the quality of lesion images.A unified standard for professional labeling of PTB lesions was developed by the expert group of TB imaging diagnosis.Label numbers for inclusion in the dataset were recorded blindly by two radiologists.All images are exported to the format needed to train the neural network, and the original CT slice size remains unchanged.Labeling software was used.Then, the statistician counted and confirmed the label consistency and recorded the lesions with inconsistent labels.A review expert with the title of deputy director reviewed and finally determined the label to be used.All the data will be uniformly numbered and then entered into the database for unified management.One radiologist independently labeled the data, and another radiologist independently reviewed and revised the data.Finally, all manually labeled images were collected and reviewed by a chief physician.

The Improved Strategy YOLOv5
In this study, the pretraining model based on the training of the LUNA16 (Setio et al., 2017) public dataset was first used for training to obtain the initial weight of the network, which helped the YOLOv5 network to better train and improve its overall performance.Based on YOLOv5, the other three modules are NAM, SIoU loss function, and data augmentation to establish the improved strategy YOLOv5 model.NAM, based on Convolutional Block Attention Module (CBAM) architecture (Woo et al., 2018) and optimized for channel and spatial attention.The SIoU module further considers the angular relationship between regression frames, which can effectively reduce the total freedom of loss and greatly improve the convergence speed and the effect of the model.The multiscale packet data augmentation module for dataset expansion further prevents network overfitting and improves the robustness of the network.
PTB lesions were manually annotated, and 3,015 CT slices of 131 patients from hospitals A and B were randomly divided into training and validation sets at a ratio of 8:2 to extract and learn the detected lesion features.After data augmentation, the training set was expanded to 20,867 images, and the number of lesions was kept as balanced as possible.Before the training model was implemented on the validation set, the five-fold cross-validation method was used to help determine the relevant parameters of the training set while avoiding the overfitting problem.A total of 825 CT slices of 83 patients from hospital C were included as an external test set.The improved strategy YOLOv5 is shown in Figure 2. The complete training, validation and test process of this study is shown in Figure 3.

Statistical Analysis
Python 3.8.5 was used to draw receiver operator characteristic (ROC) curves, which were used to calculate the areas under the ROC curve (AUCs).The authors individually calculated precision (positive predictive value), recall (sensitivity), mean average precision (MAP), and F1-score.MAP

ReSULTS
Using 8:2 random assignment, the training dataset consisted of 2,412 CT slices, the validation dataset had 603 CT slices, and the test dataset had 825 CT slices.In this study, training set models were obtained after five-fold cross-training was performed to further optimize the model.The external test set contained 825 CT slices.The distribution and total number of training, data augmentation, and validation sets for the four lesions are shown in Table 1.The precision, recall, MAP, and F1 score of the improved strategy YOLOv5 validation set were 0.886, 0.875, 0.927, and 0.88, respectively, and the specific ROC curves and the confusion matrix of the training set model are shown in Figure 4.
The values of precision, recall, MAP, and F1 score of the improved strategy YOLOv5 external test were 0.707, 0.716, 0.715, and 0.71, respectively, and the specific ROC curves and the confusion matrix of the external test are shown in Figure 5.
To evaluate the performance of their improved strategy YOLOv5 model, the authors compared the results obtained from training the same dataset with YOLOv5, SSD (Souaidi et al., 2023), and RetinaNet (Cai et al., 2021) for the detection of lung lesions.The comparison of the results of internal validation and external testing among YOLOv5, SSD, RetinaNet, and the improved strategy YOLOv5 are shown in Table 2.
The results show that the improved strategy YOLOv5 is superior to the classical networks in terms of performance.An object detection comparison between the results of each neural network and the annotation results of radiologists is shown in Figure 6.

DISCUSSION
This study explored the classification of CT images of PTB lesions.Four types of PTB lesions (cavitation, tree-in-bud sign, pulmonary nodules, and consolidations) were identified through the references.Then, the authors went one step further and established a model for the real-time detection of multiclass lesions of PTB based on the improved strategy YOLOv5.Due to the difficulty of collecting the original medical dataset, the small sample size, and the unbalanced lesions, this study innovatively proposed a multiscale data augmentation method that significantly increased the number of lesions and further improved the generalization ability to handle more complex classification tasks effectively and robustness of this neural network.The attention mechanism and loss function are optimized to the YOLOv5 model to further enhance the classification performance of the network for PTB lesions.The precision, recall, MAP, and F1 score of the improved strategy YOLOv5 validation set were 0.886, 0.875, 0.927, and 0.88, respectively.The precision, recall, MAP, and F1-scores on the improved strategy YOLOv5 external test set were   According to "WS288-2017", PTB can be divided into the following five categories based on imaging: (a) Primary PTB, (b) hematogenous disseminated PTB, (c) secondary PTB, (d) tracheal and bronchial TB, and (e) tuberculous pleurisy.The incidence of secondary PTB was the most common.TB is a serious health problem with a high mortality rate.TB can be completely cured with early diagnosis (Simi Margarat et al., 2022).This study was designed based on CT images of secondary PTB (the most common form of PTB in adults) and included cavitation, tree-in-buds, consolidations, and nodules, and this conclusion was consistent with other studies, especially in active PTB.Therefore, this study focuses on the four types of PTB lesions, and the improved strategy YOLOv5 model based on the other three modules was constructed to explore the intrinsic features of the lesions and quickly classify them to provide another effective auxiliary means for simplifying complex clinical work.
PTB lesions are very heterogeneous in size, and it is well known that poor detection of small objects is a typical problem in object detection.It was reported that the YOLOv5 model has higher accuracy for larger objects, which may be because larger lesions contain more features and are therefore easier to detect (Ku et al., 2022).In the study results, the external test results of tree-in-buds were relatively low, which also indicated that YOLOv5 had defects in the recognition of such lesions, and it was necessary to continuously seek solutions for small objects.
According to previous studies, the use of YOLOv5 in the detection of different lesions has promising results.In a fine diagnosis study of brain tumors based on YOLOv5 by Tejas Shelatkar et al., the precision reached 88%, and the results showed that the model could successfully detect brain tumors (Shelatkar et al. 2022).Asma Baccouche et al. used YOLOv5 technology for the early detection and classification of four types of lesions on mammograms, with the highest rates of 93% ± 0.118 for mass lesions, 88% ± 0.09 for calcification lesions (Baccouche et al. 2022), and 95% ± 0.06 for architectural distortion lesions.Yiji Ku et al. proposed building a system with a multiclass detection model based on YOLOv5 to detect multicategory lesions synchronously in real time (Ku et al. 2022).Compared to previous studies, our objectives were multiclassification, and most of the morphological manifestations were irregular, with multiple lesions and a wide distribution range, which were relatively difficult to identify.This is the first study to use the improved strategy YOLOv5 for object detection of PTB lesions.The model we developed can provide radiologists with more information about the lesions to make a correct diagnosis in a short time (Ku et al. 2022).
This study has several limitations.First, there was a small sample size of patients enrolled, and the data selection may have been biased.Second, this was a retrospective study.At present, the precise diagnosis of PTB still requires a combination of various means, which is time-consuming.Third, YOLOv5 had some defects in the detection of small lesions.Based on the results of this study, the detection result of small lesions was slightly unsatisfactory.Therefore, the improvement of the detection algorithm based on small lesions could be the focus of future research.In addition, CT images for prospective external testing of suspected pulmonary tuberculosis are lacking.

CONCLUSION
The authors performed improved training on YOLOv5 to optimize the test results and meet the needs of auxiliary clinical diagnosis.To the best of the authors' knowledge, classification studies of PTB lesions based on YOLOv5 are currently very rare.The highlight of this work is that it is based on more complex datasets; uses a pretrained model based on the LUNA16 common dataset; combines the NAM, SIoU loss function, and data augmentation modules; and improves the generalization ability of the network.The results indicate that the authors' improved strategy YOLOv5 outperforms current classical networks in terms of detection performance.In other words, the accuracy of the detection and classification of PTB lesions has been improved.Overall, it demonstrated better performance in assisting doctors in screening PTB lesions from pulmonary CT images.

Figure 1 .
Figure 1.Complete flowcharts of the dataset collection

Figure 4 .
Figure 4.The ROC curves of recall, precision, MAP, and F1 score (A, B, C, D) of the improved strategy YOLOv5 validation set.confusion matrix(E) of the improved strategy YOLOv5 validation set.Nidus 1, 2, 3, and 4 Refer to sonsolidation, tree-in-bud sign, cavitation, and pulmonary nodules, respectively.

Figure 5 .
Figure 5.The ROC curves of recall, precision, MAP, and F1 score (A, B, C, D) of the improved strategy YOLOv5 external test.The confusion_matrix(E) of the external test.Nidus 1, 2, 3, and 4 refer to consolidation, tree-in-bud sign, cavitation, and pulmonary nodules, respectively.

Figure 6 .
Figure 6.Compared with the results of object detection and annotation, the recognition result of the improved strategy YOLOv5 Is equivalent to that of manual annotation.(A) radiologists, (B) YOLOv5, (C) the improved strategy YOLOv5.

Table 2 . Comparison of the Results of Internal Validation and External Testing Between SSD, RetinaNet, YOLOv5, and the improved strategy YOLOv5
0.707, 0.716, 0.715, and 0.71, respectively.The values of precision, recall, MAP, and F1-score of YOLOv5 for the external test were 0.607, 0.628, 0.608, and 0.61, respectively.The precision, recall, MAP, and F1 score of the SSD external set were 0.574, 0.245, 0.343, and 0.34, respectively.The precision, recall, MAP, and F1 score of the RetinaNet external test were 0.618, 0.495, 0.508, and 0.55, respectively.These results showed strong classification advantages of the improved strategy YOLOv5 model.