If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
The sensitivity of endoscopy in diagnosing chronic atrophic gastritis is only 42%, and multipoint biopsy, despite being more accurate, is not always available.
Aims
This study aimed to construct a convolutional neural network to improve the diagnostic rate of chronic atrophic gastritis.
Methods
We collected 5470 images of the gastric antrums of 1699 patients and labeled them with their pathological findings. Of these, 3042 images depicted atrophic gastritis and 2428 did not. We designed and trained a convolutional neural network-chronic atrophic gastritis model to diagnose atrophic gastritis accurately, verified by five-fold cross-validation. Moreover, the diagnoses of the deep learning model were compared with those of three experts.
Results
The diagnostic accuracy, sensitivity, and specificity of the convolutional neural network-chronic atrophic gastritis model in diagnosing atrophic gastritis were 0.942, 0.945, and 0.940, respectively, which were higher than those of the experts. The detection rates of mild, moderate, and severe atrophic gastritis were 93%, 95%, and 99%, respectively.
Conclusion
Chronic atrophic gastritis could be diagnosed by gastroscopic images using the convolutional neural network-chronic atrophic gastritis model. This may greatly reduce the burden on endoscopy physicians, simplify diagnostic routines, and reduce costs for doctors and patients.
]. Treatment aimed at eradicating Helicobacter pylori can significantly improve atrophy in atrophic gastritis, thus reducing the risk of gastric cancer in patients with chronic atrophic gastritis (CAG) [
In clinical endoscopy, whether patients will undergo endoscopic biopsy depends on the experience of endoscopists. If the doctor thinks that the patient’s gastric mucosa is atrophic, he/she performs endoscopic biopsy to obtain clear findings. Otherwise, endoscopic biopsy is not required. However, the sensitivity of endoscopy in diagnosing atrophic gastritis is 42% [
], which leads to a high rate of missed diagnosis of CAG. Furthermore, the accuracy of pathological biopsy largely depends on the experience of the doctor in choosing the location and depth of the mucosa for biopsy. The New Sydney Classification of Chronic Gastritis guidelines require at least five biopsies for gastroscopy [
]. Multipoint biopsy increases gastric trauma and risk of bleeding, and it cannot be performed if the patient is taking medications like aspirin. Furthermore, biopsies are costly and time-consuming. Recently, both chromoendoscopy combined with magnifying endoscopy and confocal laser microscopy have been important tools for the diagnosis and differential diagnosis of chronic gastritis [
]. However, these advanced endoscopy techniques only provide images of the gastrointestinal tract mucosal surface. Therefore, the diagnostic accuracy still depends on the standardized operations of experienced endoscopists and the accumulation of a large volume of pathological mucosa [
With the advances in computing power and access to big data, convolutional neural network (CNN) has emerged as a promising tool for medical imaging. Compared with data mining, machine learning focuses more on the design of algorithms [
], enabling computers to “learn” rules from data and use existing records to extrapolate unknown data. CNN with excellent performance in image recognition is currently a hot topic in the field of machine learning. Deep learning has achieved surprising results in the identification and diagnosis of various ophthalmic diseases [
Machine learning, represented by CNN, has demonstrated outstanding performance in the fields of image recognition, image segmentation, and image understanding. It has developed swiftly in the medical and health fields, wherein various medical imaging data can be used to train models. Generally, the CNN is composed of a convolutional layer, pooling layer, and fully connected layer. The convolutional layer extracts different types of feature information by sliding the convolution kernels over the image. Each convolution kernel is essentially a signal filter that is used to extract a specific type of feature. Through the combination of multiple convolutional layers and nonlinear activation functions, complex features (the combination of multi-view features) and descriptors including sufficient information are obtained. The common pooling operations include maximum pooling and averaging pooling, which use the mean or maximum value of a region to represent this region so that the dimension is reduced to facilitate downstream computation. Research shows that maximum pooling is better for obtaining texture information, and average pooling is better for obtaining background information [
Recently, researchers have designed many new network structures and models such as VGG, Inception, Residual, and Skip Connection. The architecture of CNN evolves by adjusting the convolution kernel size and convolutional layer connection mode, which makes the CNN not only deeper but also wider. Moreover, the deeper the network, the stronger the ability of the CNN to extract features. Many related technologies have been concurrently proposed to improve the performance of CNN, including dropout, data augmentation, batch normalization, label smooth, and various types of activation functions.
Skip connection is proven to be an important structure that greatly improves the training process of a deep learning model [
]. By connecting the shallow structure to a deeper structure with skip-layer connections, gradient disappearance in error backpropagation can be avoided; consequently, more abstract and abundant information is extracted. The loss of key information in forwarding propagation is reduced by the reuse feature, which improves the recognition of micro-targets and micro-features. Skip connections mainly include residual structures in ResNet [
Gastroscopy is necessary for medical professionals to detect patients’ stomach diseases. In long-term medical practice, hospitals accumulate a large number of gastroscopic images that provide abundant data for the computer-aided diagnosis of CAG, giving physicians a chance to study gastroscopic images via deep learning. Researchers have begun to use deep learning algorithms to detect diseases such as gastric cancer, gastric polyps, and erosion. Billah et al. [
] used capsule endoscopy to record endoscopic videos in the gastrointestinal tract and then used CNN and color wavelet features to recognize gastrointestinal polyps; the accuracy rate was 98.65%, which was better than that of previous research methods based on color, location, and textural features such as color histograms and local binary patterns [
Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model.
] used a deep learning model to diagnose colorectal polyps and adenomas from endoscopic video frames and achieved high recognition accuracy. Urban et al. [
] used a residual network combined with transfer learning to discern gastric cancer, ulcers, and normal gastric images with 90% accuracy. Compared with the morphological characteristics of polyps and tumors, those of mucosal atrophy are subtle and difficult to observe, making computer-aided diagnosis of atrophic gastritis challenging. Pathological images of atrophic gastritis were classified and diagnosed using deep learning technology [
]. In 2005, Lahner et al. used clinical and biological variables, and artificial neural networks and linear destructive analysis to build a decision support system to diagnose atrophic gastritis without endoscopy. This system achieved a high accuracy rate of 100% [
Possible contribution of artificial neural networks and linear discriminant analysis in recognition of patients with suspected atrophic body gastritis.
]. However, CAG is closely related to gene expression and eating habits. Especially, in East Asia and Europe, the characteristics of CAG are very different [
], and thus, this method has not been widely used in East Asia. Patients with atrophic gastritis have no specific symptoms, even no obvious complaints, and in East Asia, gastroscopy and pathological biopsy are still the main methods for the diagnosis of gastric diseases. Thus, gastroscopy remains an essential diagnostic modality. However, a gap remains in the computer-aided screening of CAG using gastroscopy. Therefore, it is important to continuously develop and use deep learning to assist in the screening of CAG with gastroscopy. This study aimed to construct a convolutional neural network (CNN) to improve the diagnostic rate of CAG. As expected, the CNN-CAG model will be promising in drastically reducing the burden on endoscopy physicians, simplifying diagnostic routines, and reducing costs for both doctors and patients.
2. Materials and methods
2.1 Datasets and preprocessing
The study was approved by the Ethics Committee of Shanxi Provincial People’s Hospital, Taiyuan, Shanxi; informed consent was obtained from each patient included in the study. From April 2018 to April 2019, two endoscopists with more than 5 years’ experience in gastroscopy reviewed gastroscopic images from the Gastroscopy Image Database of Shanxi People’s Hospital and selected high-quality images for visual observation. We collected 5470 images of the gastric antrums of 1699 patients and used the pathological findings of these images to label these images. A total of 3042 images depicted atrophic gastritis, including 1458 images of mild atrophic gastritis, 1348 images of moderate atrophic gastritis, 38 images of severe atrophic gastritis, and 198 images of gastric antrums with an unmarked degree of atrophy. Information about the gastroscopic images with different extents of atrophy is shown in Table 1. Additional 2428 images did not show atrophic gastritis. The negative samples used in the study included images depicting chronic inflammation of the gastric mucosa, chronic non-atrophic gastritis, and gastric mucosal active inflammation.
Table 1Image characteristics of atrophic and non-atrophic gastritis in the training and testing datasets.
In total, 70% and 30% of the gastritis images were randomly assigned to the training set and testing set, respectively. Five-fold cross-validation was performed on the training set to ensure the reliability of the model, and the testing set was then used to evaluate the effectiveness of the CNN-CAG. The proportions of images with atrophic gastritis of different severities were consistent between the training and testing sets.
In this study, three different models of gastroscopes were used: Pentax EPK-i5000, Pentax EPK-i7000, and Pentax EPK-i (Tokyo, Japan). The gastric mucosa was observed using the white-light i-Scan mode. The images obtained by different gastroscopes had different formats; therefore, all images were transformed into an uncompressed BMP format. These images had high resolution, and the size of each image was more than 3 Mb.
As atrophic gastritis primarily occurs in the antrum, the images used in this study included the entire gastric antrum. The training set excluded images of poor quality; the exclusion criteria included the presence of a staining artifact created by mucus, poor focus, insufficient contrast, motion-blurring, and gastric cancer. We selected pictures capturing different visual angles of the gastric mucosa in the same position to reduce selection bias. The clinical diagnoses used for labeling were based on the degree of mucosal atrophy according to the Sydney System [
]. The histologic reports regarding these gastric mucosae were also accumulated. If the gastroscopy results of gastritis and the histologic report were inconsistent, the final diagnosis was based on the pathological results. Three gastrointestinal pathologists—three attending staff members and two senior fellow—conducted histologic assessments to determine the type of lesion.
The INPAINT_TELEA algorithm was used to remove information, including age, sex, time, and system, from the upper left, upper right, and lower left regions of the gastroscopic images. This process is shown in Fig. 1. The aim of this step was not only to remove any sensitive patient information but also to avoid any interference of the white watermarking in the analysis of the atrophic gastritis images. Image sizes were unified (512 × 512) using bilinear interpolation. All gastroscopic images were uncompressed, and all information from the images was preserved.
Fig. 1INPAINT_TELEA algorithm for removing information watermarks. (a) Watermarking information, such as age, sex, time, and system, exists in the upper left, upper right, and lower left regions of the gastroscopy image. Some of the watermarking information is located in the black background area of the image, whereas others appear directly in the target area of the image. (b) The watermarking mask contains the upper-left, lower-left, and upper-right parts of the original image. The watermarking information of these areas is consistent on each gastroscopy image. (c) The region contained in the watermarking mask has been repaired, but some watermarking information remains in the upper right corner of the repaired image. This is because the watermarking content of the upper right part of the region is closely related to the patient’s personal information. It changes with the patient’s time of visiting the doctor. Therefore, it is impossible to confirm the repair boundary through a unified watermarking mask.
To make full use of available computing resources, this study used different computing devices. All image preprocessing algorithms were run on a standard computer with a 64-bit Windows 7 operating system and Python lab environment provided by Anaconda 2.5.0. Simultaneously, the SkImage library was used for the watermark removal, fast marching method algorithm and image extension, and flip algorithm. All experiments using deep learning for model training were performed on high-speed workstations with four Titan X GPUs. The workstations were equipped with a 64-bit Ubuntu system with Pytorch 0.4 that provided a framework for quickly implementing deep learning algorithms. The computing resources and computer-aided toolkits met the needs of the experiment.
The deep learning model applied in the experiment was provided by the Pytorch framework, which provides various deep learning algorithms and pre-training models. Each pretrained model is a set of parameters trained on an ImageNet dataset for better results [
Possible contribution of artificial neural networks and linear discriminant analysis in recognition of patients with suspected atrophic body gastritis.
]. Low-level semantic features, such as edge shape and color distribution, were unchanged in different classification tasks. The combinations of these low-level features play a decisive role in object classification. Therefore, we could use transfer learning to fine tune model parameters, avoiding learning from scratch and speeding up the training process [
DenseNet was used to identify CAG lesions in gastric antrum images. The model architecture is shown in Fig. 2. The DenseNet model has made significant improvements to classical CNN through the basic DenseBlock module. Each convolutional layer used the feature map information passed from the upstream convolution layer multiple times. With the help of deeper networks and more complex model structures, the abstracting ability for image features was considerably stronger.
Our specific parameter settings were as follows: the number of convolution kernels of the first convolutional layer was 64, and the growth rate of each Dense Block was 32. The batch size was set to 128, and 2000 iterations were performed. The initial learning rate was set to 0.1, and the attenuation coefficient of learning rate was 0.9. The loss function was a cross-entropy loss function, and the optimization method was stochastic gradient descent.
Physicians with different qualifications, including three experts and two novices, were asked to classify the data of the same testing set, and their results were compared with those of the CNN-CAG model. We calculated the accuracy, sensitivity, and specificity in identifying CAG, which were used to plot the ROC curve, P–R curve, and histogram in Fig. 4. The ROC and P–R curves are comprehensive indicators for specificity and sensitivity [
In the model selection process, we used many well-known deep learning models for the initial training. The effect of different network architectures on identification of CAG is shown in Fig. 3.
Fig. 3Training process of different models at different epochs.
The accuracy of the different networks in the testing set varied greatly. In the training process, the model adopted the random shearing strategy for the original images. In the testing process, the model adopted the central clipping strategy for the original images. Our research showed that the DenseNet network achieved the highest performance in the recognition of CAG. Therefore, considering the accuracy and complexity of the algorithm, the DenseNet 121 network architecture was selected to train the CNN-CAG model.
Fig. 4 shows that the CNN-CAG had a strong classification performance for the recognition of CAG based on gastric antrum images, and the areas under the [
] P–R curve and ROC curve approached 0.99. The diagnostic accuracy, sensitivity, and specificity of the CNN model for atrophic gastritis were 0.9424, 0.9458, and 0.9401, respectively, which exceeded that of human experts. The detection rates of mild, moderate, and severe atrophic gastritis were 93%, 95%, and 99%, respectively.
Fig. 4Receiver operating characteristic curve, precision-recall curve, and measurement indicators about accuracy, sensitivity, and specificity of different diagnosing ways.
By comparing these heat maps with the image areas used by the doctors for diagnosis, we were able to verify whether the deep learning model had actually learned information regarding atrophic gastritis (Fig. 5). The heat maps generated not only completed the image classifications but also directly located the relative positions of the lesions.
Fig. 5Interpretable thermodynamic maps in automatic diagnosis of chronic atrophic gastritis. (a) Original images. The yellow boxes indicate the randomly cropped areas of the DenseNet model on the original image. (b) Clipped areas from the original images. The red boxes are the areas of focus labeled by the doctor. (c) Heat maps generated with class activation mapping. The orange-red regions of the heat maps are consistent with the atrophic mucosa labeled by the doctors according to pathological results.
]; therefore, we classified atrophic gastritis into mild, moderate, and severe atrophic gastritis. In our experiment, 1458 cases of mild, 1348 cases of moderate, and 38 cases of severe atrophic gastritis were used to train the CNN model. The accuracy of our CNN system in the diagnosis of mild, moderate, and severe atrophic gastritis were 0.93, 0.95, and 0.99, respectively, indicating that the detection rates for moderate and severe atrophic gastritis by CNN were higher than those for mild atrophic gastritis. Therefore, the model proposed in this paper was deemed to be accurate and effective. This result may be attributable to the fact that changes in glands and blood vessel textures are more evident in moderate and severe atrophic gastritis than those in mild atrophic gastritis, and the atrophic area is larger. Additionally, atrophic symptoms are regional, and some of the atrophic areas were small and unclear due to the capturing angle. Using multiple images from a single patient may yield more accurate diagnostic results through integrated analysis.
In future studies, we will adopt this method of target detection to train the CNN to learn endoscopic pictures of atrophic gastritis, reduce interference in unrelated areas, and improve the diagnostic rate of mild atrophic gastritis. This will enable early intervention to slow or even reverse the progression of atrophic gastritis in patients.
Our study had several limitations. First, atrophic gastritis primarily occurs in the gastric antrum; however, a small percentage occurs in the cardia, stomach angle, and other locations. Our research mainly focused on the diagnosis and recognition of atrophic gastritis of the gastric antrum. Future studies should focus on whole gastric images for comprehensive diagnosis of atrophic lesions in the entire gastric mucosa. Second, dynamic videos can simulate the process of gastroscopy more realistically. Using video data to train the CNN-CAG model further could improve the clinical applicability of this model. Third, in the training set, high-quality and clear gastroscopic images were used; actual gastroscopic images may not be as clear due to the instability of the handheld mirror, and these blurred gastroscopic images may hinder the recognition of gastric mucosal lesions. Thus, in training and learning data sets, we should gradually increase the number of low-quality images and test the accuracy of the CNN-based diagnosis. Fourth, the etiology of atrophic gastritis includes H. pylori infection. In our experiment, we failed to classify gastric mucosa features caused by different etiologies of atrophic gastritis. Fifth, this study used datasets from a single health center; multicenter data validation should be conducted to improve the accuracy of the CNN-based diagnosis. Sixth, as this was a retrospective study, prospective studies are needed to verify the diagnostic ability of the CNN. Finally, focus localization in current research is achieved by the visualization of model results. In future studies, we need to use the sliding window and target detection algorithms to detect lesions in CAG to achieve better classification.
In this study, we achieved a high accuracy rate in CAG screening, higher than that previously reported. Furthermore, we visualized the analytical methods used by the model and compared them to those of diagnosing physicians. These results showed that lesions detected by the model were consistent with those detected by doctors, verifying the accuracy and validity of the model trained in this study. In conclusion, the diagnosis of CAG can be confirmed by gastroscopic images using a CNN. We recommend that CNN be widely used in clinical practice to help endoscopists diagnose atrophic gastritis. This should be confirmed by future research and validated in a large-scale study.
Conflict of interest
None declared.
Funding
This study was funded by the Post-doctoral Research Projects of Shanxi Provincial Department of Human Society (2017-92).
Acknowledgments
The authors thank the endoscopists at the Shanxi Provincial People’s Hospital of Gastroenterology who helped perform the gastroscopy. We are also grateful to the engineers at Baidu Online Network Technology (Beijing) Corporation and School of Computer Science and Technology, Xidian University, who helped develop and test the convolutional neural networks. We would like to thank Editage (www.editage.cn) for English language editing.
References
Bray F.
Ferlay J.
Soerjomataram I.
Siegel R.L.
Torre L.A.
Jemal A.
Global cancer statistics 2018: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model.
Possible contribution of artificial neural networks and linear discriminant analysis in recognition of patients with suspected atrophic body gastritis.