br Applying a CNN CAD
Applying a CNN-CAD system to determine invasion depth for endoscopic resection Zhu et al
CNN architecture is essentially a stack of multiple layers, with each layer consisting of a set of neurons. A CNN re-ceives an input (eg, a vectorized image) and gradually transforms it through the stack of layers. Eventually, the CNN outputs an n-element vector, where n equals the number of classification categories, and each element is a continuous value between and 1 indicating the probabil-ity of each classification category.
The quality of input features is generally enriched by the number of layer.25 Thus, leading CNN architectures use a greater number of layers. In 1998 the first modern CNN, LeNet, had 5 layers. In 2012 AlexNet consisted of 8 layers, and VGG had as many as 16 to 19 layers. More layers in a CNN, however, cause notorious issues such as vanishing or exploding gradients, which can devastate a network’s performance.26 ResNet introduced a shortcut mechanism to overcome these issues and expanded the number of layers to 152. Our CNN-CAD system is based on the ResNet50 model.27
Outstanding performance of a CNN requires training on a massive number of images, which imposes a great burden on most researchers. Leveraging a transfer learning method that uses a CNN pretrained on a dataset of millions to extract features from each image can reduce this burden and enable accurate classification with far fewer images.
The overall accuracy of our CNN-CAD system was 89.1%, which is higher than that previously reported for endo-scopic prediction.28 At our center the CNN-CAD system also achieved significantly higher accuracy and specificity than both experienced and junior experienced endoscop-ists. In addition, the evaluation of 203 test images took only 36 seconds, thus enabling determination of invasion depth immediately after endoscopic examination. More-over, total screening time does not increase with a larger number of images because of vectorized calculation. Mod-ern graphics processing units and tensor processing units are good at large-scale parallelism and can perform vector-ized calculation in the same clock PFK158 within a range of vector sizes. Furthermore, the screening procedure can be performed online, circumventing the problem of a lack of experienced endoscopists in some parts of the country.
The expanded criteria for ESD resection are widely accepted because ESD has a higher 5-year overall survival rate and a lower adverse rate than surgery.29 Because endoscopic resection technology has been used to resect EGC lesions with a relatively low risk of lymph node involvement, we assume that quality of life is better with endoscopic treatment than with surgical treatment. According to the most recent meta-analysis of 13 studies, compared with surgery, ESD is preferable for EGC because of a lower rate of adverse events, shorter hospital stay, lower cost, and higher quality of life.30 Endoscopic treatment should therefore be performed for lesions with a high likelihood of a cure.31 In such a situation, overdiagnosis might be unacceptable because it could lead to overtreatment. That is, patients could lose their
opportunity for endoscopic resection, which could preserve their stomach with a lower risk of adverse events. Furthermore, additional surgical resection could be performed if the lesion is underdiagnosed. In the present study, higher specificity to minimize overdiagnosis of invasion depth was related to the threshold set for model classification. At a threshold value of .5, specificity was 95.6%, and unnecessary surgery was performed for only 8.8% of patients (5/57) regardless of other parameters. Of these 5 patients, 3 underwent surgery because of a large lesion size.
Obtaining high-quality endoscopic images is not easy for inexperienced endoscopists. The stomach has a com-plex structure, with different parts having unique charac-teristics. For example, tumors located in the cardiac section must be observed in retroflexion. The presence of the gastroscope and the inverse angle constraint can prevent suspicious lesions from being captured in a single endoscopic image, making CNN discrimination and inter-pretation difficult. Bile regurgitation and insufficient air inflation can also influence the quality of images. Thus, new procedures for tumor observation through gastros-copy are needed. All images used in our study were based on conventional white-light endoscopy. We believe that deep learning with narrow-band imaging magnification endoscopy, which can provide more endoscopic features, might lead to greater accuracy.
This study has several limitations. First, the sample size was small, and all endoscopic images were obtained from a single center, although the type of endoscopy and its im-age resolution is known to be highly variable across different facilities. Therefore, we will use endoscopic im-ages from other centers or other type of endoscopy in future research. Second, we only used the CNN-CAD sys-tem to determine invasion depth. Other related parame-ters, such as histologic type or lesion size, could also be determined by the system.