Article Preview
Top1. Introduction
Often in real-world applications such as multimedia, NLP, and medicine, large quantities of unlabeled data are generated every day. This surge in data gives rise to the challenging semantic gap problem (Lin, Shyu & Chen, 2012; Chen, Lin & Shyu, 2012; Zhu & Shyu, 2015; Sadiq, Yan, Shyu, Chen & Ishwaran, 2016) which is to reduce the gap between high level semantic concepts and their low level features (Sadiq, Tao, Yan & Shyu, 2017a; Sadiq, Zmieva, Shyu & Chen, 2018; Yan, Chen, Sadiq & Shyu, 2017). Despite rigorous research endeavors, this remains one of the most challenging problems in information sciences where we have overwhelming quantities of all sorts of fast, complex, heterogeneous and unstructured data. To handle such data, conventionally we have been utilizing descriptive models that try to find deterministic features and build probabilistic models (Chen & Kashyap, 1997; Chen, 2010; Lin, Shyu & Chen, 2013; Sadiq et al., 2017b). However, the problem with discriminative models is that they, generally, estimate a hyperplane. For example, when categorizing images of cats and dogs, data points on one side of the hyperplane are categorized as cats and everything on the other side as dogs. Discriminative models follow a condition in logistic regression that relaxes the computation of a joint probability to a conditional probability which is much easier to calculate because it maps directly to the hyperplane that divides between two clusters. This problem gets exacerbated in deep learning models due to the increase in dimensionality and boorish memorization of patterns. Mere rotations or color-inversions in the trained images can easily confound very deep neural networks even though the modified images share the same semantics and structures as the original images (Hosseini & Poovendran, 2017).
Recently, there has been a growing concern of this shortcoming, resulting in a soaring interest in Knowledge Representation and Reasoning (KRR) (Liu, 2017) and reasoning based deep learning (Andreas, Rohrbach, Darrell & Klein, 2016; Santoro et al., 2017). We believe that the next generation of AI systems will need to have the ability to understand problems at a deeper level rather than just based on memorization of data. However, discriminative models fail completely in inferencing problems because they do not capture the underlying relationships in the input space. Figure 1 shows how the addition of an adversarial noise can completely fool the classification while the L2 loss between the original and modified images was minimal. The discriminative models considered them as identical images while producing completely different results. All images were classified with 99.9% confidence.
Figure 1. Classification results produced by state-of-the-art deep learning models, but fooled by simple noising up of the images (Nguyen, Yosinski & Clune, 2015)