Article Preview
Top1. Introduction
Nowadays, with the rapid development of information technology, various improved algorithms are proposed to improve the technologies’ effectiveness (Li et al. 2018, Li et al. 2019a, Li et al. 2019b, Wang et al. 2018, Wang et al. 2019). Furthermore, software products are becoming more and more complex, and APIs in software libraries are becoming more and more abundant. Improving software development efficiency based on existing APIs has become one of the hot researches in software engineering. However, understanding and learning so many APIs in large software libraries is not easy (Ko et al. 2004), and software developers prefer to provide only requirements descriptions to get the right API. Existing keyword retrieval methods are difficult to identify lexical and syntactic differences between requirement description and API documents, which leads to low efficiency of API recommendation.
In order to improve the recommendation efficiency, researchers proposed many API recommendation techniques (Mcmillan et al. 2011, Chan et al. 2012, Holmes et al. 2005, Rahman et al. 2017, Goldberg et al. 2014, Hoecker et al. 1995, Eggert et al. 2004, Bengio et al. 2009, Jang et al. 2016, Ma et al. 2015), which include recommendation methods based on semantic and no-semantic information. Word embedding based API recommendation is one of the most popular techniques. Word embedding is a way to transform words in text into the form of vector representation (Lai et al. 2016). The simplest method of word embedding is one-hot coding based on the word bag model (Karakasis et al. 2015). One-hot coding is the most basic vector representation method, in which N bit state register is used to encode N states, and the text vector is used to represent the words in the text, where only the position corresponding to the word is 1 and all other positions are 0. The commonly-used technique is Word2Vec (Frome et al. 2013), which is based on the CBOW model and Skip-gram model. Both models are based on the three-layer structure in the neural network language model. In this method, each word in the text is represented by vector and the word vector with similar semantic information has close spatial position. The position of synonyms in vector space will be closer, which guarantees semantic information in the text (Lilleberg et al. 2015).