Article Preview
TopIntroduction
Frequency hopping communication system, as a traditional anti-interference technology, has strong anti-jamming ability. In the past few decades, based on the traditional frequency hopping communication technology, differential frequency hopping, multi sequence frequency hopping, variable speed frequency hopping, cognitive frequency hopping and other technologies have been developed. It can be found that most of these technologies realize anti-interference by timely and appropriately adjusting the important parameters which have a great impact on the performance of frequency hopping system. However, under the influence of increasingly complex electromagnetic environment and gradually intelligent interference strategy, these frequency hopping communication technologies can no longer meet the communication needs.
As the most important parameter of frequency hopping communication system, the research on frequency hopping pattern has been promoted all the time. Researchers often screen frequency points on the basis of accurate spectrum sensing results, and generate frequency hopping patterns in various ways. However, these methods have limited effect under the complex electromagnetic environment. In order to ensure the quality of frequency hopping communication, the application of more intelligent anti-interference technology in the design of frequency hopping pattern is necessary.
With the development of machine learning technology, the anti-interference technology is becoming more and more intelligent. As an important branch of machine learning, reinforcement learning is an algorithm based on Markov decision process (MDP). Its essence is the process in which agents constantly interact with the environment. Reinforcement learning interacts with the environment through an agent with learning ability. The agent selects and executes a certain action according to the current state and the learned environmental experience, and transfer to a new state under the joint action of the action and the environment. At the same time, the environment will also feedback certain rewards or punishments according to the current state and the actions taken. The agent updates the cognition of the current environment through rewards or punishments, and makes subsequent decisions. After completing the learning, the agent will obtain an environment-state-action mapping relationship. It is used to guide the agent to select the appropriate action based on the state in the actual decision-making process and obtain the maximum cumulative reward value from the environment. Therefore, reinforcement learning is suitable for solving intelligent decision-making problems in complex environments.