Physiological Big Data Mining Through Machine Learning and Wireless Sensor Networks

With the improvement of living standards, the requirements for medical care and daily healthcare quality have become higher and higher. However, the traditional medical diagnosis mode cannot provide patients with all-round, real-time, and accurate health status. With the aggravation of the aging population, the scale of physiological data will increase in a blowout manner. The traditional medical diagnosis model for monitoring, which is based at the central hospital, has been unable to meet the current real-time monitoring needs for families and individuals. In order to solve this issue, this paper establishes a wireless sensor network based medical platform, which implements sleep monitoring by mining electroencephalogram signals. The wireless sensor network-based medical platform adopts the end-edge-cloud architecture. The experiments and simulations show the effectiveness of the proposed end-edge-cloud architecture-based medical platform.


INTRoDUCTIoN
With the continuous improvement of living standards, people are paying more attention to their own health.At present, with the introduction of wearable medical devices by major companies, a large number of people will choose to use wearable medical devices in their daily life to detect their own health in the future, which will produce a huge amount of physiological data (Tan et al. 2022;Huarng et al. 2022).How to analyze and process these physiological data quickly has become an emergent issue in the medical field (Moge et al. 2022).
At present, the existing telemedicine monitoring and diagnosis systems are all independent and decentralized large hospitals.Various services are independently provided by each hospital, which isolates a medical diagnosis system from other medical diagnosis systems (Lohani & Thirunavukkarasan 2021).Thus, it results in low information sharing between different systems and high construction and maintenance costs.In addition, due to the current shortage of cloud technology related talents, once the system meets issues, it cannot obtain technical support and maintain regular system updates in time (Bommala et al. 2021;Cresswell et al. 2022).Thus, it is necessary to design an new system which can upgrade and improve its own medical smart information to meet the requirements of people for their own physiological health services.It is urgent to build an efficient medical diagnosis platform with high intelligence and strong analysis ability.The cloud computing mode can change the decentralized application mode of traditional medical service system to enhance the processing capacity of physiological data and conform to the development trend of medical informationize.The cloud computing based medical diagnosis has great advantages over the traditional medical diagnosis model (Cengiz 2021; Dang et al. 2019).The cloud medical service platform has strong data processing capability, which can quickly and timely analyze the physiological data uploaded by users.The medical platform gives full play to the powerful resource integration capability of cloud computing technology, which helps to accelerate the establishment of a comprehensive medical service standard and provide citizens with comprehensive health resource management.With the help of data mining technology, we can mine valuable information from these massive physiological data to provide citizens with conveniently online query function and let them know their health status in real time (Chen et al. 2018;Kumar et al. 2019).Thus, they can make corresponding adjustments in daily life according to whether their bodies are in a healthy state stably.At the same time, medical experts can pay attention to patients' physical conditions at any time through visiting cloud sources and quickly make immediate response in case of emergency medical events.
Although cloud computing based medical platform provides an efficient computing platform for medical data processing, the growth rate of network bandwidth is far behind the growth rate of data generated by various IoT based medical devices (Ali et al. 2018).The single computing resource based on cloud computing model can no longer meet the real-time, security, low energy consumption and massive data processing in medical care.In order to satisfy the requirements of the data transmission process in terms of fast connection, real-time business, data optimization, application intelligence, security and privacy protection, it is necessary to make full use of the edge network composed of sensor nodes, smart phones and other terminals, as well as cellular base stations and other infrastructure to process the content requested by IoT applications at the network edge near the data source, and provide edge intelligent services to optimize the connection, unload traffic and enhance the user's experience (Greco et al. 2020).
On the basis of centralized big data processing with existing cloud computing models, it is urgent to develop edge data processing technology with edge computing models based system.This paper establishes an end-edge-cloud based medical platform to solve the issues in cloud computing based platform.The proposed platform consists of three parts: end node, edge node and cloud server.The end nodes are used to collect physiological signals of citizens.The edge node is used to process the collected signals and sends the processed signals to cloud server.The cloud server send the mining result to edge node.The proposed platform implements the sleep monitoring through electroencephalogram signal.
The main contributions of this paper are summarized as follows: first, an edge-cloud based medical platform is proposed; second, the platform implements daily sleep monitoring through electroencephalogram signal; third, the proposed platform is evaluated by experiments.
The remains of this paper is organized as follows: the architecture of end-edge-cloud based medical platform is proposed in Section 2; the implementation of end-edge-cloud based medical platform is provided in Section 3; the experiments and simulations are reported in Section 4; the last section is the conclusions.

THe ARCHITeCTURe oF eND-eDGe-CLoUD BASeD MeDICAL PLATFoRM
Compared with cloud computing, edge computing is not only created to replace cloud computing, but also to supplement and extend cloud computing and provide a better computing platform for mobile computing in the Internet of Things (Wang et al. 2019).The edge computing model needs strong computing power of the cloud computing center and the support of massive storage.Meanwhile, the cloud computing also needs the edge devices with edge computing to process massive data and private data, so as to meet the needs of real-time, privacy protection and energy consumption reduction (Caprolu et al. 2019).To this end, an end-edge-cloud driven intelligent medical platform is proposed, which is shown in Figure 1.
In Figure 1, the proposed medical platform provides resources and services through end-edgecloud collaborative computing.It consists of three parts: end nodes, edge nodes and cloud server.The three parts can greatly improve the maximum utilization and transmission efficiency of resources in the whole system and ensure the real-time processing through complementing each other and taking advantages of each part.At the same time, three parts can dynamically adjust in the way of task migration according to the current state to achieve the purpose of balancing the computing load, and finally achieve the deep coverage and massive connection of the intelligent Internet of Things.
From the basic functions performed by various network elements, the end-edge-cloud base medical platform has a strong hierarchical feature, which can be divided into three layers: intelligent IoT terminal, edge and cloud platform.Among them, IoT terminals with limited computing and storage capacity are mainly responsible for collecting and forwarding the physiological signals of a given transaction, and performing some simple data processing.Different from cloud server, the edge nodes mainly collect, transmit, process and execute different data processing based on the realtime nature of data and the frequency of traffic and server access, and requests network resources for business in a dynamic manner.In our medical platform, the data is mainly the physiological signals.

IMPLeMeNTATIoN oF eND-eDGe-CLoUD MeDICAL PLATFoRM
The automatic detection of electroencephalogram (EEG) is implemented in proposed end-edge-cloud based medical platform, which is a routine clinical examination method.EEG detection is a painless and safe auxiliary examination method.When patients have brain diseases or other neurological diseases, doctors often use EEG as a diagnostic indicator (Sarmukadam et al. 2019;Faiman et al. 2021).EEG can accurately reflect many brain diseases, such as headache, diencephalic attack, functional psychosis and organic psychosis.Especially for epilepsy, the diagnosis is more accurate and preferred.Because most epileptic patients will have epileptic discharge, and it is easy to find the abnormal wave of epileptic discharge in EEG.
There are many abnormal waveforms in EEG of patients with brain diseases.These abnormal waveforms are mainly composed of spike wave, spike slow complex wave, spike slow complex wave, spike slow complex wave, etc.Early clinical EEG examination mainly relies on manual reading and identification of abnormal waveforms in EEG.This method is very time-consuming and laborious.Additionally, due to the interference of artifact, EEG physicians sometimes make different judgments.Therefore, the automatic detection of EEG has important practical value.The normal rhythms of EEG signals include α , β , θ and δ , which ranges 8 to 13 Hz, 13 to 30 Hz, 4 to 8 Hz and 1 to 4 Hz, respectively.The abnormal waves of EEG are divided into simple wave, complex wave and characteristic waveform.The spike wave, sharp wave and slow wave of EEG are all simple waves.The complex wave includes sharp slow wave, sharp slow wave, sharp wave bursts, slow wave bursts, etc. Characteristic waveform consists of excessive spindle wave, three-phase wave and glove wave.
The automatic detection of EEG is very useful in clinic and has potential research value.It takes doctors at least hours to complete the analysis of the patient's EEG records in one day.In clinical practice, it is difficult to identify abnormal EEG waves with strong subjectivity.For the same segment of EEG signals, different experts may have different diagnosis results.Therefore, it is necessary to study the use of EEG automatic detection to detect abnormal waves in EEG and improve the efficiency of clinical diagnosis through feature extraction (Guyon et al. 2008) or feature selection (Zheng et al. 2018;Zheng et al. 2021).
Time domain analysis is the initial research method for EEG detection (Akan & Cura 2021).Experts use personal experience to read EEG manually and visually according to the frequency and amplitude of EEG waveform, and then draw conclusions.This method is highly subjective.Frequency domain analysis and power spectrum analysis are the most commonly used methods for EEG signal processing.Fourier transform signal processing requires the signal to be stationary random signal, because the spectrum analysis results of non-stationary random signals at different times are different.Fast Fourier transform (FFT) is one of the most commonly used methods to analyze EEG signals, but its disadvantages are low frequency resolution and insufficient variance of spectral estimation (Bhardwaj et al. 2021).
EEG is a time-varying non-stationary signal, which cannot be represented correctly by either time domain or frequency domain (Arnin et al. 2018) In Figure 2, the collected EEG signals are first filtered by an adaptive filer to remove the power frequency artifact, then are decomposed into different scales by wavelet transform, lastly the multiple scale data is input into a trained classifier to determine the status of EEG signal.The classifier is learnt by using an EEG signal library with labels.
The output of filter is represented as follows: When the statistical characteristics of input signal and noise are unknown during input process, the adaptive filter can automatically adjust the parameters to meet the requirements of optimal criteria, such as recursive least square and minimum mean square error.The least mean square error algorithm is widely used in practice due to its advantages of easy realization and minimum calculation.This paper adopts least mean square to adjust the thresholds in filter.Then, the processed signals are input into wavelet to extract multi-scaled features.
The wavelet analysis is evolved from Fourier transform, which can compensate the shortcomings of Fourier transform (Amin et al. 2020).The time-frequency localization of wavelet transform is achieved at the expense of partial frequency domain localization performance.One of the advantages of wavelet transform is that it can provide more accurate location in both time domain and frequency domain.The non-stationary characteristics of signals are the characteristics of most of the real physical signals in EEG signals.The waveform transform is a powerful tool for processing non-stationary signals, such as EEG signals.
The local analysis ability of wavelet in time and frequency domain, the concept of multi-resolution analysis, the optimal approximation performance to one-dimensional bounded functions and the implementation method of fast wavelet transform have all greatly promoted the development of wavelet transform.The translation and scaling invariance is one of the most important characteristics of wavelet transform.
The wavelet transform methods include wavelet integral transform, wavelet series and discrete wavelet transform.Since the basis of discrete wavelet may not be orthogonal, the coefficients of discrete wavelet transform still have redundant information.In order to eliminate the redundant information, this paper adopts multi-resolution analysis to establish orthogonal basis for wavelet transform.
Let Ψ a b t , ( ) be wavelet basis, ϕ a b t , ( ) be the associated complementary scale basis function.The multi resolution analysis can expand input signal x t ( ) with arbitrary resolution.It refers to a series of closed subspaces which satisfy monotonicity, completeness, scalability, translation invariance, Riesz basis and positive orthogonal complementarity.
The monotonicity requires the subspaces are monotonous and each space has its own basis vector with different resolutions.The completeness refers to that the intersection between subspaces is an empty set and the union is the entire space.For scalability, if − ∑ » holds; otherwise, if there exist two integers p and q to ensure that In order to construct the wavelet standard orthogonal basis, it first designs a filter group, and then calculates the frequency domain form of the scale function ϕ ω ( ) and obtains frequency domain form of wavelet function Ψ( ) ω , finally obtains the orthogonal wavelet function and scale function in the time domain through inverse Fourier transform.
The information of frequency domain and time domain is used to represent EEG signal.Let × is used to learn a classifier which is used to monitor the EEG status of future data.The common used classifiers include: k-nearest neighbors (Saeedi et al. 2020), linear discriminant analysis (Zhu et al. 2022;Hasan et al. 2015), neural network (Übeyli 2009), logistic regression (Tomioka 2006), support vector machine (Zhu et al. 2016;Bhuvaneswari & Kumar 2013;Zhu et al. 2021;Zhu et al. 2017) etc.In this paper, we utilize optimal margin distribution machine (ODM) (Zhang & Zhou 2019) as classification method, which is superior to support vector machine.In order to ensure this paper can be self-contained, let us briefly review ODM.
Let X be the training set in reproducing kernel Hilbert space (RKHS).The φ( ) x i is the mapping of x i in RKHS, whose inner product can be calculated through kernel trick By setting the minimum margin and margin mean as 1, the ODM is formulated as follows: min . ., , , , , Since the deviation of margins is exactly , the second term of the objective function in Eq. ( 2) is just the margin variance.Thus, ODM implements to minimize margin variance under fixed minimum margin and margin mean.By introducing θ -insensitive loss to ensure the sparsity of solution, the ODM is reformulated as follows: Here, parameter µ is used to balance two different deviations of two subspaces, θ is used to control the ratio of support vectors.By introducing Lagrange multipliers α and β for the constraints in Eq. ( 4), the corresponding Lagrange function is written as follows: By setting the partial deviations of  as zeros for variables w , ξ , and  , the following equations hold.
Here, M YX XY T = .The dual form of Eq. ( 4) is written as follows: After obtaining α and β , w can be written as follows: The final decision function is written as follows: . The problem Eq. ( 10) is a quadratic programming which can be solved by many existing solvers, such dual coordinate descent algorithm.
For the multi-class classification, the ODM is reformulated as follows: Here, c is the number of classes.Furthermore, the ODM model is used to monitor sleep status through EEG signals.
With the quickening pace of life and increasing work pressure, sleep quality has become one of the focuses of people's attention to their own health and scientific research.Effective differentiation of sleep quality is very useful for the treatment of sleep apnea, insomnia and narcolepsy.In the past, people's detection of sleep quality was basically based on their own subjective feelings.Medical experts could only judge the patient's sleep situation through the patient's oral symptoms and other concurrent symptoms.Thus, it lacks objectivity and is too cumbersome and inaccurate.It is necessary to implement sleep monitoring in the medical platform to automatically and accurately detect sleep status.Our method can avoid the limitations of subjectivity, tedious steps and low accuracy in manually judging others' sleep quality.
Polysomnography is a technology applied to the diagnosis and treatment of sleep disorders.Since EEG signal is an electrical signal representing the activity of brain neurons and can be measured by non-invasively manner, it has always been a powerful tool to study the brain activity in different sleep cycles of the human body.According to Rechtschaffen and Kales (R&K) classification standard, the sleep is divided as six phases: Away (Awa), Phase 1 (S1), Phase 2 (S2), Phase 3 (S3), Phase 4 (S4), and Rapid Eye Movement (REM).The aim of the sleep monitoring in proposed medical platform is to recognize these six sleep phases.

eXPeRIMeNTS AND SIMULATIoNS
In this section, we will evaluate the proposed medical platform.In the proposed medical platform, the EEG signals collected from end nodes are unlabeled.Due to lack professional knowledge and skills, it is cumbersome and inaccurate to denote collected EEG by ourselves.In order to solve this issue, this paper adopts Sleep-European Data Format (Sleep-EDF) (Kemp et al. 2000)   Rechtschaffen and Kales suggested to generate sleep phase sequence using 30 seconds EEG signal.In this experiment, the interval of each phase is defined as 30 seconds.Thus, each interval contains 3000 data points.
The original sleep stages of these EEG segments are marked by 6 categories: S1, S2, S3, S4, REM and Awa, respectively.Each sleep phase of each channel is sampled separately.Every sleep category contains 300 segments.Here, we evaluate our medical platform on two groups.The EEG segments are processed by wavelet transform.For the features, we compare time domain, frequency domain and proposed method (time domain plus frequency domain).For the classification model, we compare k-nearest neighbor (kNN), linear discriminant analysis (LDA), support vector machine (SVM) and optimal margin distribution machine (ODM).The parameter k in kNN is tuned in the range from 1 to 10 stepped by 2. The parameter C in both SVM and ODM is tuned in the range { , , , } 10 10 10 .The results are reported in terms of accuracy of each sleep phase in Table 1 and 2. From the result in Table 1, it can be found that when using time plus frequency domain features, the accuracy of ODM achieves 92.13% which is higher than merely using time domain features (86.24%) or frequency domain features (86.19%).Compared with other classification methods, the accuracy of ODM is higher than kNN (86.59%),LDA (88.41%) and SVM (90.78%) when using time domain plus frequency domain features.
From the result in Table 2, it can be found that when using time plus frequency domain features, the accuracy of ODM achieves 91.23% which is higher than merely using time domain features (85.37%) or frequency domain features (85.84%).Compared with other classification methods, the  The proposed method is superior to previous ones on both groups.Furthermore, the accuracy of each sleep phase is reported in Table 3 when using time domain plus frequency domain features and ODM as classification method.
From the results in Table 3, it can be found that for group 1, the accuracy ranges from 86.34% to 96.49%, while for group 2, the accuracy ranges from 87.26% to 95.71%.Except S1, the accuracy of other sleep phases is more than or close to 90%.When the experiment is migrated into propose medical platform, the recognition result can be returned in less than 0.1 ms.

CoNCLUSIoN
In order to overcome the issues in current medical platform which is based on the central hospital and cloud computing, this paper establishes an end-edge-cloud based medical platform.

Figure 1 .
Figure 1.The illustration of end-edge-cloud based medical platform . Many abnormal wave signatures in EEG signals are transient, and the combination of time domain and frequency domain can better process EEG signals.Time frequency analysis methods such as short-time Fourier transform and wavelet transform have been widely used in EEG signal analysis.Wavelet transform has the ability to represent local features.The details of object can be focused and observed through wavelet decomposition.In this paper, the collected EEG signals are processed by wavelet to extract time domain and frequency domain features.The flowchart of automatic EEG detection is illustrated in Figure 2.

Figure 2 .
Figure 2. The flowchart of EEG signal analysis and two EEG channels (Fpz Cz and Pz Oz).The three channel signals are sampled at 100 Hz.In this experiment, EEG signals of Pz Oz and Fpz Cz channels are selected to analyze and identify sleep phases.

Figure 3 .
Figure 3.The illustration of an EEG signal in Sleep-EDF dataset First, the end nodes are used to collect various physiological signals and send the collected signals to edge nodes; second, the edge nodes implement denoising and feature extraction for the collected original physiological signals and send the features of physiological signals to cloud center; lastly, the cloud center returns the medical results according to received physiological signal features from edge node.Under the proposed medical platform, this paper implements sleep phase monitoring and recognition through EEG signals during daily life.The end nodes are used to collect original EEG signals.The edge nodes use wavelet transform to extract time domain and frequency domain features of EEG signals.The cloud center returns the sleep phase according to the time domain and frequency domain features from edge nodes.From both accuracy and time cost, the proposed medical platform can achieve the requirements of sleep monitoring.Under the proposed medical platform, more functions can be added in the future.ACKNoWLeDGMeNT This research is supported by the project "Analysis and Research of Multimedia Information in the Context of Big Data" (2018KY0493).
to simulate the collected EEG signals.The Sleep-EDF is a part of Research Resource for Complex Physiologic Signals Net (PhysioNet) (Goldberger et al. 2000).An EEG signal from Sleep-EDF is illustrated in Figure 3.The Sleep-EDF adopts two groups to measure sleep EEG signals.The age of the test subjects ranges from 21 to 55 years.The average age is 36 years.The first group records the sleep EEG signals of 112 healthy sleep volunteers.The volunteers in this group are all under normal sleep.The second group collects the EEG signal during sleeping from the volunteers with mild sleep disorder.The record data from each subject is saved in an independent EDF file.Each record includes a horizontal EOG

Table 2 . The sleep phase accuracy of group 2 in Sleep-EDF dataset
of ODM is higher than kNN (85.72%),LDA (87.83%) and SVM (89.67%) when using time domain plus frequency domain features. accuracy