Sleep Apnea
We have a number of different datasets of EEG data of children collected concerning both the reactions to different stimuli and from sleep apnea studies. These data have been completely de-identified and will be used throughout this project. Generally, EEG data are collected using a Geodesic Sensor Net.(Johnson et al., 2001; Tucker, 1993) This is a system that allows the mapping of brain activity data using a cap containing 128 electrodes. The cap is placed on the subject’s head. This system makes it much easier to collect brain activity data from children since previously, collecting these type of data required electrodes to be placed on a person’s head one-at-a-time using applicator gel. Most children will not sit still for this type of research. The Geodesic Sensor Net allows all of the electrodes to be placed on a child’s head at once and collection of data is much faster than using the old EEG or electro-encephalogram method. Data are recorded at fixed time intervals, usually measured in seconds. For example, in the data analyzed for any one child, an entire collection of 250 data points can be recorded in under 20 minutes.
Figure 1 represents the positioning of the electrodes for a 128 channel Geodesic Sensor Net.
Figure 1. Geodesic sensor net configuration
The electrodes that appear vertically in the center of the chart, running from top to bottom, divide the other electrodes into left and right sections. For statistical analyses, all electrode readings from the left side of the brain can be averaged to one value as can all values from the right side of the brain. The brain positioning of all electrodes is shown in Table 1.
Table 1. Position | Brain Position | Electrodes |
FL | Front Left | 18 19 20 22 23 24 25 26 27 28 33 34 39 128 |
FR | Front Right | 1 2 3 4 8 9 10 14 15 121 122 123 124 125 |
CL | Center Left | 7 12 13 21 29 30 31 32 35 36 37 38 41 42 43 46 47 48 51 |
CR | Center Right | 5 81 88 94 98 99 103 104 105 106 107 109 110 111 112 113 117 118 119 |
PL | Parietal Left | 54 61 67 53 60 52 59 58 64 63 |
PR | Parietal Right | 78 79 80 87 86 93 92 97 96 100 |
OL | Occipital Left | 65 66 69 70 71 72 74 75 |
OR | Occipital Right | 77 83 84 85 89 90 91 95 |
TL | Temporal Left | 40 44 45 49 50 56 57 |
TR | Temporal Right | 101 102 108 114 115 116 120 |
The EEG data, then, have hundreds, sometimes, thousands of data points recorded in sequence from each of the net sensors. There may be just a handful of subjects in a study, with each subject having these multiple recordings of data. The methods used to model the data must be able to accommodate the type of data collected.
For each electrode, then, we have a sequence Xi1, Xi2, …, Xin representing the first to the last timed reading (assuming a total of n readings). The value i represents the specific electrode. This sequence is not a random sample since it is clear that Xi,t is related to Xi,t+1. It is also questionable whether we can assume stationarity, meaning that Xi,t and Xi,t+1 have the same probability distribution. For the purposes of this study, we will make such an assumption. Moreover, if i and j are in the same general location, we must assume that Xi,t and Xj,t are related in some way.
Because of these relationships and the lack of randomness in the variables, we cannot use standard regression techniques to investigate the data because these techniques make the assumption that the data are both independent and identically distributed, as well as coming from a normal distribution. Such assumptions are clearly false in data collected from EEG monitoring. In the past, attempts have been made to classify, or group the EEG readings to simplify the problem.(Kook, Gupta, Kota, & Molfese, 2007) Another approach, specifically used in hypothesis testing, has been to reduce the sample data to its averages, and to analyze the average.(Mayes, Molfese, Key, & Hunter, 2005) Such an approach greatly reduces the amount of information from the EEG data that is used in the analysis. By using techniques that were specifically designed to work with these types of data, we can greatly expand the amount of knowledge extracted from the data. Therefore, we must work with techniques that do not assume independence in the data points.
The three techniques that will be used in the proposed short course are part of the general topic of data mining. Data mining is a general term that is used to describe a process of data analysis, beginning with required data preprocessing followed by exploration and hypothesis generation and ending with the validation of results and their use in making decisions from the data.