1 Introduction

Humankind has urged for safety in all spheres of our society, thus, as technology evolves in this direction, also evolves efforts to overcome security systems. In this context, current biometric systems are in constant development, and new forms of capture discriminant and robust traits among people are desirable. The present work deals with the use of electroencephalogram (EEG) signals for biometry task, since the EEG is difficult to fake or steal.

The seminal work presented in [8] showed the feasibility of using the EEG to biometric task, and since that, many approaches using EEG have been proposed such as in [2] where authors performed biometric verification on Physionet EEG database. Authors concluded that the best frequency band for EEG biometric is the gamma band (30–50 Hz), where they reported 4.4% of equal error rate (EER). Their approach is based on phase synchronization, in which the Eigenvector Centrality obtained from every node (subject) is the feature vector. Signals on resting condition are considered for the analyses in two scenarios: eyes open and eyes closed.

In [12], four different task conditions, related to signal motor movement and imagery tasks are investigated. A novel wavelet-based feature was used to extract EEG feature. Experiments were conducted in Physionet EEG data and a mixture of data, from different sessions, is used for training. Only nine electrodes are considered and the lowest EER achieved is 4.5%.

Several machine learning and pattern recognition techniques were investigated aiming to identify a person by means of EEG signals, however, to the best of our knowledge, deep learning based methods as Convolutional Neural Networks (CNN) [7] have not been evaluated yet. Deep learning has been used to represent patterns in several computer vision and patterns recognition problems, and outstanding results have been reported [1, 5].

In this work, a novel approach for EEG representation based on deep learning is proposed. The approach is also evaluated on the Physionet database, and data augmentation techniques are explored to train a deep convolutional neural network. Results show that the use of CNN in EEG biometrics is a promising path, outperforming baseline methods by lowering the EER from 4.4% to 0.19% in the best scenario.

The remainder of this paper is organized as follows. Section 2 contains the approach with the methodology and a description of the database used. In Sect. 3, we show the experimental results and a discussion about it. Finally, in Sect. 4, the conclusions are presented.

2 Approach

In this section, the Physionet EEG database is described as the proposed method, based on the convolutional network, along with the required pre-preprocessing steps.

2.1 Physionet EEG Database

The Physionet EEG Database [3] is a popular benchmark in the literature for biometric with EEG and it is public availableFootnote 1. The records were acquired from 109 different subjects, using 64 electrodes in the region of the scalp to store the EEG signals (see Fig. 1), each sampled at 160 Hz. The database was created by the developers of the BCI2000 instrumentation systemFootnote 2 and maintained by Physionet. There are 14 different acquisition sessions for each subject, each one with different motor/imagery tasks considered during recording. Among those 14 sessions, there are two one-minute (60 or 61 s per record) baseline runs (one with eyes open (EO), one with eyes closed (EC)). The others sessions are related to four kinds of tasks of three two-minute runs.

Fig. 1.
figure 1

Source: http://physionet.org/pn4/eegmmidb/.

Positions of electrodes on scalp.

2.2 Methodology

Data pre-preprocessing: To further investigate the feasibility of the method, only resting state EEG data is considered. Thus, the baseline sessions - data captured where the subject is with EO and EC - are used during experiments.

All EEG recording signals are band-pass filtered in 3 frequency bands. The first band covering from delta to gamma frequencies (1–50 Hz), the second band are related to low to high beta (10–30 Hz), and the third one preserves the range of gamma (30–50 Hz) frequency. A total of 61 s of raw wave are used for training and test.

Data Augmentation: In order to follow a baseline method evaluation protocol, proposed in [2], EEG data is divided into segments of 12 s window (1920 samples), i.e., 5 segments per subject for each record. Although 5 segments per subject are not enough data to train a deep convolutional neural network, data augmentation technique is proposed here to overcome this issue. The rationale for the data augmentation is to consider a large overlap between segments and therefore multiplying the number of segments. The new augmented data is created sliding a 12 s (1920 samples) window overall record signal (9600 or 9760 samples), shifting from 0.125 to 0.125 s, a sliding window strategy with 20 samples per step [9]. This technique yields 42696 new instances for training.

Convolutional Neural Network: The architecture of a typical convolutional neural network is structured as a series of stacked operations, beginning with convolutional layers, followed by activation with Rectified Linear Units (ReLu), pooling, normalization and finally fully connected layers (FC) [7].

For this work, three CNN architectures have been investigated. One with small receptive fields in the first convolutional layer inspired by [10] and two others with large receptive fields on the first convolutional layers inspired by [6, 13]. Note that filters are one-dimensional and proportionally adapted to EEG raw signal (see Fig. 2). The width and depth of the networks have been empirically evaluated based on validation error.

Fig. 2.
figure 2

Deep learning model.

After the learning process, last three layers are removed (Softmax, Dropout, and FC4 as seen in Table 1) and the new network output is used as a feature vector for a 12 s EEG segment, which will be used for verification task.

In verification task, the performance of methods is expressed in terms of Detection Error Trade-off (DET) curves, which show the trade-off between type I error (false acceptance error - FAR) and type II error (false rejection error - FRR). To construct the DET curve, all instances from testing dataset are compared to each other, in an all-against-all scheme. Verification task can be modeled as the Eq. 1, where S is the function that measures the similarity between two feature vectors (\(X_{1}\) and \(X_{2}\)) and t is a predefined threshold [4]. The value \(S(X_{1}, X_{2})\) is the similarity or matching score between the biometric measurements with Euclidean distance. A person’s identity is claimed and classified into genuine when pairs are similar and impostor, otherwise. After that genuine (intra-class) and impostor (inter-class) distribution curves are generated from similarities scores.

$$\begin{aligned} (X_1, X_{2}) \in {\left\{ \begin{array}{ll}genuine, &{} \text {if } S(X_{1}, X_{2})\ge t \\ impostor, &{} otherwise \end{array}\right. } \end{aligned}$$
(1)
Table 1. Architecture for EEG biometry.

3 Experimental Results and Discussion

Experiments were conducted on an Intel (R) Core i7-5820K CPU @ 3.30 GHz 12-core machine, 64 GB of DDR4 RAM and one GeForce GTX TITAN X GPU. The MatConvNet library is used for the convolutional networks [11] linked to NVIDIA CuDNN.

Data segmentation for experiments was performed following the evaluation proposed in [2], where the window size consists of 12 s (as detailed in Sect. 2.2).

Data augmentation is used on data from EO session (training data), yielding 384 or 392 (from 60 and 61 s) segments of size 1920 samples (12 s) per subject. For evaluation, five segments of 12 s are extracted for each subject from EC session, i.e., no overlapping during the test (See Fig. 3).

During training, the input signal, represented by a 12 s length EEG time series, is feed-forwarded through network layers. Each layer represents one or more CNN operations: convolutional filter; pooling; stride; rectification (RELU); normalization (L2 Norm). Convolutional stride and padding are set to one. Pooling layer performs a max-pooling operation, and when there is down-sampling (\(stride > 1\)), it happens in conjunction with pooling. The stack of layers are followed by Fully-Connected (FC) layers and the last FC layer is for classification. These FC layers can be seen as multi-layer-perceptron (MLP) network.

The final layer is a soft-max loss one. The FC layer with \(1\times 1\) filter size is used for dimension reduction and rectified linear activation. The network architecture is presented in Table 1.

For training the network, three learning rates of value \(L = [0.01, 0.001, 0.0001]\) are distributed over the epochs, mini batches are set to size 100, and a momentum coefficient of 0.9 is considered during all training. Filter weights are randomly initialized and stochastic gradient descent is used for optimization. The dropout operation is placed before the last layer with 10% to minimize over-fitting.

During the training phase, 90% of the data is reserved for training and 10% for validation as shown in Fig. 3. The CNN are trained for over 60 epochs.

Fig. 3.
figure 3

The distribution of ECG segments used for training and testing.

Evaluation is carried in verification mode and the metric used to report results is Equal Error Rate (EER) which, in turn, is defined as the point where the False Acceptance Rate (FAR) is equal to the False Rejection Rate (FRR). FAR and FRR are generated from intra-class and inter-class pairs comparison. The present protocol produces 1086 genuine (intra-class) pairs and 146610 impostors (inter-class) pairs. In Fig. 4, DET curve shows the relationship between FAR, FRR, EER by means of a threshold variation.

Table 2. EER obtained for the specified frequency bands.
Fig. 4.
figure 4

DET curve for proposed experiments.

The DET curves in Fig. 4 depicts the performance for the detailed experiments. The curve related to 30–50 Hz resulted in an overall performance of 0.19% EER as shown in Table 2, overcoming results published in the literature. As shown in Fraschini et al. [2], the best frequency band for EEG biometrics is the gamma band. The results presented here confirm the findings in [2] regarding frequency band, however, the discrepancy of results with other frequency bands was greater here. More experiments are needed to investigate whether this phenomenon extends to other tasks (T1–T4) or even other databases.

Results presented in Table 3 compares the proposed method with state-of-the-art approaches. As can be noticed, the proposed method significantly reduced the EER. The usage of all 64 EEG channels shows the robustness of the method since it was able to handle all electrodes, even if not all of them effectively contribute to the identification of individuals [12].

Table 3. Comparison with related works.

4 Conclusions

In this work, the use of CNN in an EEG-based biometric system is investigated for the first time. When compared to the baseline methods presented in the literature (under the Physionet EEG database), EEG data represented by the CNN model showed a lower EER for person recognition (verification mode).

The contribution of this paper is the proposed deep CNN architecture and the data augmentation technique, which is of paramount importance in the training process. The sliding window strategy for generating new training samples allowed the deep network architecture to learn efficiently even with reduced data.

Results showed that the proposed EEG-based biometric system can be a promising method for future real-world applications since researchers are developing hardware to facilitate embedding a CNN model, such FPGA-based deep learning acceleration and NVIDIA TX1Footnote 3.