EEG Signal Classification in Asynchronous BCIs
A Journey from Time Embedding to Convolutional Networks

2016 International BCI Meeting Workshop on
Deep Learning and other Machine Learning and Signal Processing
Methods for Analyzing EEG in BCI Paradigms

Elliott Forney, Charles Anderson, William Gavin and Patricia Davies

Asynchronous BCIs do not involve time-locked stimuli.

Instead, the user can spontaneously perform one of several mental tasks in order to send instructions or achieve fluid control of an assistive device.

Motor imagery is a common example; however, our group often works with a variety of other mental tasks.

Other mental tasks may produce a variety of diverse control signals.

In general, classification of EEG signals is a challenging problem for several reasons:

  • noise and artifacts
  • high dimensional
  • under-sampled
  • spatial and temporal patterns
  • non-linear and non-stationary patterns
  • interpretation
  • requirements for real-time processing

Asynchronous BCIs are especially challenging because they require

  • time-invariance

In other words, we do not know when the task begins or ends.

I plan to give a brief overview of my research on this topic:

  • Power Spectral Densities
  • Autoregressive Models
  • Time-Embedded Models
  • Deep Convolutional Networks

My goal is to spark discussion and new ideas about the best ways to represent and classify EEG signals that are not time-locked.

First, the preamble to load matplotlib, numpy, scipy and CEBL3

In [1]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
matplotlib.rcParams['figure.figsize'] = (9,4.5)

import numpy as np
import scipy.signal as spsig

from cebl import *

Next, lets load some data.

This EEG was recorded from me using an eight-channel g.tec g.MOBILab+ amplifier from a person with quadraplegia due to a C4-complete spinal cord lesion. Data was collected in his home.

The participant performed five, ten-second trials for four different mental tasks:

  • Silently Count backward from 100 by 3's
  • Imagine making a right-handed fist
  • Silently sing a favorite Song
  • Visualize a rotating rubiks cube

About 50 seconds of EEG per task (not much).

Data is available at

In [22]:
chanNames = ('F3','F4','C3','C4','P3','P4','O1','O2')

trials = [eeg.EEGFromPickledMatrix('s11-mentaltasks-trial' + str(i) + '.pkl',
                                   chanNames=chanNames, sampRate=256, transpose=True)
          for i in range(1,6)]

trials[0].plotTrace(0,10, scale=35)
print trials[0].getSampRate(), trials[0].getNSec()
256.0 61.01171875

CEBL3 uses a marker channel to identify the beginning and end of each trial.

Next, we split the data into two-second segments with 50% overlap. This yields 45 two-second segments.

These segments are what we will examine and classify.

In [23]:

segs = [cls.segment(0,10) for cls in trials]
for trial in segs[1:]:
segs = segs[0]
classData = [segs.selectChr(c) for c in ('C', 'F', 'R', 'S')]
classData = [cls.split(2, 0.5) for cls in classData]

print [cls.getNSeg() for cls in classData]
[45, 45, 45, 45]

One of the most straight-forward ways to construct a time-invariant representation of an EEG signal is to use the Discrete Fourier Transform (DFT) to represent the signal in the frequency domain, i.e., as a sum of sine waves in the complex domain.

By discarding the phase information and performing some scaling, we can find the Power Spectral Density (PSD) of the signal. The PSD is time-invariant and expresses the signal in terms of signal power $(\mu V^2)$ across a range of frequencies.

In [24]:
#psdKwargs = {'method': 'raw'}
psdKwargs = {'method': 'welch', 'span': 1.0}
classData[0].plotAvgPSDByChan(label='Song', lowFreq=0.5, highFreq=40, linewidth=2, alpha=0.5,
                              plotChanNames=False, psdKwargs=psdKwargs);
for i, label in enumerate(('Right Arm', 'Count', 'Left Arm')):
    classData[i].plotAvgPSDByChan(label=label, lowFreq=0.5, highFreq=40, linewidth=2, alpha=0.5,
                                  ax=plt.gca(), psdKwargs=psdKwargs,
                                  plotChanNames=True if label == 'Left Arm' else False);
plt.legend(loc='lower right');

Some subtle differences around the $\alpha$, $\beta$ and $\gamma$ bands.

Next, we put powers in four matrices, one row segment and col for power for each class

In [25]:
classPowers = [np.array([pwr.getPowers() for pwr in cls.psd(span=0.5)])
               for cls in classData]
freqs = classData[0].psd(span=0.5)[0].getFreqs()

highMask = freqs < 40.0
lowMask = freqs > 0.5
freqMask = lowMask & highMask

classPowers = [cls[:,freqMask,:] for cls in classPowers]

classPowers = [cls.reshape((cls.shape[0],-1), order='F') for cls in classPowers]

trainData = [np.log10(cls[:36]) for cls in classPowers]
testData = [np.log10(cls[36:]) for cls in classPowers]

print trainData[0].shape, testData[0].shape
(36, 152) (9, 152)

Classify using LDA with shrinkage toward the average eigenvalue of the average covariance matrix

In our experience, other classifiers don't do much better

Dimensionality reduced using decimation and smoothed using Welch's method

In [26]:
nFold = 10
shrinkages = np.linspace(0.0, 1.0, 21)
trnCA = np.zeros(shrinkages.shape)
valCA = np.zeros(shrinkages.shape)

for fold, trn, val in ml.part.classStratified(trainData, nFold):
    print 'fold: ', fold
    for i, sh in enumerate(shrinkages):
        model = ml.LDA(trn, shrinkage=sh)
        trnCA[i] +=
        valCA[i] +=

trnCA /= nFold
valCA /= nFold

bestShrinkage = shrinkages[valCA.argmax()]
print 'bestShrinkage: ', bestShrinkage

model = ml.LDA(trainData, shrinkage=bestShrinkage)

print 'Test Confusion Matrix:\n', model.confusion(testData)
print 'Test Classification Accuracy: ',
print 'Information Transfer Rate: ', model.itr(testData, decisionRate=60)

plt.plot(shrinkages, trnCA, linewidth=2, label='Train')
plt.plot(shrinkages, valCA, linewidth=2, label='Validation')
plt.ylabel('Classification Accuracy')
fold:  0
fold:  1
fold:  2
fold:  3
fold:  4
fold:  5
fold:  6
fold:  7
fold:  8
fold:  9
bestShrinkage:  0.55
Test Confusion Matrix:
[[ 0.7778  0.1111  0.      0.    ]
 [ 0.2222  0.7778  0.5556  0.1111]
 [ 0.      0.1111  0.2222  0.1111]
 [ 0.      0.      0.2222  0.7778]]
Test Classification Accuracy:  0.638888888889
Information Transfer Rate:  29.0430479397

PSDs have a limited ability to represent some types of patterns. For example, phase information is discarded.

In [2]:
sampRate = 100
t = np.arange(0.0, 10.0, 1.0/sampRate)
s1 = np.sin(5*np.pi*t)
s2 = np.sin(5*np.pi*t + np.pi)

fig = plt.figure()
sigAx = fig.add_subplot(1,2,1)
sigAx.plot(t, s1, linewidth=2, label=r'sin($\pi t$)', color='blue')
sigAx.plot(t, s2-2.2, linewidth=2, label=r'sin($\pi t + \pi$)', color='red')
sigAx.set_xlim((0.0, np.pi))

psd1 = sig.RawPSD(s1, sampRate=sampRate)
psd2 = sig.RawPSD(s2, sampRate=sampRate)

psdAx = fig.add_subplot(1,2,2)
psd1.plotPower(linewidth=4, label=r'sin($\pi t$)', color='blue', ax=plt.gca())
psd2.plotPower(linewidth=2, label=r'sin($\pi t + \pi$)', color='red', ax=plt.gca())
psdAx.set_xlim((0.0, 10.0))
psdAx.set_ylim((1e-15, 1e-2))

Also, not well-suited for representing non-linear and non-stationary patterns

In [8]:
s1 = np.exp(np.sin(np.pi*t))
s2 = spsig.square(np.pi*t)

fig = plt.figure()
sigAx = fig.add_subplot(1,2,1)
sigAx.plot(t, s1, linewidth=2, label=r'exp(sin($\pi t$))', color='blue')
sigAx.plot(t, s2-0.8, linewidth=2, label=r'square($\pi t$)', color='red')
sigAx.set_xlim((0.0, 10.0))

psd1 = sig.RawPSD(s1, sampRate=sampRate)
psd2 = sig.RawPSD(s2, sampRate=sampRate)

psdAx = fig.add_subplot(1,2,2)
psd1.plotPower(linewidth=2, label=r'exp(sin($\pi t$))', color='blue', ax=psdAx)
psd2.plotPower(linewidth=2, label=r'square($\pi t$)', color='red', ax=psdAx)
psdAx.set_xlim((0.0, 10.0))
psdAx.set_ylim((1e-15, 1e-2))

Autoregressive (AR) models are another approach for constructing time-invariant representations. AR models represent each signal value as a function of past values. In general, this can be written as

$$\mathbf{S}(t) = f(\mathbf{S}(t-1), \mathbf{S}(t-2), \ldots, \mathbf{S}(t-d)) + \mathbf{\xi}(t)$$

Where $\mathbf{S}(t)$ is a vector of signal values at time $t$, $d$ is the model order, $f$ is a chosen transfer function and $\mathbf{\bar{\xi}}(t)$ is the vector of model residuals.

In [28]:
trainData = [[:36] for cls in classData]
testData = [[36:] for cls in classData]

stand = ml.ClassSegStandardizer(trainData)
trainData = stand.apply(trainData)
testData = stand.apply(testData)

order = 10
model = ml.AR(trainData[0], order=order)
preds, resids = model.eval(trainData[0], returnResid=True)

sep = util.colsep(trainData[0][1], scale=2.0)
plt.plot(trainData[0][1][order:] + sep, color='grey', linewidth=4)
plt.plot(preds[1][:-order] + sep, color='red', linewidth=2)
legend = plt.legend(['EEG', 'ESN'])