Test a Deep Neural Network with Captured Data to Detect WLAN Router Impersonation
This example shows how to train a radio frequency (RF) fingerprinting convolutional neural network (CNN) with captured data. You capture wireless local area network (WLAN) beacon frames from real routers using a software defined radio (SDR). You program a second SDR to transmit unknown beacon frames and capture them. You train the CNN using these captured signals. You then program a software-defined radio (SDR) as a router impersonator that transmits beacon signals with the media access control (MAC) address of one of the known routers and use the CNN to identify it as an impersonator.
For more information on router impersonation and validation of the network design with simulated data, see the Design a Deep Neural Network with Simulated Data to Detect WLAN Router Impersonation (Communications Toolbox) example.
Train with Captured Data
Collect a dataset of 802.11a/g/n/ac OFDM non-high throughput (non-HT) beacon frames from real WLAN routers. As described in the Design a Deep Neural Network with Simulated Data to Detect WLAN Router Impersonation (Communications Toolbox) example, only the legacy long training field (L-LTF) field present in preambles are used as training units in order to avoid any data dependency.
In this example, the data was collected using the scenario depicted in the following figure. The observer is a stationary ADALM-PLUTO radio. Known router data was collected as follows:
Set the observer's center frequency based on the WLAN channel used by the routers
Receive a beacon frame
Extract the L-LTF signal
Decode the MAC address to use as the label
Save the L-LTF signal together with its label
Repeat steps 2-5 to collect
numFramesPerRouter
frames fromnumKnownRouters
routers.
Unknown router beacon frames are simulated using a mobile ADALM-PLUTO radio as a transmitter. This radio repeatedly transmits beacon frames with a random MAC address. Unknown router data was collected as follows:
Generate beacon frames with a random MAC address
Start transmitting the beacon frames repeatedly using the ADALM-PLUTO radio
Collect
NUMFRAMES
beacon framesExtract the L-LTF signal
Save the L-LTF frames with label "Unknown"
Move the radio to another location
Repeat steps 3-6 to collect data from
NUMLOC
locations
This combined dataset of known and unknown routers is used to train the same DL model as in the Design a Deep Neural Network with Simulated Data to Detect WLAN Router Impersonation (Communications Toolbox) example.
This example downloads training data and trained network from https://www.mathworks.com/supportfiles/spc/RFFingerprinting/RFFingerprintingCapturedData_R2024b.tar. If you do not have an Internet connection, you can download the file manually on a computer that is connected to the Internet and save to the same directory as the current example files. For privacy reasons, MAC addresses have been anonymized in the downloaded data. To replicate the results of this example, capture your own data as described in Appendix: Known and Unknown Router Data Collection.
rfFingerprintingDownloadData('captured')
Files already exist. Skipping download.
To run this example quickly, use the downloaded pretrained network. To train the network on your computer, choose the "Train network now" option (i.e. set trainNow
to true). Training this network takes about 25 seconds with an NVIDIA® GeForce RTX 3080 GPU and about 2 minutes with an Intel® Xeon W-2133 CPU @ 3.6 GHz.
trainNow = false; %#ok<*UNRCH>
This example uses data from four known routers. The dataset contains 3600 frames per router, where 90% is used as training frames and 10% is used as test frames.
numKnownRouters = 4; numFramesPerRouter = 3600; numTrainingFramesPerRouter = numFramesPerRouter * 0.9; numTestFramesPerRouter = numFramesPerRouter * 0.1; frameLength = 160;
Preprocess Known and Unknown Router Data
Separate collected complex baseband data into its in-phase and quadrature components and reshape it into a frameLength x 2 x 1 x numFramesPerRouter*numKnownRouters matrix. Repeat the same process for the unknown router data. The following code uses previously collected and pre-processed data. To use your own data, first collect data as described in Appendix: Known and Unknown Router Data Collection. Copy the new data files named rfFingerprintingCapturedDataUser.mat
and rfFingerprintingCapturedUnknownFramesUser.mat
to the same directory as this example. Then update the load
commands to load these files.
if trainNow % Load known router data load('rfFingerprintingCapturedData.mat') % Create label vectors yTrain = repelem(MACAddresses.',numTrainingFramesPerRouter); yTest = repelem(MACAddresses.',numTestFramesPerRouter); % Separate between I and Q numTrainingSamples = numTrainingFramesPerRouter*numKnownRouters*frameLength; xTrainingFrames = xTrainingFrames(1:numTrainingSamples,1); xTrainingFrames = [real(xTrainingFrames),imag(xTrainingFrames)]; numTestSamples = numTestFramesPerRouter*numKnownRouters*frameLength; xTestFrames = xTestFrames(1:numTestSamples,1); xTestFrames = [real(xTestFrames),imag(xTestFrames)]; % Reshape dataset into an frameLength x 2 x 1 x numTrainingFramesPerRouter*numKnownRouters matrix xTrainingFrames = permute(reshape(xTrainingFrames,[frameLength,numTrainingFramesPerRouter*numKnownRouters,2,1]),[1 3 4 2]); % Reshape dataset into an frameLength x 2 x 1 x numTestFramesPerRouter*numKnownRouters matrix xTestFrames = permute(reshape(xTestFrames,[frameLength,numTestFramesPerRouter*numKnownRouters,2,1]),[1 3 4 2]); % Load unknown router data load('rfFingerprintingCapturedUnknownFrames.mat') % Number of training units numUnknownFrames = size(unknownFrames,4); % Split data into 90% training and 10% test numUnknownTrainingFrames = floor(numUnknownFrames*0.9); numUnknownTest = numUnknownFrames - numUnknownTrainingFrames; % Add ADALM-PLUTO data into training and test datasets xTrainingFrames(:,:,:,(1:numUnknownTrainingFrames) + numTrainingFramesPerRouter*numKnownRouters) = unknownFrames(:,:,:, 1:numUnknownTrainingFrames); xTestFrames(:,:,:,(1:numUnknownTest) + numTestFramesPerRouter*numKnownRouters) = unknownFrames(:,:,:,(1:numUnknownTest) + numUnknownTrainingFrames); % Shuffle data vr = randperm(numKnownRouters*numTrainingFramesPerRouter+numUnknownTrainingFrames); xTrainingFrames = xTrainingFrames(:,:,:,vr); % Add "unknown" label and shuffle yTrain = [yTrain; repmat("Unknown",[numUnknownTrainingFrames,1])]; MACAddresses = unique(yTrain(vr)); % List of unique MAC addresses for prediction yTrain = categorical(yTrain(vr)); yTest = [yTest; repmat("Unknown",[numUnknownTest,1])]; yTest = categorical(yTest); uniqueMACAddresses = unique(yTrain); end
Train the CNN
Use the same NN architecture and training options as in the training with simulated data example.
poolSize = [2 1]; strideSize = [2 1]; layers = [ imageInputLayer([frameLength 2 1],'Normalization','none','Name','Input Layer') convolution2dLayer([7 1],50,'Padding',[1 0],'Name','CNN1') batchNormalizationLayer('Name','BN1') leakyReluLayer('Name','LeakyReLu1') maxPooling2dLayer(poolSize,'Stride',strideSize,'Name','MaxPool1') convolution2dLayer([7 2],50,'Padding',[1 0],'Name','CNN2') batchNormalizationLayer('Name','BN2') leakyReluLayer('Name','LeakyReLu2') maxPooling2dLayer(poolSize,'Stride',strideSize,'Name','MaxPool2') fullyConnectedLayer(256,'Name','FC1') leakyReluLayer('Name','LeakyReLu3') dropoutLayer(0.5,'Name','DropOut1') fullyConnectedLayer(80,'Name','FC2') leakyReluLayer('Name','LeakyReLu4') dropoutLayer(0.5,'Name','DropOut2') fullyConnectedLayer(numKnownRouters+1,'Name','FC3') softmaxLayer('Name','SoftMax') ]
layers = 17×1 Layer array with layers: 1 'Input Layer' Image Input 160×2×1 images 2 'CNN1' 2-D Convolution 50 7×1 convolutions with stride [1 1] and padding [1 1 0 0] 3 'BN1' Batch Normalization Batch normalization 4 'LeakyReLu1' Leaky ReLU Leaky ReLU with scale 0.01 5 'MaxPool1' 2-D Max Pooling 2×1 max pooling with stride [2 1] and padding [0 0 0 0] 6 'CNN2' 2-D Convolution 50 7×2 convolutions with stride [1 1] and padding [1 1 0 0] 7 'BN2' Batch Normalization Batch normalization 8 'LeakyReLu2' Leaky ReLU Leaky ReLU with scale 0.01 9 'MaxPool2' 2-D Max Pooling 2×1 max pooling with stride [2 1] and padding [0 0 0 0] 10 'FC1' Fully Connected 256 fully connected layer 11 'LeakyReLu3' Leaky ReLU Leaky ReLU with scale 0.01 12 'DropOut1' Dropout 50% dropout 13 'FC2' Fully Connected 80 fully connected layer 14 'LeakyReLu4' Leaky ReLU Leaky ReLU with scale 0.01 15 'DropOut2' Dropout 50% dropout 16 'FC3' Fully Connected 5 fully connected layer 17 'SoftMax' Softmax softmax
Configure the training options to use ADAM optimizer with a mini-batch size of 256. Use test frames for validation since optimization of hyperparameters were done in [1].
By default, ExecutionEnvironment
is set to 'auto'
, which uses a GPU for training if one is available. Otherwise, trainnet
uses the CPU for training. To explicitly set the execution environment, set ExecutionEnvironment
to one of 'cpu'
, 'gpu'
, 'multi-gpu'
, or 'parallel'
.
if trainNow miniBatchSize = 256; iterPerEpoch = floor((numTrainingFramesPerRouter*numKnownRouters + numUnknownTrainingFrames)/miniBatchSize); % Training options options = trainingOptions('adam', ... 'MaxEpochs', 12, ... 'ValidationData', {xTestFrames, yTest}, ... 'ValidationFrequency', iterPerEpoch, ... 'Verbose', false, ... 'InitialLearnRate', 0.002, ... 'LearnRateSchedule','piecewise', ... 'LearnRateDropFactor', 0.5, ... 'LearnRateDropPeriod', 2, ... 'MiniBatchSize', miniBatchSize, ... 'Plots', 'training-progress', ... 'Shuffle', 'every-epoch', ... 'Metrics', "accuracy", ... 'ExecutionEnvironment', "auto"); % Train the network capturedDataNet = trainnet(xTrainingFrames,yTrain,layers,"crossentropy",options); else load('rfFingerprintingCapturedDataTrainedNN_R2024b.mat','capturedDataNet','xTestFrames','yTest','uniqueMACAddresses') end
The training progress of the network run on a computer with a single NVIDIA GeForce RTX 3080 GPU, where the network converged in 12 epochs to 100% accuracy.
Generate the confusion matrix.
figure yTestPred = predict(capturedDataNet,xTestFrames); yLabel = scores2label(yTestPred,uniqueMACAddresses); cm = confusionchart(yTest,yLabel); cm.Title = 'Confusion Matrix for Test Data'; cm.RowSummary = 'row-normalized';
Test with SDR
Test the performance of the trained network on the class "Unknown". Generate beacon frames with MAC addresses of the known routers and one unknown router. Transmit these frames using an ADALM-PLUTO radio and receive using another ADALM-PLUTO radio. Since the channel and RF impairments created between these two radios are different than the ones created between the real routers and the observer, the neural network should classify all of the received signals as "Unknown". If the received MAC address is a known one, then the system declares the source as a router impersonator. If the received MAC address is an unknown one, then the system declares the source as an unknown router. To perform this test, you need two ADALM-PLUTO radios for transmission and reception. Also, you need to install Communication Toolbox Support Package for ADALM-PLUTO Radio.
Waveform Generation
Generate a transmission waveform consisting of beacon frames with different MAC addresses. The transmitter repeatedly transmits these WLAN frames. The receiver captures the WLAN frames and determines if it is a router impersonator using the received MAC address and RF fingerprint detected by the trained NN.
chanBW='CBW20'; % Channel Bandwidth osf = 2; % Oversampling Factor frameLength=160; % Frame Length in samples % Create Beacon frame-body configuration object frameBodyConfig = wlanMACManagementConfig; % Create Beacon frame configuration object beaconFrameConfig = wlanMACFrameConfig('FrameType','Beacon'); beaconFrameConfig.ManagementConfig = frameBodyConfig; % Create interpolation and decimation objects decimator = dsp.FIRDecimator('DecimationFactor',osf); % Save known MAC addresses MACAddressesToSimulate = ["71B63A2D0B83"; "A3F8AC0F2253"; "EF11D125044A"; "F636A97E07E7"; "ABCDEFABCDEF"]; % Create WLAN waveform with the MAC addresses of known routers and an unknown router txWaveform = zeros(1540,5); for i = 1:length(MACAddressesToSimulate) % Set MAC Address beaconFrameConfig.Address2 = MACAddressesToSimulate(i); % Generate Beacon frame bits [beacon, mpduLength] = wlanMACFrame(beaconFrameConfig,'OutputFormat','bits'); nonHTcfg = wlanNonHTConfig('ChannelBandwidth',chanBW,"MCS",1,"PSDULength",mpduLength); txWaveform(:,i) = [wlanWaveformGenerator(beacon, nonHTcfg); zeros(20,1)]; end txWaveform = txWaveform(:); % Get center frequency for channel 153 in 5 GHz band fc = wlanChannelFrequency(153,5); fs = wlanSampleRate(nonHTcfg); txSig = resample(txWaveform,osf,1); % Samples per frame in Burst Mode spf = length(txSig)/length(MACAddressesToSimulate); runSDRSection = false; if helperIsPlutoSDRInstalled() radios = findPlutoRadio(); if length(radios) >= 2 runSDRSection = true; else disp("Two ADALM-PLUTO radios are needed. Skipping SDR test.") end else disp("Communications Toolbox Support Package for Analog Devices ADALM-PLUTO Radio not found.") disp("Click Add-Ons in the Home tab of the MATLAB toolstrip to install the support package.") disp("Skipping SDR test.") end
Communications Toolbox Support Package for Analog Devices ADALM-PLUTO Radio not found.
Click Add-Ons in the Home tab of the MATLAB toolstrip to install the support package.
Skipping SDR test.
if runSDRSection % Set up PlutoSDR transmitter deviceNameSDR = 'Pluto'; txGain = 0; txSDR = sdrtx(deviceNameSDR); txSDR.RadioID = 'usb:0'; txSDR.BasebandSampleRate = fs*osf; txSDR.CenterFrequency = fc; txSDR.Gain = txGain; % Set up PlutoSDR Receiver rxSDR = sdrrx(deviceNameSDR); rxSDR.RadioID = 'usb:1'; rxSDR.BasebandSampleRate = txSDR.BasebandSampleRate; rxSDR.CenterFrequency = txSDR.CenterFrequency; rxSDR.GainSource ='Manual'; rxSDR.Gain = 30; rxSDR.OutputDataType = 'double'; rxSDR.EnableBurstMode=true; rxSDR.NumFramesInBurst = 20; rxSDR.SamplesPerFrame = osf*spf; end
L-LTF for Classification
The L-LTF sequence present in each beacon frame preamble is used as input units to the NN. rfFingerprintingNonHTFrontEnd System object™ is used to detect the WLAN packets, perform synchronization tasks and, extract the L-LTF sequences and data. In addition, the MAC address is also decoded. In addition, the data is pre-processed and classified using the trained network.
if runSDRSection numLLTF = 20; % Number of L-LTF captured for Testing rxFrontEnd = rfFingerprintingNonHTFrontEnd('ChannelBandwidth','CBW20'); disp("The known MAC addresses are:"); disp(knownMACAddresses) % Set PlutoSDR to transmit repeatedly disp('Starting transmitter') transmitRepeat(txSDR, txSig); % Captured Frames counter numCapturedFrames = 0; disp('Starting receiver') % Loop until numLLTF frames are collected while numCapturedFrames < numLLTF % Receive data using PlutoSDR rxSig = rxSDR(); rxSig = decimator(rxSig); % Perform front-end processing and payload buffering [payloadFull,cfgNonHT,rxNonHTData,chanEst,noiseVar,LLTF] = rxFrontEnd(rxSig); if payloadFull % Recover payload bits recBits = wlanNonHTDataRecover(rxNonHTData,chanEst,noiseVar,cfgNonHT,'EqualizationMethod','ZF'); % Decode and evaluate recovered bits [mpduCfg,~,success] = wlanMPDUDecode(recBits,cfgNonHT); if success == wlanMACDecodeStatus.Success % Update counter numCapturedFrames = numCapturedFrames+1; % Create real-valued input LLTF = [real(LLTF),imag(LLTF)]; LLTF = permute(reshape(LLTF,frameLength,[],2,1), [1 3 4 2]); ypred = classify(capturedDataNet,LLTF); if sum(contains(knownMACAddresses, mpduCfg.Address2)) ~= 0 if categorical(convertCharsToStrings(mpduCfg.Address2))~=ypred disp(strcat("MAC Address ",mpduCfg.Address2," is known, fingerprint mismatch, ROUTER IMPERSONATOR DETECTED")) else disp(strcat("MAC Address ",mpduCfg.Address2," is known, fingerprint match")) end else disp(strcat("MAC Address ",mpduCfg.Address2," is not recognized, unknown device")); end end end end release(txSDR) end
Further Exploration
Capture data from your own routers as explained in Appendix: Known and Unknown Router Data Collection, train the neural network with this data, and test the performance of the network.
Appendix: Known and Unknown Router Data Collection
Use rfFingerprintingRouterDataCollection
to collect data from known (i.e. trusted) routers. This function extracts L-LTF signals present in 802.11a/g/n/ac OFDM Non-HT beacons frames transmitted from commercial 802.11 hardware. For more information see the WLAN Beacon Receiver Using Software-Defined Radio (WLAN Toolbox) example. L-LTF signals and corresponding router MAC addresses are used to train the RF fingerprinting neural network. This method works best if the routers and their antennas are fixed and hard to move unintentionally. For example, in most office environments, routers are mounted on the ceiling. Follow these steps:
Connect an ADALM-PLUTO radio to your PC to use as the observer radio.
Place the radio in a central location where it can receive signals from as many routers as possible. Fix the radio so that it does not move. If possible, place the observer radio on the ceiling or high on a wall.
Determine the channel number of the routers. You can use a Wi-Fi® analyzer app on your phone to find out the channel numbers.
Start data collection by running "
rfFingerprintingRouterDataCollection(channel)
" where channel is the Wi-Fi channel numberMonitor the "max(abs(LLTF))" value. If it is above 1.2 or smaller than 0.01, adjust the gain of the receiver using the GAIN input of
rfFingerprintingRouterDataCollection
function.
Use the helper functions rfFingerprintingUnknownClassDataCollectionTx
and rfFingerprintingUnknownClassDataCollectionRx
to collect data from unknown routers. These functions set two ADALM-PLUTO radios to transmit and receive L-LTF signals. The received signals are combined with the known router signals to train the neural network. You need two ADALM-PLUTO radios, preferably connected to two separate PCs. Follow these steps:
Connect an ADALM-PLUTO radio to a stationary PC to act as the unknown router.
Start transmissions by running "
rfFingerprintingUnknownClassDataCollectionTx
".Connect another ADALM-PLUTO radio to a mobile PC to act as the observer.
Start data collection by running "
rfFingerprintingUnknownClassDataCollectionRx
". This function by default collects 200 frames per location. Each location represents a different unknown router.When the function instructs you to move to a new location, move the observer radio to a new location. By default, this function collects data from 10 locations.
If the observer does not receive any beacons or it rarely receives beacons, move the observer closer to the transmitter.
Once the data collection is done, call "
release(sdrTransmitter)
" in the transmitting radio's MATLAB® session.
Selected Bibliography
[1] K. Sankhe, M. Belgiovine, F. Zhou, S. Riyaz, S. Ioannidis and K. Chowdhury, "ORACLE: Optimized Radio clAssification through Convolutional neuraL nEtworks," IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, Paris, France, 2019, pp. 370-378.